Microsoft C/C++ Optimising Compiler version 15.00.30729.01 for x64, the test project was compiled with O2 Ob2 Oi Ot Oy GL.
Test Code 1
Compiled Code 1Code:inline int Alpha(int a, int b, int c, int d, int e) { int la = a; int lb = b; int lc = c; int ld = d; int le = e; return la + lb + lc + ld + le; } void main() { int lAlpha; lAlpha = Alpha(1, 2, 3, 4, 5); printf("%d", lAlpha); }
Analysis 1Code:void main() { 000000013FAC1000 sub rsp,28h int lAlpha; lAlpha = Alpha(1, 2, 3, 4, 5); printf("%d", lAlpha); 000000013FAC1004 lea rcx,[string "%d" (13FAC21B0h)] 000000013FAC100B mov edx,0Fh 000000013FAC1010 call qword ptr [__imp_printf (13FAC2130h)] } 000000013FAC1016 xor eax,eax 000000013FAC1018 add rsp,28h 000000013FAC101C ret
We know from the above code that the Microsoft C/C++ Optimising Compiler 15 does static optimisation.Code:000000013FAC100B mov edx,0Fh
Inline function Alpha is not actually executed. The output value of inline function Alpha is statically (compile-time) calculated and parameterised. Therefore, execution time of Alpha is ZERO.
Test Code 2
Compiled Code 2Code:unsigned __int8 SendBuffer[256]; unsigned __int32 SendBufferPointer = 0; void Write8(unsigned __int8 Data) { SendBuffer[SendBufferPointer] = Data; SendBufferPointer++; } void Write16(unsigned __int16 Data) { *(unsigned __int16 *)&SendBuffer[SendBufferPointer] = Data; SendBufferPointer += 2; } void main() { Write8(0x30); Write16(0x5050); }
Analysis 2Code:void main() { Write8(0x30); 000000013FC51000 mov ecx,dword ptr [SendBufferPointer (13FC537E4h)] 000000013FC51006 lea r8,[SendBuffer (13FC536E0h)] Write16(0x5050); 000000013FC5100D mov edx,5050h 000000013FC51012 mov byte ptr [rcx+r8],30h 000000013FC51017 inc ecx } 000000013FC51019 xor eax,eax 000000013FC5101B mov word ptr [rcx+r8],dx 000000013FC51020 add ecx,2 000000013FC51023 mov dword ptr [SendBufferPointer (13FC537E4h)],ecx 000000013FC51029 ret
In this case, the two different functions are optimised after expansion. 000000013FC51000 loads the value of SendBufferPointer to the ECX (it supposes that the value of the high 32-bit is zero.). 000000013FC51006 loads the address of SendBuffer[0] to the R8. Therefore, ECX is the reference register and R8 is the base register. 000000013FC5100D sets the ECX to the value of Write16 (actually this line could be placed after 000000013FC51012). 000000013FC51012 copies 0x30 which is the value of Write8 to RCX+R8 (SendBuffer[SendBufferPointer]) and 000000013FC51017 increases the reference register (corresponds to SendBufferPointer++). 000000013FC51019 just initialises the return value to zero (actually, it has no effect, it is just because the procedure is main procedure). 000000013FC5101B copies the DX (EDX was set to the Write16 value at 000000013FC5100D) to RCX+R8 (RCX was increased at 000000013FC51017 and its reference is SendBuffer[SendBufferPointer] after SendBufferPointer++). 000000013FC51020 increases the ECX (which is the reference register) by two and 000000013FC51023 copies the increased value to SendBufferPointer which is now 3.
This is quite well optimised source code. As the compiler performed at maximum optimisation, it is really an optimum code.
Test Code 3
Compiled Code 3Code:unsigned __int8 ReceiveBuffer[256]; unsigned __int32 ReceiveBufferPointer = 0; unsigned __int8 Read8() { unsigned __int8 ReturnValue = ReceiveBuffer[ReceiveBufferPointer]; ReceiveBufferPointer++; return ReturnValue; } unsigned __int16 Read16() { unsigned __int16 ReturnValue = *(unsigned __int16 *)&ReceiveBuffer[ReceiveBufferPointer]; ReceiveBufferPointer += 2; return ReturnValue; } void main() { unsigned __int8 Dummy1 = Read8(); unsigned __int16 Dummy2 = Read16(); printf("%d %d", Dummy1, Dummy2); }
Analysis 3Code:void main() { 000000013F0C1000 sub rsp,28h unsigned __int8 Dummy1 = Read8(); 000000013F0C1004 mov edx,dword ptr [ReceiveBufferPointer (13F0C37E4h)] 000000013F0C100A lea r8,[ReceiveBuffer (13F0C3020h)] 000000013F0C1011 movzx ecx,byte ptr [rdx+r8] 000000013F0C1016 inc edx unsigned __int16 Dummy2 = Read16(); 000000013F0C1018 mov eax,edx 000000013F0C101A add edx,2 printf("%x %x", Dummy1, Dummy2); 000000013F0C101D movzx r8d,word ptr [rax+r8] 000000013F0C1022 mov dword ptr [ReceiveBufferPointer (13F0C37E4h)],edx 000000013F0C1028 mov edx,ecx 000000013F0C102A lea rcx,[string "%x %x" (13F0C21B0h)] 000000013F0C1031 call qword ptr [__imp_printf (13F0C2130h)] } 000000013F0C1037 xor eax,eax 000000013F0C1039 add rsp,28h e 000000013F0C103D ret
This is almost same as the Analysis 3. 000000013F0C1004 sets the EDX to the value of ReceiveBufferPointer and 000000013F0C100A sets R8 to the address of ReceiveBuffer[0]. 000000013F0C1011 copies the first byte of ReceiveBuffer[ReceiveBufferPointer] to ECX and 000000013F0C1016 increases the ReceiveBufferPointer. 000000013F0C1018 copies the EDX to the EAX because the EDX will be changed (actually it is quite unnecessary). 000000013F0C101A increases EDX by two and 000000013F0C101D copies ReceiveBuffer[ReceiveBufferPointer+1] when ReceiveBufferPointer has the original value. Therefore ECX has a Read8, and R8D has a Read16 value. 000000013F0C1022 copies the value of EDX to ReceiveBufferPointer, so the ReceiveBufferPointer is updated to +3. 000000013F0C1028 parameterises the ECX(value of Read8) and 000000013F0C1031 calls the printf function. Although, it is not a bad optimisation, it could be more optimised if it were written by a human being. I would say it is well optimised if we consider that the code is generated by patternised algorithms.
Conclusion
We can use inline functions (if the function expansion optimisation is enabled, automatically inline functionalised) for improvement in the program performance. We can still use Write/Read operations without loss of performance. However, the TitanMS-style of processing create a new Reader/Writer class and process with the objects. it is really inefficient. For efficiency, those parts should be rewritten without creating Reader/Writer objects (particularly in Java/C#, there is no such optimisation, this method *must not be used.).
Written by Stephanos Yeo, Copyright



Reply With Quote![[Optimisation] MSOCC 15 Inline Function Execution Analysis](http://ragezone.com/hyper728.png)

