C++ - extremely strange machine code behaviour

Go To StackoverFlow.com

7

The full code is here: http://pastebin.com/MM3vWmqA

In the function fast_generator I've added comments to two statements. If you switch those statements, the code will run ~1.8x faster. If you remove the first statement the code will perform faster than the original version, but slower compared to if they were switched.

The test cases should be the following.

First - slowest. 452ms.

counter++;
i--;

Second - faster than the first one. 280ms.

i--;
counter++;

Third - faster than the first, but slower than the second one. 421ms.

i--;

The assembler output for the original statements is.

inc edx
mov eax, 6

I have verified that when switching those statements the assembler output stays the same, with the only difference of these asm instructions being interchanged.

I've tested it with VC++10 and VC++11, same behaviour. Can someone explain why switching these statements speeds up the algorithm ~1.8x? If you think that std::clock() is not accurate, change size = 7. On my machine the difference with size = 7 is 12000ms vs 7000ms.

2012-04-04 21:23
by NFRCR
List your compile options or it didn't happen. (applies to any discussion of performance - Ben Voigt 2012-04-04 21:29
Not related to your question, but change int main(void) to int main() in C++ - Jesse Good 2012-04-04 21:30
In this case, a description of the platform, including processor model, RAM speed and capacity, etc. might also help - Ben Voigt 2012-04-04 21:30
I'm not able to repro this on either x86 or x64 with VS2010. The timings are the same. What processor are you using - Mysticial 2012-04-04 21:30
@Mysticial: VS2010 ships with C++ compiler version 16.0 - Ben Voigt 2012-04-04 21:32
@BenVoigt I don't think it matters, I seem to getting the same assembly output: with the two instructions switched - Mysticial 2012-04-04 21:33
Jesse, (void) is perfectly fine, it's a matter of style.

I'm compiling it as x86. Processor: Intel E8400 @ 4.0GHz. RAM: 4GB @ 900MHz.

Compiler options: /analyze- /O2 /Gd /nologo /MT /Gm- /GL /Fa"Release\" /Oi /Oy- /Zc:forScope /Fo"Release\" /Gy /Fp"Release\C++test.pch" /WX- /errorReport:queue /GS /Fd"Release\vc110.pdb" /fp:precise /W4 /Zi /Zc:wchart /EHs - NFRCR 2012-04-04 21:34

Oh ok, that's a Core 2 generation processor. I was testing on a Nehalem - Mysticial 2012-04-04 21:35
Same CPU loads during all executions? If you are on linux, use time instea - BЈовић 2012-04-04 21:37
I'm on Windows, and I've executed each of the 3 variants tens of times - NFRCR 2012-04-04 21:38
FWIW, on Ubuntu 10.04, g++ 4.4.3, -O4 -g -Wall -Werror -ansi -pedantic, I get identical results, regardless of the order of the source code lines - Robᵩ 2012-04-04 21:39
OK - I am able to reproduce this on a Core 2 machine. + - Mysticial 2012-04-04 21:52
@NFRCR: In C, its perfectly fine, but in C++ it is considered poor style (here is a reference). And, also here is an SO question talking about it - Jesse Good 2012-04-04 22:37
There's something even funnier going on than switching the instructions. When I compiled case 2 with different offsets, the timings varied wildly by up the a factor of two. It looks like instruction alignment has an effect - Mysticial 2012-04-04 22:49
@Jesse: Are you going to "correct" his brace style too? That has about as much merit as your complaint. And don't tell me that it's poor style just because it's redundant, plenty of redundancies in C++ are considered good style -- explicitly stating private in class definitions, explicitly stating virtual for overrides, etc - Josh 2012-04-05 02:12
@Jesse: I mentioned braces as a reminder that people can disagree on style without either of them being wrong. You simply told him to use your preferred style instead of his, and that is wholly inappropriate. That he might be in the minority in his style preference is irrelevant - Josh 2012-04-05 11:17
Jesse, I know that C++ doesn't require you to specify void if the function takes no arguments. However I just happen to prefer specifying that. I think specifying something explicitly will not hurt. For example it's perfectly acceptable if someone decides to write signed long int, even though just long would have been good enough. At times I have to compatible with C code, and it's not worth making your code inconsistent by using () for pure C++ and (void) for the rest - NFRCR 2012-04-05 18:46
@NFRC: Fine, if you want to disagree with the creator of C++ and the entire C++ community, go ahead (fun fact: Did you know that the creators of C++ wanted to disallow f(void)? The only reason its allowed is to be backwards compatible with C. - Jesse Good 2012-04-05 20:52
Time to either take this to chat, or get back on the question topic - Robert Harvey 2012-04-05 21:03


3

Your slow examples are decrementing i immediately before using it to index the array at the beginning of the loop. Your fast example adds an intervening step. Without being privy to the internal architecture of the processor it's impossible to know for sure, but most likely what is happening is the processor already has buffer[i] in its pipeline, but the decrement invalidates that value, causing a pipeline stall. With the intervening step it has more time to recover the correct value before it is needed.

By the way, mov eax, 5 isn't the instruction that's doing the i--. It would be helpful to post more of the assembly context, for those of us without your compiler.

2012-04-04 22:13
by Karl Bielefeldt
Actually, the mov eax, 5 is correct. If you look at the code, i is always 7 upon entering the loop. So it equates to mov eax, 6 The OP shows 5, but probably he just forgot to change the constant - Mysticial 2012-04-04 22:17
This doesn't explain the behavior of the 3rd case. The counter++ seems to be increasing the speed - Mysticial 2012-04-04 22:20
mov eax, 5 is indeed for i--, but if size = 6. I've edited the original post so it says mov eax, 6 now - NFRCR 2012-04-04 22:26
Ads