This is a bit of code compiled with g++ and using <complex> from libstdc++, but the issue is an optimization one. Something Bad(tm) happens when complex arguments are passed. The function prolog misuses the registers. The small test case in question (which I'll attach separately) when compiled with "arm-elf-g++ -O2" will produce the output: Comparing (1,0) with (0,0) (1,0) != (0,0)! After a lot of playing I found it could also be reproduced with "arm-elf-g++ -O1 -fcse-skip-blocks": Comparing (1,0) with (0,0) (1,0) != (0,0)! If compiled with "arm-elf-g++ -O2 -fno-cse-skip-blocks" however the problem does not disappear, so I don't expect any issue with -fcse-skip-blocks itself, but it just changes the optimizer state sufficiently. Curiously though, the output isn't quite the same: Comparing (1,0) with (1,0) (1,0) != (0,0)! Various other -fno-* options with -O2 can cause the problem to appear and disappear as well, so beware of using a different gcc from 3.3.3 - just because the problem doesn't appear per se doesn't mean it has been fixed! I tried a native linux gcc 3.3.3 on the test case and it works which may well mean it's arm specific. Looking at the generated assembler (in the -O1 -fcse-skip-blocks example) I thought at first glance there was a problem in main: adr r2, .L20 ldmia r2, {r2-r3} adr r4, .L20+8 ldmia r4, {r4-r5} where .L20 is: .L20: .word 1072693248 .word 0 .word 0 .word 0 However the -O1 code which appears to run correctly also has .L20 defined like that. If compiled with -O0 something it indeed has the equivalent of the first two words of .L20 being used to load both z1 and z2 which is more what I'd expect. So maybe there are in fact two problems here? One where .L20 is defined as above, and another being a code generation issue. NB I haven't been able to play with 3.4 due to problems with it on my target. Let me know if you want more info.
Created attachment 6135 [details] Test case showing failure
Actually forget what I said about .L20 etc. I've just realised that the assembler generated there is okay... it just slipped my mind that the complex will be 4 words long, two for each double.
Richard, Paul, this is an ARM problem. Would you please give a look?
Cold case analysis time. Have you seen this problem in recent releases of GCC?
No feedback in 6 months. Closing as presumed fixed.
Sorry yes, it was somewhat hard to replicable even with 3.3.3, and I was never able to reproduce it with more recent GCC (but never quite satisfied myself that it wasn't reproduceable!). Time to draw a line.