Created attachment 53051 [details] source Hello, The attached code does not produce the same result with -O3 flag enabled. It seems that gcc reorders operations that should not be in the matrix transposition operation. The trick here is that the attached code does inplace partial transposition. To reproduce : gcc main0.c && ./a.out > O0.txt ; gcc main0.c -O3 && ./a.out > O3.txt ; md5sum O0.txt O3.txt 0b513fb110f11f0e9b143c53d5b7a634 O0.txt 12be7305e8e96decd579a1e42d45bc46 O3.txt This behavior is weird as matrix size lower than 16 do not trigger the suspected bug. My gcc version is 10.3.1. I tested with https://godbolt.org/ : It seems to be introduce in Gcc 8.1 as Gcc 7.5 give the correct output. The last gcc 12.1 seems also affected. Clang is fine and give the right output. Can someone confirmed ? Best regards, Franck
I think you have an aliasing violation here. Does adding -fno-strict-aliasing fix the issue?
I think the way to fix the code is to do this: transpose_upper_to_lower (mat,&mat);
Confirmed. main0.c:28:20: optimized: applying unroll and jam with factor 2 main0.c:29:24: optimized: loop with 16 iterations completely unrolled (header execution count 59700049) main0.c:45:24: optimized: loop vectorized using 16 byte vectors main0.c:45:24: optimized: loop turned into non-loop; it never loops main0.c:41:5: optimized: loop with 3 iterations completely unrolled (header execution count 59700049) main0.c:44:20: optimized: loop with 16 iterations completely unrolled (header execution count 0) -fno-loop-unroll-and-jam fixes it. Can't check trunk right now whether it's fixed.
Not fixed on trunk.
Started with r8-5159-g1cc521f1a824b591.
Hello, > Does adding -fno-strict-aliasing fix the issue? Right, it does. > I think you have an aliasing violation here. I can not say if we have aliasing violation here. My understanding is that AV happens when mixing pointer type referring to the same address. >I think the way to fix the code is to do this: >transpose_upper_to_lower (mat,&mat); It does not change the result. The issue is still present. > -fno-loop-unroll-and-jam fixes it. Can't check trunk right now whether it's fixed. I can confirm this too. Regards, Franck