Bug 105771 - matrix partial transposition with -O3 since r8-5159-g1cc521f1a824b591
Summary: matrix partial transposition with -O3 since r8-5159-g1cc521f1a824b591
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 10.3.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2022-05-30 08:41 UTC by Franck Behaghel
Modified: 2022-05-30 11:04 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 10.2.0, 11.2.0, 12.1.1, 9.4.0
Last reconfirmed: 2022-05-30 00:00:00


Attachments
source (402 bytes, text/x-csrc)
2022-05-30 08:41 UTC, Franck Behaghel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Franck Behaghel 2022-05-30 08:41:51 UTC
Created attachment 53051 [details]
source

Hello,

The attached code does not produce the same result with -O3 flag enabled.

It seems that gcc reorders operations that should not be in the matrix transposition operation. The trick here is that the attached code does inplace partial transposition. 


To reproduce : 
gcc  main0.c  && ./a.out > O0.txt ; gcc main0.c -O3 && ./a.out > O3.txt ; md5sum O0.txt O3.txt 
0b513fb110f11f0e9b143c53d5b7a634  O0.txt
12be7305e8e96decd579a1e42d45bc46  O3.txt

This behavior is weird as matrix size lower than 16 do not trigger the suspected bug.

My gcc version is 10.3.1.
I tested with https://godbolt.org/ : It seems to be introduce in Gcc 8.1 as Gcc 7.5 give the correct output. The last gcc 12.1 seems also affected.

Clang is fine and give the right output.

Can someone confirmed ?
Best regards,
Franck
Comment 1 Andrew Pinski 2022-05-30 09:14:11 UTC
I think you have an aliasing violation here. Does adding -fno-strict-aliasing fix the issue?
Comment 2 Andrew Pinski 2022-05-30 09:16:09 UTC
I think the way to fix the code is to do this:
transpose_upper_to_lower (mat,&mat);
Comment 3 Richard Biener 2022-05-30 09:38:58 UTC
Confirmed.

main0.c:28:20: optimized: applying unroll and jam with factor 2
main0.c:29:24: optimized: loop with 16 iterations completely unrolled (header execution count 59700049)
main0.c:45:24: optimized: loop vectorized using 16 byte vectors
main0.c:45:24: optimized: loop turned into non-loop; it never loops
main0.c:41:5: optimized: loop with 3 iterations completely unrolled (header execution count 59700049)
main0.c:44:20: optimized: loop with 16 iterations completely unrolled (header execution count 0)

-fno-loop-unroll-and-jam fixes it.  Can't check trunk right now whether it's fixed.
Comment 4 Richard Biener 2022-05-30 09:55:04 UTC
Not fixed on trunk.
Comment 5 Martin Liška 2022-05-30 10:18:09 UTC
Started with r8-5159-g1cc521f1a824b591.
Comment 6 Franck Behaghel 2022-05-30 11:04:13 UTC
Hello,

> Does adding -fno-strict-aliasing fix the issue?
Right, it does. 

> I think you have an aliasing violation here.
I can not say if we have aliasing violation here. My understanding is that AV happens when mixing pointer type referring to the same address.

>I think the way to fix the code is to do this:
>transpose_upper_to_lower (mat,&mat);
It does not change the result. The issue is still present.

> -fno-loop-unroll-and-jam fixes it.  Can't check trunk right now whether it's fixed.
I can confirm this too.


Regards,
Franck