Bug 115675 - [15 Regression] truncv4hiv4qi affect r14-1402-gd8545fb2c71683's optimization.
Summary: [15 Regression] truncv4hiv4qi affect r14-1402-gd8545fb2c71683's optimization.
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: 15.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2024-06-27 08:27 UTC by Hu Lin
Modified: 2024-06-28 01:42 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hu Lin 2024-06-27 08:27:51 UTC
After r15-1678-ge5f8a39941f6f0, truncv4hiv4qi affects dump and interferes with r14-1402-gd8545fb2c71683's optimization.

When I compile pr108938-3.c with -mavx or -mavx512bw -mavx512vl, GCC doesn't generate bswap r32. I've discussed this with Hongtao and haven't found an easier way to do it yet. I think it might be possible to target match the current form to re-support bswap optimization with option -mavx.
Comment 1 Richard Biener 2024-06-27 12:27:14 UTC
so it's now SLP vectorized?
Comment 2 Hongtao Liu 2024-06-27 13:19:53 UTC
(In reply to Richard Biener from comment #1)
> so it's now SLP vectorized?

Yes, the vectorization looks not reasonable. it used to be vectorized as v4qi vector CTOR +  v4qi vector store. Now it's vectorized as v4hi vector CTOR + truncv4hiv4qi + v4qi vector store.
Comment 3 Hu Lin 2024-06-28 01:42:18 UTC
192t.slp2

Previous:

781   char * vectp.10;
782   vector(4) char * vectp_a.9;
783   short int _1;
784   short int _2;
785   char _3;
786   char _4;
787   short int _5;
788   short int _6;
789   char _7;
790   char _8;
791   vector(4) char _16;
792
793   <bb 2> [local count: 1073741824]:
794   _1 = *b_10(D);
795   _2 = _1 >> 8;
796   _3 = (char) _2;
797   _4 = (char) _1;
798   _5 = MEM[(short int *)b_10(D) + 2B];
799   _6 = _5 >> 8;
800   _7 = (char) _6;
801   _8 = (char) _5;
802   _16 = {_3, _4, _7, _8};
803   vectp.10_17 = a_11(D);
804   MEM <vector(4) char> [(char *)vectp.10_17] = _16;

Current:

792   char * vectp.11;
793   vector(4) char * vectp_a.10;
794   vector(4) char vect__3.9;
795   short int _1;
796   short int _2;
797   char _3;
798   char _4;
799   short int _5;
800   short int _6;
801   char _7;
802   char _8;
803   vector(4) short int _16;
804
805   <bb 2> [local count: 1073741824]:
806   _1 = *b_10(D);
807   _2 = _1 >> 8;
808   _3 = (char) _2;
809   _4 = (char) _1;
810   _5 = MEM[(short int *)b_10(D) + 2B];
811   _6 = _5 >> 8;
812   _16 = {_2, _1, _6, _5};
813   vect__3.9_17 = (vector(4) char) _16;
814   _7 = (char) _6;
815   _8 = (char) _5;
816   vectp.11_18 = a_11(D);
817   MEM <vector(4) char> [(char *)vectp.11_18] = vect__3.9_17;