115683 – [15 Regression] SSE2 regressions after obselete of vcond{,u,eq}.

Bug 115683 - [15 Regression] SSE2 regressions after obselete of vcond{,u,eq}.

Summary: [15 Regression] SSE2 regressions after obselete of vcond{,u,eq}.

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	15.0

Importance:	P3 normal
Target Milestone:	15.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	114189 115517
	Show dependency tree / graph

Reported:	2024-06-27 13:14 UTC by Hongtao Liu
Modified:	2024-07-01 06:55 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	x86_64-- i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2024-07-01 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Hongtao Liu 2024-06-27 13:14:45 UTC

Whole failure list.
g++: g++.target/i386/pr100637-1b.C  -std=gnu++14  scan-assembler-times pcmpeqb 2
g++: g++.target/i386/pr100637-1b.C  -std=gnu++17  scan-assembler-times pcmpeqb 2
g++: g++.target/i386/pr100637-1b.C  -std=gnu++20  scan-assembler-times pcmpeqb 2
g++: g++.target/i386/pr100637-1b.C  -std=gnu++98  scan-assembler-times pcmpeqb 2
g++: g++.target/i386/pr100637-1w.C  -std=gnu++14  scan-assembler-times pcmpeqw 2
g++: g++.target/i386/pr100637-1w.C  -std=gnu++17  scan-assembler-times pcmpeqw 2
g++: g++.target/i386/pr100637-1w.C  -std=gnu++20  scan-assembler-times pcmpeqw 2
g++: g++.target/i386/pr100637-1w.C  -std=gnu++98  scan-assembler-times pcmpeqw 2
g++: g++.target/i386/pr103861-1.C  -std=gnu++14  scan-assembler-times pcmpeqb 2
g++: g++.target/i386/pr103861-1.C  -std=gnu++17  scan-assembler-times pcmpeqb 2
g++: g++.target/i386/pr103861-1.C  -std=gnu++20  scan-assembler-times pcmpeqb 2
g++: g++.target/i386/pr103861-1.C  -std=gnu++98  scan-assembler-times pcmpeqb 2
gcc: gcc.target/i386/pr88540.c scan-assembler minpd



There're extra 1 pcmpeq instruction generated in below 3 testcase for comparison of GTU, x86 doesn't support native GTU comparison, but use psubusw + pcmpeq + pcmpeq, the second pcmpeq is used to negate the mask, and the negate can be
 eliminated in vcond{,u,eq} expander by just swapping if_true and if_else.

g++: g++.target/i386/pr100637-1b.C 
g++.target/i386/pr100637-1w.C
g++: g++.target/i386/pr103861-1.C


This one maybe a little bit difficult, it's x86 specific floating point min/max{ps,pd} which is an exact match of a > b ? a : b, and not ieee-conformant.

gcc: gcc.target/i386/pr88540.c scan-assembler minpd

Comment 1 Richard Biener 2024-07-01 06:55:01 UTC

For the a > b ? a : b and a < b ? a : b (or is that a <= b ? a : b for min{ps,pd}?) I wonder if we want new optabs like cond_fmax (that name is
already taken though - maybe fmax_gt and fmin_le?).

I also wonder if there are other archs with similar instructions.

The unsigned compare looks like a general trick we could use in vectorizer
pattern recognition or alternatively in vectorizable_comparison which needs
adjustments anyway in case vec_cmp expanders start to reject some compare
operators.  Of course the testcase use generic vectors so the same applies
to vector lowering (code sharing between vectorization and lowering as far
as "tricks" go would be nice though this one looks difficult to generalize).