[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394

Fri Oct 23 06:21:21 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #15)
> CI with -march=cascadelake reports
> 
..
> FAIL: gcc.target/i386/avx2-vpcmpeqq-2.c execution test

expands

(gdb) p debug_tree (exp)
 <vector_cst 0x7ffff4bbd390
    type <vector_type 0x7ffff683a3f0
        type <boolean_type 0x7ffff683a348 public QI
            size <integer_cst 0x7ffff680cdc8 constant 8>
            unit-size <integer_cst 0x7ffff680cde0 constant 1>
            align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff683a348 precision:2 min <integer_cst 0x7ffff66c9df8 -2> max <integer_cst
0x7ffff66ff288 1>>
        QI size <integer_cst 0x7ffff680cdc8 8> unit-size <integer_cst
0x7ffff680cde0 1>
        align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff683a3f0 nunits:4>
    constant tree_0 npatterns:2 nelts-per-pattern:2
    elt:0:  <integer_cst 0x7ffff4b9f738 type <boolean_type 0x7ffff683a348>
constant 0>
    elt:1:  <integer_cst 0x7ffff4b9f678 type <boolean_type 0x7ffff683a348>
constant -1> elt:2:  <integer_cst 0x7ffff4b9f738 0> elt:3:  <integer_cst
0x7ffff4b9f738 0>>

which shows the heuristic cannot work.  We possibly can refine it to
key on mode-precision component types - which _might_ work since it seems
x86 uses the smallest integer mode to hold nunits bits - but that's of course
not something guaranteed for non-x86.

I wonder why we're insisting to "fill" the mask mode on GENERIC/GIMPLE
while RTL produces packed bits.  Thus, why do we use a
QImode vector(4) <signed-boolean:2> here instead of a
QImode vector(4) <signed-boolean:1> if the target in the end will produce that
from say, a V4SImode compare-to-mask?  As long as we didn't expose
temporaries of those types this was well-hidden up to RTL expansion which
then did "magic" but now we're really facing inconsistent representations.

Now targets _could_ opt to use QImode vector(4) <signed-boolean:2> but then
with representing { -1, -1, -1, -1 } as 0b11111111 (with the 'padding bits'
sign-extended).

For now I'm going to revert the patch but I still believe
const_scalar_mask_from_tree is a red herring.