[Bug tree-optimization/69282] [6 Regression] aarch64/armhf ICE on SPEC2006 464.h264ref at -O3
wilson at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Jan 15 01:16:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69282
--- Comment #9 from Jim Wilson <wilson at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> (In reply to Jim Wilson from comment #7)
> > The simplified testcases fail on arm if you use -O3 -mfpu=neon.
> >
> > I can look at fixing the arm side of things if we need an md patch.
>
> Try my attached patch and see what the code generation is.
Looks like you changed options to -O2 -ftree-vectorize.
On the aarch64 side I see
ldr q0, [x0, x1]
add x0, x0, 16
cmp x0, 128
cmeq v0.4s, v0.4s, #0
not v0.16b, v0.16b
cmlt v0.4s, v0.4s, #0
bit v1.16b, v2.16b, v0.16b
bic v3.16b, v3.16b, v0.16b
add v2.4s, v2.4s, v4.4s
and on the arm side I see
vld1.32 {q8}, [r3]
adds r3, r3, #16
cmp r2, r3
vceq.i32 q8, q10, q8
vbsl q8, q10, q14
vclt.s32 q8, q8, #0
vbit q9, q11, q8
vbit q12, q10, q8
vadd.i32 q11, q11, q13
There is a vbsl instruction in the arm output, but still the same number of
instructions with the apparently unnecessary second vector compare.
More information about the Gcc-bugs
mailing list