[Bug tree-optimization/69282] [6 Regression] aarch64/armhf ICE on SPEC2006 464.h264ref at -O3

wilson at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Jan 15 01:16:00 GMT 2016


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69282

--- Comment #9 from Jim Wilson <wilson at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> (In reply to Jim Wilson from comment #7)
> > The simplified testcases fail on arm if you use -O3 -mfpu=neon.
> > 
> > I can look at fixing the arm side of things if we need an md patch.
> 
> Try my attached patch and see what the code generation is.

Looks like you changed options to -O2 -ftree-vectorize.

On the aarch64 side I see
        ldr     q0, [x0, x1]
        add     x0, x0, 16
        cmp     x0, 128
        cmeq    v0.4s, v0.4s, #0
        not     v0.16b, v0.16b
        cmlt    v0.4s, v0.4s, #0
        bit     v1.16b, v2.16b, v0.16b
        bic     v3.16b, v3.16b, v0.16b
        add     v2.4s, v2.4s, v4.4s
and on the arm side I see
        vld1.32 {q8}, [r3]
        adds    r3, r3, #16
        cmp     r2, r3
        vceq.i32        q8, q10, q8
        vbsl    q8, q10, q14
        vclt.s32        q8, q8, #0
        vbit    q9, q11, q8
        vbit    q12, q10, q8
        vadd.i32        q11, q11, q13
There is a vbsl instruction in the arm output, but still the same number of
instructions with the apparently unnecessary second vector compare.


More information about the Gcc-bugs mailing list