[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

Mon Nov 17 16:23:00 GMT 2014

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #13 from ktkachov at gcc dot gnu.org ---
So I see this regression still, but only for some -mcpu options.
For example for -mcpu=cortex-a15 we get:
        mul     r3, r0, r3
        strd    r4, [sp, #-8]!
        umull   r4, r5, r0, r2
        mla     r1, r2, r1, r3
        mov     r0, r4
        add     r5, r1, r5
        mov     r1, r5
        ldrd    r4, [sp]
        add     sp, sp, #8

whereas for cortex-a7 we get:
        mul     r3, r0, r3
        mla     r3, r2, r1, r3
        umull   r0, r1, r0, r2
        add     r1, r3, r1

I think the problem here is reload.
If I look at the the dump of postreload, for the 'bad' RTL I see:
r0(SI) := r0(SI)
r3(SI) := r0(SI) * r3(SI)
r4(DI) := r0(SI) * r2(SI) //with sign extension
r1(SI) := r2(SI) * r1(SI) + r3(SI)
r5(SI) := r1(SI) + r5(SI)
r0(DI) := r4(DI)

whereas for the good one I see:
r0(SI) := r0(SI)
r3(SI) := r0(SI) * r3(SI)
r3(SI) := r2(SI) * r1(SI) + r3(SI)
r0(DI) := r0(SI) * r2(SI) //with sign extension
r1(SI) := r3(SI) + r1(SI)
r0(DI) := r0(DI)

In the good one the final insn is eliminated due to being dead, whereas the in
the bad one the final DImode move is split into two moves.

Sched1 changed the order of the mult and mult-accumulate but it's the register
allocator that causes the bad codegen