[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

Wed Mar 16 01:56:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #19 from Richard Henderson <rth at gcc dot gnu.org> ---
(In reply to Jiong Wang from comment #16)
> But there is a performance issue as described at
>  
>   https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00281.html
> 
>   "this patch forces register scaling expression out of memory ref, so that
>    RTL CSE pass can handle common register scaling expressions"
> 
> This is particularly performance critial if a group of instructions are
> using the same "scaled register" inside hot loop. CSE can reduce redundant
> calculations.

I wish that message had been a bit more complete with the description
of the performance issue.  I must guess from this...

>   ldr dst1, [reg_base1, reg_index, #lsl 1]
>   ldr dst2, [reg_base2, reg_index, #lsl 1]
>   ldr dst3, [reg_base3, reg_index, #lsl 1]
> 
> into
> 
>   reg_index = reg_index << 1;
>   ldr dst1, [reg_base1, reg_index]
>   ldr dst2, [reg_base2, reg_index]
>   ldr dst3, [reg_base3, reg_index]

that it must have something to do with the smaller cpus, e.g. exynosm1,
based on the address cost tables.

I'll note for the record that you cannot hope to solve this with
the legitimize_address hook alone for the simple reason that it's not
called for legitimate addresses, of which (base + index * 2) is
a member.  The hook is only being called for illegitimate addresses.

To include legitimate addresses, you'd have to force out the address
components somewhere else.  Perhaps in the mov expanders, since that's
one of the very few places mem's are allowed.  You'd want to do this
only if !cse_not_expected.

OTOH, it's also the sort of thing that one would hope that CSE itself
would be able to handle.  Looking across various addresses, computing
sums of costs, and breaking out subexpressions as necessary.