[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

wdijkstr at arm dot com gcc-bugzilla@gcc.gnu.org
Wed Mar 16 11:59:00 GMT 2016


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #20 from Wilco <wdijkstr at arm dot com> ---
(In reply to Richard Henderson from comment #19)

> I wish that message had been a bit more complete with the description
> of the performance issue.  I must guess from this...
> 
> >   ldr dst1, [reg_base1, reg_index, #lsl 1]
> >   ldr dst2, [reg_base2, reg_index, #lsl 1]
> >   ldr dst3, [reg_base3, reg_index, #lsl 1]
> > 
> > into
> > 
> >   reg_index = reg_index << 1;
> >   ldr dst1, [reg_base1, reg_index]
> >   ldr dst2, [reg_base2, reg_index]
> >   ldr dst3, [reg_base3, reg_index]
> 
> that it must have something to do with the smaller cpus, e.g. exynosm1,
> based on the address cost tables.

Some CPUs emit seperate uops to do address shift by 1. So that would mean 6
uops in the first example vs 4 when doing the shift separately. According to
the cost tables this might actually be worse on exynosm1 as it has a cost for
any indexing.

> I'll note for the record that you cannot hope to solve this with
> the legitimize_address hook alone for the simple reason that it's not
> called for legitimate addresses, of which (base + index * 2) is
> a member.  The hook is only being called for illegitimate addresses.

Would it be possible to disallow expensive addresses initially, let CSE do its
thing and then merge addresses with 1 or 2 uses back into loads/stores?

> To include legitimate addresses, you'd have to force out the address
> components somewhere else.  Perhaps in the mov expanders, since that's
> one of the very few places mem's are allowed.  You'd want to do this
> only if !cse_not_expected.

And presumably only for addresses we would prefer to be CSEd, such as expensive
shifts or indexing.

> OTOH, it's also the sort of thing that one would hope that CSE itself
> would be able to handle.  Looking across various addresses, computing
> sums of costs, and breaking out subexpressions as necessary.

Yes that would be the ideal, but one can dream...

Did you post your patch btw? We should go ahead with that (with Jiong's minor
modification) as it looks significantly better overall.


More information about the Gcc-bugs mailing list