[PATCH 2/2] Aarch64: Add branch diluter pass

Andrea Corallo andrea.corallo@arm.com
Fri Jul 24 13:21:21 GMT 2020


Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Fri, Jul 24, 2020 at 09:01:33AM +0200, Andrea Corallo wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> >> Correct, it's a sliding window only because the real load address is not
>> >> known to the compiler and the algorithm is conservative.  I believe we
>> >> could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al
>> >> least) the granule size, then we should be able to insert 'nop aligned
>> >> labels' precisely.
>> >
>> > Yeah, we have similar issues on Power...  Our "granule" (fetch group
>> > size, in our terminology) is 32 typically, but we align functions to
>> > just 16.  This is causing some problems, but aligning to bigger
>> > boundaries isn't a very happy alternative either.  WIP...
>> 
>> Interesting, I was expecting other CPUs to have a similar mechanism.
>
> On old cpus (like the 970) there were at most two branch predictions per
> cycle.  Nowadays, all branches are predicted; not sure when this changed,
> it is pretty long ago already.
>
>> > (We don't have this exact same problem, because our non-ancient cores
>> > can just predict *all* branches in the same cycle).
>> >
>> >> My main fear is that given new cores tend to have big granules code size
>> >> would blow.  One advantage of the implemented algorithm is that even if
>> >> slightly conservative it's impacting code size only where an high branch
>> >> density shows up.
>> >
>> > What is "big granules" for you?
>> 
>> N1 is 8 instructions so 32 bytes as well, I guess this may grow further
>> (my speculation).
>
> It has to sooner rather than later, yeah.  Or the mechanism has to change
> more radically.  Interesting times ahead, I guess :-)

Indeed :)

> About your patch itself.  The basic idea seems fine (I didn't look too
> closely), but do you really need a new RTX class for this?  That is not
> very appetising...

Agree, OTOH I'm not sure about the other options on the table and their
impact, the advantage of this is that the impact is relatively
contained.

  Andrea


More information about the Gcc-patches mailing list