[PATCH 2/2] Aarch64: Add branch diluter pass

Fri Jul 24 13:21:21 GMT 2020

Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Fri, Jul 24, 2020 at 09:01:33AM +0200, Andrea Corallo wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> >> Correct, it's a sliding window only because the real load address is not
>> >> known to the compiler and the algorithm is conservative.  I believe we
>> >> could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al
>> >> least) the granule size, then we should be able to insert 'nop aligned
>> >> labels' precisely.
>> >
>> > Yeah, we have similar issues on Power...  Our "granule" (fetch group
>> > size, in our terminology) is 32 typically, but we align functions to
>> > just 16.  This is causing some problems, but aligning to bigger
>> > boundaries isn't a very happy alternative either.  WIP...
>> 
>> Interesting, I was expecting other CPUs to have a similar mechanism.
>
> On old cpus (like the 970) there were at most two branch predictions per
> cycle.  Nowadays, all branches are predicted; not sure when this changed,
> it is pretty long ago already.
>
>> > (We don't have this exact same problem, because our non-ancient cores
>> > can just predict *all* branches in the same cycle).
>> >
>> >> My main fear is that given new cores tend to have big granules code size
>> >> would blow.  One advantage of the implemented algorithm is that even if
>> >> slightly conservative it's impacting code size only where an high branch
>> >> density shows up.
>> >
>> > What is "big granules" for you?
>> 
>> N1 is 8 instructions so 32 bytes as well, I guess this may grow further
>> (my speculation).
>
> It has to sooner rather than later, yeah.  Or the mechanism has to change
> more radically.  Interesting times ahead, I guess :-)

Indeed :)

> About your patch itself.  The basic idea seems fine (I didn't look too
> closely), but do you really need a new RTX class for this?  That is not
> very appetising...

Agree, OTOH I'm not sure about the other options on the table and their
impact, the advantage of this is that the impact is relatively
contained.

  Andrea