[PATCH 2/2] Aarch64: Add branch diluter pass
Segher Boessenkool
segher@kernel.crashing.org
Fri Jul 24 11:53:47 GMT 2020
Hi!
On Fri, Jul 24, 2020 at 09:01:33AM +0200, Andrea Corallo wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >> Correct, it's a sliding window only because the real load address is not
> >> known to the compiler and the algorithm is conservative. I believe we
> >> could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al
> >> least) the granule size, then we should be able to insert 'nop aligned
> >> labels' precisely.
> >
> > Yeah, we have similar issues on Power... Our "granule" (fetch group
> > size, in our terminology) is 32 typically, but we align functions to
> > just 16. This is causing some problems, but aligning to bigger
> > boundaries isn't a very happy alternative either. WIP...
>
> Interesting, I was expecting other CPUs to have a similar mechanism.
On old cpus (like the 970) there were at most two branch predictions per
cycle. Nowadays, all branches are predicted; not sure when this changed,
it is pretty long ago already.
> > (We don't have this exact same problem, because our non-ancient cores
> > can just predict *all* branches in the same cycle).
> >
> >> My main fear is that given new cores tend to have big granules code size
> >> would blow. One advantage of the implemented algorithm is that even if
> >> slightly conservative it's impacting code size only where an high branch
> >> density shows up.
> >
> > What is "big granules" for you?
>
> N1 is 8 instructions so 32 bytes as well, I guess this may grow further
> (my speculation).
It has to sooner rather than later, yeah. Or the mechanism has to change
more radically. Interesting times ahead, I guess :-)
About your patch itself. The basic idea seems fine (I didn't look too
closely), but do you really need a new RTX class for this? That is not
very appetising...
Segher
More information about the Gcc-patches
mailing list