[PATH, SH] Small builtin_strlen improvement
Oleg Endo
oleg.endo@t-online.de
Fri Apr 18 12:53:00 GMT 2014
Sorry for the delayed reply.
On Mon, 2014-03-31 at 09:44 +0200, Christian Bruel wrote:
> On 03/30/2014 11:02 PM, Oleg Endo wrote:
> > Hi,
> >
> > On Wed, 2014-03-26 at 08:58 +0100, Christian Bruel wrote:
> >
> >> This patches adds a few instructions to the inlined builtin_strlen to
> >> unroll the remaining bytes for word-at-a-time loop. This enables to have
> >> 2 distinct execution paths (no fall-thru in the byte-at-a-time loop),
> >> allowing block alignment assignation. This partially improves the
> >> problem reported with by Oleg. in [Bug target/0539] New: [SH] builtin
> >> string functions ignore loop and label alignment
> > Actually, my original concern was the (mis)alignment of the 4 byte inner
> > loop. AFAIR it's better for the SH pipeline if the first insn of a loop
> > is 4 byte aligned.
>
> yes, this is why I haven't closed the PR. IMHO the problem is with the
> non-aligned loop stems from to the generic alignment code in final.c.
> changing branch frequencies is quite impacting to BB reordering as well.
> Further tuning of static branch estimations, or tuning of the LOOP_ALIGN
> macro is needed.
OK, I've updated PR 60539 accordingly.
> Note that my branch estimations in this code is very
> empirical, a dynamic profiling benchmarking would be nice as well.
> My point was just that forcing a local .align in this code is a
> workaround, as we should be able to rely on generic reordering/align
> code for this. So the tuning of loop alignment is more global (and well
> exhibited here indeed)
I think that those two are separate issues. I've opened a new PR 60884
for this. Let's continue the discussions and experiments there.
Cheers,
Oleg
More information about the Gcc-patches
mailing list