[PATH, SH] Small builtin_strlen improvement

Fri Apr 18 12:53:00 GMT 2014

Sorry for the delayed reply.

On Mon, 2014-03-31 at 09:44 +0200, Christian Bruel wrote:
> On 03/30/2014 11:02 PM, Oleg Endo wrote:
> > Hi,
> >
> > On Wed, 2014-03-26 at 08:58 +0100, Christian Bruel wrote:
> >
> >> This patches adds a few instructions to the inlined builtin_strlen to
> >> unroll the remaining bytes for word-at-a-time loop. This enables to have
> >> 2 distinct execution paths (no fall-thru in the byte-at-a-time loop),
> >> allowing block alignment assignation. This partially improves the
> >> problem reported with by Oleg. in [Bug target/0539] New: [SH] builtin
> >> string functions ignore loop and label alignment
> > Actually, my original concern was the (mis)alignment of the 4 byte inner
> > loop.  AFAIR it's better for the SH pipeline if the first insn of a loop
> > is 4 byte aligned.
> 
> yes, this is why I haven't closed the PR. IMHO the problem is with the
> non-aligned loop stems from to the generic alignment code in final.c.
> changing branch frequencies is quite impacting to BB reordering as well.
> Further tuning of static branch estimations, or tuning of the LOOP_ALIGN
> macro is needed. 

OK, I've updated PR 60539 accordingly.

> Note that my branch estimations in this code is very
> empirical, a dynamic profiling benchmarking would be nice as well.
> My point was just that forcing a local .align in this code is a
> workaround, as we should be able to rely on generic reordering/align
> code  for this. So the tuning of loop alignment is more global (and well
> exhibited here indeed)

I think that those two are separate issues.  I've opened a new PR 60884
for this.  Let's continue the discussions and experiments there.

Cheers,
Oleg