This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATH, SH] Small builtin_strlen improvement
- From: Oleg Endo <oleg dot endo at t-online dot de>
- To: Christian Bruel <christian dot bruel at st dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Kaz Kojima <kkojima at rr dot iij4u dot or dot jp>
- Date: Fri, 18 Apr 2014 14:41:50 +0200
- Subject: Re: [PATH, SH] Small builtin_strlen improvement
- Authentication-results: sourceware.org; auth=none
- References: <533288C1 dot 1080306 at st dot com> <1396213352 dot 2352 dot 12 dot camel at yam-132-YW-E178-FTW> <53391CDB dot 3030905 at st dot com>
Sorry for the delayed reply.
On Mon, 2014-03-31 at 09:44 +0200, Christian Bruel wrote:
> On 03/30/2014 11:02 PM, Oleg Endo wrote:
> > Hi,
> >
> > On Wed, 2014-03-26 at 08:58 +0100, Christian Bruel wrote:
> >
> >> This patches adds a few instructions to the inlined builtin_strlen to
> >> unroll the remaining bytes for word-at-a-time loop. This enables to have
> >> 2 distinct execution paths (no fall-thru in the byte-at-a-time loop),
> >> allowing block alignment assignation. This partially improves the
> >> problem reported with by Oleg. in [Bug target/0539] New: [SH] builtin
> >> string functions ignore loop and label alignment
> > Actually, my original concern was the (mis)alignment of the 4 byte inner
> > loop. AFAIR it's better for the SH pipeline if the first insn of a loop
> > is 4 byte aligned.
>
> yes, this is why I haven't closed the PR. IMHO the problem is with the
> non-aligned loop stems from to the generic alignment code in final.c.
> changing branch frequencies is quite impacting to BB reordering as well.
> Further tuning of static branch estimations, or tuning of the LOOP_ALIGN
> macro is needed.
OK, I've updated PR 60539 accordingly.
> Note that my branch estimations in this code is very
> empirical, a dynamic profiling benchmarking would be nice as well.
> My point was just that forcing a local .align in this code is a
> workaround, as we should be able to rely on generic reordering/align
> code for this. So the tuning of loop alignment is more global (and well
> exhibited here indeed)
I think that those two are separate issues. I've opened a new PR 60884
for this. Let's continue the discussions and experiments there.
Cheers,
Oleg