This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Exploiting knowing sizes of string.


On Thu, Jun 04, 2015 at 09:06:40PM +0100, Richard Earnshaw wrote:
> On 04/06/15 20:57, Jakub Jelinek wrote:
> > On Thu, Jun 04, 2015 at 06:36:33PM +0200, OndÅej BÃlka wrote:
> >> On Thu, Jun 04, 2015 at 04:01:50PM +0000, Joseph Myers wrote:
> >>> On Thu, 4 Jun 2015, Richard Earnshaw wrote:
> >>>
> >>>>> Change that into
> >>>>>
> >>>>> int foo(char *s)
> >>>>> {
> >>>>>   int l = strlen (s);
> >>>>>   char *p = memchr (s, 'a', l);
> >>>>>   return p+l;
> >>>>> }
> > 
> >> And Joseph you shouldn't restrict yourself only to values that are
> >> present in variables to cover case where its implicit one from strcpy
> >> converted to stpcpy.
> > 
> > memchr isn't handled in that pass right now at all, of course it could be
> > added, shouldn't be really hard.  Feel free to file a PR and/or write
> > a patch.
> > 
> > As for e.g. the inlining of the first (or a few more) iterations of strcmp
> > etc., that is certainly something that can be done in the compiler too and
> > the compiler should have much better information whether to do it or not,
> > as it shouldn't be done for -Os, or for basic blocks or functions predicted
> > cold, because it enlarges the code size quite a lot.
> 
> You should also be wary of making the strings passed to the library
> functions not naturally aligned.  That can result in the code in the
> library having to take a much slower path to regain any alignment done
> by peeling the initial iteration(s).  And if you're going to pass the
> full string(s) anyway, then you'd better be pretty sure that doing the
> check before the call is really likely to succeed.
> 
You can for these functions. If that causes problem its because your
implementation is already slow. 
Trying peeling is wrong way. With unaligned loads just check if you cross 
page boundary with both arguments and do unaligned load, no peeling necesary.
For aligned loads its bit more tricky, you could emulate initial
unaligned load by shifts, no peeling necessary.

As for strcmp you have following profile you are optimizing a cold code.
As majority of 90% of strings are misaligned and relatively misaligned
you need to optimize that. See following overall statistic, its bit
skewed that make causes majority of strcmp's but also for most other
programs alignment is rare. See following or more detailed strcmp
profile in my previous mail.

average size  21.8 calls 71742523 succeed  99.2% latencies -14.2 -14.0
s1    aligned to 4 bytes  26.6% aligned to 8 bytes  14.5% aligned to 16 bytes   8.1%
s2    aligned to 4 bytes  49.9% aligned to 8 bytes  41.9% aligned to 16 bytes  37.2%
s1-s2 aligned to 4 bytes  23.0% aligned to 8 bytes  11.2% aligned to 16 bytes   5.5%
n <= 0:  32.4% n <= 1:  34.7% n <= 2:  35.6% n <= 3:  36.2%  n <= 4:
36.2% n <= 8:  36.5% n <= 16:  38.9% n <= 32:  51.3% n <= 64: 100.0%

As you on aarch use byte-by-byte loop for 94.5% of inputs its clear that
you have slow implementation.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]