Builtin/headers: Constant arguments and adding extra entry points.

Ondřej Bílka neleai@seznam.cz
Thu Jun 4 19:45:00 GMT 2015


I start with simplest suggestion which is precomputing constant arguments 
like saving multiplication cost in strchr with:

char *strchr_c(char *x, unsigned long u);
#define strchr(x,c) \
(__builtin_constant_p(c) ? strchr_c (x, c * (~0ULL / 255)) : strchr (x,c))


Then I am working on using constant n for memset and memcpy. These
cannot be done in gcc alone as you need to choose implementation based
on cpu and for different sizes different are best for different cpu.

Some users try to always do inlining like in rte_memcpy. That works
better than gcc one as its optimized for newer processors.

For sizes beyond 64 bytes trying to fully expand memcpy and memset
doesn't make lot of sense as libcall is faster.

To get benefits of inlining I now work on following approach. For sizes
< 64 use builtin. For n 64-1024 make indirect jump according to
cpu-specific table that you get from libc.

That would allow do unrolling upto size 1024 into sequence of movsqa's
without increasing cache footprint much. Same with memset(x,0,n) except you
need to pass 0.0 argument to have zero xmm register.


Entry point for aligned input doesn't make lot of sense. As input is
short you want to go into copy of header and you save difference between
aligned/unaligned load and crosspage check. As you need to duplicate
header icache cost could be bigger.

What makes sense is inline headers instead expanding whole function.
I am looking at following expansion of strcmp/memcmp:

int
inline_strcmp (const char *x, const char *y)
{
  int r = *((unsigned char *) x) - *((unsigned char *) y);
  return r ? r : strcmp(x, y);
}

int
inline_memcmp (const void *x, const void *y, size_t n)
{
  if (n == 0)
    return 0;
  int r = *((unsigned char *) x) - *((unsigned char *) y);
  return r ? r : memcmp(x + 1, y + 1, n - 1);
}

Note that end is not tested as its unlikely. Same transformation
could be done for strncmp, strcasecmp and strncasecmp but we at libc
would need to improve tls access of tolower which now requires call
which defeats purpose of inline.

That gives considerable savings as in my profile 32.4% calls 
of strcmp and calls of 49.5% differ in first byte. From profiling 
data these branches are almost completely predictable as I see long
sequences of calls that differ at 0 followed by sequence that differ
in other. From programs measured it could harm only make. See attached
data.

On x64 adding match for first 16 bytes using sse would also make sense.
except make all other programs have 90% of calls differ in first 16 bytes.

Same could be done for strchr/memchr headers where first 16 bytes also
form majority.

In case of make we should check if in strchr(x,'/') we have x[0] == '/'
which happens 85.1% times.

In generic case same header would be bigger so question if its
profitable versus code size becomes more significant.

For similar questions I have on todo list add counters for userspace
profiling. Decision if some optimization is profitable depends on
details like average size of input that cannot be directly determined
from profile. For example in strstr we would need digraph that occurs
least often.

I don't know if that could be integrated into -fprofile-generate
-fprofile-use or done before that as it would change control flow or do
it just by macros. If we could convince people to do compilation with
profiling it would also allow to directly precompute tables like below
without large header hacks, and make things like calculating perfect 
hashing possible without external tools.

For precomputed tables I so far know two use-cases

One case would be memchr("abc",x,3) or strchr("abc",x) pattern. 
I found that in libc to test membership which is obviously ineffective.
Second use case is strpbrk family.

These have in common that they could benefit from precomputed table with
1 for present bytes and 0 otherwise. While I could create such table I
couldn't do that without 256 warnings. Following constructs table just
fine but complains

warning: initializer element is not a constant expression

int
main()
{
  static char x[256]  =  {strchr("aaa", 'a') == NULL, strchr("aaa", 'b') == NULL};
  printf("%i %i %s", x[0],x[1], x);
}

Same trick could be used for making bitwise array.

Also its weird what you could and cannot do in static initializers. 
I was surprised that I could use strchr but couldn't evalutate "abc"[2]
as 'c'.


When bug above gets fixed that allows these functions to be lot faster,
as most of time you get match in first 8 bytes.
-------------- next part --------------
Statistic of comparison routines collected with dryrun, for source see

kam.mff.cuni.cz/~ondra/dryrun.tar.bz2


summary strcmp:


replaying ls

average size   0.2 calls      246 succeed  93.1% latencies   1.1   2.8
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s2    aligned to 4 bytes  21.5% aligned to 8 bytes  10.2% aligned to 16 bytes   3.7%
s1-s2 aligned to 4 bytes  21.5% aligned to 8 bytes  10.2% aligned to 16 bytes   3.7%
n <= 0:  88.2% n <= 1:  93.5% n <= 2: 100.0% n <= 3: 100.0%  n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying bash

average size   4.0 calls      711 succeed  57.7% latencies  -4.6  -4.6
s1    aligned to 4 bytes  65.4% aligned to 8 bytes  56.0% aligned to 16 bytes   2.0%
s2    aligned to 4 bytes  58.6% aligned to 8 bytes  50.8% aligned to 16 bytes   3.4%
s1-s2 aligned to 4 bytes  49.6% aligned to 8 bytes  39.9% aligned to 16 bytes  37.1%
n <= 0:   0.1% n <= 1:  49.9% n <= 2:  60.6% n <= 3:  64.3%  n <= 4:  71.6% n <= 8:  81.3% n <= 16:  99.4% n <= 32: 100.0% n <= 64: 100.0%
replaying dircolors

average size   1.0 calls       54 succeed  96.3% latencies  -5.1  -6.1
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes  98.1%
s2    aligned to 4 bytes   1.9% aligned to 8 bytes   1.9% aligned to 16 bytes   1.9%
s1-s2 aligned to 4 bytes   1.9% aligned to 8 bytes   1.9% aligned to 16 bytes   1.9%
n <= 0:  87.0% n <= 1:  87.0% n <= 2:  87.0% n <= 3:  87.0%  n <= 4:  88.9% n <= 8:  94.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying ps

average size   1.7 calls      239 succeed  87.0% latencies   3.6  16.0
s1    aligned to 4 bytes  94.1% aligned to 8 bytes  88.3% aligned to 16 bytes  88.3%
s2    aligned to 4 bytes  28.0% aligned to 8 bytes  13.4% aligned to 16 bytes  11.3%
s1-s2 aligned to 4 bytes  28.5% aligned to 8 bytes  13.0% aligned to 16 bytes  12.6%
n <= 0:  58.6% n <= 1:  77.0% n <= 2:  77.4% n <= 3:  81.6%  n <= 4:  84.1% n <= 8:  94.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying ssh-add

average size  11.5 calls      221 succeed   0.9% latencies   1.4   0.8
s1    aligned to 4 bytes  97.3% aligned to 8 bytes  97.3% aligned to 16 bytes  97.3%
s2    aligned to 4 bytes  92.8% aligned to 8 bytes  91.4% aligned to 16 bytes  91.0%
s1-s2 aligned to 4 bytes  94.6% aligned to 8 bytes  93.2% aligned to 16 bytes  92.8%
n <= 0:   1.4% n <= 1:   1.4% n <= 2:   1.8% n <= 3:  12.7%  n <= 4:  13.6% n <= 8:  29.0% n <= 16:  83.3% n <= 32: 100.0% n <= 64: 100.0%
replaying ssh-keygen

average size  11.5 calls      222 succeed   0.9% latencies   1.7   2.0
s1    aligned to 4 bytes  97.3% aligned to 8 bytes  97.3% aligned to 16 bytes  97.3%
s2    aligned to 4 bytes  92.3% aligned to 8 bytes  91.0% aligned to 16 bytes  90.5%
s1-s2 aligned to 4 bytes  94.1% aligned to 8 bytes  92.8% aligned to 16 bytes  92.3%
n <= 0:   1.4% n <= 1:   1.4% n <= 2:   1.8% n <= 3:  12.6%  n <= 4:  13.5% n <= 8:  29.3% n <= 16:  83.3% n <= 32: 100.0% n <= 64: 100.0%
replaying mc

average size   7.3 calls    16244 succeed  62.2% latencies -182.0 -181.9
s1    aligned to 4 bytes  95.6% aligned to 8 bytes  95.3% aligned to 16 bytes  95.3%
s2    aligned to 4 bytes  80.4% aligned to 8 bytes  78.6% aligned to 16 bytes  77.3%
s1-s2 aligned to 4 bytes  79.6% aligned to 8 bytes  78.2% aligned to 16 bytes  76.9%
n <= 0:  28.6% n <= 1:  32.1% n <= 2:  35.6% n <= 3:  43.6%  n <= 4:  48.4% n <= 8:  61.3% n <= 16:  87.1% n <= 32:  99.7% n <= 64:  99.9%
replaying killall

average size   0.1 calls      281 succeed  99.3% latencies  10.9   0.5
s1    aligned to 4 bytes   0.4% aligned to 8 bytes   0.4% aligned to 16 bytes   0.4%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s1-s2 aligned to 4 bytes   0.4% aligned to 8 bytes   0.4% aligned to 16 bytes   0.4%
n <= 0:  97.5% n <= 1:  99.6% n <= 2:  99.6% n <= 3:  99.6%  n <= 4:  99.6% n <= 8:  99.6% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying iceweasel

average size   5.8 calls    13136 succeed  86.7% latencies -39.2 -33.5
s1    aligned to 4 bytes  32.5% aligned to 8 bytes  14.5% aligned to 16 bytes   7.6%
s2    aligned to 4 bytes  31.8% aligned to 8 bytes  16.7% aligned to 16 bytes  10.9%
s1-s2 aligned to 4 bytes  28.6% aligned to 8 bytes  14.1% aligned to 16 bytes   6.8%
n <= 0:  33.0% n <= 1:  41.5% n <= 2:  45.8% n <= 3:  54.4%  n <= 4:  58.6% n <= 8:  68.4% n <= 16:  92.3% n <= 32:  99.9% n <= 64: 100.0%
replaying mutt

average size  28.3 calls    27644 succeed  39.4% latencies -157.4 -134.1
s1    aligned to 4 bytes  99.8% aligned to 8 bytes  73.0% aligned to 16 bytes  73.0%
s2    aligned to 4 bytes  85.0% aligned to 8 bytes  61.2% aligned to 16 bytes  59.0%
s1-s2 aligned to 4 bytes  84.9% aligned to 8 bytes  76.4% aligned to 16 bytes  74.3%
n <= 0:  19.0% n <= 1:  33.3% n <= 2:  35.0% n <= 3:  35.8%  n <= 4:  37.2% n <= 8:  39.2% n <= 16:  40.1% n <= 32:  56.7% n <= 64:  89.3%
replaying irb

average size   3.1 calls    10058 succeed  39.2% latencies -102.7 -98.0
s1    aligned to 4 bytes   0.3% aligned to 8 bytes   0.3% aligned to 16 bytes   0.1%
s2    aligned to 4 bytes  21.4% aligned to 8 bytes   8.2% aligned to 16 bytes   4.4%
s1-s2 aligned to 4 bytes  41.6% aligned to 8 bytes  28.5% aligned to 16 bytes  13.0%
n <= 0:   2.0% n <= 1:   9.2% n <= 2:  33.5% n <= 3:  74.8%  n <= 4:  84.7% n <= 8:  99.9% n <= 16:  99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying vim

average size   1.5 calls   161275 succeed  84.9% latencies 105.3 124.3
s1    aligned to 4 bytes  75.5% aligned to 8 bytes  71.3% aligned to 16 bytes  70.2%
s2    aligned to 4 bytes  47.0% aligned to 8 bytes  41.2% aligned to 16 bytes  39.8%
s1-s2 aligned to 4 bytes  45.2% aligned to 8 bytes  39.4% aligned to 16 bytes  37.2%
n <= 0:  54.1% n <= 1:  73.1% n <= 2:  81.8% n <= 3:  86.7%  n <= 4:  90.6% n <= 8:  96.7% n <= 16:  99.3% n <= 32: 100.0% n <= 64: 100.0%
replaying ar

average size   0.2 calls  1000000 succeed  99.9% latencies   5.0   4.8
s1    aligned to 4 bytes  25.0% aligned to 8 bytes  13.0% aligned to 16 bytes   6.1%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s1-s2 aligned to 4 bytes  25.0% aligned to 8 bytes  13.0% aligned to 16 bytes   6.1%
n <= 0:  90.9% n <= 1:  97.6% n <= 2:  98.3% n <= 3:  99.6%  n <= 4:  99.7% n <= 8:  99.9% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying make

average size  30.1 calls  1000000 succeed  98.7% latencies   1.2   1.6
s1    aligned to 4 bytes  28.4% aligned to 8 bytes  20.7% aligned to 16 bytes   9.8%
s2    aligned to 4 bytes  26.6% aligned to 8 bytes  18.8% aligned to 16 bytes   7.7%
s1-s2 aligned to 4 bytes  22.2% aligned to 8 bytes  12.3% aligned to 16 bytes   4.5%
n <= 0:   4.2% n <= 1:   4.2% n <= 2:   5.3% n <= 3:   5.3%  n <= 4:   5.3% n <= 8:   5.3% n <= 16:   8.9% n <= 32:  77.8% n <= 64: 100.0%
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1

average size   5.8 calls    15151 succeed  37.9% latencies   2.9  -2.6
s1    aligned to 4 bytes  40.7% aligned to 8 bytes  37.0% aligned to 16 bytes  36.0%
s2    aligned to 4 bytes  97.1% aligned to 8 bytes  96.8% aligned to 16 bytes  45.7%
s1-s2 aligned to 4 bytes  40.3% aligned to 8 bytes  36.6% aligned to 16 bytes  35.1%
n <= 0:  12.5% n <= 1:  14.5% n <= 2:  15.0% n <= 3:  58.5%  n <= 4:  68.1% n <= 8:  80.2% n <= 16:  94.6% n <= 32:  98.0% n <= 64: 100.0%
replaying gcc

average size   0.5 calls      235 succeed  93.6% latencies   2.9   4.0
s1    aligned to 4 bytes  30.2% aligned to 8 bytes  17.0% aligned to 16 bytes   9.4%
s2    aligned to 4 bytes   5.5% aligned to 8 bytes   4.7% aligned to 16 bytes   4.7%
s1-s2 aligned to 4 bytes  25.1% aligned to 8 bytes  19.1% aligned to 16 bytes  11.1%
n <= 0:  74.9% n <= 1:  92.3% n <= 2:  93.2% n <= 3:  94.5%  n <= 4:  98.7% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying /bin/bash

average size   5.8 calls     2108 succeed  37.3% latencies -39.0 -39.5
s1    aligned to 4 bytes  71.4% aligned to 8 bytes  54.8% aligned to 16 bytes   2.6%
s2    aligned to 4 bytes  59.3% aligned to 8 bytes  43.8% aligned to 16 bytes   1.8%
s1-s2 aligned to 4 bytes  50.9% aligned to 8 bytes  39.0% aligned to 16 bytes  35.7%
n <= 0:   0.1% n <= 1:  34.4% n <= 2:  45.2% n <= 3:  47.9%  n <= 4:  59.3% n <= 8:  68.5% n <= 16:  99.1% n <= 32: 100.0% n <= 64: 100.0%
replaying /usr/bin/lsof

average size   9.4 calls       56 succeed  33.9% latencies  29.8  29.5
s1    aligned to 4 bytes  98.2% aligned to 8 bytes  98.2% aligned to 16 bytes  98.2%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s1-s2 aligned to 4 bytes  98.2% aligned to 8 bytes  98.2% aligned to 16 bytes  98.2%
n <= 0:   1.8% n <= 1:  30.4% n <= 2:  30.4% n <= 3:  30.4%  n <= 4:  37.5% n <= 8:  55.4% n <= 16:  78.6% n <= 32: 100.0% n <= 64: 100.0%
replaying find

average size   0.2 calls      297 succeed  96.3% latencies  -0.9  -9.5
s1    aligned to 4 bytes  26.6% aligned to 8 bytes  15.8% aligned to 16 bytes   9.8%
s2    aligned to 4 bytes  20.2% aligned to 8 bytes   0.3% aligned to 16 bytes   0.3%
s1-s2 aligned to 4 bytes  31.3% aligned to 8 bytes  16.2% aligned to 16 bytes   8.8%
n <= 0:  93.6% n <= 1:  97.0% n <= 2:  97.3% n <= 3:  97.6%  n <= 4:  98.3% n <= 8:  99.7% n <= 16:  99.7% n <= 32: 100.0% n <= 64: 100.0%
replaying pager

average size   0.8 calls      116 succeed  94.8% latencies -18.6 -18.6
s1    aligned to 4 bytes  93.1% aligned to 8 bytes  92.2% aligned to 16 bytes  91.4%
s2    aligned to 4 bytes   7.8% aligned to 8 bytes   7.8% aligned to 16 bytes   6.9%
s1-s2 aligned to 4 bytes   6.0% aligned to 8 bytes   5.2% aligned to 16 bytes   5.2%
n <= 0:  75.0% n <= 1:  86.2% n <= 2:  87.9% n <= 3:  89.7%  n <= 4:  94.0% n <= 8:  98.3% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying man

average size   1.0 calls     1723 succeed  97.6% latencies   1.6 -13.4
s1    aligned to 4 bytes  37.8% aligned to 8 bytes  26.7% aligned to 16 bytes  19.4%
s2    aligned to 4 bytes  56.3% aligned to 8 bytes  47.8% aligned to 16 bytes  38.2%
s1-s2 aligned to 4 bytes  34.5% aligned to 8 bytes  23.6% aligned to 16 bytes  18.7%
n <= 0:  71.7% n <= 1:  92.9% n <= 2:  93.4% n <= 3:  93.6%  n <= 4:  93.9% n <= 8:  97.3% n <= 16:  98.8% n <= 32:  99.5% n <= 64: 100.0%
replaying troff

average size   1.3 calls   178664 succeed  94.4% latencies -63.4 -59.8
s1    aligned to 4 bytes  86.8% aligned to 8 bytes  84.8% aligned to 16 bytes  83.9%
s2    aligned to 4 bytes  27.7% aligned to 8 bytes  17.3% aligned to 16 bytes   9.8%
s1-s2 aligned to 4 bytes  27.1% aligned to 8 bytes  16.5% aligned to 16 bytes   9.2%
n <= 0:  57.9% n <= 1:  63.9% n <= 2:  78.9% n <= 3:  90.7%  n <= 4:  95.6% n <= 8:  97.3% n <= 16:  99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying grotty

average size   6.1 calls     5553 succeed  62.8% latencies -18.7 -31.6
s1    aligned to 4 bytes  99.0% aligned to 8 bytes  98.9% aligned to 16 bytes  98.9%
s2    aligned to 4 bytes  90.2% aligned to 8 bytes  89.6% aligned to 16 bytes  89.4%
s1-s2 aligned to 4 bytes  89.2% aligned to 8 bytes  88.6% aligned to 16 bytes  88.3%
n <= 0:  11.1% n <= 1:  16.4% n <= 2:  31.1% n <= 3:  49.3%  n <= 4:  55.3% n <= 8:  56.4% n <= 16:  98.4% n <= 32: 100.0% n <= 64: 100.0%
replaying groff

average size   0.2 calls      696 succeed  98.4% latencies  12.6  10.3
s1    aligned to 4 bytes  91.7% aligned to 8 bytes  90.9% aligned to 16 bytes  90.9%
s2    aligned to 4 bytes  33.5% aligned to 8 bytes  18.4% aligned to 16 bytes   9.1%
s1-s2 aligned to 4 bytes  25.7% aligned to 8 bytes   9.9% aligned to 16 bytes   0.6%
n <= 0:  88.8% n <= 1:  98.3% n <= 2:  99.1% n <= 3:  99.6%  n <= 4:  99.7% n <= 8:  99.9% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying as

average size   6.7 calls     5198 succeed  36.7% latencies  24.5  20.4
s1    aligned to 4 bytes  28.9% aligned to 8 bytes  14.7% aligned to 16 bytes   7.4%
s2    aligned to 4 bytes  28.9% aligned to 8 bytes  14.7% aligned to 16 bytes   7.3%
s1-s2 aligned to 4 bytes  74.1% aligned to 8 bytes  67.9% aligned to 16 bytes  64.6%
n <= 0:   4.0% n <= 1:  10.4% n <= 2:  13.8% n <= 3:  18.6%  n <= 4:  25.0% n <= 8:  67.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%

summary memcmp:

replaying ls

average size   0.4 calls     9641 succeed 100.0% latencies  -6.2  -7.0
s1    aligned to 4 bytes  27.2% aligned to 8 bytes  12.3% aligned to 16 bytes   2.5%
s2    aligned to 4 bytes  26.0% aligned to 8 bytes  15.6% aligned to 16 bytes   8.4%
s1-s2 aligned to 4 bytes  25.0% aligned to 8 bytes  12.6% aligned to 16 bytes   6.4%
n <= 0:  63.7% n <= 1:  97.1% n <= 2: 100.0% n <= 3: 100.0%  n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying awk

average size   0.5 calls      158 succeed  93.0% latencies   0.9   0.9
s1    aligned to 4 bytes  51.3% aligned to 8 bytes  46.8% aligned to 16 bytes  46.8%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s1-s2 aligned to 4 bytes  51.3% aligned to 8 bytes  46.8% aligned to 16 bytes  46.8%
n <= 0:  78.5% n <= 1:  89.9% n <= 2:  93.7% n <= 3:  96.2%  n <= 4:  97.5% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying mc

average size   0.4 calls     1942 succeed  98.5% latencies -199.0 -199.0
s1    aligned to 4 bytes  31.3% aligned to 8 bytes  21.9% aligned to 16 bytes  16.3%
s2    aligned to 4 bytes  28.7% aligned to 8 bytes  22.8% aligned to 16 bytes  14.8%
s1-s2 aligned to 4 bytes  28.8% aligned to 8 bytes  19.4% aligned to 16 bytes  14.4%
n <= 0:  79.2% n <= 1:  96.7% n <= 2:  96.7% n <= 3:  96.7%  n <= 4:  98.7% n <= 8:  99.4% n <= 16:  99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying mutt

average size   4.7 calls    29693 succeed 100.0% latencies -251.6 -253.5
s1    aligned to 4 bytes  99.8% aligned to 8 bytes   1.4% aligned to 16 bytes   1.4%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes  99.8% aligned to 16 bytes  99.8%
s1-s2 aligned to 4 bytes  99.8% aligned to 8 bytes   1.3% aligned to 16 bytes   1.3%
n <= 0:   8.7% n <= 1:   8.9% n <= 2:   8.9% n <= 3:   8.9%  n <= 4:   8.9% n <= 8:  98.9% n <= 16:  98.9% n <= 32: 100.0% n <= 64: 100.0%
replaying irb

average size   2.9 calls      306 succeed  88.2% latencies -109.3 -112.0
s1    aligned to 4 bytes  34.0% aligned to 8 bytes  19.0% aligned to 16 bytes  13.7%
s2    aligned to 4 bytes  82.4% aligned to 8 bytes  73.5% aligned to 16 bytes  35.9%
s1-s2 aligned to 4 bytes  34.6% aligned to 8 bytes  19.9% aligned to 16 bytes  12.4%
n <= 0:  67.3% n <= 1:  69.9% n <= 2:  80.4% n <= 3:  81.0%  n <= 4:  84.6% n <= 8:  87.9% n <= 16:  89.5% n <= 32:  99.3% n <= 64: 100.0%
replaying vim

average size   1.5 calls   467979 succeed  99.1% latencies 101.4  95.6
s1    aligned to 4 bytes  25.6% aligned to 8 bytes  15.6% aligned to 16 bytes  10.0%
s2    aligned to 4 bytes  59.5% aligned to 8 bytes  47.0% aligned to 16 bytes  46.3%
s1-s2 aligned to 4 bytes  20.4% aligned to 8 bytes   8.6% aligned to 16 bytes   3.6%
n <= 0:   6.7% n <= 1:  52.2% n <= 2:  94.6% n <= 3:  98.4%  n <= 4:  99.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying make

average size   7.2 calls  1000000 succeed  99.5% latencies   1.3   1.4
s1    aligned to 4 bytes  19.2% aligned to 8 bytes  12.3% aligned to 16 bytes   8.4%
s2    aligned to 4 bytes  27.5% aligned to 8 bytes  15.8% aligned to 16 bytes   6.6%
s1-s2 aligned to 4 bytes  24.8% aligned to 8 bytes  12.2% aligned to 16 bytes   6.0%
n <= 0:  72.1% n <= 1:  75.0% n <= 2:  75.3% n <= 3:  75.3%  n <= 4:  75.3% n <= 8:  76.1% n <= 16:  76.6% n <= 32: 100.0% n <= 64: 100.0%
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1

average size   4.4 calls     6108 succeed  34.0% latencies   0.0  10.5
s1    aligned to 4 bytes  27.7% aligned to 8 bytes   2.2% aligned to 16 bytes   1.5%
s2    aligned to 4 bytes  80.8% aligned to 8 bytes  79.2% aligned to 16 bytes  42.5%
s1-s2 aligned to 4 bytes  27.9% aligned to 8 bytes   3.3% aligned to 16 bytes   2.4%
n <= 0:  23.8% n <= 1:  26.5% n <= 2:  27.2% n <= 3:  27.4%  n <= 4:  52.5% n <= 8:  96.1% n <= 16:  99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying gcc

average size   0.0 calls    63189 succeed  99.9% latencies   1.6   1.7
s1    aligned to 4 bytes   3.4% aligned to 8 bytes   3.2% aligned to 16 bytes   3.1%
s2    aligned to 4 bytes  26.5% aligned to 8 bytes  11.9% aligned to 16 bytes   6.6%
s1-s2 aligned to 4 bytes  24.7% aligned to 8 bytes  13.2% aligned to 16 bytes   7.7%
n <= 0:  96.3% n <= 1:  99.7% n <= 2:  99.9% n <= 3:  99.9%  n <= 4:  99.9% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying pager

average size   0.9 calls      118 succeed  56.8% latencies -18.2 -18.2
s1    aligned to 4 bytes  23.7% aligned to 8 bytes  15.3% aligned to 16 bytes   8.5%
s2    aligned to 4 bytes  21.2% aligned to 8 bytes  16.9% aligned to 16 bytes  13.6%
s1-s2 aligned to 4 bytes  30.5% aligned to 8 bytes  17.8% aligned to 16 bytes  11.9%
n <= 0:  54.2% n <= 1:  56.8% n <= 2:  98.3% n <= 3:  98.3%  n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying man

average size  12.3 calls      119 succeed  49.6% latencies -16.9  -5.0
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
n <= 0:   0.8% n <= 1:  21.8% n <= 2:  21.8% n <= 3:  21.8%  n <= 4:  21.8% n <= 8:  50.4% n <= 16:  89.1% n <= 32:  89.1% n <= 64: 100.0%
replaying as

average size   5.3 calls     8968 succeed   2.1% latencies  16.0   4.8
s1    aligned to 4 bytes  42.8% aligned to 8 bytes  39.1% aligned to 16 bytes  38.4%
s2    aligned to 4 bytes  35.4% aligned to 8 bytes  23.9% aligned to 16 bytes  18.8%
s1-s2 aligned to 4 bytes  26.3% aligned to 8 bytes  13.1% aligned to 16 bytes   7.4%
n <= 0:   0.2% n <= 1:   0.3% n <= 2:   1.5% n <= 3:  12.7%  n <= 4:  47.8% n <= 8:  98.9% n <= 16:  99.6% n <= 32: 100.0% n <= 64: 100.0%

summary strcasecmp:


replaying mutt

average size   1.2 calls    53965 succeed 100.0% latencies -252.2 -251.1
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s2    aligned to 4 bytes  31.7% aligned to 8 bytes  20.8% aligned to 16 bytes  11.9%
s1-s2 aligned to 4 bytes  31.7% aligned to 8 bytes  20.8% aligned to 16 bytes  11.9%
n <= 0:  63.4% n <= 1:  65.3% n <= 2:  65.3% n <= 3:  88.7%  n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.581
replaying irb

average size   1.0 calls      693 succeed  94.5% latencies -97.4 -97.9
s1    aligned to 4 bytes  30.4% aligned to 8 bytes  11.4% aligned to 16 bytes   4.2%
s2    aligned to 4 bytes  29.1% aligned to 8 bytes  14.3% aligned to 16 bytes  10.2%
s1-s2 aligned to 4 bytes  27.4% aligned to 8 bytes  13.6% aligned to 16 bytes   5.9%
n <= 0:  84.6% n <= 1:  88.3% n <= 2:  89.0% n <= 3:  89.8%  n <= 4:  90.3% n <= 8:  93.8% n <= 16:  99.6% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000
replaying vim

average size   0.5 calls     2194 succeed  95.2% latencies -19.8  -9.9
s1    aligned to 4 bytes  92.7% aligned to 8 bytes  92.6% aligned to 16 bytes  91.7%
s2    aligned to 4 bytes  27.7% aligned to 8 bytes  10.9% aligned to 16 bytes   6.5%
s1-s2 aligned to 4 bytes  26.5% aligned to 8 bytes  10.2% aligned to 16 bytes   5.3%
n <= 0:  87.2% n <= 1:  90.6% n <= 2:  91.3% n <= 3:  94.5%  n <= 4:  97.4% n <= 8:  99.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.024
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1

average size   5.3 calls      108 succeed   4.6% latencies  31.5  -6.5
s1    aligned to 4 bytes   6.5% aligned to 8 bytes   5.6% aligned to 16 bytes   5.6%
s2    aligned to 4 bytes   1.9% aligned to 8 bytes   0.9% aligned to 16 bytes   0.9%
s1-s2 aligned to 4 bytes  93.5% aligned to 8 bytes  93.5% aligned to 16 bytes  93.5%
n <= 0:   0.9% n <= 1:   0.9% n <= 2:   0.9% n <= 3:   0.9%  n <= 4:   3.7% n <= 8:  95.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.028
replaying /usr/bin/lsof

average size   0.1 calls      181 succeed  98.9% latencies  32.2  36.8
s1    aligned to 4 bytes  20.4% aligned to 8 bytes  17.1% aligned to 16 bytes  17.1%
s2    aligned to 4 bytes  17.7% aligned to 8 bytes   0.6% aligned to 16 bytes   0.6%
s1-s2 aligned to 4 bytes  26.0% aligned to 8 bytes  12.7% aligned to 16 bytes   6.1%
n <= 0:  97.2% n <= 1:  99.4% n <= 2:  99.4% n <= 3:  99.4%  n <= 4:  99.4% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000
replaying man

average size   2.1 calls    70892 succeed 100.0% latencies -353.3 -355.8
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
n <= 0:  38.8% n <= 1:  63.4% n <= 2:  74.7% n <= 3:  81.3%  n <= 4:  86.7% n <= 8:  95.5% n <= 16:  98.4% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.063
replaying preconv

average size   0.6 calls       75 succeed  97.3% latencies -35.2  -6.9
s1    aligned to 4 bytes  97.3% aligned to 8 bytes  96.0% aligned to 16 bytes  96.0%
s2    aligned to 4 bytes  38.7% aligned to 8 bytes  21.3% aligned to 16 bytes   9.3%
s1-s2 aligned to 4 bytes  37.3% aligned to 8 bytes  21.3% aligned to 16 bytes   9.3%
n <= 0:  84.0% n <= 1:  85.3% n <= 2:  85.3% n <= 3:  86.7%  n <= 4:  98.7% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.453


summary strncasecmp:

replaying mutt

average size   0.5 calls   233025 succeed  95.9% latencies -260.3 -259.2
s1    aligned to 4 bytes  24.4% aligned to 8 bytes  23.6% aligned to 16 bytes   0.4%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes  49.2% aligned to 16 bytes  25.8%
s1-s2 aligned to 4 bytes  24.4% aligned to 8 bytes  13.2% aligned to 16 bytes   7.5%
n <= 0:  81.1% n <= 1:  85.7% n <= 2:  87.6% n <= 3: 100.0%  n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000
replaying vim

average size   2.8 calls    10719 succeed  98.3% latencies -20.9 -20.2
s1    aligned to 4 bytes  30.3% aligned to 8 bytes  11.4% aligned to 16 bytes   8.1%
s2    aligned to 4 bytes  20.7% aligned to 8 bytes   5.0% aligned to 16 bytes   3.5%
s1-s2 aligned to 4 bytes  27.9% aligned to 8 bytes   8.1% aligned to 16 bytes   3.7%
n <= 0:  55.5% n <= 1:  57.6% n <= 2:  58.4% n <= 3:  71.2%  n <= 4:  72.6% n <= 8:  86.6% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.002
replaying man

average size   1.3 calls      167 succeed  91.0% latencies -17.1  22.7
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0%
n <= 0:  50.3% n <= 1:  64.1% n <= 2:  66.5% n <= 3:  89.8%  n <= 4:  98.8% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000
replaying as

average size   0.0 calls     3267 succeed 100.0% latencies   1.5   7.7
s1    aligned to 4 bytes  24.4% aligned to 8 bytes  12.3% aligned to 16 bytes   6.0%
s2    aligned to 4 bytes   0.1% aligned to 8 bytes   0.0% aligned to 16 bytes   0.0%
s1-s2 aligned to 4 bytes  25.3% aligned to 8 bytes  11.6% aligned to 16 bytes   6.0%
n <= 0:  99.9% n <= 1: 100.0% n <= 2: 100.0% n <= 3: 100.0%  n <= 4: 100.0% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000


More information about the Gcc mailing list