This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Slow memcmp for aligned strings on Pentium 3


Kevin Atkinson writes:
 > On Fri, 4 Apr 2003, Jerry Quinn wrote:
 > 
 > > I just tried the same benchmark on a Pentium 4 out of curiosity.  Slightly
 > > different results:
 > > 
 > > Memory compare int:
 > >   10000
 > >   130000
 > >   Speed up: 0.076923
 > > Memory compare 15 bytes:
 > >   10000
 > >   370000
 > >   Speed up: 0.027027
 > > Memory compare 16 bytes:
 > >   20000
 > >   330000
 > >   Speed up: 0.060606
 > > Memory compare 64 bytes:
 > >   10000
 > >   1040000
 > >   Speed up: 0.009615
 > > Memory compare 256 bytes:
 > >   20000
 > >   2300000
 > >   Speed up: 0.008696
 > > 
 > > Perhaps this is to be expected since the routine uses shifts.
 > 
 > The shift are only used in the case size is not divisible by 4.  It seams 
 > that on the Pentium 4 cmps is the way to go.  You might also want to 
 > increase the number of loop iterations to get more meaning full results 
 > due the limited precision of clock().

Adding iterations didn't change the relative scores significantly.  It
still loses big on P4.  It also loses big on Athlon.  Here are Athlon
results using the later version you posted with 10x iterations:

jlquinn at smaug:~/gcc/test$ gcc3.3 -O3 -fomit-frame-pointer -march=athlon cmps.c 
jlquinn at smaug:~/gcc/test$ ./a.out 
Memory compare 15 bytes:
  310000
  5810000
  Speed up: 0.053356
Memory compare 16 bytes:
  300000
  5290000
  Speed up: 0.056711
Memory compare 64 bytes:
  460000
  13770000
  Speed up: 0.033406
Memory compare 256 bytes:
  470000

Jerry Quinn


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]