This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] i386 memcmp implementation and gcc builtin


Here is a revised version; thank everybody for the quick feedback!

I am now using EAX to hold the decreasing loop counter, and checking if <16
before checking =0 as Jakub suggested.  It could be interesting to check if
working on -EAX and using indexed addressing modes is worth (I'm not sure,
because you'd lose the ability to decrease the loop counter).

I performed the tests with 100 repetitions, since on machines only slightly
newer than mine a single repetition is way too small to observe anything;
the number is customizable.  A single program does testing and benchmarking
(to avoid clashes my implementation is now called my_memcmp); compile it
with -fno-builtin of course!  As Roger Sayle suggested, I'm taking user time
instead of wall time (though my machine is so lightly loaded that I cannot
see the difference).

Note I'm using -mcpu=i686 -march=i686 (Andreas Jaeger suggested that, but
since I compiled my GCC on my own it is already the default).

Results:

utente@engineer:~/esperimenti$ gcc -O3 -fno-builtin test.c memcmp.S
utente@engineer:~/esperimenti$  ./a.out
100 repetitions

Testing custom memcmp ..... done,      8.130 seconds
Testing libc memcmp ....... done,     10.250 seconds
Testing builtin memcmp .... done,      9.620 seconds
Testing loop overhead ..... done,      5.580 seconds

(CCed to gcc people because of the dismaying __builtin_memcmp performance on
the PII).

Paolo

Attachment: memcmp.S
Description: Binary data

Attachment: test.c
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]