[Bug target/43052] Inline memcmp is *much* slower than glibc's

hubicka at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Mon Jul 4 10:50:00 GMT 2011


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052

--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-07-04 10:49:18 UTC ---
Created attachment 24670
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24670
memcpy/memset testing script

HJ,
can you please run the attached script with new glibc as 
sh test_stringop 64 640000000 gcc -march=native | tee out

In my quick testing on glibc2.11 and core i5 & AMD machine, inline
memcpy/memset is still win on I5 for all blocks sizes (our optimization table
is however wrong since it is inherited from generic one). For blocks of 512b
and above however the inline code is about as fast as glibc code and obviously
longer.

On AMD machine libcall is win for blocks of 1k to 8k. For large blocks inline
seems to be win again, for whatever reason. Probably prefetch logic is wrong on
the older glibc.

If glibc stringops has been finally made sane, we ought to revisit the tables
we generate inline versions from.



More information about the Gcc-bugs mailing list