[Bug middle-end/31750] Suboptimal builtin_memset on x86 with SSE

jb at gcc dot gnu dot org gcc-bugzilla@gcc.gnu.org
Fri Apr 30 18:02:00 GMT 2010



------- Comment #4 from jb at gcc dot gnu dot org  2010-04-30 18:02 -------
Some more experimentation, on different hardware, reveals that the relative
performance of "rep stos" vs. loop depends heavily on the size of the object to
set, the optimization options (loop unrolling etc.), and presumably on the
hardware as well. The nice thing about "rep stos", is at least it's short, and
in principle in the future hw manufacturers could tune the microcode to provide
an optimal implementation.

As I have no time to set up a comprehensive benchmark that would be required if
one were to make changes to the current implementation (presumably, given the
importance of memset() others have already done it), closing this as wontfix.


-- 

jb at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |WONTFIX


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31750



More information about the Gcc-bugs mailing list