This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/31750] Suboptimal builtin_memset on x86 with SSE
- From: "jb at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 30 Apr 2010 18:02:01 -0000
- Subject: [Bug middle-end/31750] Suboptimal builtin_memset on x86 with SSE
- References: <bug-31750-11659@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #4 from jb at gcc dot gnu dot org 2010-04-30 18:02 -------
Some more experimentation, on different hardware, reveals that the relative
performance of "rep stos" vs. loop depends heavily on the size of the object to
set, the optimization options (loop unrolling etc.), and presumably on the
hardware as well. The nice thing about "rep stos", is at least it's short, and
in principle in the future hw manufacturers could tune the microcode to provide
an optimal implementation.
As I have no time to set up a comprehensive benchmark that would be required if
one were to make changes to the current implementation (presumably, given the
importance of memset() others have already done it), closing this as wontfix.
--
jb at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |WONTFIX
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31750