[Bug target/38824] [4.4 regression] performance regression of sse code from 4.2/4.3

hubicka at gcc dot gnu dot org gcc-bugzilla@gcc.gnu.org
Thu Jan 15 00:31:00 GMT 2009



------- Comment #5 from hubicka at gcc dot gnu dot org  2009-01-15 00:30 -------
Created an attachment (id=17106)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17106&action=view)
Proposed patch

The patch makes GCC to generate movaps load followed by addps.  On Core 2 it
speeds up the testcase from 7s to 6.2s so I guess it works as expected.

The same however does not reproduce on AMD box and I am not sure if it is just
coincidence here or if really core preffer to split read-execute SSE operations
(it is not recommended by the manual).

H.J. perhaps, you can have some advice here?  Or at least can we do some
benchmarking?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824



More information about the Gcc-bugs mailing list