RFA: another patch to fix PR61360
Wed Sep 24 11:36:00 GMT 2014
>The "r->x" alternative results in "vector" decoding on amdfam10. This is AMD-speak for microcoded instructions, and AMD optimization manual strongly recommends avoiding them. I have CC'd Ganesh, maybe he >can provide more relevant data on the performance impact.
Yes, the AMD SWOG recommends precisely what Uros mentions.
<snip from SWOG for BD>
When moving data from a GPR to an XMM register, use separate store and load instructions to move
the data first from the source register to a temporary location in memory and then from memory into
the destination register
This is listed as an optimization too. This holds good for all amdfam10 and BD family processors.
I have to dig through the performance numbers will try to get them.
More information about the Gcc-patches