RFA: another patch to fix PR61360

Gopalasubramanian, Ganesh Ganesh.Gopalasubramanian@amd.com
Wed Sep 24 11:36:00 GMT 2014


>The "r->x" alternative results in "vector" decoding on amdfam10. This is AMD-speak for microcoded instructions, and AMD optimization manual strongly recommends avoiding them. I have CC'd Ganesh, maybe he >can provide more relevant data on the performance impact.

Thanks Uros!

Yes, the AMD SWOG recommends precisely what Uros mentions.
<snip from SWOG for BD>
When moving data from a GPR to an XMM register, use separate store and load instructions to move
the data first from the source register to a temporary location in memory and then from memory into
the destination register
</snip>

This is listed as an optimization too. This holds good for all amdfam10 and BD  family processors. 
I have to dig through the performance numbers will try to get them.

Regards
Ganesh


More information about the Gcc-patches mailing list