This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: RFA: another patch to fix PR61360
- From: "Gopalasubramanian, Ganesh" <Ganesh dot Gopalasubramanian at amd dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>, Vladimir Makarov <vmakarov at redhat dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <rdsandiford at googlemail dot com>
- Date: Wed, 24 Sep 2014 11:36:41 +0000
- Subject: RE: RFA: another patch to fix PR61360
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is 165.204.84.222) smtp dot mailfrom=Ganesh dot Gopalasubramanian at amd dot com;
>The "r->x" alternative results in "vector" decoding on amdfam10. This is AMD-speak for microcoded instructions, and AMD optimization manual strongly recommends avoiding them. I have CC'd Ganesh, maybe he >can provide more relevant data on the performance impact.
Thanks Uros!
Yes, the AMD SWOG recommends precisely what Uros mentions.
<snip from SWOG for BD>
When moving data from a GPR to an XMM register, use separate store and load instructions to move
the data first from the source register to a temporary location in memory and then from memory into
the destination register
</snip>
This is listed as an optimization too. This holds good for all amdfam10 and BD family processors.
I have to dig through the performance numbers will try to get them.
Regards
Ganesh