This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/55147] x86: wrong code for 64-bit load
- From: "jakub at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 31 Oct 2012 16:07:11 +0000
- Subject: [Bug target/55147] x86: wrong code for 64-bit load
- Auto-submitted: auto-generated
- References: <bug-55147-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55147
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-10-31 16:07:11 UTC ---
For the testcase from this PR it creates better assembly actually (compared to
with the #c1 patch, without that it is both longer and wrong). That is because
when bswapdi is split too late, nothing optimizes the fact that only 32 bits of
the result are used.
For
unsigned long long
f1 (unsigned long long *p, int i)
{
return __builtin_bswap64 (p[i]);
}
unsigned long long
f2 (unsigned long long p)
{
return __builtin_bswap64 (p);
}
void
f3 (unsigned long long *p, int i, unsigned long long q)
{
p[i] = __builtin_bswap64 (q);
}
void
f4 (unsigned long long *p, int i, unsigned long long *q)
{
p[i] = __builtin_bswap64 (q[i]);
}
it creates the same number of insns/same quality (just slightly different RA
decisions/scheduling) for f1-f3, but for f4 without bswapdi2 it creates
slightly worse code (with bswapdi2 f4 needs just one call saved register,
without it two, supposedly because both bswap insns are scheduled together.