This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, i386]: Fix PR target/30970, take 2
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Ian Lance Taylor <iant at google dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Henderson <rth at redhat dot com>
- Date: Tue, 27 Feb 2007 20:02:07 +0100
- Subject: Re: [PATCH, i386]: Fix PR target/30970, take 2
- Dkim-signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; b=NGImwFCos+A1XnzZ4zW7y7Xk9xJYxwB5cvhJ6mpAHvlQ88icDGPFSoiYLFH1nWapdm7qKEg+07FZWJPmFVpHzCX/qjIP1ADVoflIx7Pjpw8F+JXtl27f1Njmi3jQh8j6memm4fX5pX918gnaAStRc2+VowvBgl+jVMvaMKcG/+s=
- References: <45E42C83.1000602@gmail.com> <m3bqjfd47d.fsf@localhost.localdomain>
Ian Lance Taylor wrote:
The addition to this patch is corrected MODES_TIEABLE_P functionality
for i386 targets. The problem is, that lower-subreg pass checks
MODES_TIEABLE_P if the XMM register can be splitted into word_mode
(DImode) _without_copying_.
Anyhow, your patch seems OK to me, though I wonder whether you will
get worse register allocation in some cases for code which extracts
the first float from a vector of floats. Of course those cases are
not very common.
Extracting float from vecfloat or int from vecint without going through
memory never worked for i386. One of possible reasons for float case
could be that movss reg,reg doesn't clear top 3 elements, but
_mm_store_ss() simply ignores this, as shown in following testcase:
--cut here--
#include <xmmintrin.h>
float test (__m128 x)
{
float a;
_mm_store_ss (&a, x);
return a + 1.0;
}
--cut here--
compiles to:
test:
.LFB509:
addss .LC0(%rip), %xmm0
ret
For int case, we can use movd, but gcc doesn't generate it with or
without the patch.
This is the testcase:
--cut here--
typedef float __v4sf __attribute__ ((vector_size (16)));
float testf(__v4sf x, __v4sf y)
{
union {
__v4sf v;
float f[4];
} u;
u.v = x + y;
return u.f[0];
}
typedef int __v4si __attribute__ ((vector_size (16)));
int testi(__v4si x, __v4si y)
{
union {
__v4si v;
int i[4];
} u;
u.v = x + y;
return u.i[0];
}
--cut here--
The result:
testf:
.LFB2:
addps %xmm1, %xmm0
movaps %xmm0, -24(%rsp)
movss -24(%rsp), %xmm0
ret
testi:
.LFB3:
paddd %xmm1, %xmm0
movaps %xmm0, -24(%rsp)
movl -24(%rsp), %eax
ret
(Using _mm_store_ss() intrinsic always produces equivalent code).
To fix this, MODES_TIEABLE_P would need to be revisited. I'll open a
bugreport for this enhancement.
Uros.