This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: rs6000.md/altivec.md problem in setting of vector registers
- From: Dorit Naishlos <DORIT at il dot ibm dot com>
- To: David Edelsohn <dje at makai dot watson dot ibm dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 23 Mar 2004 10:50:00 +0200
- Subject: Re: rs6000.md/altivec.md problem in setting of vector registers
I won't be able to dedicate much time at the moment to this rs6000 code
generation problem. I've included a test case that displays it, in case any
one would like to look into it:
typedef int __attribute__((mode(V4SI))) v4si;
typedef int aint __attribute__ ((__aligned__(16)));
#define N 1024
typedef union {
aint a[N];
v4si pa[N/4];
} vec_union;
void
foo (short n){
vec_union a;
v4si va = {n,n,n,n};
int i;
for (i=0; i<N/4; i++){
a.pa[i] = va;
}
}
Below is the code that is being generated on powerpc and i386.
dorit
This is the code generated on powerpc, compiling with -O3 -floop-optimize2
-maltivec:
(relevant code marked with ">>";
4 scalar stores + 1 vector load, all invariant, all in the loop)
foo:
mfspr r5,256
oris r12,r5,0x8000
stw r5,-8(r1)
mtspr 256,r12
li r0,256
mflr r4
bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
stw r31,-4(r1)
mr r9,r3
mflr r31
mr r10,r3
stw r4,8(r1)
mr r11,r3
mr r12,r3
mtctr r0
addis r2,r31,ha16(L_a$non_lazy_ptr-"L00000000001$pb")
lwz r8,lo16(L_a$non_lazy_ptr-"L00000000001$pb")(r2)
li r2,0
L4:
addi r7,r1,-32
slwi r3,r2,4
>> stw r9,0(r7)
addi r2,r2,1
>> stw r10,4(r7)
>> stw r11,8(r7)
>> stw r12,12(r7)
>> lvx v0,0,r7
stvx v0,r3,r8
bdnz L4
lwz r8,-8(r1)
mtspr 256,r8
lwz r6,8(r1)
lwz r31,-4(r1)
mtlr r6
blr
This is the code generated on i386, compiling with -O3 -msse2:
(relevant code marked with ">>";
4 scalar stores out of the loop. 1 invariant vector load, in the loop)
foo:
pushl %ebp
xorl %edx, %edx
movl %esp, %ebp
subl $24, %esp
movswl 8(%ebp),%eax
>> movl %eax, -24(%ebp)
>> movl %eax, -20(%ebp)
>> movl %eax, -16(%ebp)
>> movl %eax, -12(%ebp)
movdqa -24(%ebp), %xmm0
.p2align 4,,15
.L5:
>> movl %edx, %ecx
incl %edx
sall $4, %ecx
movdqa %xmm0, a(%ecx)
cmpl $255, %edx
jle .L5
leave
ret
David Edelsohn
<dje@makai.watson To: Dorit Naishlos/Haifa/IBM@IBMIL
.ibm.com> cc: gcc@gcc.gnu.org
Subject: Re: rs6000.md/altivec.md problem in setting of vector registers
21/03/2004 00:18
>>>>> Dorit Naishlos writes:
Dorit> I focused on understanding what in the machine description explains
the
Dorit> different ways Reload handles the same pattern ('set subreg') on the
two
Dorit> platforms (i386/powerpc).
Altivec and SSE are integrated in their respective
architectures
in different ways, so GCC of one is not alway appropriate for the other.
The vec_set and vec_extract patterns provide explicit control over setting
vector elements, so that probably is the best way to achieve the optimal
behavior.
David