This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: rs6000.md/altivec.md problem in setting of vector registers

From: Dorit Naishlos <DORIT at il dot ibm dot com>
To: David Edelsohn <dje at makai dot watson dot ibm dot com>
Cc: gcc at gcc dot gnu dot org
Date: Tue, 23 Mar 2004 10:50:00 +0200
Subject: Re: rs6000.md/altivec.md problem in setting of vector registers

I won't be able to dedicate much time at the moment to this rs6000 code
generation problem. I've included a test case that displays it, in case any
one would like to look into it:

typedef int __attribute__((mode(V4SI))) v4si;
typedef int aint __attribute__ ((__aligned__(16)));
#define N 1024
typedef union {
   aint a[N];
   v4si pa[N/4];
} vec_union;

void
foo (short n){
  vec_union a;
  v4si va = {n,n,n,n};
  int i;

  for (i=0; i<N/4; i++){
    a.pa[i] = va;
  }
}

Below is the code that is being generated on powerpc and i386.

dorit


This is the code generated on powerpc, compiling with -O3 -floop-optimize2
-maltivec:
(relevant code marked with ">>";
4 scalar stores + 1 vector load, all invariant, all in the loop)

foo:
        mfspr r5,256
        oris r12,r5,0x8000
        stw r5,-8(r1)
        mtspr 256,r12
        li r0,256
        mflr r4
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        stw r31,-4(r1)
        mr r9,r3
        mflr r31
        mr r10,r3
        stw r4,8(r1)
        mr r11,r3
        mr r12,r3
        mtctr r0
        addis r2,r31,ha16(L_a$non_lazy_ptr-"L00000000001$pb")
        lwz r8,lo16(L_a$non_lazy_ptr-"L00000000001$pb")(r2)
        li r2,0
L4:
        addi r7,r1,-32
        slwi r3,r2,4
>>      stw r9,0(r7)
        addi r2,r2,1
>>      stw r10,4(r7)
>>      stw r11,8(r7)
>>      stw r12,12(r7)
>>      lvx v0,0,r7
        stvx v0,r3,r8
        bdnz L4

        lwz r8,-8(r1)
        mtspr 256,r8
        lwz r6,8(r1)
        lwz r31,-4(r1)
        mtlr r6
        blr


This is the code generated on i386, compiling with -O3 -msse2:
(relevant code marked with ">>";
4 scalar stores out of the loop. 1 invariant vector load, in the loop)

foo:
        pushl   %ebp
        xorl    %edx, %edx
        movl    %esp, %ebp
        subl    $24, %esp
        movswl  8(%ebp),%eax
>>      movl    %eax, -24(%ebp)
>>      movl    %eax, -20(%ebp)
>>      movl    %eax, -16(%ebp)
>>      movl    %eax, -12(%ebp)
        movdqa  -24(%ebp), %xmm0
        .p2align 4,,15
.L5:
>>      movl    %edx, %ecx
        incl    %edx
        sall    $4, %ecx
        movdqa  %xmm0, a(%ecx)
        cmpl    $255, %edx
        jle     .L5

        leave
        ret




                                                                                                                                   
                      David Edelsohn                                                                                               
                      <dje@makai.watson        To:       Dorit Naishlos/Haifa/IBM@IBMIL                                            
                      .ibm.com>                cc:       gcc@gcc.gnu.org                                                           
                                               Subject:  Re: rs6000.md/altivec.md problem in setting of vector registers           
                      21/03/2004 00:18                                                                                             
                                                                                                                                   




>>>>> Dorit Naishlos writes:

Dorit> I focused on understanding what in the machine description explains
the
Dorit> different ways Reload handles the same pattern ('set subreg') on the
two
Dorit> platforms (i386/powerpc).

             Altivec and SSE are integrated in their respective
architectures
in different ways, so GCC of one is not alway appropriate for the other.
The vec_set and vec_extract patterns provide explicit control over setting
vector elements, so that probably is the best way to achieve the optimal
behavior.

David

References:
- Re: rs6000.md/altivec.md problem in setting of vector registers
  - From: David Edelsohn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]