This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

a C routine to optimize GCC for


Hello,

i've written a short routine where my gcc 2.95 doesn't
make use of PRO instructions. 
Please don't watch the variable names, they're picked 
randomly for the globals.

It's about the code and optimization to assembler of it!

  int *board,*sweep,*Pindex,*snelbord;

or something like:

  int board[64],sweep[20],Pindex[64],snelbord[64];

The above definition of arrays/pointers should
not matter for the optimization of 'tryalles'.

int tryalles(void){
   int ut,*va,ua,summation=0;

   for( ua = 0 ; ua < 16 ; ua++ ) {
     va = Pindex;
     ut = 0;
     if( !sweep[snelbord[ua]] )
       ut = board[ua];
     va += ut;
     summation += *va;
   }
   return(summation);
}


        .align 4
.globl tryalles
        .type    tryalles,@function
tryalles:
        pushl %ebp
        movl %esp,%ebp
        pushl %edi
        pushl %esi
        pushl %ebx
        xorl %esi,%esi
        movl $sweep,%edi
        xorl %ecx,%ecx
        movl $15,%ebx
        .p2align 4,,7
.L6:
        movl snelbord(%ecx),%eax
        xorl %edx,%edx
        sall $2,%eax
        cmpl $0,(%eax,%edi)
        jne .L7                <== DUH??? spilling on average 5 - 7.5 clocks
        movl board(%ecx),%edx
.L7:
        addl Pindex(,%edx,4),%esi
        addl $4,%ecx
        decl %ebx
        jns .L6
        movl %esi,%eax
        popl %ebx
        popl %esi
        popl %edi
        movl %ebp,%esp
        popl %ebp
        ret
.Lfe1:
        .size    tryalles,.Lfe1-tryalles
        .align 4

Suffering 10-15 clocks for a branch misprediction is major, it 
a few instructions more to prevent that penalty can get done 
at a rate of 3 instructions a clock!

I would like to zoom in into the invariant:

First the C invariant:
     va = Pindex;
     ut = 0;
     if( !sweep[snelbord[ua]] )
       ut = board[ua];
     va += ut;
     summation += *va;

Now how this is currently translated to 32 bits assembler:
.L6:
        movl snelbord(%ecx),%eax
        xorl %edx,%edx
        sall $2,%eax           <== where do we need this shift instruction
for?
        cmpl $0,(%eax,%edi)             
        jne .L7                <== We don't want to have this JNE!
        movl board(%ecx),%edx
.L7:
        addl Pindex(,%edx,4),%esi
        addl $4,%ecx
        decl %ebx
        jns .L6
  
If i look to an example like this i directly see the use of
some more registers than intel has... ...however i would like to replace
the next 3 lines
by pentiumpro instructions. I don't care how much lines it gets replaced with,
because here in my program i'm suffering in a lot of cases a huge penalty!

        cmpl $0,(%eax,%edi)             
        jne .L7                <== We don't want to have this JNE!
        movl board(%ecx),%edx
       .L7:

Is it so hard to replace the above by PRO instructions?

Greetings,
Vincent




Vincent Diepeveen
diep@xs4all.nl

---
...en verder ben ik van mening dat Dap het heelal in 
dient te worden gestraald...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]