This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
a C routine to optimize GCC for
- To: gcc at gcc dot gnu dot org
- Subject: a C routine to optimize GCC for
- From: Vincent Diepeveen <diep at xs4all dot nl>
- Date: Mon, 30 Aug 1999 01:53:52 +0100
Hello,
i've written a short routine where my gcc 2.95 doesn't
make use of PRO instructions.
Please don't watch the variable names, they're picked
randomly for the globals.
It's about the code and optimization to assembler of it!
int *board,*sweep,*Pindex,*snelbord;
or something like:
int board[64],sweep[20],Pindex[64],snelbord[64];
The above definition of arrays/pointers should
not matter for the optimization of 'tryalles'.
int tryalles(void){
int ut,*va,ua,summation=0;
for( ua = 0 ; ua < 16 ; ua++ ) {
va = Pindex;
ut = 0;
if( !sweep[snelbord[ua]] )
ut = board[ua];
va += ut;
summation += *va;
}
return(summation);
}
.align 4
.globl tryalles
.type tryalles,@function
tryalles:
pushl %ebp
movl %esp,%ebp
pushl %edi
pushl %esi
pushl %ebx
xorl %esi,%esi
movl $sweep,%edi
xorl %ecx,%ecx
movl $15,%ebx
.p2align 4,,7
.L6:
movl snelbord(%ecx),%eax
xorl %edx,%edx
sall $2,%eax
cmpl $0,(%eax,%edi)
jne .L7 <== DUH??? spilling on average 5 - 7.5 clocks
movl board(%ecx),%edx
.L7:
addl Pindex(,%edx,4),%esi
addl $4,%ecx
decl %ebx
jns .L6
movl %esi,%eax
popl %ebx
popl %esi
popl %edi
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.size tryalles,.Lfe1-tryalles
.align 4
Suffering 10-15 clocks for a branch misprediction is major, it
a few instructions more to prevent that penalty can get done
at a rate of 3 instructions a clock!
I would like to zoom in into the invariant:
First the C invariant:
va = Pindex;
ut = 0;
if( !sweep[snelbord[ua]] )
ut = board[ua];
va += ut;
summation += *va;
Now how this is currently translated to 32 bits assembler:
.L6:
movl snelbord(%ecx),%eax
xorl %edx,%edx
sall $2,%eax <== where do we need this shift instruction
for?
cmpl $0,(%eax,%edi)
jne .L7 <== We don't want to have this JNE!
movl board(%ecx),%edx
.L7:
addl Pindex(,%edx,4),%esi
addl $4,%ecx
decl %ebx
jns .L6
If i look to an example like this i directly see the use of
some more registers than intel has... ...however i would like to replace
the next 3 lines
by pentiumpro instructions. I don't care how much lines it gets replaced with,
because here in my program i'm suffering in a lot of cases a huge penalty!
cmpl $0,(%eax,%edi)
jne .L7 <== We don't want to have this JNE!
movl board(%ecx),%edx
.L7:
Is it so hard to replace the above by PRO instructions?
Greetings,
Vincent
Vincent Diepeveen
diep@xs4all.nl
---
...en verder ben ik van mening dat Dap het heelal in
dient te worden gestraald...