This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: failure notice
- To: gcc at gcc dot gnu dot org
- Subject: Re: failure notice
- From: "Timothy J. Wood" <bungi at omnigroup dot com>
- Date: Sun, 21 May 2000 02:18:25 -0700
- reply-to: tjw at omnigroup dot com
I'm working on the Quake3:Arena port for MacOS X DP4. This newly released version supposedly contains a compiler based off of gcc 2.95.2 (at least cc -v claims something to that effect).
The PPC is really poor at converting ints to floats, there is an optimization problem that is making this even worse. One of the current hot spots in Quake3:Arena exposes this problem under MacOS X DP4. I don't have a Linux PPC system to test this on unfortunately (maybe I'll reformat my PowerBook to have more partitions in the future :).
Some code that demonstrates the problem follows. I'm using doubles in this code to make the example assembly simpler (avoids rounding the double to a float).
---- cast.c ----
double test1(unsigned short *stuff)
{
unsigned int i;
double x = 0;
for (i = 0; i < 1000; i++)
x += stuff[i];
return x;
}
double test2(short *stuff)
{
unsigned int i;
double x = 0;
for (i = 0; i < 1000; i++)
x += stuff[i];
return x;
}
double test3(char *stuff)
{
unsigned int i;
double x = 0;
for (i = 0; i < 1000; i++)
x += stuff[i];
return x;
}
double test4(unsigned char *stuff)
{
unsigned int i;
double x = 0;
for (i = 0; i < 1000; i++)
x += stuff[i];
return x;
}
typedef struct _foo {
unsigned int i:18;
unsigned int j:14;
} foo;
double test5(foo *stuff)
{
unsigned int i;
double x = 0;
for (i = 0; i < 1000; i++)
x += stuff[i].i;
return x;
}
---- end cast.c ----
I compile this test case with:
cc -static -O2 -S cast.c
(the -static avoids a bunch of Mach-O PIC crud that isn't relevant to the problem), I get something like the following:
.data
.const
.align 2
LC0:
.double 0r0.00000000000000000000e0
.text
.align 2
.globl _test1
_test1:
lis r7,ha16(LC0)
la r7,lo16(LC0)(r7)
lfd f1,0(r7)
lis r11,0x4330
lis r9,0x4330
lis r10,0x8000
li r8,1000
mtctr r8
L8:
lhz r0,0(r3)
addi r3,r3,2
stw r9,-16(r1)
stw r10,-12(r1)
lfd f13,-16(r1)
xoris r8,r0,0x8000
stw r8,-4(r1)
stw r11,-8(r1)
lfd f0,-8(r1)
fsub f0,f0,f13
fadd f1,f1,f0
bdnz L8
blr
There are two problems here. First, is that the various constants are duplicated for each function (there are more constants for some of the types). Maybe the linker cleans this up -- I don't know.
More important is the second problem. When building the double on the stack, there are two words, one that is constant throughout the loop and one that needs to be updated for each value. But, gcc doesn't realize that it can avoid rewriting the constant part of the double on each iteration of the loop and in most of the cases above, ends up issuing twice as many store instructions as necessary.
But in the case of casting unsigned shorts to floats, gcc goes bonkers and ends up doing part of the conversion twice and throwing away the results.
Can anyone replicate this result on a PPC Linux box with the latest sources? Either way, hopefully Apple will be able to fix these problems so that their Quake port for MacOS X will run faster :)
-tim