This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
> On Jan 8, 2008 1:41 PM, Jan Hubicka <jh@suse.cz> wrote:
>
> But this doesn't work as expected, neither for -mfpmath=sse, neither
> -mfpmath=387. I have tried 4.0, 4.1, 4.2 and 4.3 [patched / unpatched]
> branches with following testcase:
>
> --cut here--
> double a[256];
>
> void test (void)
> {
> int i;
>
> for (i = 0; i < 256; i++)
> a[i] = -a[i];
> }
> --cut here--
>
> There were no r-m-w instructions, always fchs and xor, no matter if
> data was float or double.
This is because we further split r-m-w instructions into RISC like
sequence on many modern CPUs when integer reg happens to be available.
This helps scheduling. The point is that read-write-pairs are faster if
executed through integer unit. Turning your testcase into benchmark:
double a[256];
void
main (void)
{
int i;
int b;
for (b = 0; b < 10000000; b++)
for (i = 0; i < 256; i++)
a[i] = -a[i];
}
I get:
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./a.out
real 0m3.305s
user 0m3.304s
sys 0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./a.out
real 0m3.305s
user 0m3.304s
sys 0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./b.out
real 0m5.666s
user 0m5.668s
sys 0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./b.out
real 0m5.666s
user 0m5.664s
sys 0m0.004s
a.out is with GCC 3.3.5 that use xor, while b.out is mainline on Athlon
XP.
Honza
>
> Uros.
- References:
- [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled