This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled


> On Jan 8, 2008 1:41 PM, Jan Hubicka <jh@suse.cz> wrote:
> 
> But this doesn't work as expected, neither for -mfpmath=sse, neither
> -mfpmath=387. I have tried 4.0, 4.1, 4.2 and 4.3 [patched / unpatched]
> branches with following testcase:
> 
> --cut here--
> double a[256];
> 
> void test (void)
> {
>         int i;
> 
>         for (i = 0; i < 256; i++)
>                 a[i] = -a[i];
> }
> --cut here--
> 
> There were no r-m-w instructions, always fchs and xor, no matter if
> data was float or double.

This is because we further split r-m-w instructions into RISC like
sequence on many modern CPUs when integer reg happens to be available.
This helps scheduling.  The point is that read-write-pairs are faster if
executed through integer unit.  Turning your testcase into benchmark:

double a[256];

void
main (void)
{
  int i;
  int b;

  for (b = 0; b < 10000000; b++)
  for (i = 0; i < 256; i++)
    a[i] = -a[i];
}

I get:
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./a.out

real    0m3.305s
user    0m3.304s
sys     0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./a.out

real    0m3.305s
user    0m3.304s
sys     0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./b.out

real    0m5.666s
user    0m5.668s
sys     0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./b.out

real    0m5.666s
user    0m5.664s
sys     0m0.004s

a.out is with GCC 3.3.5 that use xor, while b.out is mainline on Athlon
XP.

Honza
> 
> Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]