This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled

From: Jan Hubicka <hubicka at ucw dot cz>
To: Uros Bizjak <ubizjak at gmail dot com>
Cc: Jan Hubicka <jh at suse dot cz>, Jan Hubicka <hubicka at ucw dot cz>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Tue, 8 Jan 2008 14:58:15 +0100
Subject: Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
References: <4782864B.8000803@gmail.com> <20080108002636.GF16855@atrey.karlin.mff.cuni.cz> <5787cf470801072319g1ee19f25y64c605f3dfce0f9d@mail.gmail.com> <20080108124154.GD19896@kam.mff.cuni.cz> <5787cf470801080536j4769cad9oe4bf1d1b1919236d@mail.gmail.com>

> On Jan 8, 2008 1:41 PM, Jan Hubicka <jh@suse.cz> wrote:
> 
> But this doesn't work as expected, neither for -mfpmath=sse, neither
> -mfpmath=387. I have tried 4.0, 4.1, 4.2 and 4.3 [patched / unpatched]
> branches with following testcase:
> 
> --cut here--
> double a[256];
> 
> void test (void)
> {
>         int i;
> 
>         for (i = 0; i < 256; i++)
>                 a[i] = -a[i];
> }
> --cut here--
> 
> There were no r-m-w instructions, always fchs and xor, no matter if
> data was float or double.

This is because we further split r-m-w instructions into RISC like
sequence on many modern CPUs when integer reg happens to be available.
This helps scheduling.  The point is that read-write-pairs are faster if
executed through integer unit.  Turning your testcase into benchmark:

double a[256];

void
main (void)
{
  int i;
  int b;

  for (b = 0; b < 10000000; b++)
  for (i = 0; i < 256; i++)
    a[i] = -a[i];
}

I get:
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./a.out

real    0m3.305s
user    0m3.304s
sys     0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./a.out

real    0m3.305s
user    0m3.304s
sys     0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./b.out

real    0m5.666s
user    0m5.668s
sys     0m0.000s
hubicka@occam:/aux/hubicka/gcc/build/gcc$ time ./b.out

real    0m5.666s
user    0m5.664s
sys     0m0.004s

a.out is with GCC 3.3.5 that use xor, while b.out is mainline on Athlon
XP.

Honza
> 
> Uros.

Follow-Ups:
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Uros Bizjak

References:
- [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Uros Bizjak
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Jan Hubicka
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Uros Bizjak
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Jan Hubicka
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Uros Bizjak

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]