This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled

From: Jan Hubicka <jh at suse dot cz>
To: Uros Bizjak <ubizjak at gmail dot com>
Cc: Jan Hubicka <hubicka at ucw dot cz>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Tue, 8 Jan 2008 13:41:54 +0100
Subject: Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
References: <4782864B.8000803@gmail.com> <20080108002636.GF16855@atrey.karlin.mff.cuni.cz> <5787cf470801072319g1ee19f25y64c605f3dfce0f9d@mail.gmail.com>

> On Jan 8, 2008 1:26 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> > > The testcase in the PR exposed a problem, where xorb with partial QImode
> > > memory access was used to flip the sign bit of DFmode value. This
> > > partial memory access resulted in ~70% slowdown when -mfpmath=sse was
> > > used. The same problem affected 64bit binary, although ~20% slowdown was
> > > measured on a Core2 processor.
> >
> > Why do you see slowdown on SFmode operations and 64bit DFmode where
> > matching size integer operation can be used?
> 
> Even in this case, I think that using read/modify/write insns is not
> desirable. As shown in the PR, these instructions can slip into the
> loops, preventing read before loop and write after loop.

Isn't it the case of all read/modify/write insns in i386 ISA then?

In the testcase the store/load pair inside loop results from poor
register allocation decision.  It happens with or without your patch
since loop optimization is not done after that.
At current mainline I get:

.L4:
        movsd   -24(%ebp), %xmm1
        incl    %eax
        xorpd   .LC5, %xmm1
        cmpl    $512000001, %eax
        movsd   %xmm1, -24(%ebp)
        addsd   %xmm1, %xmm0
        jne     .L4

The reason why I am asking is that the patch used to be important win
for block negate operations like

for (i=0;i<BIGCONST;i++)
  a[i]=-a[i];

Since integer read-modify-write is faster than what FP unit can do at
least on older CPUs.

Honza
> 
> Uros.

Follow-Ups:
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Uros Bizjak

References:
- [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Uros Bizjak
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Jan Hubicka
- Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
  - From: Uros Bizjak

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]