This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled


> On Jan 8, 2008 1:26 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> > > The testcase in the PR exposed a problem, where xorb with partial QImode
> > > memory access was used to flip the sign bit of DFmode value. This
> > > partial memory access resulted in ~70% slowdown when -mfpmath=sse was
> > > used. The same problem affected 64bit binary, although ~20% slowdown was
> > > measured on a Core2 processor.
> >
> > Why do you see slowdown on SFmode operations and 64bit DFmode where
> > matching size integer operation can be used?
> 
> Even in this case, I think that using read/modify/write insns is not
> desirable. As shown in the PR, these instructions can slip into the
> loops, preventing read before loop and write after loop.

Isn't it the case of all read/modify/write insns in i386 ISA then?

In the testcase the store/load pair inside loop results from poor
register allocation decision.  It happens with or without your patch
since loop optimization is not done after that.
At current mainline I get:

.L4:
        movsd   -24(%ebp), %xmm1
        incl    %eax
        xorpd   .LC5, %xmm1
        cmpl    $512000001, %eax
        movsd   %xmm1, -24(%ebp)
        addsd   %xmm1, %xmm0
        jne     .L4

The reason why I am asking is that the patch used to be important win
for block negate operations like

for (i=0;i<BIGCONST;i++)
  a[i]=-a[i];

Since integer read-modify-write is faster than what FP unit can do at
least on older CPUs.

Honza
> 
> Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]