This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, i386]: Fix PR target/34682, 70% slowdown with SSE enabled
> On Jan 8, 2008 1:26 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > > The testcase in the PR exposed a problem, where xorb with partial QImode
> > > memory access was used to flip the sign bit of DFmode value. This
> > > partial memory access resulted in ~70% slowdown when -mfpmath=sse was
> > > used. The same problem affected 64bit binary, although ~20% slowdown was
> > > measured on a Core2 processor.
> >
> > Why do you see slowdown on SFmode operations and 64bit DFmode where
> > matching size integer operation can be used?
>
> Even in this case, I think that using read/modify/write insns is not
> desirable. As shown in the PR, these instructions can slip into the
> loops, preventing read before loop and write after loop.
Isn't it the case of all read/modify/write insns in i386 ISA then?
In the testcase the store/load pair inside loop results from poor
register allocation decision. It happens with or without your patch
since loop optimization is not done after that.
At current mainline I get:
.L4:
movsd -24(%ebp), %xmm1
incl %eax
xorpd .LC5, %xmm1
cmpl $512000001, %eax
movsd %xmm1, -24(%ebp)
addsd %xmm1, %xmm0
jne .L4
The reason why I am asking is that the patch used to be important win
for block negate operations like
for (i=0;i<BIGCONST;i++)
a[i]=-a[i];
Since integer read-modify-write is faster than what FP unit can do at
least on older CPUs.
Honza
>
> Uros.