This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: 2.95, x86: severe performance problems with short arithmetic
- To: John Wehle <john at feith dot com>
- Subject: Re: 2.95, x86: severe performance problems with short arithmetic
- From: Zack Weinberg <zack at bitmover dot com>
- Date: Tue, 10 Aug 1999 13:10:23 -0700
- cc: gcc at gcc dot gnu dot org
John Wehle wrote:
> > Looks like it would suffice to rip out the entire if block - but there
> > was a reason it was put there in the first place, right?
>
> ix86 word instructions generally require a prefix which makes the
> resulting instruction larger and complicates decoding. ix86 word
> instructions also can cause partial register stalls on P6 class
> processors (as you have apparentally noticed :-). The various
> ix86 patterns in i386.md attempt to avoid word instructions
> when possible so to hopefully produce better code, however at
> times they just make the situation worse. Consider the following:
>
> 1) A 16 bit write to a register immediately followed by a 32 bit read
> of the register. This will cause a stall. Converting the 16 bit
> write to a 32 bit write avoids the stall, however this may cause a
> stall with an earlier instruction (if I recall correctly).
Does this include something like this?
movzbw %dl, %ax
addl %eax, %esi
It certainly sounds like it, and would explain why such a tiny
difference turns into a 2x performance loss.
...
> Really the code needs to do more analysis and scheduling to properly handle
> the issue of avoiding the prefix and partial register stalls.
> Any takers? :-)
It appears to have already been done, in the new_ia32_branch. I was
hoping for a quick fix that could go into 2.95.1, but I can wait for
2.96 or 3.0 or whatever.
zw