This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: 2.95, x86: severe performance problems with short arithmetic
- To: Richard Henderson <rth@cygnus.com>
- Subject: Re: 2.95, x86: severe performance problems with short arithmetic
- From: Zack Weinberg <zack@bitmover.com>
- Date: Wed, 11 Aug 1999 10:23:25 -0700
- cc: Jeffrey A Law <law@cygnus.com>, John Wehle <john@feith.com>, gcc@gcc.gnu.org
Richard Henderson wrote:
> On Tue, Aug 10, 1999 at 11:00:53PM -0600, Jeffrey A Law wrote:
> > Instead we need to do more analysis to determine when it is profitable to
> > promote from 8/16 bit operations to 32bit operations.
>
> The new_ia32_branch currently just disables all such promotions when
> TARGET_PARTIAL_REG_STALL. Not ideal, perhaps, but much better than
> blindly going ahead with the promotions.
>
> Yes, a post-reload pass to match up register usage modes would be the
> ideal solution, but something like that isn't going to go into 2.95.1.
>
> IMO going through the 10 to 20 patterns in the existing code base
> that do such promotions and conditionally disabling them is the only
> viable course for 2.95.
If I make a patch to do this, will it be accepted for 2.95.1?
For 2.96/the new_ia32_branch, I wonder if it would be possible to
use the peephole2 framework you posted to widen register operations
throughout a function.
Going back to the sample code I posted...
.L3:
movb (%ebx),%dl # 26 movqi+1/1
incl %ebx # 27 addsi3+1/1
testb %dl,%dl # 29 tstqi_1
je .L4 # 30 bleu+1
movzbw %dl,%ax # 35 zero_extendqihi2+1
addl %eax,%esi # 37 addhi3+1/1
movb %dl,(%ecx) # 41 movqi+1/3
incl %ecx # 42 addsi3+1/1
cmpb $10,%dl # 44 cmpqi_1/2
jne .L3 # 45 bleu+1
.L4:
I wonder if it wouldn't be profitable to do the extend at the same
time as the fetch, like this:
.L3:
movzbl (%ebx),%edx
incl %ebx
testb %dl,%dl
je .L4
addl %edx,%esi
movb %dl,(%ecx)
incl %ecx
cmpb $10,%dl
jne .L3
.L4:
This is three bytes smaller and uses one fewer register, and I don't
think it's any worse in decode penalties or whatever.
zw