Performance regression
Roger Sayle
roger@eyesopen.com
Tue Sep 24 19:30:00 GMT 2002
Hi Dale,
> This patch:
>
> 2002-07-10 Roger Sayle <roger@eyesopen.com>
>
> PR c/2454
> * combine.c (nonzero_bits): LOAD_EXTEND_OP should only apply
> to SUBREGs of MEMs. (num_sign_bit_copies): Likewise.
>
> introduces a performance regression on the following code (ppc
> [on which LOAD_EXTEND_OP==ZERO_EXTEND and WORD_REGISTER_OPERATIONS]
> and probably other RISCs):
>
> struct x { unsigned char c; };
> unsigned char foo(struct x* x, unsigned char q) {
> return x->c + (unsigned int)(255 - x->c)*q;
> }
The posting describing the problems that this patch fixed is at
http://gcc.gnu.org/ml/gcc-patches/2002-07/msg00383.html. The
only thing that has changed since then is that the definitive
semantics of SUBREGs are now described at line 11,087 of combine.c
> With the changed semantics of SUBREG, this no longer happens.
The semantics didn't change, the correct semantics were enforced :>
The fundamental problem is that LOAD_EXTEND_OP applies to insns
that load values from memory. So although (SUBREG(MEM ...)) will
always use a zero-extending load, (SUBREG(REG ...)) may not when
the REG has been calculated by other means. To change your code
example a tiny bit consider
unsigned char foo(int x, unsigned char q) {
register unsigned char xc = x>>16;
return xc + (unsigned int)(255 - xc)*q;
}
Now xc could appear in the RTL as (subreg:QI (ashiftrt:SI .. )).
In this case, the high bits of the register storing xc are
undefined, left over from the high bits of x. Now, if GCC uses
a subreg:SI (reg:QI ...)) instead of a zero_extend, the high
bits will never be cleared leading to incorrect code.
[Can you check whether the original GCC behaviour produced
invalid code on PPC for the example above?]
To restore the performance on your example, GCC needs some way to
determine that the REG was initialized by a zero extending load
instruction, i.e. the zero_extend needs to see the mem.
One place to look might be simplify_unary_operation in simplify_rtx.c,
such that in combine.c the RTL for "zero_extend:SI (mem:QI ...)" would
be optimized to produce "subreg:SI (mem:QI)". This might avoid your
explicit zero_extend, but as my counter-example shows there will
still be cases where a zero_extend is required for correctness.
I hope this sheds some light on the issue. I'll think about some
other possible fixes.
Roger
--
Roger Sayle, E-mail: roger@eyesopen.com
OpenEye Scientific Software, WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833
More information about the Gcc-patches
mailing list