This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Performance regression
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: Dale Johannesen <dalej at apple dot com>, gcc at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org, Richard dot Earnshaw at arm dot com
- Date: Wed, 25 Sep 2002 11:06:47 +0100
- Subject: Re: Performance regression
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
>
> Hi Dale,
>
> > This patch:
> >
> > 2002-07-10 Roger Sayle <roger@eyesopen.com>
> >
> > PR c/2454
> > * combine.c (nonzero_bits): LOAD_EXTEND_OP should only apply
> > to SUBREGs of MEMs. (num_sign_bit_copies): Likewise.
> >
> > introduces a performance regression on the following code (ppc
> > [on which LOAD_EXTEND_OP==ZERO_EXTEND and WORD_REGISTER_OPERATIONS]
> > and probably other RISCs):
> >
> > struct x { unsigned char c; };
> > unsigned char foo(struct x* x, unsigned char q) {
> > return x->c + (unsigned int)(255 - x->c)*q;
> > }
>
> The posting describing the problems that this patch fixed is at
> http://gcc.gnu.org/ml/gcc-patches/2002-07/msg00383.html. The
> only thing that has changed since then is that the definitive
> semantics of SUBREGs are now described at line 11,087 of combine.c
>
> > With the changed semantics of SUBREG, this no longer happens.
> The semantics didn't change, the correct semantics were enforced :>
>
> The fundamental problem is that LOAD_EXTEND_OP applies to insns
> that load values from memory. So although (SUBREG(MEM ...)) will
> always use a zero-extending load, (SUBREG(REG ...)) may not when
> the REG has been calculated by other means. To change your code
> example a tiny bit consider
>
> unsigned char foo(int x, unsigned char q) {
> register unsigned char xc = x>>16;
> return xc + (unsigned int)(255 - xc)*q;
> }
>
> Now xc could appear in the RTL as (subreg:QI (ashiftrt:SI .. )).
> In this case, the high bits of the register storing xc are
> undefined, left over from the high bits of x. Now, if GCC uses
> a subreg:SI (reg:QI ...)) instead of a zero_extend, the high
> bits will never be cleared leading to incorrect code.
>
> [Can you check whether the original GCC behaviour produced
> invalid code on PPC for the example above?]
>
> To restore the performance on your example, GCC needs some way to
> determine that the REG was initialized by a zero extending load
> instruction, i.e. the zero_extend needs to see the mem.
>
> One place to look might be simplify_unary_operation in simplify_rtx.c,
> such that in combine.c the RTL for "zero_extend:SI (mem:QI ...)" would
> be optimized to produce "subreg:SI (mem:QI)". This might avoid your
> explicit zero_extend, but as my counter-example shows there will
> still be cases where a zero_extend is required for correctness.
>
> I hope this sheds some light on the issue. I'll think about some
> other possible fixes.
>
I found that I got significantly better code on the ARM when I rewrote the
movqi expander to for load-byte(mem) to use
if (GET_CODE (operands[1]) == MEM && optimize > 0)
{
rtx reg = gen_reg_rtx (SImode);
emit_insn (gen_zero_extendqisi2 (reg, operands[1]));
operands[1] = gen_lowpart (QImode, reg);
}
Of course, you can only do this during initial expansion (when you can
create new pseudos). The change meant that we never use implicit
zero-extension operations so the compiler was able to remove several
zero-/sign-extend operations which were clearly redundant.
R.