This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance regression


> 
> Hi Dale,
> 
> > This patch:
> >
> > 2002-07-10 Roger Sayle <roger@eyesopen.com>
> >
> >	PR c/2454
> >	* combine.c (nonzero_bits): LOAD_EXTEND_OP should only apply
> >	to SUBREGs of MEMs. (num_sign_bit_copies): Likewise.
> >
> > introduces a performance regression on the following code (ppc
> > [on which LOAD_EXTEND_OP==ZERO_EXTEND and WORD_REGISTER_OPERATIONS]
> > and probably other RISCs):
> >
> >	struct x { unsigned char c; };
> >	unsigned char foo(struct x* x, unsigned char q) {
> >	return x->c + (unsigned int)(255 - x->c)*q;
> >	}
> 
> The posting describing the problems that this patch fixed is at
> http://gcc.gnu.org/ml/gcc-patches/2002-07/msg00383.html.  The
> only thing that has changed since then is that the definitive
> semantics of SUBREGs are now described at line 11,087 of combine.c
> 
> > With the changed semantics of SUBREG, this no longer happens.
> The semantics didn't change, the correct semantics were enforced :>
> 
> The fundamental problem is that LOAD_EXTEND_OP applies to insns
> that load values from memory.  So although (SUBREG(MEM ...)) will
> always use a zero-extending load, (SUBREG(REG ...)) may not when
> the REG has been calculated by other means.  To change your code
> example a tiny bit consider
> 
> 	unsigned char foo(int x, unsigned char q) {
> 	  register unsigned char xc = x>>16;
> 	  return xc + (unsigned int)(255 - xc)*q;
> 	}
> 
> Now xc could appear in the RTL as (subreg:QI (ashiftrt:SI .. )).
> In this case, the high bits of the register storing xc are
> undefined, left over from the high bits of x.  Now, if GCC uses
> a subreg:SI (reg:QI ...)) instead of a zero_extend, the high
> bits will never be cleared leading to incorrect code.
> 
> [Can you check whether the original GCC behaviour produced
> invalid code on PPC for the example above?]
> 
> To restore the performance on your example, GCC needs some way to
> determine that the REG was initialized by a zero extending load
> instruction, i.e. the zero_extend needs to see the mem.
> 
> One place to look might be simplify_unary_operation in simplify_rtx.c,
> such that in combine.c the RTL for "zero_extend:SI (mem:QI ...)" would
> be optimized to produce "subreg:SI (mem:QI)".  This might avoid your
> explicit zero_extend, but as my counter-example shows there will
> still be cases where a zero_extend is required for correctness.
> 
> I hope this sheds some light on the issue.  I'll think about some
> other possible fixes.
> 

I found that I got significantly better code on the ARM when I rewrote the 
movqi expander to for load-byte(mem) to use

          if (GET_CODE (operands[1]) == MEM && optimize > 0)
            {
              rtx reg = gen_reg_rtx (SImode);

              emit_insn (gen_zero_extendqisi2 (reg, operands[1]));
              operands[1] = gen_lowpart (QImode, reg);
            }

Of course, you can only do this during initial expansion (when you can 
create new pseudos).  The change meant that we never use implicit 
zero-extension operations so the compiler was able to remove several 
zero-/sign-extend operations which were clearly redundant.

R.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]