Performance regression

Roger Sayle roger@eyesopen.com
Tue Sep 24 19:30:00 GMT 2002


Hi Dale,

> This patch:
>
> 2002-07-10 Roger Sayle <roger@eyesopen.com>
>
>	PR c/2454
>	* combine.c (nonzero_bits): LOAD_EXTEND_OP should only apply
>	to SUBREGs of MEMs. (num_sign_bit_copies): Likewise.
>
> introduces a performance regression on the following code (ppc
> [on which LOAD_EXTEND_OP==ZERO_EXTEND and WORD_REGISTER_OPERATIONS]
> and probably other RISCs):
>
>	struct x { unsigned char c; };
>	unsigned char foo(struct x* x, unsigned char q) {
>	return x->c + (unsigned int)(255 - x->c)*q;
>	}

The posting describing the problems that this patch fixed is at
http://gcc.gnu.org/ml/gcc-patches/2002-07/msg00383.html.  The
only thing that has changed since then is that the definitive
semantics of SUBREGs are now described at line 11,087 of combine.c

> With the changed semantics of SUBREG, this no longer happens.
The semantics didn't change, the correct semantics were enforced :>

The fundamental problem is that LOAD_EXTEND_OP applies to insns
that load values from memory.  So although (SUBREG(MEM ...)) will
always use a zero-extending load, (SUBREG(REG ...)) may not when
the REG has been calculated by other means.  To change your code
example a tiny bit consider

	unsigned char foo(int x, unsigned char q) {
	  register unsigned char xc = x>>16;
	  return xc + (unsigned int)(255 - xc)*q;
	}

Now xc could appear in the RTL as (subreg:QI (ashiftrt:SI .. )).
In this case, the high bits of the register storing xc are
undefined, left over from the high bits of x.  Now, if GCC uses
a subreg:SI (reg:QI ...)) instead of a zero_extend, the high
bits will never be cleared leading to incorrect code.

[Can you check whether the original GCC behaviour produced
invalid code on PPC for the example above?]

To restore the performance on your example, GCC needs some way to
determine that the REG was initialized by a zero extending load
instruction, i.e. the zero_extend needs to see the mem.

One place to look might be simplify_unary_operation in simplify_rtx.c,
such that in combine.c the RTL for "zero_extend:SI (mem:QI ...)" would
be optimized to produce "subreg:SI (mem:QI)".  This might avoid your
explicit zero_extend, but as my counter-example shows there will
still be cases where a zero_extend is required for correctness.

I hope this sheds some light on the issue.  I'll think about some
other possible fixes.

Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833



More information about the Gcc-patches mailing list