[PATCH] Optimize zero_extend into paradoxical subreg

Roger Sayle roger@eyesopen.com
Sat Sep 28 10:17:00 GMT 2002

On Sat, 28 Sep 2002, Richard Earnshaw wrote:
> > A recent discussion on GCC's semantics for paradoxical subregs on
> > this list suggested there was a need for additional optimizations
> > to optimize their use.  In that thread I suggested that simplify_rtx
> > could eliminate explicit zero_extension and sign_extension operations
> > on targets with LOAD_EXTEND_OP when the extension operand was a MEM.
> > The patch below implements this suggestion.
> >
> I'm not convinced this is a good idea.  Surely it would be better to go
> the other way -- ie translate (subreg:M1 (mem:M2) 0) into zero_ or
> sign_extend if M1 is wider than M2.
> There's a general principle for RTL that it's better to be explicit about
> an operation than rely on implicit meaning -- your patch seems to go
> against that.

However, on many GCC targets zero_extend or sign_extend require real
instructions to be generated, whereas automatic sign/zero extension
when present in hardware does not.  Hence the optimization above, takes
advantage of a special case (with well defined semantics) to optimize
away those instructions.

My interpretation of this principle is that front and middle ends
should keep the RTL as explicit as possible.  But its the job of
the optimizers to take advantage of machine idioms where they
exist.  Going from zero-extend to subreg is a lowering, and if
the RTL is explicit about what its doing there shouldn't be a
need to go the other way.

Your proposal of going the other direction is also complicated on
many platforms that don't implement zero_extend or sign_extend at
the RTL-level, converting these extensions into a pair of shifts,
further obfuscating by the "better to be explicit principle".

In an ideal world we probably wouldn't need LOAD_EXTEND_OP,
target patterns would appropriately recognize "(zero_extend (mem ...))""
and GCC would recognize"(?rshift (?lshift ...))" as extend insns.

Hence the crux is whether we require backends to make LOAD_EXTEND_OP
explicit in their RTL or continue to support the other common style of
machine descriptions with patches such as mine.  I can be convinced
either way.  The powerpc and i960 people argued that this kind
of optimization is needed, and I knew how to do what they requested.
If the consensus is that this is a legacy issue, then as you say
my patch is indeed a step in the wrong direction.


More information about the Gcc-patches mailing list