This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Optimize zero_extend into paradoxical subreg
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: Richard dot Earnshaw at arm dot com, gcc-patches at gcc dot gnu dot org
- Date: Mon, 30 Sep 2002 10:16:27 +0100
- Subject: Re: [PATCH] Optimize zero_extend into paradoxical subreg
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> On Sat, 28 Sep 2002, Richard Earnshaw wrote:
> > > A recent discussion on GCC's semantics for paradoxical subregs on
> > > this list suggested there was a need for additional optimizations
> > > to optimize their use. In that thread I suggested that simplify_rtx
> > > could eliminate explicit zero_extension and sign_extension operations
> > > on targets with LOAD_EXTEND_OP when the extension operand was a MEM.
> > > The patch below implements this suggestion.
> > >
> > I'm not convinced this is a good idea. Surely it would be better to go
> > the other way -- ie translate (subreg:M1 (mem:M2) 0) into zero_ or
> > sign_extend if M1 is wider than M2.
> > There's a general principle for RTL that it's better to be explicit about
> > an operation than rely on implicit meaning -- your patch seems to go
> > against that.
> However, on many GCC targets zero_extend or sign_extend require real
> instructions to be generated, whereas automatic sign/zero extension
> when present in hardware does not. Hence the optimization above, takes
> advantage of a special case (with well defined semantics) to optimize
> away those instructions.
You miss the point.
1) Paradoxical subregs of a mem are *only* defined on machines which
define LOAD_EXTEND_OP; in all other cases we can only have a paradoxical
subreg of another register.
2) LOAD_EXTEND_OP documentation says:
Define this macro to be a C expression indicating when insns that
read memory in MODE, an integral mode narrower than a word, set the
bits outside of MODE to be either the sign-extension or the
zero-extension of the data read. Return `SIGN_EXTEND' for values
of MODE for which the insn sign-extends, `ZERO_EXTEND' for which
it zero-extends, and `NIL' for other modes.
So there would be no point in defining this on a machine where zero/sign
extension from a memory is not a natural side effect of a load (and
therefore free), not least because if you did you'd almost certainly get
wrong code. So the issue about some machines needing several instructions
to zero/sign extend from memory is not relevant here (indeed on the ARM
prior to Architecture v4, loading a half-word from memory did not extend
the result to a word, so LOAD_EXTEND_OP did return NIL to indicate this).
> My interpretation of this principle is that front and middle ends
> should keep the RTL as explicit as possible. But its the job of
> the optimizers to take advantage of machine idioms where they
> exist. Going from zero-extend to subreg is a lowering, and if
> the RTL is explicit about what its doing there shouldn't be a
> need to go the other way.
No, the point of the optimizers is never to throw away information, just
to find a more efficient way of expressing the computation. If we start
throwing away important information then we can end up with ambiguities
(which seems to be the root cause of all the problems here).
> In an ideal world we probably wouldn't need LOAD_EXTEND_OP,
> target patterns would appropriately recognize "(zero_extend (mem ...))""
> and GCC would recognize"(?rshift (?lshift ...))" as extend insns.
I think working on the elimination of LOAD_EXTEND_OP might be a good move.
It might mean that we can eliminate another case of paradoxical subregs,
which (IMO) would be good.