This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [trunk] Addition to subreg section of rtl.text.

From: Joern Rennecke <joernr at arc dot com>
To: gcc at gcc dot gnu dot org, zadeck at naturalbridge dot com, law at redhat dot com, dj at redhat dot com, rth at redhat dot com, iant at google dot com, bonzini at gnu dot org, rsandifo at nildram dot co dot uk
Date: Thu, 20 Mar 2008 01:27:16 +0000
Subject: Re: [trunk] Addition to subreg section of rtl.text.
References: <20080317194917.GB31732@elsdt-razorfish.arc.com> <87ve3jbti6.fsf@firetop.home> <20080319214512.GD14705@elsdt-razorfish.arc.com> <87myou9vc2.fsf@firetop.home>

On Wed, Mar 19, 2008 at 10:56:29PM +0000, Richard Sandiford wrote:
> > I don't see why the target-independent code would need to know what the bits
> > inside a partial integer mode mean.
> 
> Consider:
> 
>    (set (subreg:HI (reg:PDI ...) ...) ...)
>    (set (zero_extract (subreg:SI (reg:PDI ...) ...) 16 ...) ...)
> 
> on a 32-bit machine with "..." filled in such that, with s/PDI/DI/, the
> second store would kill the first.  We want to know if the same is true
> of the original PDI version.

Yes.  This should still address the same bits in both cases, even if you
don't know which of them are valid and what numerical significance
they have.

> But what we're trying to do here is define what bytes are _modified_
> by a definition as well as what bytes are live in a use.  I assume
> you're saying that a definition of any subreg that involves partial
> integer modes should behave like a full definition _and_ a full use
> of the partial mode?

The use and full definition comes only into play if you also would have them
for the associated integral mode.
I.e. for QImode == word_mode, if you write to an QImode lowpart of
a PHImode value, the upper part remains untouched, and the lowpart
gets replaced.  In general, you can't assume that you could read back
the value that you have just written, though.  Although in the 20 bit
address case, you could; the funkyness is all in the highpart.

If you don't want to track low and highpart separately, you can
pretend that all of the register is used and then re-defined, just like for
a Qimode subreg of HImode.

The same access, but with wordmode == HImode, will set the entire
PHImode register.

> If so, I'd argue that a special case like that is
> too complicated to be justified if no mainline port uses it.  It seems
> reasonable to require the port to behave "as if" the natural byte order
> applies.

Since you don't know how many invalid bits there are, requiring them to
be all on the side where the natural byte order would put the most significant
bits wouldn't help you.

But the SUBREGS and ZERO_EXTRACTs should still mean the same with respect to
selecting groups of bits.  You simply don't know which of them mean anything
and what their positional value is, if any, but you shouldn't need to.
So in that respect, it still behaves "as if" the natural byte order applies.

Note that it might be that the nature of a partial integer mode is such that
some narrowing subregs of it are simply invalid for a given mode, or storing
particular values into them is invalid, because they could form bit patterns
that have no equivalent in the actual hard register.

If the register is not valid in an integral mode, the register allocator
should never use it for ordinary values, so all that ends up in such a
register should result from target-specific expanders which have to know
what they are doing.

> >> OK, so in all these cases, "N words and a bit" modes can be treated
> >> like "N + 1 words, with the upper bits undefined"?  For both inner
> >> and outer modes?
> >
> > N + 1 words, yes, but it doesn't follow that it must be the upper bits
> > that are undefined.
> 
> Breaking the paragraph here because, as above, I'd argue that it's
> reasonable to assume that the upper bits are the undefined ones unless
> a mainline port needs something else.

The SH port does, in its handling of FPSCR.
But you shouldn't see any subregs in connection with this.
And even if it did, it would not really make lifeness tracking of PSImode any
different than for SImode.

> Yes, MIPS is one such port, but we expressly forbid conversions
> between full-width and partial-width modes for the very reason
> given in mainline rtl.texi:
> 
>     It is also not valid to access a single word of a multi-word value in a
>     hard register when less registers can hold the value than would be
>     expected from its size.  For example, some 32-bit machines have
>     floating-point registers that can hold an entire @code{DFmode} value.
>     If register 10 were such a register @code{(subreg:SI (reg:DF 10) 4)}
>     would be invalid because there is no way to convert that reference to
>     a single machine register.  The reload pass prevents @code{subreg}
>     expressions such as these from being formed.

The reasoning there is flawed.  You could still identify a specific hard
register when you are presented with a DFmode subreg of a DCmode or V2DFmode
inner register.
And @code{(subreg:SI (reg:DF 10) 0)} would be a natural way to express that
you are using the floating point register as a 32 bit integer register,
with writes clobbering the entire 64 bit of the register.

> > IIRC you have to do something like (SUBREG:SI (SUBREG:DI (REG:DF...
> > and even spread it across multiple instruction patterns.
> > I don't see why we should be picky about the MODE_CLASS of inner or
> > outer modes of SUBREGs.
> 
> My understanding was that nested subregs aren't allowed (any more).

That's why I taked about spreading it across multiple instruction patterns.
Unfortunately that can leave you with multiple machine instructions
where one would do, just because the middle-end is in denial that these
things might exist.

> > registers to make this work sanely.  Also, group spill allocation
> > has extra costs in several ways, so if the predominant way to use the
> > wide registers is to use them as a whole, it is still desirable to
> > model them as wide registers and have the narrower accesses use
> > SUBREG and/or zero_extract.
> 
> As above, I think the rtl.texi documentation makes this invalid (and this
> is a long-standing restriction).

Well, Kenny also asked 'are the rules we've got here right?'.
I think some of the rules are overly restrictive, and prevent gcc
from archiving its full potential for generating efficient code.
Moreover, if a port has an extv / insv pattern that matches in mode with the
wide registers, it can legitimately use the zero_extract route.  It's
reload that contradicts the documentation in changing registers into MEMs
and thus creating zero_extracts from wide MEMs.

> Let's assume a 32-bit target with 64-bit registers.  If you model
> the 64-bit registers as single wide registers, you need to define
> class_cannot_change_mode in such a way that they are not allocated
> to (reg:DI PSEUDO)s that are accessed by things like:
> 
>        (subreg:SI (reg:DI PSUEOD) 0)
>    and (subreg:SI (reg:DI PSEUDO) 4)

Or alternatively, you might not allow integer modes in the floating point
registers.

> > There is a problem, though, with considering zero_extract as an escape hatch
> > if you do want to access only part of the register in sepcial circumstances:
> > the documentation says that applied on memory, the inner mode must be
> > byte-sized - this will certainly be violated in reload - and that for
> > registers, the mode will be that of extv / insv.  Not all processors have
> > extv / insv instructions, and even if they had, you might need more than
> > one inner mode in different circumstances.  Why are we making any
> > stipulations about the inner mode?
> 
> I think one reason is that allowing zero_extracts of multi-word modes is
> (like this subreg thing) a little hard to pin down.  What happens when
> WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN on a 32-bit target, and you have:
> 
>     (zero_extract (reg:DI ....) (const_int 16) (const_int 24))
> 
> (which should be BITS_BIG_ENDIAN-neutral).  0x76543210 would be laid out
> in memory as "0x45670123", so is this extract equivalent to "0x70" or
> "0x43"?  You could probably make a case for both, and I doubt the
> target-independent code handles this consistently at the moment.

Huh?  The documentation says that zero_extract follows BITS_BIG_ENDIAN,
so the memory layout doesn't come into play.  We have a 64 bit value,
and BITS_BIG_ENDIAN determines which bits are meant.
BITS_BIG_ENDIAN ? 0 : 0x76 .

And even if you rewrote the meaning of zero_extract to follow other
endianness rules, you seem to be confused about the total size with
the extracted bit you stated above.

> (In other words, I was supporting the change in behaviour but
> opposing the addition of a new hook.)

Which would leave the SH out in the cold.  Not being an SH maintainer
anymore, I can live with that.
But I can't submit a new patch till all the Copyright assigment papaers
are filed...

Follow-Ups:
- Re: [trunk] Addition to subreg section of rtl.text.
  - From: Richard Sandiford

References:
- Re: [trunk] Addition to subreg section of rtl.text.
  - From: Joern Rennecke
- Re: [trunk] Addition to subreg section of rtl.text.
  - From: Richard Sandiford
- Re: [trunk] Addition to subreg section of rtl.text.
  - From: Joern Rennecke
- Re: [trunk] Addition to subreg section of rtl.text.
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]