This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [trunk] Addition to subreg section of rtl.text.

Joern Rennecke <> writes:
> On Tue, Mar 18, 2008 at 09:40:49PM +0000, Richard Sandiford wrote:
>> > The most natural layout would be 0x45??0123 .
>> > But you could also have 0x345?012? , or even more exotic mappings.
>> Do we actually support the second mapping though?  Surely the
>> target-independent code needs to know how bytes are divided into words?
> I don't see why the target-independent code would need to know what the bits
> inside a partial integer mode mean.


   (set (subreg:HI (reg:PDI ...) ...) ...)
   (set (zero_extract (subreg:SI (reg:PDI ...) ...) 16 ...) ...)

on a 32-bit machine with "..." filled in such that, with s/PDI/DI/, the
second store would kill the first.  We want to know if the same is true
of the original PDI version.

> A partial exception to this is when aritmetic for partial integers has to
> be implemented using arithmetic for integral integers; in this case, it is
> assumed that moving partial integers to integral integers, performing the
> arithmetic, and moving back to partial integers will produce the right result.
> So, if partial integer addition or subtraction is present, and no named
> pattern for these operations exits, this implies that valid bits are
> contiguous, and that any unused lower bits will read as zero (assuming we
> are actually dealing with bits here.  Stranger scenarious are possible,
> e.g. mod 81 arithmtic.)

>> The reason Kenny's looking at this is that he wants to track which
>> bytes in a SUBREG are actually live.
> A conservative assumption is that all bits occupied by the integral mode the
> partial integral mode is associated with are live.  If we really find that
> there is a code quality issue when making this assumption, we can add a hook
> to define the salient semantics, but I doubt this will come up.

But what we're trying to do here is define what bytes are _modified_
by a definition as well as what bytes are live in a use.  I assume
you're saying that a definition of any subreg that involves partial
integer modes should behave like a full definition _and_ a full use
of the partial mode?  If so, I'd argue that a special case like that is
too complicated to be justified if no mainline port uses it.  It seems
reasonable to require the port to behave "as if" the natural byte order

If a new port needs something different, it should be the port
submitter's responsibility to add suitable infrastructure
(where "suitable" means that the target-independent code can
follow that's going on).

>> >> 3) What about things like 80-bit FP modes on a 32-bit or 64-bit target? Is it valid to refer to pieces of an 80-bit FP pseudo? If so, are the rules we've got here right?
>> >
>> > Where the 80-bit mode is stored in multiple words like for x86, you
>> > should be able to refer to word_mode subregs the way the value is
>> > stored in memory.  This is the only way you can get a sane equivalence
>> > between reloads via secondary memory and direct register-register
>> > moves invollving word_mode GENERAL_REGS.
>> OK, so in all these cases, "N words and a bit" modes can be treated
>> like "N + 1 words, with the upper bits undefined"?  For both inner
>> and outer modes?
> N + 1 words, yes, but it doesn't follow that it must be the upper bits
> that are undefined.

Breaking the paragraph here because, as above, I'd argue that it's
reasonable to assume that the upper bits are the undefined ones unless
a mainline port needs something else.

> If that is actually the case, however, for an 80 bit
> value on a little-endian byte-addressed the target, the port could refer
> to the bits in the highest words as (subreg:HI (reg:XF inner_reg) 8) or
> (subreg:HI (mem:XF mem_addr) 8) to make this explicit.

Agreed.  The rules in the rtl.texi proposal allow this.

> However, what would we do with a true-blue big endian target?  Would
> the highest bits be (subreg:HI (reg:XF inner_reg 2)) ?

That's my understanding, yes (and it's what the proposed rules allow).

>> >> 4) Do stores to subregs of hardreg invalidate just the registers
>> >> mentioned in the outer mode or do they invalidate the entire set of
>> >> registers mentioned in the inner mode? (our rules say only the outer
>> >> mode).
>> >
>> > Where the hardreg is actually a single hardware register, all of it is
>> > clobbered.  If it is a concatenation of multiple actual hard
>> > registers, the idea is that only the one that corresponds to the word
>> > that is stored into gets clobbered.  If more than one word is stored
>> > into, that would logically translate to changing each of the registers
>> > that each word corresponds to.
>> >
>> > What seems less defined is what happens when the underlying hard registers
>> > are smaller than a word, and either the mode size or SUBREG_BYTE
>> > is not a multiple of a word.
>> Yeah, my version of the question was more: do we support subregs of
>> hard registers in which the normal word-based semantics of pseudos
>> do not apply?
> Having some data registers larger than word size is quite common,
> particularily floating point registers on machines with a word size
> smaller than the largest supported floating point mode.
> IIRC we support this, but not very well.

Yes, MIPS is one such port, but we expressly forbid conversions
between full-width and partial-width modes for the very reason
given in mainline rtl.texi:

    It is also not valid to access a single word of a multi-word value in a
    hard register when less registers can hold the value than would be
    expected from its size.  For example, some 32-bit machines have
    floating-point registers that can hold an entire @code{DFmode} value.
    If register 10 were such a register @code{(subreg:SI (reg:DF 10) 4)}
    would be invalid because there is no way to convert that reference to
    a single machine register.  The reload pass prevents @code{subreg}
    expressions such as these from being formed.

(Well, OK, these days MIPS has a much more general class_cannot_change_mode
implementation, but the case we're talking about used to be handled
explicitly.)  I believe other ports with this restriction do the same

> Where the hardware allows transfers bewteen differently sized registers,
> it seems most natural to use SUBREGs to express this.

My understanding is that this is invalid, and the rtl.texi documentation
seems to back this up.  MIPS for one uses a different technique.

> IIRC you have to do something like (SUBREG:SI (SUBREG:DI (REG:DF...
> and even spread it across multiple instruction patterns.
> I don't see why we should be picky about the MODE_CLASS of inner or
> outer modes of SUBREGs.

My understanding was that nested subregs aren't allowed (any more).

> If individual portions of multiple-word registers can be accessed individually
> like normal registers, it makes sense to mode the individual parts as
> separate registers, but it is essential that all parts can be both
> read from and writen to separately with moves from/to general purpose
> registers to make this work sanely.  Also, group spill allocation
> has extra costs in several ways, so if the predominant way to use the
> wide registers is to use them as a whole, it is still desirable to
> model them as wide registers and have the narrower accesses use
> SUBREG and/or zero_extract.

As above, I think the rtl.texi documentation makes this invalid (and this
is a long-standing restriction).

Let's assume a 32-bit target with 64-bit registers.  If you model
the 64-bit registers as single wide registers, you need to define
class_cannot_change_mode in such a way that they are not allocated
to (reg:DI PSEUDO)s that are accessed by things like:

       (subreg:SI (reg:DI PSUEOD) 0)
   and (subreg:SI (reg:DI PSEUDO) 4)

MIPS has had to this for many years.

AIUI, using subregs to convert between one full-width mode and another
is fine, and the proposed rules allow this.

> But there is also part of an answer here for the original question:
> when a wide register is only partially available as separate words,
> it is more likely to be available as separate values to read.
> If you can't write separate parts separately, it follows that a subreg
> write would naturally clobber the entire register.
> There is a problem, though, with considering zero_extract as an escape hatch
> if you do want to access only part of the register in sepcial circumstances:
> the documentation says that applied on memory, the inner mode must be
> byte-sized - this will certainly be violated in reload - and that for
> registers, the mode will be that of extv / insv.  Not all processors have
> extv / insv instructions, and even if they had, you might need more than
> one inner mode in different circumstances.  Why are we making any
> stipulations about the inner mode?

I think one reason is that allowing zero_extracts of multi-word modes is
(like this subreg thing) a little hard to pin down.  What happens when
WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN on a 32-bit target, and you have:

    (zero_extract (reg:DI ....) (const_int 16) (const_int 24))

(which should be BITS_BIG_ENDIAN-neutral).  0x76543210 would be laid out
in memory as "0x45670123", so is this extract equivalent to "0x70" or
"0x43"?  You could probably make a case for both, and I doubt the
target-independent code handles this consistently at the moment.

>> The current documentation expressly forbids taking
>> an SImode subreg of a DImode hard register on a 32-bit machine,
> Huh?  Then all our 32 bit ports which support long long must be broken.

I was talking specifically about single DImode registers.  Sorry for
not making that clear.

>> for example, and I agree that the subword hard register case is
>> also suspicious.
> I suppose it just doesn't happen often enough for anybody to have any
> strong opinion one way or other.  I suppose you can always express this
> with a zero_extract, so it would only become important if we had to
> worry about memory footprint of or processing time for zero_extract.
> So, pragmatically, I suppose we should go with whatever prohibition or
> definition allows the fastest implementation.


>> Without wanting to fan flames, isn't this something that should
>> be fixed in reload? ;)  Reload is amenable to change...
> We've already discussed this 16 months ago:
> FWIW, I did a small reload patch to my experimental local sources yesterday
> to tinker with reload types for a 0.2% size gain.

Sorry, I'd forgotten about that.  At least I was consistent ;)
Both then and now, I'm arguing that the change you want to make
should be correct for all targets.  Unless the thread got broken,
it doesn't seem like anyone has a justification for keeping the
current behaviour as an option.

(In other words, I was supporting the change in behaviour but
opposing the addition of a new hook.)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]