This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: wide-int branch now up for public comment and review

From: Richard Sandiford <rdsandiford at googlemail dot com>
To: Kenneth Zadeck <zadeck at naturalbridge dot com>
Cc: rguenther at suse dot de, gcc-patches <gcc-patches at gcc dot gnu dot org>, Mike Stump <mikestump at comcast dot net>, r dot sandiford at uk dot ibm dot com
Date: Sat, 24 Aug 2013 10:47:48 +0100
Subject: Re: wide-int branch now up for public comment and review
References: <520A9DCC dot 6080609 at naturalbridge dot com> <87ppt4e9hg dot fsf at talisman dot default> <5217CB03 dot 6010400 at naturalbridge dot com>

Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>      * Code that does widening conversions.  The canonical way that
>>>        this is performed is to sign or zero extend the input value to
>>>        the max width based on the sign of the type of the source and
>>>        then to truncate that value to the target type.  This is in
>>>        preference to using the sign of the target type to extend the
>>>        value directly (which gets the wrong value for the conversion
>>>        of large unsigned numbers to larger signed types).
>
>
>> I don't understand this particular reason.  Using the sign of the source
>> type is obviously right, but why does that mean we need "infinite" precision,
>> rather than just doubling the precision of the source?
> in a sense, "infinite" does not really mean infinite, it really just 
> means large enough so that you never loose any information from the 
> top.    For widening all that you really need to be "infinite" is one 
> more bit larger than the destination type.

I'm still being clueless, sorry, but why does the intermediate int need
to be wider than the destination type?  If you're going from an N1-bit
value to an N2>N1-bit value, I don't see why you ever need more than
N2 bits.  Maybe I'm misunderstanding what the comment means by
"widening conversions".

>> This seems dangerous.  Not all code that uses "unsigned HOST_WIDE_INT"
>> actually wants it to be an unsigned value.  Some code uses it to avoid
>> the undefinedness of signed overflow.  So these overloads could lead
>> to us accidentally zero-extending what's conceptually a signed value
>> without any obvious indication that that's happening.  Also, hex constants
>> are unsigned int, but it doesn't seem safe to assume that 0x80000000 was
>> meant to be zero-extended.
>>
>> I realise the same thing can happen if you mix "unsigned int" with
>> HOST_WIDE_INT, but the point is that you shouldn't really do that
>> in general, whereas we're defining these overloads precisely so that
>> a mixture can be used.
>>
>> I'd prefer some explicit indication of the sign, at least for anything
>> other than plain "int" (so that the compiler will complain about uses
>> of "unsigned int" and above).
>
> There is a part of me that finds this scary and a part of me that feels 
> that the concern is largely theoretical.    It does make it much easier 
> to read and understand the code to be able to write "t + 6" rather than 
> "wide_int (t) + wide_int::from_uhwi" (6) but of course you loose some 
> control over how 6 is converted.

I replied in more detail to Mike's comment, but the reason I suggested
only allowing this for plain "int" (and using templates to forbid
"unsigned int" and wider types) was that you could still write "t + 6".
You just couldn't write "t + 0x80000000" or "t + x", where "x" is a
HOST_WIDE_INT or similar.

Code can store values in "HOST_WIDE_INT" or "unsigned HOST_WIDE_INT"
if we know at GCC compile time that HOST_WIDE_INT is big enough.
But code doesn't necessarily store values in a "HOST_WIDE_INT"
because we know at GCC compile time that the value is signed,
or in a "unsigned HOST_WIDE_INT" because we know at GCC compile
time that the value is unsigned.  The signedness often depends
on the input.  The language still forces us to pick one or the
other though.  I don't feel comfortable with having syntactic
sugar that reads too much into the choice.

>>>    Note that the bits above the precision are not defined and the
>>>    algorithms used here are careful not to depend on their value.  In
>>>    particular, values that come in from rtx constants may have random
>>>    bits.
>> I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't
>> have random bits.  The upper bits must be a sign extension of the value.
>> There's exactly one valid rtx for each (value, mode) pair.  If you saw
>> something different then that sounds like a bug.  The rules should already
>> be fairly well enforced though, since something like (const_int 128) --
>> or (const_int 256) -- will not match a QImode operand.
>>
>> This is probably the part of the representation that I disagree most with.
>> There seem to be two main ways we could hande the extension to whole HWIs:
>>
>> (1) leave the stored upper bits undefined and extend them on read
>> (2) keep the stored upper bits in extended form
> It is not a matter of opening old wounds.   I had run some tests on 
> x86-64 and was never able to assume that the bits above the precision 
> had always been canonized.   I will admit that i did not try to run down 
> the bugs because i thought that since the rtx constructors did not have 
> a mode associated with them now was one required to in the constructors, 
> that this was not an easily solvable problem.   So i have no idea if i 
> hit the one and only bug or was about to start drinking from a fire 
> hose.

Hopefully the first one. :-)   Would you mind going back and trying again,
so that we at least have some idea what kind of bugs they were?

> But the back ends are full of GEN_INT (a) where a came from god 
> knows where and you almost never see it properly canonized.   I think 
> that until GEN_INT takes a mandatory non VOIDmode mode parameter, and 
> that constructor canonizes it, you are doomed chasing this bug 
> forever.

Well, sure, the bug crops up now and then (I fixed another instance
a week or so ago).  But generally this bug shows up as an ICE --
usually an unrecognisable instruction ICE -- precisely because this
is something that the code is already quite picky about.

Most of the GEN_INTs in the backend will be correct.  They're in
a position to make asumptions about instructions and target modes.

> Having said that, we actually do neither of (1) or (2) inside of 
> wide-int.  For rtl to wide-int, we leave the upper bits undefined and 
> never allow you to look at them because the constructor has a mode and 
> that mode allows you to draw a line in the sand.   There is no 
> constructor for the "infinite" wide ints from rtl so you have no way to 
> look.

Sorry, I don't understand the distinction you're making between this
and (1).  The precision is taken from the mode, and you flesh out the
upper bits on read if you care.

> Doing this allows us to do something that richi really wanted: avoiding 
> copying.   We do not get to do as much richi would like and when he 
> comes back from leave, he may have other places where can apply it, but 
> right now if you say w = t + 6 as above, it "borrows" the rep from t to 
> do the add, it does not really build a wide-int. We also do this if t is 
> an rtx const.   But if we had to canonize the number, then we could not 
> borrow the rep.

But the point is that the number is already canonical for rtx, so you
should do nothing.  I.e. (2) will be doing less work.  If we have other
input types besides rtx that don't define the upper bits, then under (2)
their wide_int accessor functions (to_shwi[12] in the current scheme)
should do the extension.

>> The patch goes for (1) but (2) seems better to me, for a few reasons:
>>
>> * As above, constants coming from rtl are already in the right form,
>>    so if you create a wide_int from an rtx and only query it, no explicit
>>    extension is needed.
>>
>> * Things like logical operations and right shifts naturally preserve
>>    the sign-extended form, so only a subset of write operations need
>>    to take special measures.
> right now the internals of wide-int do not keep the bits above the 
> precision clean.   as you point out this could be fixed by changing 
> lshift, add, sub, mul, div (and anything else i have forgotten about) 
> and removing the code that cleans this up on exit.   I actually do not 
> really care which way we go here but if we do go on keeping the bits 
> clean above the precision inside wide-int, we are going to have to clean 
> the bits in the constructors from rtl, or fix some/a lot of bugs.
>
> But if you want to go with the stay clean plan you have to start clean, 
> so at the rtl level this means copying. and it is the not copying trick 
> that pushed me in the direction we went.
>
> At the tree level, this is not an issue.   There are no constructors for 
> tree-csts that do not have a type and before we copy the rep from the 
> wide-int to the tree, we clean the top bits.   (BTW - If i had to guess 
> what the bug is with the missing messages on the mips port, it would be 
> because the front ends HAD a bad habit of creating constants that did 
> not fit into a type and then later checking to see if there were any 
> interesting bits above the precision in the int-cst.  This now does not 
> work because we clean out those top bits on construction but it would 
> not surprise me if we missed the fixed point constant path.)   So at the 
> tree level, we could easily go either way here, but there is a cost at 
> the rtl level with doing (2).

TBH, I think we should do (2) and fix whatever bugs you saw with invalid
rtx constants.

>>>    When the precision is 0, all the bits in the LEN elements of
>>>    VEC are significant with no undefined bits.  Precisionless
>>>    constants are limited to being one or two HOST_WIDE_INTs.  When two
>>>    are used the upper value is 0, and the high order bit of the first
>>>    value is set.  (Note that this may need to be generalized if it is
>>>    ever necessary to support 32bit HWIs again).
>> I didn't understand this.  When are two HOST_WIDE_INTs needed for
>> "precision 0"?
> if a large unsigned constant comes in with the top bit set, the 
> canonized value takes 2 hwis, the top hwi being 0.

Ah, yeah.  But then that sounds like another reason to only allow
"int" to have zero precision.

>>> /* Return THIS as a signed HOST_WIDE_INT.  If THIS does not fit in
>>>     PREC, the information is lost. */
>>> HOST_WIDE_INT
>>> to_shwi (unsigned int prec = 0) const
>> Just dropping the excess bits seems dangerous.  I think we should assert
>> instead, at least when prec is 0.
> there are times when this is useful.   there is a lot of code that just 
> wants to look at the bottom bits to do some alignment stuff. I guess 
> that code could just grab the bottom elt of the array, but has not 
> generally been how these this has been done.

Yeah, using elt to mean "get the low HOST_WIDE_INT" sounds better to me FWIW.
"to_shwi" sounds like a type conversion whereas "elt" is more obviously
accessing only part of the value.

>>>    gcc_unreachable ();
>>> #if 0
>>>    return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1);
>>> #endif
>> Maybe remove this?
>>
>> The inline div routines do:
> i will work on this this weekend.   tree vrp has not been our friend and 
> sometimes does not like to compile this function.

OK.  If it's just temporary then never mind, but a comment might help.
I remember you mentioned tree-vrp in a reply to one of Richard's reviews,
so this is probably something you've been asked before.  And will probably
be asked again if there's no comment :-)

>> I think these are going to bitrot quickly if we #if 0 then out.
>> I think we should either use:
>>
>>    if (DEBUG_WIDE_INT)
>>      debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2);
>>
>> or drop them.
> my plan is to leave these in while the branch is still being developed 
> and then get rid of them before it is merged.    My guess is that i am 
> going to need them still when i try the 32bit hwi test.

Ah, OK, sounds good.

One thing I hit yesterday while trying out the "lightweight classes"
idea from the end of the mail is that, as things stand, most operations
end up with machine instructions to do:

  if (*p1 == 0)
    *p1 = *p2;

  if (*p2 == 0)
    *p2 = *p1;

E.g. for wide_int + wide_int, wide_int + tree and wide_int + rtx,
where the compiler has no way of statically proving that the precisions
are nonzero.  Considering all the template goo we're adding to avoid
a copy (even though that copy is going to be 24 bytes in the worst case
on most targets, and could be easily SRAed), this seems like a relatively
high overhead for syntactic sugar, especially on hosts that need branches
rather than conditional moves.

Either all this is premature optimisation (including avoiding the
copying) or we should probably do something to avoid these checks too.
I wonder how easy it would be to restrict this use of "zero precision"
(i.e. flexible precision) to those where primitive types like "int" are
used as template arguments to operators, and require a precision when
constructing a wide_int.  I wouldn't have expected "real" precision 0
(from zero-width bitfields or whatever) to need any special handling
compared to precision 1 or 2.

Thanks,
Richard

Follow-Ups:
- Re: wide-int branch now up for public comment and review
  - From: Richard Sandiford
- Re: wide-int branch now up for public comment and review
  - From: Kenneth Zadeck

References:
- wide-int branch now up for public comment and review
  - From: Kenneth Zadeck
- Re: wide-int branch now up for public comment and review
  - From: Richard Sandiford
- Re: wide-int branch now up for public comment and review
  - From: Kenneth Zadeck

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]