patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1

Wed Apr 3 13:36:00 GMT 2013

On 04/03/2013 05:17 AM, Richard Biener wrote:

> In the end you will have a variable-size storage in TREE_INT_CST thus
> you will have at least to emit _code_ copying over meta-data and data
> from the tree representation to the wide-int (similar for RTX CONST_DOUBLE/INT).
> I'm objecting to the amount of code you emit and agree that the runtime
> cost is copying the meta-data (hopefully optimizable via CSE / SRA)
> and in most cases one (or two) iterations of the loop copying the data
> (not optimizable).
i did get rid of the bitsize in the wide-int patch so at this point the 
meta data is the precision and the len.
not really a lot here.   As usual we pay a high price in gcc for not 
pushing the tree rep down into the rtl level, then it would have been 
acceptable to have the tree type bleed into the wide-int code.

>> 2)  You present this as if the implementor actually should care about the
>> implementation and you give 3 alternatives:  the double_int, the current
>> one, and HWI.     We have tried to make it so that the client should not
>> care.   Certainly in my experience here, I have not found a place to care.
> Well, similar as for the copying overhead for tree your approach requires
> overloading operations for HOST_WIDE_INT operands to be able to
> say wi + 1 (which is certainly desirable), or the overhead of using
> wide_int_one ().
>
>> In my opinion double_int needs to go away.  That is the main thrust of my
>> patches.   There is no place in a compiler for an abi that depends on
>> constants fitting into 2 two words whose size is defined by the host.
> That's true.  I'm not arguing to preserve "double-int" - I'm arguing to
> preserve a way to ask for an integer type on the host with (at least)
> N bits.  Almost all double-int users really ask for an integer type on the
> host that has at least as many bits as the pointer representation (or
> word_mode) on
> the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer
> targets).  No double-int user specifically wants 2 * HOST_WIDE_INT
> precision - that is just what happens to be there.  Thus I am providing
> a way to say get me a host integer with at least N bits (VRP asks for
> this, for example).
>
> What I was asking for is that whatever can provide the above should share
> the functional interface with wide-int (or the othert way around).  And I
> was claiming that wide-int is too fat, because current users of double-int
> eventually store double-ints permanently.
The problem is that, in truth, double int is too fat. 99.something% of 
all constants fit in 1 hwi and that is likely to be true forever (i 
understand that tree vpn may need some thought here).  The rtl level, 
which has, for as long as i have known it, had 2 reps for integer 
constants. So it was relatively easy to slide the CONST_WIDE_INT in.  It 
seems like the right trickery here rather than adding a storage model 
for wide-ints might be a way to use the c++ to invisibly support several 
(and by "several" i really mean 2) classes of TREE_CSTs.

>
>> This is not a beauty contest argument, we have public ports are beginning to
>> use modes that are larger than two x86-64 HWIs and i have a private port
>> that has such modes and it is my experience that any pass that uses this
>> interface has one of three behaviors: it silently gets the wrong answer, it
>> ices, or it fails to do the transformation.  If we leave double_int as an
>> available option, then any use of it potentially will have one of these
>> three behaviors.  And so one of my strong objections to this direction is
>> that i do not want to fight this kind of bug for the rest of my life.
>> Having a single storage model that just always works is in my opinion a
>> highly desirable option.  What you have never answered in a concrete manner
>> is, if we decide to provide this generality, what it would be used for.
>> There is no place in a portable compiler where the right answer for every
>> target is two HOST wide integers.
>>
>> However, i will admit that the HWI option has some merits.   We try to
>> address this in our implementation by dividing what is done inline in
>> wide-int.h to the cases that fit in an HWI and then only drop into the heavy
>> code in wide-int.c if mode is larger (which it rarely will be).   However, a
>> case could be made that for certain kinds of things like string lengths and
>> such, we could use another interface or as you argue, a different storage
>> model with the same interface.   I just do not see that the cost of the
>> conversion code is really going to show up on anyone's radar.
> What's the issue with abstracting away the model so a fixed-size 'len'
> is possible?  (let away the argument that this would easily allow an
> adaptor to tree)
I have a particularly pessimistic perspective because i have already 
written most of this patch.   It is not that i do not want to change 
that code, it is that i have seen a certain set of mistakes that were 
made and i do not want to fix them more than once.   At the rtl level 
you can see the transition from only supporting 32 bit ints to 
supporting 64 bit its to finally supporting two HWIs and that transition 
code is not pretty.  My rtl patch fixes the places where an optimization 
was only made if the data type was 32 bits or smaller as well as the 
places where the optimization was made only if the data type is smaller 
than 64 bits (in addition to fixing all of the places where the code 
ices or simply gets the wrong answer if it is larger than TImode.)  The 
tree level is only modestly better, I believe only because it is newer. 
I have not seen any 32 bit only code, but it is littered with 
transformations that only work for 64 bits.   What is that 64 bit only 
code going to look like in 5 years?

I want to make it easier to write the general code than to write the 
code that only solves the problem for the size port that the implementor 
is currently working on.   So I perceive the storage model as a way to 
keep having to fight this battle forever because it will allow the 
implementor to make a decision that the optimization only needs to be 
done for a particular sized integer.

However, i get the fact that from your perspective, what you really want 
is a solution to the data structure problem in tree-vrp.  My patch for 
tree vrp scans the entire function to find the largest type used in the 
function and then does all of the math at 2x that size.  But i have to 
admit that i thought it was weird that you use tree cst as your long 
term storage.   If this were my pass, i would have allocated a 2d array 
of some type that was as large as the function in 1d and twice as large 
as the largest int used in the other dimension and not overloaded 
tree-cst and then had a set of friend functions in double int to get in 
and out. Of course you do not need friends in double int because the rep 
is exposed, but in wide-int that is now hidden since it now is purely 
functional.

I just have to believe that there is a better way to do tree-vrp than 
messing up wide-int for the rest of the compiler.

>> 3) your trick will work at the tree level, but not at the rtl level.   The
>> wide-int code cannot share storage with the CONST_INTs.    We tried this,
>> and there are a million bugs that would have to be fixed to make it work.
>> It could have worked if CONST_INTs had carried a mode around, but since they
>> do not, you end up with the same CONST_INT sharing the rep for several
>> different types and that just did not work unless you are willing to do
>> substantial cleanups.
> I don't believe you.  I am only asking for the adaptors to tree and RTL to
> work in an RVALUE-ish way (no modification, as obviously RTL and tree
> constants may be shared).  I think your claim is because you have that
> precision and bitsize members in your wide-int which I believe is a
> design red herring.  I suppose we should concentrate on addressing that
> one first.  Thus, let me repeat a few questions on your design to eventually
> let you understand my concerns:
>
> Given two wide-ints, a and b, what precision will the result of
>       a + b
> have?  Consider a having precision 32 and b having precision 64
> on a 32-bit HWI host.
>
> You define wide-int to have no sign information:
>
> +   The representation does not contain any information about
> +   signedness of the represented value, so it can be used to represent
> +   both signed and unsigned numbers.  For operations where the results
> +   depend on signedness (division, comparisons), the signedness must
> +   be specified separately.  For operations where the signness
> +   matters, one of the operands to the operation specifies either
> +   wide_int::SIGNED or wide_int::UNSIGNED.
>
> but the result of operations on mixed precision operands _does_ depend
> on the sign, nevertheless most operations do not get a signedness argument.
> Nor would that help, because it depends on the signedness of the individual
> operands!
>
> double-int get's around this by having a common "precision" to which
> all smaller precision constants have to be sign-/zero-extended.  So
> does CONST_INT and CONST_DOUBLE.
>
> Note that even with same precision you have introduced the same problem
> with the variable len.
>
> My proposal is simple - all wide-ints are signed!  wide-int is basically
> an arbitrary precision signed integer format.  The sign is encoded in
> the most significant bit of the last active HWI of the representation
> (val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)).  All values
> with less precision than len * HOST_BITS_PER_WIDE_INT are
> properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT.
>
> This let's you define mixed len operations by implicitely sign-/zero-extending
> the operands to whatever len is required for the operation.
>
> Furthermore it allows you to get rid of the precision member (and
> bitsize anyway).
> Conversion from tree / RTL requires information on the signedness of the
> input (trivial for tree, for RTL all constants are signed - well,
> sign-extended).
> Whenever you want to transfer the wide-int to tree / RTL you have to
> sign-/zero-extend according to the desired precision.  If you need sth else
> than arbitrary precision arithmetic you have to explicitely truncate / extend
> at suitable places - with overflow checking being trivial here.  For
> optimization
> purposes selected operations may benefit from a combined implementation
> receiving a target precision and signedness.  Whatever extra meta-data
> RTL requires does not belong in wide-int but in the RTX.  Trivially
> a mode comes to my mind (on tree we have a type), and trivially
> each RTX has a mode.  And each mode has a precision and bitsize.
> It lacks a sign, so all RTL integer constants are sign-extended for
> encoding efficiency purposes.  mixed-mode operations will not
> occur (mixed len operations will), mixed-mode ops are exclusively
> sign-/zero-extensions and truncations.
>
> Representation of (unsigned HOST_WIDE_INT)-1 would necessarily
> be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any
> precision would be { -1 }.
>
> That was my proposal.  Now, can you please properly specify yours?
>
> Thanks,
> Richard.
>
>> On 04/02/2013 11:04 AM, Richard Biener wrote:
>>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
>>> <zadeck@naturalbridge.com> wrote:
>>>> This patch contains a large number of the changes requested by Richi.
>>>> It
>>>> does not contain any of the changes that he requested to abstract the
>>>> storage layer.   That suggestion appears to be quite unworkable.
>>> I of course took this claim as a challenge ... with the following result.
>>> It is
>>> of course quite workable ;)
>>>
>>> The attached patch implements the core wide-int class and three storage
>>> models (fixed size for things like plain HWI and double-int, variable size
>>> similar to how your wide-int works and an adaptor for the double-int as
>>> contained in trees).  With that you can now do
>>>
>>> HOST_WIDE_INT
>>> wi_test (tree x)
>>> {
>>>     // template argument deduction doesn't do the magic we want it to do
>>>     // to make this kind of implicit conversions work
>>>     // overload resolution considers this kind of conversions so we
>>>     // need some magic that combines both ... but seeding the overload
>>>     // set with some instantiations doesn't seem to be possible :/
>>>     // wide_int<> w = x + 1;
>>>     wide_int<> w;
>>>     w += x;
>>>     w += 1;
>>>     // template argument deduction doesn't deduce the return value type,
>>>     // not considering the template default argument either ...
>>>     // w = wi (x) + 1;
>>>     // we could support this by providing rvalue-to-lvalue promotion
>>>     // via a traits class?
>>>     // otoh it would lead to sub-optimal code anyway so we should
>>>     // make the result available as reference parameter and only support
>>>     // wide_int <> res; add (res, x, 1); ?
>>>     w = wi (x).operator+<wide_int<> >(1);
>>>     wide_int<>::add(w, x, 1);
>>>     return w.to_hwi ();
>>> }
>>>
>>> we are somewhat limited with C++ unless we want to get really fancy.
>>> Eventually providing operator+ just doesn't make much sense for
>>> generic wide-int combinations (though then the issue is its operands
>>> are no longer commutative which I think is the case with your wide-int
>>> or double-int as well - they don't suport "1 + wide_int" for obvious
>>> reasons).
>>>
>>> So there are implementation design choices left undecided.
>>>
>>> Oh, and the operation implementations are crap (they compute nonsense).
>>>
>>> But you should get the idea.
>>>
>>> Richard.
>>