patch to fix constant math - 4th patch - the wide-int class.

Wed Oct 31 14:39:00 GMT 2012

On 10/31/2012 08:44 AM, Richard Biener wrote:
> On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>>>> <mikestump@comcast.net> wrote:
>>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>>
>>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>> I do not believe that this can happen.  The core compiler
>>>>>>>>>>>> includes all
>>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>>> bigger, we'd
>>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>>
>>>>>>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>>>>>>> wide_int
>>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>>> templated
>>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>>
>>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>>>>>>> I
>>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>>>>>> goal of
>>>>>>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>>>>>>> the
>>>>>>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>>>>>>> operate
>>>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>>>>>>> latter Â…
>>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>>>>>> waste of time, as the time is better spent removing double-int from the
>>>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>>> then all the
>>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>>> must then
>>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>>> double-int; it is
>>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>>>> get rid of it - you
>>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>>>>>> example
>>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>>
>>>>>>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>>>>>
>>>>>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>>>> important
>>>>>>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>> i do not know why you believe this about the speed.     double int always
>>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>>
>>>>>>>> the thing about wide int, is that since it does math to the precision's
>>>>>>>> size, it almost never does uses synthetic operations since the sizes for
>>>>>>>> almost every instance can be done using the native math on the machine.
>>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>>> natively.
>>>>>>>> I seriously doubt that you are going to do TI mode math much faster than i
>>>>>>>> do it and if you do who cares.
>>>>>>>>
>>>>>>>> the number of calls does not effect the performance in any
>>>>>>>> negative way and
>>>>>>>> it fact is more efficient since common things that require more than one
>>>>>>>> operation in double in are typically done in a single operation.
>>>>>>> Simple double-int operations like
>>>>>>>
>>>>>>> inline double_int
>>>>>>> double_int::and_not (double_int b) const
>>>>>>> {
>>>>>>>    double_int result;
>>>>>>>    result.low = low & ~b.low;
>>>>>>>    result.high = high & ~b.high;
>>>>>>>    return result;
>>>>>>> }
>>>>>>>
>>>>>>> are always going to be faster than conditionally executing only one
>>>>>>> operation
>>>>>>> (but inside an offline function).
>>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>>> appropriate here.
>>>>>>
>>>>>> It's interesting that you gave this example, since before you were
>>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>>> removed in favour of separate and() and not() operations, but why
>>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>> I was more concerned about fused operations that use precision
>>>>> or bitsize as input.  That is for example
>>>>>
>>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>>> +  bool only_sign_bit_p () const;
>>>>> The first is construct a wide-int with precision prec (and sign- or
>>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>>> should not be necessary and existing callers should be questioned
>>>>> instead of introducing it.
>>>>>
>>>>> In fact wide-int seems to have so many "fused" operations that
>>>>> we run out of sensible recognizable names for them.  Which results
>>>>> in a lot of confusion on what the functions actually do (at least for me).
>>>> Well, I suppose I can't really say anything useful either way on
>>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>>
>>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>>> of what's fused and what isn't.
>>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>>> operations introduced separately together with the places they are
>>>>> used.  In the current way it's way too tedious to go over all of them
>>>>> and match them with callers, lookup enough context and then
>>>>> make up my mind on whether the caller should do sth different or not.
>>>>>
>>>>> Thus, consider the big initial API a reason that all this review takes
>>>>> so long ...
>>>>>
>>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>>> routines that pull out the integer representation again and create a
>>>>>> tree to hold the result, is going to be much slower than using either
>>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>>> measurable difference between double_int and wide_int here.
>>>>>>
>>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>>> wide integer really shouldn't have any significance.
>>>>>>
>>>>>> Your main complaint seems to be that the wide_int API is different
>>>>>> from the double_int one, but we can't literally use the same API, since
>>>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>>>> Having a precision that is separate from the underlying representation
>>>>>> is IMO the most important feature of wide_int, so:
>>>>>>
>>>>>>     template wide_int<2> double_int;
>>>>>>
>>>>>> is never going to be a drop-in, API-compatible replacement for double_int.
>>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>>> you have a double_int<N> class.
>>>> But you don't!  Because...
>>>>
>>>>> Thus wide-int should have a base
>>>>> of that kind and just add precision / bitsize ontop of that.  It wouldn't
>>>>> be a step forward if we end up replacing double_int uses with
>>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>>> would it?
>>>> ...the precision and bitsize isn't an optional extra, either conceptually
>>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>>> the hood, but the value of N is an internal implementation detail.
>>>> No operations are done to N HWIs, they're done to the number of bits
>>>> in the operands.  Whereas a double_int<N> class does everything to N HWIs.
>>> If that's the only effect then either bitsize or precision is redundant (and
>>> we also have len ...).  Note I did not mention len above, thus the base
>>> class would retain 'len' and double-int would simply use 2 for it
>>> (if you don't template it but make it variable).
>> But that means that wide_int has to model a P-bit operation as a
>> "normal" len*HOST_WIDE_INT operation and then fix up the result
>> after the fact, which seems unnecessarily convoluted.
> It does that right now.  The operations are carried out in a loop
> over len HOST_WIDE_INT parts, the last HWI is then special-treated
> to account for precision/size.  (yes, 'len' is also used as optimization - the
> fact that len ends up being mutable is another thing I dislike about
> wide-int.  If wide-ints are cheap then all ops should be non-mutating
> (at least to 'len')).
There are currently two places where len is mutable.    They are parts 
where i did not believe that i had the expertise to rewrite the double 
int code.    They are in the conversion to and from float and the 
conversion to and from fixed.    In those cases the api is exposed so 
that wide-ints could be built.    I did not think that such temporary 
scaffolding would be the source of ridicule.

All of the uses of wide int as integers are truly functional.

>
>>   I still don't
>> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
>> X*HOST_WIDE_INT operation for any X) has any special meaning.
> Well, the same reason as a HOST_WIDE_INT variable has a meaning.
> We use it to constrain what we (efficiently) want to work on.  For example
> CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
> doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).
>
> Oh, and I don't necessary see a use of double_int in its current form
> but for an integer representation on the host that is efficient to manipulate
> integer constants of a target dependent size.  For example the target
> detail that we have partial integer modes with bitsize > precision and that
> the bits > precision appearantly have a meaning when looking at the
> bit-representation of a constant should not be part of the base class
> of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
> about the reason we track bitsize in addition to precision - I think it's
> abstraction at the wrong level, the tree level does fine without knowing
> about bitsize).
>
> Richard.
>
>> Richard