[RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

Richard Sandiford richard.sandiford@arm.com
Fri Sep 29 10:30:06 GMT 2023


Richard Biener <rguenther@suse.de> writes:
> On Thu, 28 Sep 2023, Jakub Jelinek wrote:
>
>> Hi!
>> 
>> On Tue, Aug 29, 2023 at 05:09:52PM +0200, Jakub Jelinek via Gcc-patches wrote:
>> > On Tue, Aug 29, 2023 at 11:42:48AM +0100, Richard Sandiford wrote:
>> > > > I'll note tree-ssa-loop-niter.cc also uses GMP in some cases, widest_int
>> > > > is really trying to be poor-mans GMP by limiting the maximum precision.
>> > > 
>> > > I'd characterise widest_int as "a wide_int that is big enough to hold
>> > > all supported integer types, without losing sign information".  It's
>> > > not big enough to do arbitrary arithmetic without losing precision
>> > > (in the way that GMP is).
>> > > 
>> > > If the new limit on integer sizes is 65535 bits for all targets,
>> > > then I think that means that widest_int needs to become a 65536-bit type.
>> > > (But not with all bits represented all the time, of course.)
>> > 
>> > If the widest_int storage would be dependent on the len rather than
>> > precision for how it is stored, then I think we'd need a new method which
>> > would be called at the start of filling the limbs where we'd tell how many
>> > limbs there would be (i.e. what will set_len be called with later on), and
>> > do nothing for all storages but the new widest_int_storage.
>> 
>> So, I've spent some time on this.  While wide_int is in the patch a fixed/variable
>> number of limbs (aka len) storage depending on precision (precision >
>> WIDE_INT_MAX_PRECISION means heap allocated limb array, otherwise it is
>> inline), widest_int has always very large precision
>> (WIDEST_INT_MAX_PRECISION, currently defined to the INTEGER_CST imposed
>> limitation of 255 64-bit limbs) but uses inline array for length
>> corresponding up to WIDE_INT_MAX_PRECISION bits and for larger one uses
>> similarly to wide_int a heap allocated array of limbs.
>> These changes make both wide_int and widest_int obviously non-POD, not
>> trivially default constructible, nor trivially copy constructible, trivially
>> destructible, trivially copyable, so not a good fit for GC and some vec
>> operations.
>> One common use of wide_int in GC structures was in dwarf2out.{h,cc}; but as
>> large _BitInt constants don't appear in RTL, we really don't need such large
>> precisions there.
>> So, for wide_int the patch introduces rwide_int, restricted wide_int, which
>> acts like the old wide_int (except that it is now trivially default
>> constructible and has assertions precision isn't set above
>> WIDE_INT_MAX_PRECISION).
>> For widest_int, the nastiness is that because it always has huge precision
>> of 16320 right now,
>> a) we need to be told upfront in wide-int.h before calling the large
>>    value internal functions in wide-int.cc how many elements we'll need for
>>    the result (some reasonable upper estimate is fine)
>> b) various of the wide-int.cc functions were lazy and assumed precision is
>>    small enough and often used up to that many elements, which is
>>    undesirable; so, it now tries to decreas that and use xi.len etc. based
>>    estimates instead if possible (sometimes only if precision is above
>>    WIDE_INT_MAX_PRECISION)
>> c) with the higher precision, behavior changes for lrshift (-1, 2) etc. or
>>    unsigned division with dividend having most significant bit set in
>>    widest_int - while such values were considered to be above or equal to
>>    1 << (WIDE_INT_MAX_PRECISION - 2), now they are with
>>    WIDEST_INT_MAX_PRECISION and so much larger; but lrshift on widest_int
>>    is I think only done in ccp and I'd strongly hope that we treat the
>>    values as unsigned and so usually much smaller length; so it is just
>>    when we call wi::lrshift (-1, 2) or similar that results change.
>> I've noticed that for wide_int or widest_int references even simple
>> operations like eq_p liked to allocate and immediately free huge buffers,
>> which was caused by wide_int doing allocation on creation with a particular
>> precision and e.g. get_binary_precision running into that.  So, I've
>> duplicated that to avoid the allocations when all we need is just a
>> precision.
>> 
>> The patch below doesn't actually build anymore since the vec.h asserts
>> (which point to useful stuff though), so temporarily I've applied it also
>> with
>> --- gcc/vec.h.xx	2023-09-28 12:56:09.055786055 +0200
>> +++ gcc/vec.h	2023-09-28 13:15:31.760487111 +0200
>> @@ -1197,7 +1197,7 @@ template<typename T, typename A>
>>  inline void
>>  vec<T, A, vl_embed>::qsort (int (*cmp) (const void *, const void *))
>>  {
>> -  static_assert (vec_detail::is_trivially_copyable_or_pair <T>::value, "");
>> +//  static_assert (vec_detail::is_trivially_copyable_or_pair <T>::value, "");
>>    if (length () > 1)
>>      gcc_qsort (address (), length (), sizeof (T), cmp);
>>  }
>> @@ -1422,7 +1422,7 @@ template<typename T>
>>  void
>>  gt_ggc_mx (vec<T, va_gc> *v)
>>  {
>> -  static_assert (std::is_trivially_destructible <T>::value, "");
>> +//  static_assert (std::is_trivially_destructible <T>::value, "");
>>    extern void gt_ggc_mx (T &);
>>    for (unsigned i = 0; i < v->length (); i++)
>>      gt_ggc_mx ((*v)[i]);
>> hack.  The two spots that trigger are tree-ssa-loop-niter.cc doing qsort on
>> widest_int vector (to be exact, swapping elements in the vector of
>
> For this (besides choosing a fixed smaller widest_int as indicated in
> the other mail) sorting could be done indirect by sorting a
> [0, 1, 2 ... n-1 ] vector instead.
>
>> And, now the question is what to do about this.  I guess for omp_general
>> I could just use generic_wide_int <fixed_wide_int_storage <1024> > or
>> something similar, after all the widest_int wasn't really great when it
>> had maximum precision of WIDE_INT_MAX_PRECISION, different values on
>> different targets, it has very few uses and is easy to change (thinking
>> about this, makes me wonder what we do for offloading if offload host
>> has different WIDE_INT_MAX_PRECISION from offload target).
>> 
>> But the more important question is what to do about loop/niters analysis.
>> I think for number of iteration analysis it might be ok to punt somehow
>> (if there is a way to tell that number of iterations is unknown) if we
>> get some bound which is too large to be expressible in some reasonably small
>> fixed precision (whether it is WIDE_INT_MAX_PRECISION, or something
>> different is a question).  We could either introduce yet another widest_int
>> like storage which would have still WIDEST_INT_MAX_PRECISION precision, but
>> would ICE if length is set to something above its fixed width.  One problem
>> is that the write_val estimations are often just conservatively larger and
>> could trigger even if the value fits in the end.  Or we could use
>> generic_wide_int <fixed_wide_int_storage <WIDE_INT_MAX_PRECISION> > (perhaps
>> call that rwidest_int), the drawback would be that it would be slightly harder
>> to use as it has different precision from widest_int, we'd need to do some
>> from on it or the like.  Plus I really don't know the niters code to know
>> how to punt.
>
> I think when widest_int is no longer bound by something like the
> largest integer mode but now has to cater for arbitrary large _BitInt
> we have to get rid of widest_int or we have to make it variable-precision
> and reallocate it like auto_vec<T, n>.

Yeah, think I agree with this.  widest_int really combined two things:

(a) a way of storing any integer IL value without loss of precision

(b) a way of attaching sign information

Arithmetic on widest_int is dubious, because it wraps at an essentially
arbitrary point.

_BitInt means that (a) is no longer a sensible concept.  And (b) could
be achieved by having a variant of wide_int with sign information
(like LLVM's APSInt).

In a way, _BitInt undermines the whole premise of the wide_int
representation.  Originally the idea was to have a structure that
was entirely self-contained and GC-friendly.  We're having to give
up on both of those (rightly and understandably).  So we might want
to reconsider how many elements are stored inline, and whether other
aspects of the design could be tweaked.

We could probably also clean up a lot of stuff now that we have
access to C++11.  (Or maybe even C++14. :) )

But that could end up being a large rewrite.

The approach in the patch looks good to me from a quick scan FWIW.
Will try to review over the weekend.

Thanks,
Richard

> For GC we can have the storage still heap allocated but of course
> CTOR/DTOR is going to be a pain (so better not use widest_int in GC).
>
>> ipa_bits is even worse, because unlike niter analysis, I think it is very
>> much desirable to support IPA VRP of all supported _BitInt sizes.  Shall
>> we perhaps use trailing_wide_int storage in there, or conditionally
>> rwidest_int vs. INTEGER_CSTs for stuff that doesn't fit, something else?
>
> trailing_wide_int storage is the way to go here
>
>> What about slsr?  This is after bitint lowering, so it shouldn't be
>> performing opts on larger BITINT_TYPEs and so could also go with the
>> rwidest_int.
>
> Just to say I don't really like adding another "widest" int, but
> slsr shouldn't need to GC any of that so widest_int should be fine?
>
> Richard.


More information about the Gcc-patches mailing list