[wide-int] int_traits <tree>

Thu Oct 17 12:32:00 GMT 2013

On Thu, 17 Oct 2013, Kenneth Zadeck wrote:

> On 10/17/2013 04:46 AM, Richard Biener wrote:
> > the case actually comes up on the ppc because they do a lot of 128 bit
> > math.    I think i got thru the x86-64 without noticing this.
> > > Well, it'd be suspicious if we're directly using 128-bit numbers
> > > in addr_wide_int.  The justification for the assertion was that we
> > > should explicitly truncate to addr_wide_int when deliberately
> > > ignoring upper bits, beyond bit or byte address width.  128 bits
> > > definitely falls into that category on powerpc.
> > My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
> > wide-int value if it has precision 16.  AFAIK that is what the
> > code produces, but now Kenny says this is only for some kind
> > of wide-ints but not all?  That is, why is
> 
> The issue is not that the rules are different between the different flavors of
> wide int, it is that the circumstances are different. The rule is that the
> only bits above the precision that exist are if the precision is not an even
> multiple of the HBPWI.   In that case, the bits are always an extension of the
> sign bits.
> 
> max_wide_int and addr_wide_int have large enough precisions so that it is
> impossible to ever generate an unsigned number on the target that is large
> enough to ever run against the precision.    However, for the fixed precision
> case, you can, and at least on the ppc, you do, generate unsigned numbers that
> are big enough to have to be recanonicalized like this.
> 
> > inline wi::storage_ref
> > wi::int_traits <const_tree>::decompose (HOST_WIDE_INT *scratch,
> >                                          unsigned int precision, const_tree
> > x)
> > {
> >    unsigned int len = TREE_INT_CST_NUNITS (x);
> >    const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x,
> > 0);
> >    return wi::storage_ref (val, len, precision);
> > }
> > 
> > not a valid implementation together with making sure that the
> > INTEGER_CST tree rep has that extra word of zeros if required?
> > I would like to see us move in that direction (given that the
> > tree rep of INTEGER_CST has transitioned to variable-length already).
> > 
> > Btw, code such as
> > 
> > tree
> > wide_int_to_tree (tree type, const wide_int_ref &pcst)
> > {
> > ...
> >    unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
> >    bool recanonize = sgn == UNSIGNED
> >      && small_prec
> >      && (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT ==
> > len;
> > 
> > definitely needs a comment.  I would have thought _all_ unsigned
> > numbers need re-canonicalization.  Well, maybe only if we're
> > forcing the extra zeros.
> It will get a comment.
> > [This function shows another optimization issue:
> > 
> >      case BOOLEAN_TYPE:
> >        /* Cache false or true.  */
> >        limit = 2;
> >        if (wi::leu_p (cst, 1))
> >          ix = cst.to_uhwi ();
> > 
> > I would have expected cst <= 1 be optimized to cst.len == 1 &&
> > cst.val[0] <= 1.  It expands to
> > 
> > <L27>:
> >    MEM[(long int *)&D.50698 + 16B] = 1;
> >    MEM[(struct wide_int_ref_storage *)&D.50698] = &MEM[(struct
> > wide_int_ref_storage *)&D.50698].scratch;
> >    MEM[(struct wide_int_ref_storage *)&D.50698 + 8B] = 1;
> >    MEM[(struct wide_int_ref_storage *)&D.50698 + 12B] = 32;
> >    _277 = MEM[(const struct wide_int_storage *)&cst + 260B];
> >    if (_277 <= 64)
> >      goto <bb 42>;
> >    else
> >      goto <bb 43>;
> > 
> >    <bb 42>:
> >    xl_491 = zext_hwi (1, 32);  // ok, checking enabled and thus out-of-line
> >    _494 = MEM[(const long int *)&cst];
> >    _495 = (long unsigned int) _494;
> >    yl_496 = zext_hwi (_495, _277);
> >    _497 = xl_491 < yl_496;
> >    goto <bb 44>;
> > 
> >    <bb 43>:
> >    _503 = wi::ltu_p_large (&MEM[(struct wide_int_ref_storage
> > *)&D.50698].scratch, 1, 32, &MEM[(const struct wide_int_storage
> > *)&cst].val, len_274, _277);
> > 
> > this keeps D.50698 and cst un-SRAable - inline storage is problematic
> > for this reason.  But the representation should guarantee the
> > compare with a low precision (32 bit) constant is evaluatable
> > at compile-time if len of the larger value is > 1, no?
> > 
> >    <bb 44>:
> >    # _504 = PHI <_497(42), _503(43)>
> >    D.50698 ={v} {CLOBBER};
> >    if (_504 != 0)
> >      goto <bb 45>;
> >    else
> >      goto <bb 46>;
> > 
> >    <bb 45>:
> >    pretmp_563 = MEM[(const struct wide_int_storage *)&cst + 256B];
> >    goto <bb 229> (<L131>);
> > 
> >    <bb 46>:
> >    _65 = generic_wide_int<wide_int_storage>::to_uhwi (&cst, 0);
> >    ix_66 = (int) _65;
> >    goto <bb 91>;
> > 
> > The question is whether we should try to optimize wide-int for
> > such cases or simply not use wi:leu_p (cst, 1) but rather
> > 
> >   if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1)
> > 
> > ?
> i find this ugly, but i see where you are coming from.   The problem is that
> both you and i know that the len has to be 1, but the optimizer does not.
> This is a case where I think that we made a mistake getting rid of the
> wi::one_p, wi::zero_p and wi::minus_one_p.  The internals of one_p were return
> (len == 1 && val[0] ==1) and i think that is much nicer than what you put
> there.      On the other hand, it seem that a person more skilled than i am
> with c++ could specialize the comparisons with an integer constant, since i
> believe that that constant must fit in one hwi (I am a little concerned about
> large unsigned constants).

one_p and zero_p wouldn't catch a compare with 3 ;)  So yes, it looks
like if we cannot trick the optimizers to do the right thing, like with

inline bool
wi::lts_p (const wide_int_ref &x, const wide_int_ref &y)
{
  if (x.precision <= HOST_BITS_PER_WIDE_INT
      && y.precision <= HOST_BITS_PER_WIDE_INT)
    return x.slow () < y.slow ();
  else if (x.precision <= HOST_BITS_PER_WIDE_INT)   // why not len == 1?
    ... optimize
  else if (y.precision <= HOST_BITS_PER_WIDE_INT)
    ... optimize
  else
    return lts_p_large (x.val, x.len, x.precision, y.val, y.len,
                        y.precision);
}

which of course pessimizes the case where the precisions are not
known at compile-time.  So maybe the above should always be

  if (__builtin_constant_p (x.precision)
      && x.precision <= HOST_BITS_PER_WIDE_INT
     ....

and lts_p_large should handle the small precision case efficiently
(but out-of-line?)

Richard.