This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [wide-int] int_traits <tree>


On 10/17/2013 07:30 AM, Richard Biener wrote:
On Thu, 17 Oct 2013, Richard Sandiford wrote:

Richard Biener <rguenther@suse.de> writes:
On Thu, 17 Oct 2013, Richard Sandiford wrote:

Kenneth Zadeck <zadeck@naturalbridge.com> writes:
As mentioned in my message yesterday, I thought your new way of canonising
unsigned tree constants meant that there was always an upper zero bit.
Is that right?
i believe this is correct.
If so, xprecision < precision is a no-op, because the number always
has the right form for wider precisions.  The only difficult case is
xprecision == precision, since then we need to peel off any upper -1 HWIs.
say my HWI size is 8 bits (just to keep from typing a million 'f's.   if
i have a 16 bit unsigned number that is all 1s in the tree world it is 3
hwis
0x00 0xff 0xff.

but inside regular wide int, it would take 1 wide int whose value is 0xff.
inside of max it would be the same as the tree, but then the test
precision < xprecision + hbpwi never kicks in because precision is
guaranteed to be huge.

inside of addr_wide_int, i think we tank with the assertion.
It should be OK for addr_wide_int too.  The precision still fits 2 HWIs.
The initial length is greater than the maximum length of an addr_wide_int,
but your "len = MAX (len, max_len)" deals with that.
It's

   len = MIN (len, max_len)
Oops, yeah, I meant MIN, sorry.

which looked suspicious to me, but with precision >= xprecision
precision can only be zero if xprecision is zero which looked to
me like it cannot happen - or rather it should be fixed.
Despite the comment above the code, I don't think this MIN is there
for the zero-precision case.  I think it's there to handle the new
tree representation.

The new tree representation can have a length greater than max_len
for an unsigned tree constant that occupies a whole number of HWIs.
The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
When extended to max_wide_int the representation is the same.
But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
The MIN trims the length from 3 to 2 in the last case.
Oh, so it was the tree rep that changed?  _Why_ was it changed?
We still cannot use it directly from wide-int and the extra
word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).
It was changed because you asked me to change it. However, you end up with special cases either way.
the case actually comes up on the ppc because they do a lot of 128 bit
math.    I think i got thru the x86-64 without noticing this.
Well, it'd be suspicious if we're directly using 128-bit numbers
in addr_wide_int.  The justification for the assertion was that we
should explicitly truncate to addr_wide_int when deliberately
ignoring upper bits, beyond bit or byte address width.  128 bits
definitely falls into that category on powerpc.
My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
wide-int value if it has precision 16.
No, for a 16-bit wide_int it should be 0xff.  0x00 0xff 0xff is correct
for any wide_int wider than 16 bits though.

AFAIK that is what the code produces,
In which case?  This is:

    precision == 16
    xprecision == 16
    len == 3
    max_len == 2

The MIN trims the len to 2 and then the loop Kenny added trims it
again to 1, so the "0x00 0xff 0xff" becomes "0xff".  The "0x00 0xff"
is still there in the array, but not used.

but now Kenny says this is only for some kind
of wide-ints but not all?  That is, why is

inline wi::storage_ref
wi::int_traits <const_tree>::decompose (HOST_WIDE_INT *scratch,
                                         unsigned int precision, const_tree
x)
{
   unsigned int len = TREE_INT_CST_NUNITS (x);
   const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x,
0);
   return wi::storage_ref (val, len, precision);
}

not a valid implementation together with making sure that the
INTEGER_CST tree rep has that extra word of zeros if required?
The fundamental problem here is that we're trying to support two cases:

(a) doing N-bit arithemtic in cases where the inputs have N bits
(b) doing N-bit arithmetic in cases where the inputs have fewer than N bits
     and are extended according to TYPE_SIGN.

Let's assume 32-bit HWIs.  The 16-bit (4-hex-digit) constant 0x8000 is
0x8000 regardless of whether the type is signed or unsigned.  But if it's
extended to 32-bits you get two different numbers 0xffff8000 and 0x00008000,
depending on the sign.

So for one value of the "precision" parameter (i.e. xprecision), signed
and unsigned constants produce the same number.  But for another value
of the "precision" parameter (those greater than xprecision), signed and
unsigned constants produce different numbers.  Yet at the moment the tree
constant has a single representation.
But a correctly extended one, up to its len!  (as opposed to RTL)

So I think the possibilities are:

(1) Use the representation of an N-bit wide_int to store N-bit tree constants.
     Do work when extending them to wider wide_ints.

(2) Use the representation of max_wide_int to store N-bit tree constants.
     Do work when creating an N-bit wide_int.

(3) Store both representations in the tree constant.

(4) Require all tree arithmetic to be done in the same way as rtl arithmetic,
     with explicit extensions.  This gets rid of case (b).

(5) Require all tree arithemtic to be done in wider wide_ints than the inputs,
     which I think is what you preferred.  This gets rid of case (a).

(6) Allow the same wide_int constant to have several different representations.
     Remember that this is to some extent what Kenny's original implementation
     did, since partial HWIs were filled with don't-cares.  You and I are the
     ones who argued that each wide_int should have a single representation.
As far as I can see when we need to extend the tree rep to a bigger
precision (extend according to its sign) the only thing we have to
do to make the wide-int rep canonical is sign-extend the tree rep
at the desired precision.  And as we never truncate this sign-extension
is a no-op for signed tree constants, is a no-op for unsigned
tree constants when precision > xprecision but not for unsigned tree
constants and precision == xprecision when the MSB is set.

I don't see where adding the fake zero padding for unsigneds in the
tree rep helps here.

Thus,

Index: gcc/tree.h
===================================================================
--- gcc/tree.h  (revision 203743)
+++ gcc/tree.h  (working copy)
@@ -5189,28 +5189,28 @@ wi::int_traits <const_tree>::decompose (
    /* Got to be careful of precision 0 values.  */
    if (precision)
      len = MIN (len, max_len);
-  if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED)
+  if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED
+      && precision == xprecision
+      && len == max_len)
      {
        unsigned int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
-      if (small_prec)
+      if (small_prec
+         && len == max_len)
         {
           /* We have to futz with this because the canonization for
              short unsigned numbers in wide-int is different from the
              canonized short unsigned numbers in the tree-cst.  */
-         if (len == max_len)
-           {
-             for (unsigned int i = 0; i < len - 1; i++)
-               scratch[i] = val[i];
-             scratch[len - 1] = sext_hwi (val[len - 1], precision);
-             return wi::storage_ref (scratch, len, precision);
-           }
+         for (unsigned int i = 0; i < len - 1; i++)
+           scratch[i] = val[i];
+         scratch[len - 1] = sext_hwi (val[len - 1], small_prec);
+         while (len > 1 && scratch[len - 1] == (HOST_WIDE_INT)-1)
+           len--;
+         return wi::storage_ref (scratch, len, precision);
         }
        /* We have to futz here because a large unsigned int with
          precision 128 may look (0x0 0xFFFFFFFFFFFFFFFF 0xF...) as a
          tree-cst and as (0xF...) as a wide-int.  */
-      else if (precision == xprecision && len == max_len)
-        while (len > 1 && val[len - 1] == (HOST_WIDE_INT)-1)
-          len--;
+      while (len > 1 && val[len - 1] == (HOST_WIDE_INT)-1)
+       len--;
      }
/* Signed and the rest of the unsigned cases are easy. */

explain: the current code misses compressing the now sign-extended
value in the small_prec case (and it still sext_hwi's to the wrong
precision if precision > HOST_BITS_PER_WIDE_INT).  It restricts
special-casing to the only relevant case, no explicit or
implicit extension.

?

Richard.


[This function shows another optimization issue:

     case BOOLEAN_TYPE:
       /* Cache false or true.  */
       limit = 2;
       if (wi::leu_p (cst, 1))
         ix = cst.to_uhwi ();

I would have expected cst <= 1 be optimized to cst.len == 1 &&
cst.val[0] <= 1.  It expands to

<L27>:
   MEM[(long int *)&D.50698 + 16B] = 1;
   MEM[(struct wide_int_ref_storage *)&D.50698] = &MEM[(struct
wide_int_ref_storage *)&D.50698].scratch;
   MEM[(struct wide_int_ref_storage *)&D.50698 + 8B] = 1;
   MEM[(struct wide_int_ref_storage *)&D.50698 + 12B] = 32;
   _277 = MEM[(const struct wide_int_storage *)&cst + 260B];
   if (_277 <= 64)
     goto <bb 42>;
   else
     goto <bb 43>;

   <bb 42>:
   xl_491 = zext_hwi (1, 32);  // ok, checking enabled and thus out-of-line
   _494 = MEM[(const long int *)&cst];
   _495 = (long unsigned int) _494;
   yl_496 = zext_hwi (_495, _277);
   _497 = xl_491 < yl_496;
   goto <bb 44>;

   <bb 43>:
   _503 = wi::ltu_p_large (&MEM[(struct wide_int_ref_storage
*)&D.50698].scratch, 1, 32, &MEM[(const struct wide_int_storage
*)&cst].val, len_274, _277);

this keeps D.50698 and cst un-SRAable - inline storage is problematic
for this reason.  But the representation should guarantee the
compare with a low precision (32 bit) constant is evaluatable
at compile-time if len of the larger value is > 1, no?

   <bb 44>:
   # _504 = PHI <_497(42), _503(43)>
   D.50698 ={v} {CLOBBER};
   if (_504 != 0)
     goto <bb 45>;
   else
     goto <bb 46>;

   <bb 45>:
   pretmp_563 = MEM[(const struct wide_int_storage *)&cst + 256B];
   goto <bb 229> (<L131>);

   <bb 46>:
   _65 = generic_wide_int<wide_int_storage>::to_uhwi (&cst, 0);
   ix_66 = (int) _65;
   goto <bb 91>;

The question is whether we should try to optimize wide-int for
such cases or simply not use wi:leu_p (cst, 1) but rather

  if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1)

?
I think we should do the first, trying to optimise wide_int.

Thanks,
Richard




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]