wide-int more performance fixes for wide multiplication.

Sat Dec 14 17:22:00 GMT 2013

On 12/14/2013 09:30 AM, Richard Sandiford wrote:
>>>> +  /* True if the result needs to be negated.  */
>>>> +  bool is_neg = false;
>>>>    
>>>>      /* If the top level routine did not really pass in an overflow, then
>>>>         just make sure that we never attempt to set it.  */
>>>>      bool needs_overflow = (overflow != 0);
>>>> +  const HOST_WIDE_INT zero = 0;
>>>>      if (needs_overflow)
>>>>        *overflow = false;
>>>>    
>>>> @@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co
>>>>          return 1;
>>>>        }
>>>>    
>>>> -  /* We do unsigned mul and then correct it.  */
>>>> -  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
>>>> -	     half_blocks_needed, prec, SIGNED);
>>>> -  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
>>>> -	     half_blocks_needed, prec, SIGNED);
>>>> -
>>>> -  /* The 2 is for a full mult.  */
>>>> -  memset (r, 0, half_blocks_needed * 2
>>>> -	  * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT);
>>>> +  ulen = op1len * 2;
>>>> +  vlen = op2len * 2;
>>>> +
>>>> +  if (sgn == SIGNED)
>>>> +    {
>>>> +      if (wi::neg_p (op1))
>>>> +	{
>>>> +	  HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS];
>>>> +	
>>> E.g. following on from the above:
>>>
>>> 	  /* Add an extra HWI so that we can represent the negative of
>>> 	     the minimum.  */
>>> 	  HOST_WIDE_INT *nop1 = XALLOCAVEC (HOST_WIDE_INT, op1len + 1);
>>>
>>>> +	  int nop1len
>>>> + = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED,
>>>> 0);
>>> Not sure about the "prec + 1" here, since it could cause nop1len to be
>>> greater than the maximum length for "prec".  And you go on to unpack
>>> nop1 in "prec" rather than "prec + 1".
>> the unpacking with prec is the wrong part.  That should have been prec + 1
> Hmm, I still think it's the opposite, because...
>
>>> AIUI, once you've taken the absolutes, you've got two unsigned numbers
>>> of precision "prec" and create a "prec * 2" result, so naybe:
>>> sub_large (..., prec, UNSIGNED, 0) instead?
>> the signed and unsigned does not matter here.    the prec + 1 means we
>> never overflow.
> ...my point is that after this we treat the number as unsigned,
> so it isn't a question of whether the absolute value overflows
> the precision as a signed number but whether it overflows as an
> unsigned number.  And it never will.  It might need 1 extra HWI
> than the original input if op1len < blocks_needed, but it shouldn't
> need extra precision.
>
> E.g. say for 8-bit HWIs we start out with a signed -0x80 * -0x80.
> We convert that into an unsigned 0x80 * 0x80, which is still
> precision 8 * precision 8 -> precision 16.  And we'd need to be
> able to handle those precision-8 values correctly (with no extra
> zero HWI) if we had an unsigned 0x80 * 0x80 from the outset.
it will not work like that.     You are thinking that i really can do an 
8 bit X 8 bit multiply.    I cannot.
The case that i am worried about is 115 bits x 115 bits.    in this case 
i have to make sure that the top bits of the top block are provisioned 
properly.    If you think about your case above where bit 114 is set and 
the bits below it are zeros, then i need to invert it so that the bit 
115 and above are zero.     otherwise i get the top of the block 
polluted with the 1 bits which would mess up the top block multiplied by 
any of the lower blocks.

however, you are, at a deeper level correct.    if i change the parms to 
wi_unpack to UNSIGNED then it is likely you are correct.    I will think 
this through today and see if i can find a counter example.    but it 
may be true that if i invert the number, i always want the top bits of 
the block to be 0 which is what the sign parameter controls.

>>>> +	  is_neg = !is_neg;
>>>> +	  ulen = nop1len * 2;
>>>> +	  wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len,
>>>> +		     ulen, prec, SIGNED);
>>> Back on the alloca theme, we'd have:
>>>
>>> 	  u = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, ulen);
>>>
>>> before this and other unpacks.
>>>
>>> Note that there's supposed to be space after ")" in casts.  Lots of
>>> instances in the patch :-)
i actually did not know this.   i will fix.
>>>
>>>> +      /* UNSIGNED mul. */
>>>> +      /* For large positive integers inside regular wide-ints, we must
>>>> +	 explictly expand out the 1s to full width for the mul to be
>>>> +	 correct. Note that the top bit will never be set for a
>>>> +	 unsigned number in offset_int of widest-int.  */
>>> s/of/or/.  But I think the comment is confusing because the unsignedness
>>> is a property of the multiplication rather than the inputs.  We should never
>>> be doing an unsigned operation on widest_int to begin with.  And if
>>> offset_ints are being treated as having a sign then the same is true there.
>>>
>>> If we ever do have an unsigned multiplication on those types for some reason
>>> then -1 will (rightly) be treated as the maximum unsigned value.
>> The real question here is "do you want to do an assert or do you want to
>> explain what happens if the input comes in that way?"
>