This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: wide-int more performance fixes for wide multiplication.

From: Richard Biener <rguenther at suse dot de>
To: Kenneth Zadeck <zadeck at naturalbridge dot com>, Mike Stump <mikestump at comcast dot net>, gcc-patches <gcc-patches at gcc dot gnu dot org>, rdsandiford at googlemail dot com
Date: Mon, 16 Dec 2013 12:19:18 +0100
Subject: Re: wide-int more performance fixes for wide multiplication.
Authentication-results: sourceware.org; auth=none
References: <52A61506 dot 5000407 at naturalbridge dot com> <87sitv3bkq dot fsf at talisman dot default> <52AC4EC9 dot 3020806 at naturalbridge dot com> <87ob4j33pd dot fsf at talisman dot default> <52AC7C44 dot 5080805 at naturalbridge dot com> <87txea7auv dot fsf at talisman dot default> <52ADCC04 dot 90505 at naturalbridge dot com> <87iouq6p9u dot fsf at talisman dot default> <52ADF98A dot 6070801 at naturalbridge dot com>

On 12/15/13 7:48 PM, Kenneth Zadeck wrote:
> 
> On 12/15/2013 11:40 AM, Richard Sandiford wrote:
>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>> it is certainly true that in order to do an unbounded set of operations,
>>> you would have to check on every operation.   so my suggestion that we
>>> should remove the checking from the infinite precision would not support
>>> this.     but the reality is that there are currently no places in the
>>> compiler that do this.
>>>
>>> Currently all of the uses of widest-int are one or two operations, and
>>> the style of code writing is that you do these and then you deal with
>>> the overflow at the time that you convert the widest-int to a tree.   I
>>> think that it is important to maintain the style of programming where
>>> for a small finite number of computations do not need to check until
>>> they convert back.
>>>
>>> The problem with making the buffer size so tight is that we do not have
>>> an adequate reserves to allow this style for any supportable type.
>>> I personally think that 2x + some small n is what we need to have.
>>>
>>>
>>> i am not as familiar with how this is used (or to be used when all of
>>> the offset math is converted to use wide-int), but there appear to be
>>> two uses of multiply.    one is the "harmless" mult by 3" and the other
>>> is where people are trying to compute the size of arrays.    These last
>>> operations do need to be checked for overflow.    The question here is
>>> do you want to force those operations to overflow individually or do you
>>> want to check when you convert out.    Again, i think 2x + some small
>>> number is what we might want to consider.
>> It's a fair question, but personally I think checking for overflow
>> on the operation is much more robust.  Checking on conversion doesn't
>> allow you to stop thinking about overflow, it just changes the way you
>> think about it: rather than handling explicit overflow flags, you have
>> to remember to ask "is the range of the unconverted result within the
>> range of widest_int", which I bet it is something that would be easily
>> forgotten once widest_int & co. are part of the furniture.
>>
>> E.g. the SPARC operation (picked only because I remember it):
>>
>>       for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
>>         {
>>           tree e0 = VECTOR_CST_ELT (arg0, i);
>>           tree e1 = VECTOR_CST_ELT (arg1, i);
>>
>>           bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;
>>
>>           tmp = wi::neg (e1, &neg1_ovf);
>>           tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
>>           if (wi::neg_p (tmp))
>>         tmp = wi::neg (tmp, &neg2_ovf);
>>           else
>>         neg2_ovf = false;
>>           result = wi::add (result, tmp, SIGNED, &add2_ovf);
>>           overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf;
>>         }
>>
>>       gcc_assert (!overflow);
>>
>>       return wide_int_to_tree (rtype, result);
>>
>> seems pretty natural.  If instead it was modelled as a widest_int
>> chain without overflow then it would be less obviously correct.
>>
>> Thanks,
>> Richard
> Let us for the sake of argument assume that this was common code rather
> than code in a particular port, because code in a particular port can
> know more about the environment than common code is allowed to.
> 
> My main point is that this code is in wide-int not widest-int because at
> this level the writer of this code actually wants to model what the
> target wants to do.   So doing the adds in precision and testing
> overflow is perfectly fine at every step.    But this loop CANNOT be
> written in a style where you tested the overflow at the end because if
> this is common code you cannot make any assumptions about the largest
> mode on the machine.     If the buffer was 2x + n in size, then it would
> be reasonably safe to assume that the number of elements in the vector
> could be represented in an integer and so you could wait till the end.
> 
> I think that my point was that (and i feel a little uncomfortable
> putting words in richi's mouth but i believe that this was his point
> early on) was that he thinks of the widest int as an infinite precision
> representation.    he was the one who was pushing for the entire rep to
> be done with a large internal (or perhaps unbounded) rep because he felt
> that this was more natural to not have to think about overflow.     He
> wanted you to be able to chain a mult and a divide and not see the
> product get truncated before the divide was done.    The rep that we
> have now really sucks with respect to this because widest int truncates
> if you are close to the largest precision on the machine and does not if
> you are small with respect to that.
> 
> My other point is that while you think that the example above is nice,
> the experience with double-int is contrary to this.   people will say
> (and test) the normal modes and anyone trying to use large modes will
> die a terrible death of a thousand cuts.

Well - the cases that matter in practice are

1) the things we have offset_int for - code that does bit vs. byte
quantity calculations on addresses or address offsets.  It used
either HWI before (and probably still does, and thus is buggy) or
double-int.  The usual patter was/is to do host_integerp (t, 0)
and then TREE_LOW_CST (t) * BITS_PER_UNIT (oops) or blindly assume
that doing things in double-int works (which it does in practice).

2) passes that want to know whether a single operation overflows

the multiple-operation and then check overflow after-the-fact is
seldomly used - it is, mainly from the frontends which use trees
and thus get a sticky TREE_OVERFLOW.  Yes, infinite precision
would make this work as well, and yes, originally I thought of
basing all of wide-int on an internally infinite precision
implementation (and luckily we are close enough that I may end
up fixing the implementation detail to work that way ...).
With the infinite precision internal rep you'd have explicit
truncations and sign-/zero-extensions at the right point and
failing to do that before conversion to tree/RTX could have been
easily turned into ICEs saying we overflowed and nobody cared.

Well.  Let's see how the thing we have now works out.

Richard.

> 
>

Follow-Ups:
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck

References:
- wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Sandiford
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Sandiford
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Sandiford
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Sandiford
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]