This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: wide-int more performance fixes for wide multiplication.

From: Kenneth Zadeck <zadeck at naturalbridge dot com>
To: Richard Biener <rguenther at suse dot de>, Mike Stump <mikestump at comcast dot net>, gcc-patches <gcc-patches at gcc dot gnu dot org>, rdsandiford at googlemail dot com
Date: Mon, 16 Dec 2013 13:51:22 -0500
Subject: Re: wide-int more performance fixes for wide multiplication.
Authentication-results: sourceware.org; auth=none
References: <52A61506 dot 5000407 at naturalbridge dot com> <87sitv3bkq dot fsf at talisman dot default> <52AC4EC9 dot 3020806 at naturalbridge dot com> <87ob4j33pd dot fsf at talisman dot default> <52AC7C44 dot 5080805 at naturalbridge dot com> <87txea7auv dot fsf at talisman dot default> <52ADCC04 dot 90505 at naturalbridge dot com> <87iouq6p9u dot fsf at talisman dot default> <52ADF98A dot 6070801 at naturalbridge dot com> <52AEE1B6 dot 4000206 at suse dot de> <52AF086B dot 9060502 at naturalbridge dot com> <ade65022-3607-4935-875a-5c22f42f5429 at email dot android dot com> <52AF1351 dot 3080504 at naturalbridge dot com> <e91a68d2-ed7a-484c-948f-c04a06fd4247 at email dot android dot com>

On 12/16/2013 01:37 PM, Richard Biener wrote:

Kenneth Zadeck <zadeck@naturalbridge.com> wrote:

On 12/16/2013 09:37 AM, Richard Biener wrote:

Kenneth Zadeck <zadeck@naturalbridge.com> wrote:

On 12/16/2013 06:19 AM, Richard Biener wrote:

On 12/15/13 7:48 PM, Kenneth Zadeck wrote:

On 12/15/2013 11:40 AM, Richard Sandiford wrote:

Kenneth Zadeck <zadeck@naturalbridge.com> writes:

it is certainly true that in order to do an unbounded set of

operations,

you would have to check on every operation.   so my suggestion

that we

should remove the checking from the infinite precision would not

support

this.     but the reality is that there are currently no places

in

the

compiler that do this.

Currently all of the uses of widest-int are one or two

operations,

and

the style of code writing is that you do these and then you deal

with

the overflow at the time that you convert the widest-int to a

tree.   I

think that it is important to maintain the style of programming

where

for a small finite number of computations do not need to check

until

they convert back.

The problem with making the buffer size so tight is that we do

not

have

an adequate reserves to allow this style for any supportable

type.

I personally think that 2x + some small n is what we need to

have.


i am not as familiar with how this is used (or to be used when

all

of

the offset math is converted to use wide-int), but there appear

to

be

two uses of multiply.    one is the "harmless" mult by 3" and

the

other

is where people are trying to compute the size of arrays.

These

last

operations do need to be checked for overflow.    The question

here is

do you want to force those operations to overflow individually

or

do you

want to check when you convert out.    Again, i think 2x + some

small

number is what we might want to consider.

It's a fair question, but personally I think checking for

overflow

on the operation is much more robust.  Checking on conversion

doesn't

allow you to stop thinking about overflow, it just changes the

way

you

think about it: rather than handling explicit overflow flags, you

have

to remember to ask "is the range of the unconverted result within

the

range of widest_int", which I bet it is something that would be

easily

forgotten once widest_int & co. are part of the furniture.

E.g. the SPARC operation (picked only because I remember it):

         for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
           {
             tree e0 = VECTOR_CST_ELT (arg0, i);
             tree e1 = VECTOR_CST_ELT (arg1, i);

             bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;

             tmp = wi::neg (e1, &neg1_ovf);
             tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
             if (wi::neg_p (tmp))
           tmp = wi::neg (tmp, &neg2_ovf);
             else
           neg2_ovf = false;
             result = wi::add (result, tmp, SIGNED, &add2_ovf);
             overflow |= neg1_ovf | neg2_ovf | add1_ovf |

add2_ovf;

           }

         gcc_assert (!overflow);

         return wide_int_to_tree (rtype, result);

seems pretty natural.  If instead it was modelled as a widest_int
chain without overflow then it would be less obviously correct.

Thanks,
Richard

Let us for the sake of argument assume that this was common code

rather

than code in a particular port, because code in a particular port

can

know more about the environment than common code is allowed to.

My main point is that this code is in wide-int not widest-int

because at

this level the writer of this code actually wants to model what

the

target wants to do.   So doing the adds in precision and testing
overflow is perfectly fine at every step.    But this loop CANNOT

be

written in a style where you tested the overflow at the end

because

if

this is common code you cannot make any assumptions about the

largest

mode on the machine.     If the buffer was 2x + n in size, then it

would

be reasonably safe to assume that the number of elements in the

vector

could be represented in an integer and so you could wait till the

end.

I think that my point was that (and i feel a little uncomfortable
putting words in richi's mouth but i believe that this was his

point

early on) was that he thinks of the widest int as an infinite

precision

representation.    he was the one who was pushing for the entire

rep

to

be done with a large internal (or perhaps unbounded) rep because

he

felt

that this was more natural to not have to think about overflow.

He

wanted you to be able to chain a mult and a divide and not see the
product get truncated before the divide was done.    The rep that

we

have now really sucks with respect to this because widest int

truncates

if you are close to the largest precision on the machine and does

not if

you are small with respect to that.

My other point is that while you think that the example above is

nice,

the experience with double-int is contrary to this.   people will

say

(and test) the normal modes and anyone trying to use large modes

will

die a terrible death of a thousand cuts.

Well - the cases that matter in practice are

1) the things we have offset_int for - code that does bit vs. byte
quantity calculations on addresses or address offsets.  It used
either HWI before (and probably still does, and thus is buggy) or
double-int.  The usual patter was/is to do host_integerp (t, 0)
and then TREE_LOW_CST (t) * BITS_PER_UNIT (oops) or blindly assume
that doing things in double-int works (which it does in practice).

2) passes that want to know whether a single operation overflows

the multiple-operation and then check overflow after-the-fact is
seldomly used - it is, mainly from the frontends which use trees
and thus get a sticky TREE_OVERFLOW.  Yes, infinite precision
would make this work as well, and yes, originally I thought of
basing all of wide-int on an internally infinite precision
implementation (and luckily we are close enough that I may end
up fixing the implementation detail to work that way ...).
With the infinite precision internal rep you'd have explicit
truncations and sign-/zero-extensions at the right point and
failing to do that before conversion to tree/RTX could have been
easily turned into ICEs saying we overflowed and nobody cared.

Well.  Let's see how the thing we have now works out.

richi,

i think that you miss the point.   right now on the x86-64, the max
size
is 128 +64 bits.   On my private platform it is 256 + 64 bits.

if, at the tree level, i do a widest-int multiply of two timode

values

that are large with respect to their precision, i get different

answers

on the two targets.    95% of the existing trunk bugs that we fixed

we

because people thought that double-int was big enough so that they

did

not have to care. But what you and richard are saying is that you do
need to care (even though none of the code currently does care), but
only for rarely executed cases.  With widest-int and a larger buffer

we

could can make that expectation true in all of the places where you

get

in, do some finite work and get out.   In my opinion, being able to

do

this is the only reason for widest-int and if this does not work
reliably, then i would be inclined to convert everything to the
explicit
precision where the definition of what is done is rock solid.
Because
to me, if the rep is not completely predictable, then it is not

worth

using.

At the moment we do not have the infinite precision code.  We have

offset._int which effectively has a precision and widest int which is
indeed somewhat useless. We have fixed int where you specify your
desired infinite precision which is better.

Note that we have shifted the double-int host dependence to a

widest-int target dependence. Which is equally bad.

Richard.

yes, my point is that it does not need to be this bad, we can make the
buffer large enough to hide this, at least for the kind of bounded
computation that we have in the compiler.   Having written most of this

code, we tend to get in, do a couple of things and then get out.    If
we up the buffer size, that will always work reliably.

I realize that it will not be perfect, in that if you have enough
bounded size computation you would hit the limit.    But right now
every
multiply (except the ones by 3 in the offset int code) are a source of
failure.  We can do a lot better than that by upping the buffer size.

you and i have slightly different experiences here.    i did not do a
lot of the offset int code and so i am not as familiar with the
potential for bugs here.   But I did a significant amount of the
widest-int code, and for that, 2x + 2 bits makes all the potential
problems go away.

my guess is that the non 3x multiplies in offset-int most likely need
to
be checked.

As long as we have persistent widest ints we cannot make them large.  Btw, most arithmetic is from user code and thus subject to language constraints, including overflow. Its only when the compiler introduces artificial ops that possible issues start.

Richard.

when richard s made his last round of changes which remove theperformance issues with widest and offset ints, i was ready to concedethat i was wrong in fighting you about your love of the pseudo infiniteprecision. It is not that i am disagreeing with your view that thepersistent forms have to have a reasonable size, it is just that thecost that imposes on the correctness of the rest of the compiler ispretty steep. I think that it is clear that widest int is not the repof choice where you have a choice between the default version andwidest-int.


kenny

Kenny

Richard.

References:
- wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Sandiford
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Sandiford
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Sandiford
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Sandiford
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Biener
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Biener
- Re: wide-int more performance fixes for wide multiplication.
  - From: Kenneth Zadeck
- Re: wide-int more performance fixes for wide multiplication.
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]