This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: patch to fix constant math - 4th patch - the wide-int class.



On 10/31/2012 10:24 AM, Richard Biener wrote:
On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
On 10/31/2012 10:05 AM, Richard Biener wrote:
On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
On 10/31/2012 08:11 AM, Richard Biener wrote:
On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
Richard Biener <richard.guenther@gmail.com> writes:
On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
Richard Biener <richard.guenther@gmail.com> writes:
On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
On 10/25/2012 06:42 AM, Richard Biener wrote:
On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
<mikestump@comcast.net>
wrote:
On Oct 24, 2012, at 2:43 AM, Richard Biener
<richard.guenther@gmail.com>
wrote:
On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
On 10/23/2012 10:12 AM, Richard Biener wrote:
+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
HOST_BITS_PER_WIDE_INT];

are we sure this rounds properly?  Consider a port with max
byte
mode
size 4 on a 64bit host.
I do not believe that this can happen.   The core compiler
includes all
modes up to TI mode, so by default we already up to 128 bits.
And mode bitsizes are always power-of-two? I suppose so.
Actually, no, they are not.  Partial int modes can have bit sizes
that
are not power of two, and, if there isn't an int mode that is
bigger, we'd
want to round up the partial int bit size.  Something like ((2 *
MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
HOST_BITS_PER_WIDE_INT should do it.

I still would like to have the ability to provide
specializations of
wide_int
for "small" sizes, thus ideally wide_int would be a template
templated
on the number of HWIs in val.  Interface-wise wide_int<2>
should
be
identical to double_int, thus we should be able to do

typedef wide_int<2> double_int;
If you want to go down this path after the patches get in, go
for
it.
I
see no use at all for this.
This was not meant to be a plug in replacement for double int.
This
goal of
this patch is to get the compiler to do the constant math the
way
that
the
target does it.   Any such instantiation is by definition
placing
some
predefined limit that some target may not want.
Well, what I don't really like is that we now have two
implementations
of functions that perform integer math on two-HWI sized
integers.
What
I also don't like too much is that we have two different
interfaces to
operate
on them!  Can't you see how I come to not liking this?
Especially
the
latter …
double_int is logically dead.  Reactoring wide-int and double-int
is a
waste of time, as the time is better spent removing double-int
from
the
compiler.  All the necessary semantics and code of double-int
_has_
been
refactored into wide-int already.  Changing wide-int in any way
to
vend
anything to double-int is wrong, as once double-int is removed,
then all the
api changes to make double-int share from wide-int is wasted and
must then
be removed.  The path forward is the complete removal of
double-int; it is
wrong, has been wrong and always will be wrong, nothing can
change
that.
double_int, compared to wide_int, is fast and lean.  I doubt we
will
get rid of it - you
will make compile-time math a _lot_ slower.  Just profile when you
for
example
change get_inner_reference to use wide_ints.

To be able to remove double_int in favor of wide_int requires _at
least_
templating wide_int on 'len' and providing specializations for 1
and
2.

It might be a non-issue for math that operates on trees or RTXen
due
to
the allocation overhead we pay, but in recent years we
transitioned
important
paths away from using tree math to using double_ints _for speed
reasons_.

Richard.
i do not know why you believe this about the speed.     double int
always
does synthetic math since you do everything at 128 bit precision.

the thing about wide int, is that since it does math to the
precision's
size, it almost never does uses synthetic operations since the
sizes
for
almost every instance can be done using the native math on the
machine.
almost every call has a check to see if the operation can be done
natively.
I seriously doubt that you are going to do TI mode math much faster
than i
do it and if you do who cares.

the number of calls does not effect the performance in any negative
way and
it fact is more efficient since common things that require more
than
one
operation in double in are typically done in a single operation.
Simple double-int operations like

inline double_int
double_int::and_not (double_int b) const
{
     double_int result;
     result.low = low & ~b.low;
     result.high = high & ~b.high;
     return result;
}

are always going to be faster than conditionally executing only one
operation
(but inside an offline function).
OK, this is really in reply to the 4.8 thing, but it felt more
appropriate here.

It's interesting that you gave this example, since before you were
complaining about too many fused ops.  Clearly this one could be
removed in favour of separate and() and not() operations, but why
not provide a fused one if there are clients who'll make use of it?
I was more concerned about fused operations that use precision
or bitsize as input.  That is for example

+  bool only_sign_bit_p (unsigned int prec) const;
+  bool only_sign_bit_p () const;
The first is construct a wide-int with precision prec (and sign- or
zero-extend it) and then call only_sign_bit_p on it.  Such function
should not be necessary and existing callers should be questioned
instead of introducing it.

In fact wide-int seems to have so many "fused" operations that
we run out of sensible recognizable names for them.  Which results
in a lot of confusion on what the functions actually do (at least for
me).
Well, I suppose I can't really say anything useful either way on
that one, since I'm not writing the patch and I'm not reviewing it :-)

I think Kenny's API is just taking that to its logical conclusion.
There doesn't seem to be anything sacrosanct about the current choice
of what's fused and what isn't.
Maybe.  I'd rather have seen an initial small wide-int API and fused
operations introduced separately together with the places they are
used.  In the current way it's way too tedious to go over all of them
and match them with callers, lookup enough context and then
make up my mind on whether the caller should do sth different or not.

Thus, consider the big initial API a reason that all this review takes
so long ...

The speed problem we had using trees for internal arithmetic isn't
IMO a good argument for keeping double_int in preference to wide_int.
Allocating and constructing tree objects to hold temporary values,
storing an integer representation in it, then calling tree arithmetic
routines that pull out the integer representation again and create a
tree to hold the result, is going to be much slower than using either
double_int or wide_int.  I'd be very surprised if we notice any
measurable difference between double_int and wide_int here.

I still see no reason to keep double_int around.  The width of a host
wide integer really shouldn't have any significance.

Your main complaint seems to be that the wide_int API is different
from the double_int one, but we can't literally use the same API,
since
double_int has an implicit precision and bitsize, and wide_int
doesn't.
Having a precision that is separate from the underlying
representation
is IMO the most important feature of wide_int, so:

template wide_int<2> double_int;

is never going to be a drop-in, API-compatible replacement for
double_int.
My reasoning was that if you strip wide-int of precision and bitsize
you have a double_int<N> class.
But you don't! Because...

Thus wide-int should have a base
of that kind and just add precision / bitsize ontop of that.  It
wouldn't
be a step forward if we end up replacing double_int uses with
wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
would it?
...the precision and bitsize isn't an optional extra, either
conceptually
or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
the hood, but the value of N is an internal implementation detail.
No operations are done to N HWIs, they're done to the number of bits
in the operands.  Whereas a double_int<N> class does everything to N
HWIs.
If that's the only effect then either bitsize or precision is redundant
(and
we also have len ...).  Note I did not mention len above, thus the base
class would retain 'len' and double-int would simply use 2 for it
(if you don't template it but make it variable).

Richard.

NO, in your own words, there are two parts of the compiler that want the
infinite model.   The rest wants to do the math the way the target does
it.
My version now accommodates both.    In tree vrp it scans the gimple and
determines what the largest type is and that is the basis of all of the
math
in this pass.  If you just make double int bigger, then you are paying
for
big math everywhere.
You have an artificial limit on what 'len' can be.  And you do not
accomodate
users that do not want to pay the storage penalty for that arbitrary upper
limit
choice.  That's all because 'len' may grow (mutate).  You could
alternatively
not allow bitsize to grow / mutate and have allocation tied to bitsize
instead
of len.
It is not artificial, it is based on the target.  I chose to do it that way
because i "knew" that having a fixed size would be faster than doing an
alloca on for every wide-int.    If we feel that it is important to be able
to do truly arbitrary infinite precision arithmetic (as opposed to just some
fixed amount larger than the size of the type) then we can have a subclass
that does this. However, there is not a need to do this in the compiler
currently and so using this as an argument against wide-int is really not
fair.
Well, it is artifical as you had to increase it by a factor of two to handle
the use in VRP.  It is also "artificial" as it wastes storage.

However, as the machines allow wider math, your really do not want to
penalize every program that is compiled with this burden.   It was ok for
double int to do so, but when machines commonly have oi mode, it will still
be the case that more than 99% of the variables will be 64 bits or less.

I do worry a lot about adding layers like this on the efficiency of the
compiler.   you made a valid point that a lot of the double int routines
could be done inline and my plan is take the parts of the functions that
notice that the precision fits in a hwi and do them inline.

If your proposed solution causes a function call to access the elements,
then we are doomed.
Well, if it causes a function call to access a pointer to the array of elements
which you can cache then it wouldn't be that bad.  And with templates
we can inline the access anyway.
I do not see how to do this with templates without building in the largest template size.


Richard.

Ideally the wide-int interface would have two storage models:

class alloc_storage
{
    unsigned len; /* or bitsize */
    HOST_WIDE_INT *hwis;

    HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
}

class max_mode
{
    HOST_WIDE_INT hwis[largest integer mode size in hwi];

    HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
}

template <class storage>
class wide_int : storage
{

so you can even re-use the (const) in-place storage of INTEGER_CSTs
or RTL CONST_WIDEs.  And VRP would simply have its own storage
supporting 2 times the largest integer mode (or whatever choice it has).
And double-int would simply embed two.

Maybe this is the perfect example for introducing virtual functions as
well
to ease inter-operability between the wide-int variants without making
each
member a template on the 2nd wide-int operand (it's of course
auto-deduced,
but well ...).

The above is just a brain-dump, details may need further thinking
Like the operator[], maybe the storage model should just be able
to return a pointer to the array of HWIs, this way the actual workers
can be out-of-line and non-templates.

Richard.

Richard



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]