This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1



On 04/03/2013 05:17 AM, Richard Biener wrote:

In the end you will have a variable-size storage in TREE_INT_CST thus
you will have at least to emit _code_ copying over meta-data and data
from the tree representation to the wide-int (similar for RTX CONST_DOUBLE/INT).
I'm objecting to the amount of code you emit and agree that the runtime
cost is copying the meta-data (hopefully optimizable via CSE / SRA)
and in most cases one (or two) iterations of the loop copying the data
(not optimizable).
i did get rid of the bitsize in the wide-int patch so at this point the meta data is the precision and the len. not really a lot here. As usual we pay a high price in gcc for not pushing the tree rep down into the rtl level, then it would have been acceptable to have the tree type bleed into the wide-int code.


2)  You present this as if the implementor actually should care about the
implementation and you give 3 alternatives:  the double_int, the current
one, and HWI.     We have tried to make it so that the client should not
care.   Certainly in my experience here, I have not found a place to care.
Well, similar as for the copying overhead for tree your approach requires
overloading operations for HOST_WIDE_INT operands to be able to
say wi + 1 (which is certainly desirable), or the overhead of using
wide_int_one ().

In my opinion double_int needs to go away.  That is the main thrust of my
patches.   There is no place in a compiler for an abi that depends on
constants fitting into 2 two words whose size is defined by the host.
That's true.  I'm not arguing to preserve "double-int" - I'm arguing to
preserve a way to ask for an integer type on the host with (at least)
N bits.  Almost all double-int users really ask for an integer type on the
host that has at least as many bits as the pointer representation (or
word_mode) on
the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer
targets).  No double-int user specifically wants 2 * HOST_WIDE_INT
precision - that is just what happens to be there.  Thus I am providing
a way to say get me a host integer with at least N bits (VRP asks for
this, for example).

What I was asking for is that whatever can provide the above should share
the functional interface with wide-int (or the othert way around).  And I
was claiming that wide-int is too fat, because current users of double-int
eventually store double-ints permanently.
The problem is that, in truth, double int is too fat. 99.something% of all constants fit in 1 hwi and that is likely to be true forever (i understand that tree vpn may need some thought here). The rtl level, which has, for as long as i have known it, had 2 reps for integer constants. So it was relatively easy to slide the CONST_WIDE_INT in. It seems like the right trickery here rather than adding a storage model for wide-ints might be a way to use the c++ to invisibly support several (and by "several" i really mean 2) classes of TREE_CSTs.


This is not a beauty contest argument, we have public ports are beginning to
use modes that are larger than two x86-64 HWIs and i have a private port
that has such modes and it is my experience that any pass that uses this
interface has one of three behaviors: it silently gets the wrong answer, it
ices, or it fails to do the transformation.  If we leave double_int as an
available option, then any use of it potentially will have one of these
three behaviors.  And so one of my strong objections to this direction is
that i do not want to fight this kind of bug for the rest of my life.
Having a single storage model that just always works is in my opinion a
highly desirable option.  What you have never answered in a concrete manner
is, if we decide to provide this generality, what it would be used for.
There is no place in a portable compiler where the right answer for every
target is two HOST wide integers.

However, i will admit that the HWI option has some merits.   We try to
address this in our implementation by dividing what is done inline in
wide-int.h to the cases that fit in an HWI and then only drop into the heavy
code in wide-int.c if mode is larger (which it rarely will be).   However, a
case could be made that for certain kinds of things like string lengths and
such, we could use another interface or as you argue, a different storage
model with the same interface.   I just do not see that the cost of the
conversion code is really going to show up on anyone's radar.
What's the issue with abstracting away the model so a fixed-size 'len'
is possible?  (let away the argument that this would easily allow an
adaptor to tree)
I have a particularly pessimistic perspective because i have already written most of this patch. It is not that i do not want to change that code, it is that i have seen a certain set of mistakes that were made and i do not want to fix them more than once. At the rtl level you can see the transition from only supporting 32 bit ints to supporting 64 bit its to finally supporting two HWIs and that transition code is not pretty. My rtl patch fixes the places where an optimization was only made if the data type was 32 bits or smaller as well as the places where the optimization was made only if the data type is smaller than 64 bits (in addition to fixing all of the places where the code ices or simply gets the wrong answer if it is larger than TImode.) The tree level is only modestly better, I believe only because it is newer. I have not seen any 32 bit only code, but it is littered with transformations that only work for 64 bits. What is that 64 bit only code going to look like in 5 years?

I want to make it easier to write the general code than to write the code that only solves the problem for the size port that the implementor is currently working on. So I perceive the storage model as a way to keep having to fight this battle forever because it will allow the implementor to make a decision that the optimization only needs to be done for a particular sized integer.

However, i get the fact that from your perspective, what you really want is a solution to the data structure problem in tree-vrp. My patch for tree vrp scans the entire function to find the largest type used in the function and then does all of the math at 2x that size. But i have to admit that i thought it was weird that you use tree cst as your long term storage. If this were my pass, i would have allocated a 2d array of some type that was as large as the function in 1d and twice as large as the largest int used in the other dimension and not overloaded tree-cst and then had a set of friend functions in double int to get in and out. Of course you do not need friends in double int because the rep is exposed, but in wide-int that is now hidden since it now is purely functional.

I just have to believe that there is a better way to do tree-vrp than messing up wide-int for the rest of the compiler.

3) your trick will work at the tree level, but not at the rtl level.   The
wide-int code cannot share storage with the CONST_INTs.    We tried this,
and there are a million bugs that would have to be fixed to make it work.
It could have worked if CONST_INTs had carried a mode around, but since they
do not, you end up with the same CONST_INT sharing the rep for several
different types and that just did not work unless you are willing to do
substantial cleanups.
I don't believe you.  I am only asking for the adaptors to tree and RTL to
work in an RVALUE-ish way (no modification, as obviously RTL and tree
constants may be shared).  I think your claim is because you have that
precision and bitsize members in your wide-int which I believe is a
design red herring.  I suppose we should concentrate on addressing that
one first.  Thus, let me repeat a few questions on your design to eventually
let you understand my concerns:

Given two wide-ints, a and b, what precision will the result of
      a + b
have?  Consider a having precision 32 and b having precision 64
on a 32-bit HWI host.

You define wide-int to have no sign information:

+   The representation does not contain any information about
+   signedness of the represented value, so it can be used to represent
+   both signed and unsigned numbers.  For operations where the results
+   depend on signedness (division, comparisons), the signedness must
+   be specified separately.  For operations where the signness
+   matters, one of the operands to the operation specifies either
+   wide_int::SIGNED or wide_int::UNSIGNED.

but the result of operations on mixed precision operands _does_ depend
on the sign, nevertheless most operations do not get a signedness argument.
Nor would that help, because it depends on the signedness of the individual
operands!

double-int get's around this by having a common "precision" to which
all smaller precision constants have to be sign-/zero-extended.  So
does CONST_INT and CONST_DOUBLE.

Note that even with same precision you have introduced the same problem
with the variable len.

My proposal is simple - all wide-ints are signed!  wide-int is basically
an arbitrary precision signed integer format.  The sign is encoded in
the most significant bit of the last active HWI of the representation
(val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)).  All values
with less precision than len * HOST_BITS_PER_WIDE_INT are
properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT.

This let's you define mixed len operations by implicitely sign-/zero-extending
the operands to whatever len is required for the operation.

Furthermore it allows you to get rid of the precision member (and
bitsize anyway).
Conversion from tree / RTL requires information on the signedness of the
input (trivial for tree, for RTL all constants are signed - well,
sign-extended).
Whenever you want to transfer the wide-int to tree / RTL you have to
sign-/zero-extend according to the desired precision.  If you need sth else
than arbitrary precision arithmetic you have to explicitely truncate / extend
at suitable places - with overflow checking being trivial here.  For
optimization
purposes selected operations may benefit from a combined implementation
receiving a target precision and signedness.  Whatever extra meta-data
RTL requires does not belong in wide-int but in the RTX.  Trivially
a mode comes to my mind (on tree we have a type), and trivially
each RTX has a mode.  And each mode has a precision and bitsize.
It lacks a sign, so all RTL integer constants are sign-extended for
encoding efficiency purposes.  mixed-mode operations will not
occur (mixed len operations will), mixed-mode ops are exclusively
sign-/zero-extensions and truncations.

Representation of (unsigned HOST_WIDE_INT)-1 would necessarily
be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any
precision would be { -1 }.

That was my proposal.  Now, can you please properly specify yours?

Thanks,
Richard.

On 04/02/2013 11:04 AM, Richard Biener wrote:
On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
This patch contains a large number of the changes requested by Richi.
It
does not contain any of the changes that he requested to abstract the
storage layer.   That suggestion appears to be quite unworkable.
I of course took this claim as a challenge ... with the following result.
It is
of course quite workable ;)

The attached patch implements the core wide-int class and three storage
models (fixed size for things like plain HWI and double-int, variable size
similar to how your wide-int works and an adaptor for the double-int as
contained in trees).  With that you can now do

HOST_WIDE_INT
wi_test (tree x)
{
    // template argument deduction doesn't do the magic we want it to do
    // to make this kind of implicit conversions work
    // overload resolution considers this kind of conversions so we
    // need some magic that combines both ... but seeding the overload
    // set with some instantiations doesn't seem to be possible :/
    // wide_int<> w = x + 1;
    wide_int<> w;
    w += x;
    w += 1;
    // template argument deduction doesn't deduce the return value type,
    // not considering the template default argument either ...
    // w = wi (x) + 1;
    // we could support this by providing rvalue-to-lvalue promotion
    // via a traits class?
    // otoh it would lead to sub-optimal code anyway so we should
    // make the result available as reference parameter and only support
    // wide_int <> res; add (res, x, 1); ?
    w = wi (x).operator+<wide_int<> >(1);
    wide_int<>::add(w, x, 1);
    return w.to_hwi ();
}

we are somewhat limited with C++ unless we want to get really fancy.
Eventually providing operator+ just doesn't make much sense for
generic wide-int combinations (though then the issue is its operands
are no longer commutative which I think is the case with your wide-int
or double-int as well - they don't suport "1 + wide_int" for obvious
reasons).

So there are implementation design choices left undecided.

Oh, and the operation implementations are crap (they compute nonsense).

But you should get the idea.

Richard.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]