This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1


On 04/03/2013 09:53 AM, Richard Biener wrote:
On Wed, Apr 3, 2013 at 2:05 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
On 04/03/2013 05:17 AM, Richard Biener wrote:

In the end you will have a variable-size storage in TREE_INT_CST thus
you will have at least to emit _code_ copying over meta-data and data
from the tree representation to the wide-int (similar for RTX
CONST_DOUBLE/INT).
I'm objecting to the amount of code you emit and agree that the runtime
cost is copying the meta-data (hopefully optimizable via CSE / SRA)
and in most cases one (or two) iterations of the loop copying the data
(not optimizable).
i did get rid of the bitsize in the wide-int patch so at this point the meta
data is the precision and the len.
not really a lot here.   As usual we pay a high price in gcc for not pushing
the tree rep down into the rtl level, then it would have been acceptable to
have the tree type bleed into the wide-int code.



2)  You present this as if the implementor actually should care about the
implementation and you give 3 alternatives:  the double_int, the current
one, and HWI.     We have tried to make it so that the client should not
care.   Certainly in my experience here, I have not found a place to
care.
Well, similar as for the copying overhead for tree your approach requires
overloading operations for HOST_WIDE_INT operands to be able to
say wi + 1 (which is certainly desirable), or the overhead of using
wide_int_one ().

In my opinion double_int needs to go away.  That is the main thrust of my
patches.   There is no place in a compiler for an abi that depends on
constants fitting into 2 two words whose size is defined by the host.
That's true.  I'm not arguing to preserve "double-int" - I'm arguing to
preserve a way to ask for an integer type on the host with (at least)
N bits.  Almost all double-int users really ask for an integer type on the
host that has at least as many bits as the pointer representation (or
word_mode) on
the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer
targets).  No double-int user specifically wants 2 * HOST_WIDE_INT
precision - that is just what happens to be there.  Thus I am providing
a way to say get me a host integer with at least N bits (VRP asks for
this, for example).

What I was asking for is that whatever can provide the above should share
the functional interface with wide-int (or the othert way around).  And I
was claiming that wide-int is too fat, because current users of double-int
eventually store double-ints permanently.
The problem is that, in truth, double int is too fat. 99.something% of all
constants fit in 1 hwi and that is likely to be true forever (i understand
that tree vpn may need some thought here).  The rtl level, which has, for as
long as i have known it, had 2 reps for integer constants. So it was
relatively easy to slide the CONST_WIDE_INT in.  It seems like the right
trickery here rather than adding a storage model for wide-ints might be a
way to use the c++ to invisibly support several (and by "several" i really
mean 2) classes of TREE_CSTs.
The truth is that _now_ TREE_INT_CSTs use double-ints and we have
CONST_INT and CONST_DOUBLE.  What I (and you) propose would
get us to use variable-size storage for both, allowing to just use a single
HOST_WIDE_INT in the majority of cases.  In my view the constant
length of the variable-size storage for TREE_INT_CSTs is determined
by its type (thus, it doesn't have "optimized" variable-size storage
but "unoptimized" fixed-size storage based on the maximum storage
requirement for the type).  Similar for RTX CONST_INT which would
have fixed-size storage based on the mode-size of the constant.
Using optimized space (thus using the encoding properties) requires you
to fit the 'short len' somewhere which possibly will not pay off in the end
(for tree we do have that storage available, so we could go with optimized
storage for it, not sure with RTL, I don't see available space there).
There are two questions here: one is the fact that you object to the fact that we represent small constants efficiently and the second is that we take advantage of the fact that fixed size stack allocation is effectively free for short lived objects like wide-ints (as i use them).

At the rtl level your idea does not work. rtl constants do not have a mode or type. So if you do not compress, how are you going to determine how many words you need for the constant 1. I would love to have a rep that had the mode in it. But it is a huge change that requires a lot of hacking to every port.

I understand that this makes me vulnerable to the argument that we should not let the rtl level ever dictate anything about the tree level, but the truth is that a variable len rep is almost always used for big integers. In our code, most constants of large types are small numbers. (Remember i got into this because the tree constant prop thinks that left shifting any number by anything greater than 128 is always 0 and discovered that that was just the tip of the iceberg.) But mostly i support the decision to canonize numbers to the smallest number of HWIs because most of the algorithms to do the math can be short circuited. I admit that if i had to effectively unpack most numbers to do the math, that the canonization would be a waste. However, this is not really relevant to this conversation. Yes, you could get rid of the len, but this such a small part of picture.

Furthermore, I am constrained at the rtl level because it is just too dirty to share the storage. We tried that and the amount of whack a mole we were playing was killing us.

I am comfortable making big changes at the portable level because i can easily test them, but changes to fundamental data structures inside every port is more than i am willing to do. If you are going to do that, then you are going to have start giving rtl constants modes and that is a huge change (that while desirable i cannot do).

At the tree level you could share the storage, I admit that that is easy, i just do not think that it is desirable. The stack allocation inside the wide-int is very cheap and the fact that the number that comes out is almost always one hwi makes this very efficient. Even if I was starting from scratch this would be a strong contender to be the right representation.

Fundamentally, i do not believe that the amount of copying that i am proposing at the tree level will be significant. Yes, if you force me to use an uncompressed format you can make what i propose expensive.

This is not a beauty contest argument, we have public ports are beginning
to
use modes that are larger than two x86-64 HWIs and i have a private port
that has such modes and it is my experience that any pass that uses this
interface has one of three behaviors: it silently gets the wrong answer,
it
ices, or it fails to do the transformation.  If we leave double_int as an
available option, then any use of it potentially will have one of these
three behaviors.  And so one of my strong objections to this direction is
that i do not want to fight this kind of bug for the rest of my life.
Having a single storage model that just always works is in my opinion a
highly desirable option.  What you have never answered in a concrete
manner
is, if we decide to provide this generality, what it would be used for.
There is no place in a portable compiler where the right answer for every
target is two HOST wide integers.

However, i will admit that the HWI option has some merits.   We try to
address this in our implementation by dividing what is done inline in
wide-int.h to the cases that fit in an HWI and then only drop into the
heavy
code in wide-int.c if mode is larger (which it rarely will be).
However, a
case could be made that for certain kinds of things like string lengths
and
such, we could use another interface or as you argue, a different storage
model with the same interface.   I just do not see that the cost of the
conversion code is really going to show up on anyone's radar.
What's the issue with abstracting away the model so a fixed-size 'len'
is possible?  (let away the argument that this would easily allow an
adaptor to tree)
I have a particularly pessimistic perspective because i have already written
most of this patch.   It is not that i do not want to change that code, it
is that i have seen a certain set of mistakes that were made and i do not
want to fix them more than once.   At the rtl level you can see the
transition from only supporting 32 bit ints to supporting 64 bit its to
finally supporting two HWIs and that transition code is not pretty.  My rtl
patch fixes the places where an optimization was only made if the data type
was 32 bits or smaller as well as the places where the optimization was made
only if the data type is smaller than 64 bits (in addition to fixing all of
the places where the code ices or simply gets the wrong answer if it is
larger than TImode.)  The tree level is only modestly better, I believe only
because it is newer. I have not seen any 32 bit only code, but it is
littered with transformations that only work for 64 bits.   What is that 64
bit only code going to look like in 5 years?
The user interface of wide-int does not depend on whether a storage model
is abstracted or not.  If you take advantage of the storage model by
making its interface leaner then it will.  But I guess converting everything
before settling on the wide-int interface may not have been the wisest
choice in the end (providing a wide-int that can literally replace double-int
would have got you testing coverage without any change besides
double-int.[ch] and wide-int.[ch]).
I think that it was exactly the correct decision. We have made significant changes to the structure as we have gone along. I have basically done most of what you have suggested, (the interface is completely functional, i got rid of the bitsize...) What you are running into is that mike stump, richard sandiford and myself actually believe that the storage model is a fundamentally bad idea.

I want to make it easier to write the general code than to write the code
that only solves the problem for the size port that the implementor is
currently working on.   So I perceive the storage model as a way to keep
having to fight this battle forever because it will allow the implementor to
make a decision that the optimization only needs to be done for a particular
sized integer.

However, i get the fact that from your perspective, what you really want is
a solution to the data structure problem in tree-vrp.
No, that's just a convenient example.  What I really want is a wide-int
that is less visibly a replacement for CONST_DOUBLE.
I think that that is unfair. Variable length reps are the standard technique for doing wide math. I am just proposing using data structures that are common best practices in the rest of the world and adapting them so that they match the gcc world better than just hacking in a gmp interface.

  My patch for tree vrp
scans the entire function to find the largest type used in the function and
then does all of the math at 2x that size.  But i have to admit that i
thought it was weird that you use tree cst as your long term storage.   If
this were my pass, i would have allocated a 2d array of some type that was
as large as the function in 1d and twice as large as the largest int used in
the other dimension and not overloaded tree-cst and then had a set of friend
functions in double int to get in and out. Of course you do not need friends
in double int because the rep is exposed, but in wide-int that is now hidden
since it now is purely functional.

I just have to believe that there is a better way to do tree-vrp than
messing up wide-int for the rest of the compiler.
It's not "messing up", it's making wide-int a generally useful thing and
not tying it so closely to RTL.
Again, this is unfair.

3) your trick will work at the tree level, but not at the rtl level.
The
wide-int code cannot share storage with the CONST_INTs.    We tried this,
and there are a million bugs that would have to be fixed to make it work.
It could have worked if CONST_INTs had carried a mode around, but since
they
do not, you end up with the same CONST_INT sharing the rep for several
different types and that just did not work unless you are willing to do
substantial cleanups.
I don't believe you.  I am only asking for the adaptors to tree and RTL to
work in an RVALUE-ish way (no modification, as obviously RTL and tree
constants may be shared).  I think your claim is because you have that
precision and bitsize members in your wide-int which I believe is a
design red herring.  I suppose we should concentrate on addressing that
one first.  Thus, let me repeat a few questions on your design to
eventually
let you understand my concerns:

Given two wide-ints, a and b, what precision will the result of
       a + b
have?  Consider a having precision 32 and b having precision 64
on a 32-bit HWI host.

You define wide-int to have no sign information:

+   The representation does not contain any information about
+   signedness of the represented value, so it can be used to represent
+   both signed and unsigned numbers.  For operations where the results
+   depend on signedness (division, comparisons), the signedness must
+   be specified separately.  For operations where the signness
+   matters, one of the operands to the operation specifies either
+   wide_int::SIGNED or wide_int::UNSIGNED.

but the result of operations on mixed precision operands _does_ depend
on the sign, nevertheless most operations do not get a signedness
argument.
Nor would that help, because it depends on the signedness of the
individual
operands!

double-int get's around this by having a common "precision" to which
all smaller precision constants have to be sign-/zero-extended.  So
does CONST_INT and CONST_DOUBLE.

Note that even with same precision you have introduced the same problem
with the variable len.

My proposal is simple - all wide-ints are signed!  wide-int is basically
an arbitrary precision signed integer format.  The sign is encoded in
the most significant bit of the last active HWI of the representation
(val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)).  All values
with less precision than len * HOST_BITS_PER_WIDE_INT are
properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT.

This let's you define mixed len operations by implicitely
sign-/zero-extending
the operands to whatever len is required for the operation.

Furthermore it allows you to get rid of the precision member (and
bitsize anyway).
Conversion from tree / RTL requires information on the signedness of the
input (trivial for tree, for RTL all constants are signed - well,
sign-extended).
Whenever you want to transfer the wide-int to tree / RTL you have to
sign-/zero-extend according to the desired precision.  If you need sth
else
than arbitrary precision arithmetic you have to explicitely truncate /
extend
at suitable places - with overflow checking being trivial here.  For
optimization
purposes selected operations may benefit from a combined implementation
receiving a target precision and signedness.  Whatever extra meta-data
RTL requires does not belong in wide-int but in the RTX.  Trivially
a mode comes to my mind (on tree we have a type), and trivially
each RTX has a mode.  And each mode has a precision and bitsize.
It lacks a sign, so all RTL integer constants are sign-extended for
encoding efficiency purposes.  mixed-mode operations will not
occur (mixed len operations will), mixed-mode ops are exclusively
sign-/zero-extensions and truncations.

Representation of (unsigned HOST_WIDE_INT)-1 would necessarily
be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any
precision would be { -1 }.

That was my proposal.  Now, can you please properly specify yours?
And you chose to not answer that fundamental question of how your
wide-int is _supposed_ to work?  Ok, maybe I shouldn't have distracted
you with the bits before this.
sorry, i missed this question by accident.

we did not do infinite precision by design. We looked at the set of programming languages that gcc either compiles or might compile and we looked at the set of machines that gcc targets and neither of these two sets of entities define their operations in terms of infinite precision math. They always do math within a particular precision. Scripting languages do use infinite precision but gcc does not compile any of them. So unless you are going to restrict your set of operations to those that satisfy the properties of a ring, it is generally not strictly safe to do your math in infinite precision.

we do all of the math in the precision defined by the types or modes that are passed in.

constants are defined to be the size of the precision unless they can be compressed. The len field tells how many words are actually needed to exactly represent the constant if (len-1)*HOST_WIDE_BITS_PER_WIDE_INT is less than the precision. The decompression process is to add a number of words to the high order side of the number. These words must contain a zero if the highest represented bit is 0 and -1 if the highest represented bit is 1.

i.e. this looks a lot like sign extension, but the numbers them selves are not inherently signed or unsigned as in your representation. It is up to the operators to imply the signess. So the unsigned multiply operation is different than the signed multiply operation. But these are applied to the bits without an interpretation of their sign.

Richard.

Thanks,
Richard.

On 04/02/2013 11:04 AM, Richard Biener wrote:
On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
This patch contains a large number of the changes requested by Richi.
It
does not contain any of the changes that he requested to abstract the
storage layer.   That suggestion appears to be quite unworkable.
I of course took this claim as a challenge ... with the following
result.
It is
of course quite workable ;)

The attached patch implements the core wide-int class and three storage
models (fixed size for things like plain HWI and double-int, variable
size
similar to how your wide-int works and an adaptor for the double-int as
contained in trees).  With that you can now do

HOST_WIDE_INT
wi_test (tree x)
{
     // template argument deduction doesn't do the magic we want it to do
     // to make this kind of implicit conversions work
     // overload resolution considers this kind of conversions so we
     // need some magic that combines both ... but seeding the overload
     // set with some instantiations doesn't seem to be possible :/
     // wide_int<> w = x + 1;
     wide_int<> w;
     w += x;
     w += 1;
     // template argument deduction doesn't deduce the return value type,
     // not considering the template default argument either ...
     // w = wi (x) + 1;
     // we could support this by providing rvalue-to-lvalue promotion
     // via a traits class?
     // otoh it would lead to sub-optimal code anyway so we should
     // make the result available as reference parameter and only support
     // wide_int <> res; add (res, x, 1); ?
     w = wi (x).operator+<wide_int<> >(1);
     wide_int<>::add(w, x, 1);
     return w.to_hwi ();
}

we are somewhat limited with C++ unless we want to get really fancy.
Eventually providing operator+ just doesn't make much sense for
generic wide-int combinations (though then the issue is its operands
are no longer commutative which I think is the case with your wide-int
or double-int as well - they don't suport "1 + wide_int" for obvious
reasons).

So there are implementation design choices left undecided.

Oh, and the operation implementations are crap (they compute nonsense).

But you should get the idea.

Richard.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]