This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon


Richard,

I spent a good part of the afternoon talking to Mike about this. He is on the c++ standards committee and is a much more seasoned c++ programmer than I am.

He convinced me that with a large amount of engineering and c++ "foolishness" that it was indeed possible to get your proposal to POSSIBLY work as well as what we did.

But now the question is why would any want to do this?

At the very least you are talking about instantiating two instances of wide-ints, one for the stack allocated uses and one for the places where we just move a pointer from the tree or the rtx. Then you are talking about creating connectors so that the stack allocated functions can take parameters of pointer version and visa versa.

Then there is the issue that rather than just saying that something is a wide int, that the programmer is going to have to track it's origin. In particular, where in the code right now i say.

wide_int foo = wide_int::from_rtx (r1);
wide_int bar = wide_int::from_rtx (r2) + foo;

now i would have to say

wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;

then when i want to call some function using a wide_int ref that function now must be either overloaded to take both or i have to choose one of the two instantiations (presumably based on which is going to be more common) and just have the compiler fix up everything (which it is likely to do).

And so what is the payoff:
1) No one except the c++ elite is going to understand the code. The rest of the community will hate me and curse the ground that i walk on.
2) I will end up with a version of wide-int that can be used as a medium life container (where i define medium life as not allowed to survive a gc since they will contain pointers into rtxes and trees.)
3) An no clients that actually wanted to do this!! I could use as an example one of your favorite passes, tree-vrp. The current double-int could have been a medium lifetime container since it has a smaller footprint, but in fact tree-vrp converts those double-ints back into trees for medium storage. Why, because it needs the other fields of a tree-cst to store the entire state. Wide-ints also "suffer" this problem. their only state are the data, and the three length fields. They have no type and none of the other tree info so the most obvious client for a medium lifetime object is really not going to be a good match even if you "solve the storage problem".


The fact is that wide-ints are an excellent short term storage class that can be very quickly converted into our two long term storage classes. Your proposal is requires a lot of work, will not be easy to use and as far as i can see has no payoff on the horizon. It could be that there could be future clients for a medium lifetime value, but asking for this with no clients in hand is really beyond the scope of a reasonable review.

I remind you that the purpose of these patches is to solve problems that exist in the current compiler that we have papered over for years. If someone needs wide-ints in some way that is not foreseen then they can change it.

kenny

On 11/26/2012 11:30 AM, Richard Biener wrote:
On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
On 11/26/2012 10:03 AM, Richard Biener wrote:
On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zadeck@naturalbridge.com>
wrote:
On 11/04/2012 11:54 AM, Richard Biener wrote:
On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
Kenneth Zadeck <zadeck@naturalbridge.com> writes:
I would like you to respond to at least point 1 of this email.   In it
there is code from the rtl level that was written twice, once for the
case when the size of the mode is less than the size of a HWI and once
for the case where the size of the mode is less that 2 HWIs.

my patch changes this to one instance of the code that works no matter
how large the data passed to it is.

you have made a specific requirement for wide int to be a template
that
can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
I
would like to know how this particular fragment is to be rewritten in
this model?   It seems that I would have to retain the structure where
there is one version of the code for each size that the template is
instantiated.
I think richi's argument was that wide_int should be split into two.
There should be a "bare-metal" class that just has a length and HWIs,
and the main wide_int class should be an extension on top of that
that does things to a bit precision instead.  Presumably with some
template magic so that the length (number of HWIs) is a constant for:

typedef foo<2> double_int;

and a variable for wide_int (because in wide_int the length would be
the number of significant HWIs rather than the size of the underlying
array).  wide_int would also record the precision and apply it after
the full HWI operation.

So the wide_int class would still provide "as wide as we need"
arithmetic,
as in your rtl patch.  I don't think he was objecting to that.
That summarizes one part of my complaints / suggestions correctly.  In
other
mails I suggested to not make it a template but a constant over object
lifetime
'bitsize' (or maxlen) field.  Both suggestions likely require more
thought
than
I put into them.  The main reason is that with C++ you can abstract from
where
wide-int information pieces are stored and thus use the arithmetic /
operation
workers without copying the (source) "wide-int" objects.  Thus you
should
be able to write adaptors for double-int storage, tree or RTX storage.
We had considered something along these lines and rejected it.   I am not
really opposed to doing something like this, but it is not an obvious
winning idea and is likely not to be a good idea.   Here was our thought
process:

if you abstract away the storage inside a wide int, then you should be
able
to copy a pointer to the block of data from either the rtl level integer
constant or the tree level one into the wide int.   It is certainly true
that making a wide_int from one of these is an extremely common operation
and doing this would avoid those copies.

However, this causes two problems:
1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
make
the object.   it created the base object and then it allocated the array.
Richard S noticed that we could just allocate one CONST_WIDE_INT that had
the array in it.   Doing it this way saves one ggc allocation and one
indirection when accessing the data within the CONST_WIDE_INT.   Our plan
is
to use the same trick at the tree level.   So to avoid the copying, you
seem
to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
I did not propose having a pointer to the data in the RTX or tree int.
Just
the short-lived wide-ints (which are on the stack) would have a pointer to
the data - which can then obviously point into the RTX and tree data.
There is the issue then what if some wide-ints are not short lived. It makes
me nervous to create internal pointers to gc ed memory.
I thought they were all short-lived.

2) You are now stuck either ggcing the storage inside a wide_int when
they
are created as part of an expression or you have to play some game to
represent the two different storage plans inside of wide_int.
Hm?  wide-ints are short-lived and thus never live across a garbage
collection
point.  We create non-GCed objects pointing to GCed objects all the time
and everywhere this way.
Again, this makes me nervous but it could be done.  However, it does mean
that now the wide ints that are not created from rtxes or trees will be more
expensive because they are not going to get their storage "for free", they
are going to alloca it.
No, those would simply use the embedded storage model.

however, it still is not clear, given that 99% of the wide ints are going to
fit in a single hwi, that this would be a noticeable win.
Currently even if they fit into a HWI you will still allocate 4 times the
larges integer mode size.  You say that doesn't matter because they
are short-lived, but I say it does matter because not all of them are
short-lived enough.  If 99% fit in a HWI why allocate 4 times the
largest integer mode size in 99% of the cases?

    Clearly this
is where you think that we should be going by suggesting that we abstract
away the internal storage.   However, this comes at a price:   what is
currently an array access in my patches would (i believe) become a
function
call.
No, the workers (that perform the array accesses) will simply get
a pointer to the first data element.  Then whether it's embedded or
external is of no interest to them.
so is your plan that the wide int constructors from rtx or tree would just
copy the pointer to the array on top of the array that is otherwise
allocated on the stack?    I can easily do this.   But as i said, the gain
seems quite small.

And of course, going the other way still does need the copy.
The proposal was to template wide_int on a storage model, the embedded
one would work as-is (embedding 4 times largest integer mode), the
external one would have a pointer to data.  All functions that return a
wide_int produce a wide_int with the embedded model.  To avoid
the function call penalty you described the storage model provides
a way to get a pointer to the first element and the templated operations
simply dispatch to a worker that takes this pointer to the first element
(as the storage model is designed as a template its abstraction is going
to be optimized away by means of inlining).

Richard.

   From a performance point of view, i believe that this is a non
starter. If you can figure out how to design this so that it is not a
function call, i would consider this a viable option.

On the other side of this you are clearly correct that we are copying the
data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.
But
this is why we represent data inside of the wide_ints, the INT_CSTs and
the
CONST_WIDE_INTs in a compressed form.   Even with very big types, which
are
generally rare, the constants them selves are very small.   So the copy
operation is a loop that almost always copies one element, even with
tree-vrp which doubles the sizes of every type.

There is the third option which is that the storage inside the wide int
is
just ggced storage.  We rejected this because of the functional nature of
wide-ints.    There are zillions created, they can be stack allocated,
and
they last for very short periods of time.
Of course - GCing wide-ints is a non-starter.

As is probably obvious, I don't agree FWIW.  It seems like an
unnecessary
complication without any clear use.  Especially since the number of
Maybe the double_int typedef is without any clear use.  Properly
abstracting from the storage / information providers will save
compile-time, memory and code though.  I don't see that any thought
was spent on how to avoid excessive copying or dealing with
long(er)-lived objects and their storage needs.
I actually disagree.    Wide ints can use a bloated amount of storage
because they are designed to be very short lived and very low cost
objects
that are stack allocated.   For long term storage, there is INT_CST at
the
tree level and CONST_WIDE_INT at the rtl level.  Those use a very compact
storage model.   The copying entailed is only a small part of the overall
performance.
Well, but both trees and RTXen are not viable for short-lived things
because
the are GCed!  double-ints were suitable for this kind of stuff because
the also have a moderate size.  With wide-ints size becomes a problem
(or GC, if you instead use trees or RTXen).

Everything that you are suggesting along these lines is adding to the
weight
of a wide-int object.
On the contrary - it lessens their weight (with external already
existing storage)
or does not do anything to it (with the embedded storage).

   You have to understand there will be many more
wide-ints created in a normal compilation than were ever created with
double-int.    This is because the rtl level had no object like this at
all
and at the tree level, many of the places that should have used double
int,
short cut the code and only did the transformations if the types fit in a
HWI.
Your argument shows that the copy-in/out from tree/RTX to/from wide-int
will become a very frequent operation and thus it is worth optimizing it.

This is why we are extremely defensive about this issue.   We really did
think a lot about it.
I'm sure you did.

Richard.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]