This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Tue, Nov 27, 2012 at 1:06 AM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
> Richard,
>
> I spent a good part of the afternoon talking to Mike about this. He is on
> the c++ standards committee and is a much more seasoned c++ programmer than
> I am.
>
> He convinced me that with a large amount of engineering and c++
> "foolishness" that it was indeed possible to get your proposal to POSSIBLY
> work as well as what we did.
>
> But now the question is why would any want to do this?
>
> At the very least you are talking about instantiating two instances of
> wide-ints, one for the stack allocated uses and one for the places where we
> just move a pointer from the tree or the rtx. Then you are talking about
> creating connectors so that the stack allocated functions can take
> parameters of pointer version and visa versa.
>
> Then there is the issue that rather than just saying that something is a
> wide int, that the programmer is going to have to track it's origin. In
> particular, where in the code right now i say.
>
> wide_int foo = wide_int::from_rtx (r1);
> wide_int bar = wide_int::from_rtx (r2) + foo;
>
> now i would have to say
>
> wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
> wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;
No, you'd say
wide_int foo = wide_int::from_rtx (r1);
and the static, non-templated from_rtx method would automagically
return (always!) a "wide_int_ptr" kind. The initialization then would
use the assignment operator that mediates between wide_int and
"wide_int_ptr", doing the copying.
The user should get a 'stack' kind by default when specifying wide_int,
like implemented with
struct wide_int_storage_stack;
struct wide_int_storage_ptr;
template <class storage = wide_int_storage_stack>
class wide_int : public storage
{
...
static wide_int <wide_int_storage_ptr> from_rtx (rtx);
}
the whole point of the exercise is to make from_rtx and from_tree avoid
the copying (and excessive stack space allocation) for the rvalue case
like in
wide_int res = wide_int::from_rtx (x) + 1;
if you save the result into a wide_int temporary first then you are lost
of course (modulo some magic GCC optimization being able to elide
the copy somehow).
And of course for code like VRP that keeps a lattice of wide_ints to
be able to reduce its footprint by using ptr storage and explicit allocations
(that's a secondary concern, of course). And for VRP to specify that
it needs more than the otherwise needed MAX_INT_MODE_SIZE.
ptr storage would not have this arbitrary limitation, only embedded
storage (should) have.
> then when i want to call some function using a wide_int ref that function
> now must be either overloaded to take both or i have to choose one of the
> two instantiations (presumably based on which is going to be more common)
> and just have the compiler fix up everything (which it is likely to do).
Nope, they'd be
class wide_int ...
{
template <class storage1, class storage2>
wide_int operator+(wide_int <storage1> a, wide_int<storage2> b)
{
return wide_int::plus_worker (a.precision, a. ...., a.get_storage_ptr (),
b.precision, ...,
b.get_storage_ptr ());
}
> And so what is the payoff:
> 1) No one except the c++ elite is going to understand the code. The rest of
> the community will hate me and curse the ground that i walk on.
Maybe for the implementation - but look at hash-table and vec ... not for
usage certainly.
> 2) I will end up with a version of wide-int that can be used as a medium
> life container (where i define medium life as not allowed to survive a gc
> since they will contain pointers into rtxes and trees.)
> 3) An no clients that actually wanted to do this!! I could use as an
> example one of your favorite passes, tree-vrp. The current double-int
> could have been a medium lifetime container since it has a smaller
> footprint, but in fact tree-vrp converts those double-ints back into trees
> for medium storage. Why, because it needs the other fields of a tree-cst
> to store the entire state. Wide-ints also "suffer" this problem. their
> only state are the data, and the three length fields. They have no type
> and none of the other tree info so the most obvious client for a medium
> lifetime object is really not going to be a good match even if you "solve
> the storage problem".
>
> The fact is that wide-ints are an excellent short term storage class that
> can be very quickly converted into our two long term storage classes. Your
> proposal is requires a lot of work, will not be easy to use and as far as i
> can see has no payoff on the horizon. It could be that there could be
> future clients for a medium lifetime value, but asking for this with no
> clients in hand is really beyond the scope of a reasonable review.
>
> I remind you that the purpose of these patches is to solve problems that
> exist in the current compiler that we have papered over for years. If
> someone needs wide-ints in some way that is not foreseen then they can
> change it.
The patches introduce a lot more temporary wide-ints (your words) and
at the same time makes construction of them from tree / rtx very expensive
both stack space and compile-time wise. Look at how we for example
compute TREE_INT_CST + 1 - int_cst_binop internally uses double_ints
for the computation and then instantiates a new tree for holding the result.
Now we'd use wide_ints for this requring totally unnecessary copying.
Why not in the first place try to avoid that. And try to avoid making
wide_ints 4 times as large as really necessary just for the sake of VRP!
(VRP should have a way to say "_I_ want larger wide_ints", without putting
this burden on all other users).
Richard.
> kenny
>
>
> On 11/26/2012 11:30 AM, Richard Biener wrote:
>>
>> On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
>> <zadeck@naturalbridge.com> wrote:
>>>
>>> On 11/26/2012 10:03 AM, Richard Biener wrote:
>>>>
>>>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck
>>>> <zadeck@naturalbridge.com>
>>>> wrote:
>>>>>
>>>>> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>>>>>
>>>>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>>>
>>>>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>>>>>
>>>>>>>> I would like you to respond to at least point 1 of this email. In
>>>>>>>> it
>>>>>>>> there is code from the rtl level that was written twice, once for
>>>>>>>> the
>>>>>>>> case when the size of the mode is less than the size of a HWI and
>>>>>>>> once
>>>>>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>>>>>
>>>>>>>> my patch changes this to one instance of the code that works no
>>>>>>>> matter
>>>>>>>> how large the data passed to it is.
>>>>>>>>
>>>>>>>> you have made a specific requirement for wide int to be a template
>>>>>>>> that
>>>>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
>>>>>>>> I
>>>>>>>> would like to know how this particular fragment is to be rewritten
>>>>>>>> in
>>>>>>>> this model? It seems that I would have to retain the structure
>>>>>>>> where
>>>>>>>> there is one version of the code for each size that the template is
>>>>>>>> instantiated.
>>>>>>>
>>>>>>> I think richi's argument was that wide_int should be split into two.
>>>>>>> There should be a "bare-metal" class that just has a length and HWIs,
>>>>>>> and the main wide_int class should be an extension on top of that
>>>>>>> that does things to a bit precision instead. Presumably with some
>>>>>>> template magic so that the length (number of HWIs) is a constant for:
>>>>>>>
>>>>>>> typedef foo<2> double_int;
>>>>>>>
>>>>>>> and a variable for wide_int (because in wide_int the length would be
>>>>>>> the number of significant HWIs rather than the size of the underlying
>>>>>>> array). wide_int would also record the precision and apply it after
>>>>>>> the full HWI operation.
>>>>>>>
>>>>>>> So the wide_int class would still provide "as wide as we need"
>>>>>>> arithmetic,
>>>>>>> as in your rtl patch. I don't think he was objecting to that.
>>>>>>
>>>>>> That summarizes one part of my complaints / suggestions correctly. In
>>>>>> other
>>>>>> mails I suggested to not make it a template but a constant over object
>>>>>> lifetime
>>>>>> 'bitsize' (or maxlen) field. Both suggestions likely require more
>>>>>> thought
>>>>>> than
>>>>>> I put into them. The main reason is that with C++ you can abstract
>>>>>> from
>>>>>> where
>>>>>> wide-int information pieces are stored and thus use the arithmetic /
>>>>>> operation
>>>>>> workers without copying the (source) "wide-int" objects. Thus you
>>>>>> should
>>>>>> be able to write adaptors for double-int storage, tree or RTX storage.
>>>>>
>>>>> We had considered something along these lines and rejected it. I am
>>>>> not
>>>>> really opposed to doing something like this, but it is not an obvious
>>>>> winning idea and is likely not to be a good idea. Here was our
>>>>> thought
>>>>> process:
>>>>>
>>>>> if you abstract away the storage inside a wide int, then you should be
>>>>> able
>>>>> to copy a pointer to the block of data from either the rtl level
>>>>> integer
>>>>> constant or the tree level one into the wide int. It is certainly
>>>>> true
>>>>> that making a wide_int from one of these is an extremely common
>>>>> operation
>>>>> and doing this would avoid those copies.
>>>>>
>>>>> However, this causes two problems:
>>>>> 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
>>>>> make
>>>>> the object. it created the base object and then it allocated the
>>>>> array.
>>>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that
>>>>> had
>>>>> the array in it. Doing it this way saves one ggc allocation and one
>>>>> indirection when accessing the data within the CONST_WIDE_INT. Our
>>>>> plan
>>>>> is
>>>>> to use the same trick at the tree level. So to avoid the copying, you
>>>>> seem
>>>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
>>>>
>>>> I did not propose having a pointer to the data in the RTX or tree int.
>>>> Just
>>>> the short-lived wide-ints (which are on the stack) would have a pointer
>>>> to
>>>> the data - which can then obviously point into the RTX and tree data.
>>>
>>> There is the issue then what if some wide-ints are not short lived. It
>>> makes
>>> me nervous to create internal pointers to gc ed memory.
>>
>> I thought they were all short-lived.
>>
>>>>> 2) You are now stuck either ggcing the storage inside a wide_int when
>>>>> they
>>>>> are created as part of an expression or you have to play some game to
>>>>> represent the two different storage plans inside of wide_int.
>>>>
>>>> Hm? wide-ints are short-lived and thus never live across a garbage
>>>> collection
>>>> point. We create non-GCed objects pointing to GCed objects all the time
>>>> and everywhere this way.
>>>
>>> Again, this makes me nervous but it could be done. However, it does mean
>>> that now the wide ints that are not created from rtxes or trees will be
>>> more
>>> expensive because they are not going to get their storage "for free",
>>> they
>>> are going to alloca it.
>>
>> No, those would simply use the embedded storage model.
>>
>>> however, it still is not clear, given that 99% of the wide ints are going
>>> to
>>> fit in a single hwi, that this would be a noticeable win.
>>
>> Currently even if they fit into a HWI you will still allocate 4 times the
>> larges integer mode size. You say that doesn't matter because they
>> are short-lived, but I say it does matter because not all of them are
>> short-lived enough. If 99% fit in a HWI why allocate 4 times the
>> largest integer mode size in 99% of the cases?
>>
>>>>> Clearly this
>>>>> is where you think that we should be going by suggesting that we
>>>>> abstract
>>>>> away the internal storage. However, this comes at a price: what is
>>>>> currently an array access in my patches would (i believe) become a
>>>>> function
>>>>> call.
>>>>
>>>> No, the workers (that perform the array accesses) will simply get
>>>> a pointer to the first data element. Then whether it's embedded or
>>>> external is of no interest to them.
>>>
>>> so is your plan that the wide int constructors from rtx or tree would
>>> just
>>> copy the pointer to the array on top of the array that is otherwise
>>> allocated on the stack? I can easily do this. But as i said, the
>>> gain
>>> seems quite small.
>>>
>>> And of course, going the other way still does need the copy.
>>
>> The proposal was to template wide_int on a storage model, the embedded
>> one would work as-is (embedding 4 times largest integer mode), the
>> external one would have a pointer to data. All functions that return a
>> wide_int produce a wide_int with the embedded model. To avoid
>> the function call penalty you described the storage model provides
>> a way to get a pointer to the first element and the templated operations
>> simply dispatch to a worker that takes this pointer to the first element
>> (as the storage model is designed as a template its abstraction is going
>> to be optimized away by means of inlining).
>>
>> Richard.
>>
>>>>> From a performance point of view, i believe that this is a non
>>>>> starter. If you can figure out how to design this so that it is not a
>>>>> function call, i would consider this a viable option.
>>>>>
>>>>> On the other side of this you are clearly correct that we are copying
>>>>> the
>>>>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.
>>>>> But
>>>>> this is why we represent data inside of the wide_ints, the INT_CSTs and
>>>>> the
>>>>> CONST_WIDE_INTs in a compressed form. Even with very big types, which
>>>>> are
>>>>> generally rare, the constants them selves are very small. So the copy
>>>>> operation is a loop that almost always copies one element, even with
>>>>> tree-vrp which doubles the sizes of every type.
>>>>>
>>>>> There is the third option which is that the storage inside the wide int
>>>>> is
>>>>> just ggced storage. We rejected this because of the functional nature
>>>>> of
>>>>> wide-ints. There are zillions created, they can be stack allocated,
>>>>> and
>>>>> they last for very short periods of time.
>>>>
>>>> Of course - GCing wide-ints is a non-starter.
>>>>
>>>>>>> As is probably obvious, I don't agree FWIW. It seems like an
>>>>>>> unnecessary
>>>>>>> complication without any clear use. Especially since the number of
>>>>>>
>>>>>> Maybe the double_int typedef is without any clear use. Properly
>>>>>> abstracting from the storage / information providers will save
>>>>>> compile-time, memory and code though. I don't see that any thought
>>>>>> was spent on how to avoid excessive copying or dealing with
>>>>>> long(er)-lived objects and their storage needs.
>>>>>
>>>>> I actually disagree. Wide ints can use a bloated amount of storage
>>>>> because they are designed to be very short lived and very low cost
>>>>> objects
>>>>> that are stack allocated. For long term storage, there is INT_CST at
>>>>> the
>>>>> tree level and CONST_WIDE_INT at the rtl level. Those use a very
>>>>> compact
>>>>> storage model. The copying entailed is only a small part of the
>>>>> overall
>>>>> performance.
>>>>
>>>> Well, but both trees and RTXen are not viable for short-lived things
>>>> because
>>>> the are GCed! double-ints were suitable for this kind of stuff because
>>>> the also have a moderate size. With wide-ints size becomes a problem
>>>> (or GC, if you instead use trees or RTXen).
>>>>
>>>>> Everything that you are suggesting along these lines is adding to the
>>>>> weight
>>>>> of a wide-int object.
>>>>
>>>> On the contrary - it lessens their weight (with external already
>>>> existing storage)
>>>> or does not do anything to it (with the embedded storage).
>>>>
>>>>> You have to understand there will be many more
>>>>> wide-ints created in a normal compilation than were ever created with
>>>>> double-int. This is because the rtl level had no object like this at
>>>>> all
>>>>> and at the tree level, many of the places that should have used double
>>>>> int,
>>>>> short cut the code and only did the transformations if the types fit in
>>>>> a
>>>>> HWI.
>>>>
>>>> Your argument shows that the copy-in/out from tree/RTX to/from wide-int
>>>> will become a very frequent operation and thus it is worth optimizing
>>>> it.
>>>>
>>>>> This is why we are extremely defensive about this issue. We really
>>>>> did
>>>>> think a lot about it.
>>>>
>>>> I'm sure you did.
>>>>
>>>> Richard.
>>>
>>>
>