This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
wide_int foo = wide_int::from_rtx (r1); wide_int bar = wide_int::from_rtx (r2) + foo;
wide_int_ptr foo = wide_int_ptr::from_rtx (r1); wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;
On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:On 11/26/2012 10:03 AM, Richard Biener wrote:I thought they were all short-lived.On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:On 11/04/2012 11:54 AM, Richard Biener wrote:On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford <rdsandiford@googlemail.com> wrote:Kenneth Zadeck <zadeck@naturalbridge.com> writes:I would like you to respond to at least point 1 of this email. In it there is code from the rtl level that was written twice, once for the case when the size of the mode is less than the size of a HWI and once for the case where the size of the mode is less that 2 HWIs.
my patch changes this to one instance of the code that works no matter how large the data passed to it is.
you have made a specific requirement for wide int to be a template that can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I would like to know how this particular fragment is to be rewritten in this model? It seems that I would have to retain the structure where there is one version of the code for each size that the template is instantiated.I think richi's argument was that wide_int should be split into two. There should be a "bare-metal" class that just has a length and HWIs, and the main wide_int class should be an extension on top of that that does things to a bit precision instead. Presumably with some template magic so that the length (number of HWIs) is a constant for:
typedef foo<2> double_int;
and a variable for wide_int (because in wide_int the length would be the number of significant HWIs rather than the size of the underlying array). wide_int would also record the precision and apply it after the full HWI operation.
So the wide_int class would still provide "as wide as we need" arithmetic, as in your rtl patch. I don't think he was objecting to that.That summarizes one part of my complaints / suggestions correctly. In other mails I suggested to not make it a template but a constant over object lifetime 'bitsize' (or maxlen) field. Both suggestions likely require more thought than I put into them. The main reason is that with C++ you can abstract from where wide-int information pieces are stored and thus use the arithmetic / operation workers without copying the (source) "wide-int" objects. Thus you should be able to write adaptors for double-int storage, tree or RTX storage.We had considered something along these lines and rejected it. I am not really opposed to doing something like this, but it is not an obvious winning idea and is likely not to be a good idea. Here was our thought process:
if you abstract away the storage inside a wide int, then you should be able to copy a pointer to the block of data from either the rtl level integer constant or the tree level one into the wide int. It is certainly true that making a wide_int from one of these is an extremely common operation and doing this would avoid those copies.
However, this causes two problems: 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make the object. it created the base object and then it allocated the array. Richard S noticed that we could just allocate one CONST_WIDE_INT that had the array in it. Doing it this way saves one ggc allocation and one indirection when accessing the data within the CONST_WIDE_INT. Our plan is to use the same trick at the tree level. So to avoid the copying, you seem to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.I did not propose having a pointer to the data in the RTX or tree int. Just the short-lived wide-ints (which are on the stack) would have a pointer to the data - which can then obviously point into the RTX and tree data.There is the issue then what if some wide-ints are not short lived. It makes me nervous to create internal pointers to gc ed memory.
No, those would simply use the embedded storage model.2) You are now stuck either ggcing the storage inside a wide_int when they are created as part of an expression or you have to play some game to represent the two different storage plans inside of wide_int.Hm? wide-ints are short-lived and thus never live across a garbage collection point. We create non-GCed objects pointing to GCed objects all the time and everywhere this way.Again, this makes me nervous but it could be done. However, it does mean that now the wide ints that are not created from rtxes or trees will be more expensive because they are not going to get their storage "for free", they are going to alloca it.
however, it still is not clear, given that 99% of the wide ints are going to fit in a single hwi, that this would be a noticeable win.Currently even if they fit into a HWI you will still allocate 4 times the larges integer mode size. You say that doesn't matter because they are short-lived, but I say it does matter because not all of them are short-lived enough. If 99% fit in a HWI why allocate 4 times the largest integer mode size in 99% of the cases?
Clearly this is where you think that we should be going by suggesting that we abstract away the internal storage. However, this comes at a price: what is currently an array access in my patches would (i believe) become a function call.No, the workers (that perform the array accesses) will simply get a pointer to the first data element. Then whether it's embedded or external is of no interest to them.so is your plan that the wide int constructors from rtx or tree would just copy the pointer to the array on top of the array that is otherwise allocated on the stack? I can easily do this. But as i said, the gain seems quite small.
And of course, going the other way still does need the copy.The proposal was to template wide_int on a storage model, the embedded one would work as-is (embedding 4 times largest integer mode), the external one would have a pointer to data. All functions that return a wide_int produce a wide_int with the embedded model. To avoid the function call penalty you described the storage model provides a way to get a pointer to the first element and the templated operations simply dispatch to a worker that takes this pointer to the first element (as the storage model is designed as a template its abstraction is going to be optimized away by means of inlining).
Richard.
Of course - GCing wide-ints is a non-starter.From a performance point of view, i believe that this is a non starter. If you can figure out how to design this so that it is not a function call, i would consider this a viable option.
On the other side of this you are clearly correct that we are copying the data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs. But this is why we represent data inside of the wide_ints, the INT_CSTs and the CONST_WIDE_INTs in a compressed form. Even with very big types, which are generally rare, the constants them selves are very small. So the copy operation is a loop that almost always copies one element, even with tree-vrp which doubles the sizes of every type.
There is the third option which is that the storage inside the wide int is just ggced storage. We rejected this because of the functional nature of wide-ints. There are zillions created, they can be stack allocated, and they last for very short periods of time.
As is probably obvious, I don't agree FWIW. It seems like an unnecessary complication without any clear use. Especially since the number ofMaybe the double_int typedef is without any clear use. Properly abstracting from the storage / information providers will save compile-time, memory and code though. I don't see that any thought was spent on how to avoid excessive copying or dealing with long(er)-lived objects and their storage needs.I actually disagree. Wide ints can use a bloated amount of storage because they are designed to be very short lived and very low cost objects that are stack allocated. For long term storage, there is INT_CST at the tree level and CONST_WIDE_INT at the rtl level. Those use a very compact storage model. The copying entailed is only a small part of the overall performance.Well, but both trees and RTXen are not viable for short-lived things because the are GCed! double-ints were suitable for this kind of stuff because the also have a moderate size. With wide-ints size becomes a problem (or GC, if you instead use trees or RTXen).
Everything that you are suggesting along these lines is adding to the weight of a wide-int object.On the contrary - it lessens their weight (with external already existing storage) or does not do anything to it (with the embedded storage).
You have to understand there will be many more wide-ints created in a normal compilation than were ever created with double-int. This is because the rtl level had no object like this at all and at the tree level, many of the places that should have used double int, short cut the code and only did the transformations if the types fit in a HWI.Your argument shows that the copy-in/out from tree/RTX to/from wide-int will become a very frequent operation and thus it is worth optimizing it.
I'm sure you did.This is why we are extremely defensive about this issue. We really did think a lot about it.
Richard.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |