[RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
Richard Biener
richard.guenther@gmail.com
Wed Feb 17 10:48:00 GMT 2016
On Wed, Feb 17, 2016 at 8:30 AM, Jeff Law <law@redhat.com> wrote:
> On 02/14/2016 11:38 AM, Richard Biener wrote:
>>
>> On February 14, 2016 5:35:13 PM GMT+01:00, Jeff Law <law@redhat.com>
>> wrote:
>>>
>>> On 02/12/2016 10:21 AM, Jeff Law wrote:
>>>>>
>>>>> But really we simply need a better DSE algorithm.
>>>>
>>>> So the way to fix DSE is to keep the existing algorithm and track the
>>>> hunks of Complex/aggregates that have been set a second time.
>>>>
>>>> My first thought was to implement this as an inverted bitmap. ie,
>>>
>>> set
>>>>
>>>> it to 1 for every byte in the complex/aggregate that is set by the
>>>
>>> first
>>>>
>>>> store. Clear bits for each byte in subsequent stores to the pieces.
>>>
>>> If
>>>>
>>>> the bitmap reaches an empty state, then the initial store is dead.
>>>>
>>>> Adjusting *ref could work too, but would probably be painful if the
>>>> subsequent stores don't happen in a convenient order.
>>>
>>> So that was scary easy. We should have done this a long time ago.
>>>
>>> Essentially I call ao_get_ref_base to get the offset/size/max_size
>>> filled in for the first statement. Those are used to initialize the
>>> live bytes bitfield, as long as max_size != -1.
>>>
>>> Then when we have a possible killing statement, we use the ao in a
>>> similar manner to determine which bytes to clear (taking care that the
>>> base is the same between the two references and that in the killing
>>> statement that the size/max_size are the same.).
>>>
>>> When all the live bytes are zero then we've killed the original
>>> statement.
>>>
>>> It's ~20 lines of code.
>>>
>>> I need to pull together some additional tests, but it looks likely
>>> we'll
>>> be able to wrap this up easily for gcc-6.
>>
>>
>> BTW, we had sth like this before but it had both correctness and more
>> importantly scalability issues.
>
> Just a couple more tibits.
>
> I instrumented a bootstrap -- the improved DSE finds ~20k additional DSE
> opportunities during a GCC bootstrap that could not be found by the current
> DSE. Yes, 20k additional statements deleted by tree DSE. Yow!
Well, DCE also can do quite some DSE and it runs after DSE - did that 20k
more DSE affect the overall end-result?
> Of those additional opportunities, none require bit level tracking. So we
> can just punt when the size/offset is not byte sized/aligned.
Yep. I expect us to eventually lower all those bit-precision stuff.
> Of those additional opportunities > 99% are for sizes of 64 bytes or
> smaller. Thus we can pack those into 1 or 2 bitmap elements, depending on
> the starting offset. So the bitmap side will be efficient with no real
> searching if we choose our PARAM value wisely.
So then please use a uint64_t or even uint32_t mask please. Which means
a fixed size SBITMAP (32 bits) if you like to use the bitmap interface.
> The outliers are, well, strange. There were cases where we found new DSE
> opportunities for objects of size 2k bytes or larger. There weren't many of
> these, but I was surprised at the size. Most likely it's a clobber or mem*
> thing that's participating in DSE. But I haven't looked closely at those
> cases yet.
I suspect it's memset followed by actually initializing all elements. We have
quite some of those I think.
> There's a ton of statements that are clobbering zero-sized objects. My code
> can determine when those clobbers are redundant (with some later clobber),
> but I haven't looked closely to see if that's actually a good thing to do or
> not.
>
> Anyway, I still don't see anything which makes me think this can't wrap-up
> in the immediate future.
Can you share your work-in-progress patch?
Thanks,
Richard.
> jeff
More information about the Gcc-patches
mailing list