This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.


On 02/14/2016 11:38 AM, Richard Biener wrote:
On February 14, 2016 5:35:13 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
On 02/12/2016 10:21 AM, Jeff Law wrote:
But really we simply need a better DSE algorithm.
So the way to fix DSE is to keep the existing algorithm and track the
hunks of Complex/aggregates that have been set a second time.

My first thought was to implement this as an inverted bitmap.  ie,
set
it to 1 for every byte in the complex/aggregate that is set by the
first
store.  Clear bits for each byte in subsequent stores to the pieces.
If
the bitmap reaches an empty state, then the initial store is dead.

Adjusting *ref could work too, but would probably be painful if the
subsequent stores don't happen in a convenient order.
So that was scary easy.  We should have done this a long time ago.

Essentially I call ao_get_ref_base to get the offset/size/max_size
filled in for the first statement.  Those are used to initialize the
live bytes bitfield, as long as max_size != -1.

Then when we have a possible killing statement, we use the ao in a
similar manner to determine which bytes to clear (taking care that the
base is the same between the two references and that in the killing
statement that the size/max_size are the same.).

When all the live bytes are zero then we've killed the original
statement.

It's ~20 lines of code.

I need to pull together some additional tests, but it looks likely
we'll
be able to wrap this up easily for gcc-6.

BTW, we had sth like this before but it had both correctness and more importantly scalability issues.
Just a couple more tibits.

I instrumented a bootstrap -- the improved DSE finds ~20k additional DSE opportunities during a GCC bootstrap that could not be found by the current DSE. Yes, 20k additional statements deleted by tree DSE. Yow!

Of those additional opportunities, none require bit level tracking. So we can just punt when the size/offset is not byte sized/aligned.

Of those additional opportunities > 99% are for sizes of 64 bytes or smaller. Thus we can pack those into 1 or 2 bitmap elements, depending on the starting offset. So the bitmap side will be efficient with no real searching if we choose our PARAM value wisely.

The outliers are, well, strange. There were cases where we found new DSE opportunities for objects of size 2k bytes or larger. There weren't many of these, but I was surprised at the size. Most likely it's a clobber or mem* thing that's participating in DSE. But I haven't looked closely at those cases yet.

There's a ton of statements that are clobbering zero-sized objects. My code can determine when those clobbers are redundant (with some later clobber), but I haven't looked closely to see if that's actually a good thing to do or not.

Anyway, I still don't see anything which makes me think this can't wrap-up in the immediate future.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]