This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.

From: Jeff Law <law at redhat dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Wed, 17 Feb 2016 00:30:07 -0700
Subject: Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
Authentication-results: sourceware.org; auth=none
References: <56BD1EFB dot 90008 at redhat dot com> <CAFiYyc3iYgzrprV2V=C02nbAmZntPy7QH_=Dxo_UXps7sneh_g at mail dot gmail dot com> <56BE14B0 dot 9040801 at redhat dot com> <56C0ACC1 dot 60905 at redhat dot com> <0D3F08EF-9DC3-4848-AEC7-1AE639464B3D at gmail dot com>

On 02/14/2016 11:38 AM, Richard Biener wrote:

On February 14, 2016 5:35:13 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:

On 02/12/2016 10:21 AM, Jeff Law wrote:

But really we simply need a better DSE algorithm.

So the way to fix DSE is to keep the existing algorithm and track the
hunks of Complex/aggregates that have been set a second time.

My first thought was to implement this as an inverted bitmap.  ie,

set

it to 1 for every byte in the complex/aggregate that is set by the

first

store.  Clear bits for each byte in subsequent stores to the pieces.

If

the bitmap reaches an empty state, then the initial store is dead.

Adjusting *ref could work too, but would probably be painful if the
subsequent stores don't happen in a convenient order.

So that was scary easy.  We should have done this a long time ago.

Essentially I call ao_get_ref_base to get the offset/size/max_size
filled in for the first statement.  Those are used to initialize the
live bytes bitfield, as long as max_size != -1.

Then when we have a possible killing statement, we use the ao in a
similar manner to determine which bytes to clear (taking care that the
base is the same between the two references and that in the killing
statement that the size/max_size are the same.).

When all the live bytes are zero then we've killed the original
statement.

It's ~20 lines of code.

I need to pull together some additional tests, but it looks likely
we'll
be able to wrap this up easily for gcc-6.


BTW, we had sth like this before but it had both correctness and more importantly scalability issues.

Just a couple more tibits.

I instrumented a bootstrap -- the improved DSE finds ~20k additional DSEopportunities during a GCC bootstrap that could not be found by thecurrent DSE. Yes, 20k additional statements deleted by tree DSE. Yow!

Of those additional opportunities, none require bit level tracking. Sowe can just punt when the size/offset is not byte sized/aligned.

Of those additional opportunities > 99% are for sizes of 64 bytes orsmaller. Thus we can pack those into 1 or 2 bitmap elements, dependingon the starting offset. So the bitmap side will be efficient with noreal searching if we choose our PARAM value wisely.

The outliers are, well, strange. There were cases where we found newDSE opportunities for objects of size 2k bytes or larger. There weren'tmany of these, but I was surprised at the size. Most likely it's aclobber or mem* thing that's participating in DSE. But I haven't lookedclosely at those cases yet.

There's a ton of statements that are clobbering zero-sized objects. Mycode can determine when those clobbers are redundant (with some laterclobber), but I haven't looked closely to see if that's actually a goodthing to do or not.

Anyway, I still don't see anything which makes me think this can'twrap-up in the immediate future.


jeff

Follow-Ups:
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Richard Biener

References:
- [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Richard Biener
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]