This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE - v3


On 01/04/2017 06:23 AM, Richard Biener wrote:
On Wed, Jan 4, 2017 at 2:22 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law <law@redhat.com> wrote:
This is the first of the 4 part patchkit to address deficiencies in our DSE
implementation.

This patch addresses the P2 regression 33562 which has been a low priority
regression since gcc-4.3.  To summarize, DSE no longer has the ability to
detect an aggregate store as dead if subsequent stores are done in a
piecemeal fashion.

I originally tackled this by changing how we lower complex objects. That was
sufficient to address 33562, but was reasonably rejected.

This version attacks the problem by improving DSE to track stores to memory
at a byte level.  That allows us to determine if a series of stores
completely covers an earlier store (thus making the earlier store dead).

A useful side effect of this is we can detect when parts of a store are dead
and potentially rewrite the store.  This patch implements that for complex
object initializations.  While not strictly part of 33562, it's so closely
related that I felt it belongs as part of this patch.

This originally limited the size of the tracked memory space to 64 bytes.  I
bumped the limit after working through the CONSTRUCTOR and mem* trimming
patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
sleep if we throttled back to 64 or 128 bytes.

Later patches in the kit will build upon this patch.  So if pieces look like
skeleton code, that's because it is.

The changes since the V2 patch are:

1. Using sbitmaps rather than bitmaps.
2. Returning a tri-state from dse_classify_store (renamed from
dse_possible_dead_store_p)
3. More efficient trim computation
4. Moving trimming code out of dse_classify_store
5. Refactoring code to delete dead calls/assignments
6. dse_optimize_stmt moves into the dse_dom_walker class

Not surprisingly, this patch has most of the changes based on prior feedback
as it includes the raw infrastructure.

Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?

New functions in sbitmap.c lack function comments.

bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
is not important, but non-GCC host compilers are).  See bitmap.c for a
fallback.

Both bitmap_clear_range and bitmap_set_range look rather inefficient...
(it's not likely anybody will clean this up after you)

I'd say split out the sbitmap.[ch] changes.

+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+        "dse-max-object-size",
+        "Maximum size (in bytes) of objects tracked by dead store
elimination.",
+        256, 0, 0)

the docs suggest that DSE doesn't handle larger stores but it does (just in
the original limited way).  Maybe "tracked bytewise" is better.

Oh, and new --params need documeting in invoke.texi.
Fixed.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]