This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE


Jeff Law <law@redhat.com> writes:
> This is the first of the 4 part patchkit to address deficiencies in our 
> DSE implementation.
>
> This patch addresses the P2 regression 33562 which has been a low 
> priority regression since gcc-4.3.  To summarize, DSE no longer has the 
> ability to detect an aggregate store as dead if subsequent stores are 
> done in a piecemeal fashion.
>
> I originally tackled this by changing how we lower complex objects. 
> That was sufficient to address 33562, but was reasonably rejected.
>
> This version attacks the problem by improving DSE to track stores to 
> memory at a byte level.  That allows us to determine if a series of 
> stores completely covers an earlier store (thus making the earlier store 
> dead).
>
> A useful side effect of this is we can detect when parts of a store are 
> dead and potentially rewrite the store.  This patch implements that for 
> complex object initializations.  While not strictly part of 33562, it's 
> so closely related that I felt it belongs as part of this patch.
>
> This originally limited the size of the tracked memory space to 64 
> bytes.  I bumped the limit after working through the CONSTRUCTOR and 
> mem* trimming patches.  The 256 byte limit is still fairly arbitrary and 
> I wouldn't lose sleep if we throttled back to 64 or 128 bytes.

FWIW (and shouldn't affect whether the patch goes in)...

If SVE support is accepted for GCC 8 then we have the additional problem
that the sizes of useful aggregates can be runtime invariants.  For example,
in the SVE equivalent of float64x2x2_t, it would be good to be able to
detect that an assignment to the full structure is dead if we later
assign to both of the individual vectors.  Probably the most convenient
way of doing that would be to track ranges rather than individual bytes,
like the local part of the RTL DSE pass does.  (It was relatively easy
to convert local RTL DSE to variable-length modes, but the global part
also uses byte bitmaps.)

I guess bitmaps are going to be more efficient for small structures or
for accesses that occur in an arbitrary order.  But tracking ranges
means adding at most one fixed-size range record per update, so I suppose
the throttle would be on the number of records (= number of gaps + 1)
rather than the size of the structure.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]