This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Design notes, RFC] Address-lowering prototype design (PR46556)


On Tue, 2011-06-14 at 17:21 +0200, Richard Guenther wrote:
> On Tue, Jun 14, 2011 at 4:18 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
> >> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
> >> <wschmidt@linux.vnet.ibm.com> wrote:
> >> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
> >> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
> >> >> <wschmidt@linux.vnet.ibm.com> wrote:
> >> >
> >> > <snip>
> >> >
> >> >> >> > Loss of aliasing information
> >> >> >> > ============================
> >> >> >> > The most serious problem I've run into is degraded performance due to poorer
> >> >> >> > instruction scheduling choices.  I tracked this down to
> >> >> >> > alias.c:nonoverlapping_component_refs_p.
> >> >> >> >
> >> >> >> > This code proves that two memory accesses don't overlap by attempting to prove
> >> >> >> > that they access different fields of the same structure.  This is done using
> >> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
> >> >> >> > translated into the rtx's during expand.  When address lowering is not
> >> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
> >> >> >> > example.  However, address lowering changes the simple COMPONENT_REF into a
> >> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> >> >> >> > reference.  Thus the aliasing machinery can no longer prove that two such
> >> >> >> > field references are disjoint.
> >> >> >> >
> >> >> >> > This has severe consequences for performance, and has to be dealt with if
> >> >> >> > address lowering is to be successful.
> >> >> >> >
> >> >> >> > I've worked around this with an admittedly fragile solution; I'll discuss the
> >> >> >> > drawbacks below.  The idea is to construct a mapping from replacement mem_refs
> >> >> >> > to the original expressions that they replaced.  When a MEM_EXPR is being set
> >> >> >> > during expand, we first look up the mem_ref in the mapping.  If present, the
> >> >> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
> >> >> >> > essentially duplicates the behavior in the absence of address lowering.
> >> >> >>
> >> >> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
> >> >> >> of problems.  Removing it didn't cause much degradation because we now
> >> >> >> preserve points-to information.
> >> >> >>
> >> >> >> Originally I played with lowering all memory accesses to MEM_REFs
> >> >> >> (see the old mem-ref branch), and the loss of type-based alias
> >> >> >> disambiguation was indeed an issue.
> >> >> >>
> >> >> >> But - I definitely do not like the idea of preserving something similar
> >> >> >> to TMR_ORIGINAL.  Instead we can try preserving some information
> >> >> >> we derive from it.  We keep the original access type that we can use
> >> >> >> for TBAA but do not retain knowledge on whether the type of the
> >> >> >> MEM_REF is valid for TBAA or if it is view-converted.
> >> >> >
> >> >> > Yes, I really don't like what I have at the moment, either.  I put it in
> >> >> > place as a stopgap to let me proceed to look for other performance
> >> >> > problems.
> >> >> >
> >> >> > The question is how we can infer useful information for TBAA from the
> >> >> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
> >> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
> >> >> > many cases to maintain correctness, and couldn't prove the type
> >> >> > information for the important cases in SPEC I was trying to address.
> >> >> >
> >> >> > Unfortunately, the whole design goes down the drain if we can't find a
> >> >> > way to solve the TBAA issue.  The performance degradations are too
> >> >> > costly.
> >> >>
> >> >> If you look at what basic TBAA the alias oracle performs then it boils
> >> >> down to the fact that get_alias_set for a.b.c might end up using the
> >> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
> >> >> of the type of a.  The tree alias-oracle extracts both alias sets, that
> >> >> of the outermost valid type and that of the innermost as both are
> >> >> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
> >> >> only have storage for one such alias-set.  Thus my idea at some point
> >> >> was to store the other one as well in some form.  It will not be
> >> >> the full information (after all, the complete access path does provide
> >> >> some extra information - see aliasing_component_refs_p).
> >> >
> >> > This is what concerns me.  TBAA information for the outer and inner
> >> > components doesn't seem sufficient to provide what
> >> > nonoverlapping_component_refs_p is currently able to prove.  The latter
> >> > searches for a common RECORD_TYPE somewhere along the two access paths,
> >> > and then disambiguates if the two associated referenced fields differ.
> >> > For a simple case like "struct x { int a; int b; };", a and b have the
> >> > same type and alias-set, so the alias-set information doesn't add
> >> > anything.  It isn't sufficient alone for the disambiguation of x1.a =
> >> > MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
> >> >
> >> > Obviously the offset is sufficient to disambiguate for this simple case
> >> > with a common base type, but when the shared record types aren't at the
> >> > outermost level, we can't detect whether it is.
> >> >
> >> > At the moment I don't see how we can avoid degradation unless we keep
> >> > the full access path around somewhere, for [TARGET_]MEM_REFs built from
> >> > COMPONENT_REFs.  I hope I'm wrong.
> >>
> >> You are not wrong.  But the question is, does it make a difference?
> >>
> >> Richard.
> >
> > Yes, it does.  This scenario occurs in 188.ammp, among others, and leads
> > to a large degradation without the change.  The performance-critical
> > loop in mm_fv_update_nonbon makes heavy use of indirect references to
> > the ATOM structure that contains numerous float variables.  When the
> > COMPONENT_REFs have been converted to MEM_REFs, the alias machinery can
> > no longer disambiguate these, which constrains the scheduler.  The
> > result of the poor scheduling (on PowerPC, at least) is a large increase
> > in floating-point spill code in the critical loop.
> 
> As they appear in loops I wonder why IVOPTs doesn't already expose
> this problem?

That's a fair question.  I haven't analyzed it that deeply yet.  I
suspect that most accesses to different ATOMs are through general
pointers rather than arrays.  Any opportunities for IVOPTs may be too
obfuscated to be easily found.  If I recall correctly, this is also a
deep loop nest with some nontrivial control flow.

> 
> In general the answer to missed TBAA optimizations is of course
> "make sure that PTA works", which usually means using LTO ...
> 
> I really really don't like preserving TBAA related information as trees.
> Instead if we really think preserving access paths is a must (I have
> already significantly reduced the number of preserved paths with
> introducing MEM_REFs!) then we have to find a different representation.
> 

Right, I had been wondering whether some opportunities had been
similarly lost with the MEM_REFs introduced in IVOPTs.  

I will ponder how else we could represent the necessary information from
the access path.  One important goal is to avoid bloating the IR when
the additional information isn't needed (one of the downsides of
something like TMR_ORIGINAL).  We only need this data for (chains of)
COMPONENT_REFs converted into [TARGET_]MEM_REFs, so we don't want to
bloat every MEM_REF or TMR with additional data.

Unfortunately, because of the layout of tree_exprs, the only way to
directly add something to a MEM_REF node is in the operand array.
Although we'd rather not raise the size of all MEM_REFs, I haven't
thought of anything less wasteful.  I considered hiding the extra data
in the type system somehow, but we would need something like a separate
type variant for each MEM_REF created from a different chain of
COMPONENT_REFs.  Even if that was workable, which I doubt, it would cost
more space and complexity than adding a new operand would.  In the end,
we probably require some sort of operand that represents the hidden type
information.  I don't want to use a hackish side table like my prototype
does; the data should be directly represented in the IR for
maintainability.

> I suppose you can turn the AMMP case into a (small) testcase that
> illustrates the issue and allows easier analysis and test of fixes?
> 

I can give it the old college try. :)  I'll have to give some thought
how to detect reliably from dumps that two things either are or are not
aliased.  But I agree this is important to isolate, and I should be able
to come up with something.

Thanks,
Bill

> Richard.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]