[PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]
David Malcolm
dmalcolm@redhat.com
Thu Jan 21 17:15:20 GMT 2021
On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > > gimple.h has this comment for gimple's uid field:
> > >
> > > /* UID of this statement. This is used by passes that want to
> > > assign IDs to statements. It must be assigned and used by
> > > each
> > > pass. By default it should be assumed to contain
> > > garbage. */
> > > unsigned uid;
> > >
> > > and gimple_set_uid has:
> > >
> > > Please note that this UID property is supposed to be undefined
> > > at
> > > pass boundaries. This means that a given pass should not
> > > assume it
> > > contains any useful value when the pass starts and thus can
> > > set it
> > > to any value it sees fit.
> > >
> > > which suggests that any pass can use the uid field as an
> > > arbitrary
> > > scratch space.
> > >
> > > PR analyzer/98599 reports a case where this error occurs in LTO
> > > mode:
> > > fatal error: Cgraph edge statement index out of range
> > > on certain inputs with -fanalyzer.
> > >
> > > The error occurs in the LTRANS phase after -fanalyzer runs in the
> > > WPA phase. The analyzer pass writes to the uid fields of all
> > > stmts.
> > >
> > > The error occurs when LTRANS is streaming callgraph edges back
> > > in.
> > > If I'm reading things correctly, the LTO format uses stmt uids to
> > > associate call stmts with callgraph edges between WPA and LTRANS.
> > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > gimple_uid, and input_edge reads it back in.
> > >
> > > Hence IPA passes that touch the uids in WPA need to restore them,
> > > or the stream-in at LTRANS will fail.
> > >
> > > Is it intended that the LTO machinery relies on the value of the
> > > uid
> > > field being preserved during WPA (or, at least, needs to be saved
> > > and
> > > restored by passes that touch it)?
> >
> > I belive this is solely at the cgraph stream out to stream in
> > boundary but
> > this may be a blurred area since while we materialize the whole
> > cgraph
> > at once the function bodies are streamed in on demand.
> >
> > Honza can probably clarify things.
>
> Well, the uids are used to associate cgraph edges with
> statements. At
> WPA stage you do not have function bodies and thus uids serves role
> of
> pointers to the statement. If you load the body in (via get_body)
> the
> uids are replaced by pointers and when you stream out uids are
> recomputed again.
>
> When do you touch the uids? At WPA time or from small IPA pass in
> ltrans?
The analyzer is here in passes.def:
INSERT_PASSES_AFTER (all_regular_ipa_passes)
NEXT_PASS (pass_analyzer);
and so in LTO runs as the first regular IPA pass at WPA time,
when do_whole_program_analysis calls:
execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
FWIW I hope to eventually have a way to summarize function bodies in
the analyzer, but I don't yet, so I'm currently brute forcing things by
loading all function bodies at the start of the analyzer (when
-fanalyzer is enabled).
I wonder if that's messing things up somehow?
Does the stream-out from WPA make any assumptions about the stmt uids?
For example,
#define STMT_UID_NOT_IN_RANGE(uid) \
(gimple_stmt_max_uid (fn) < uid || uid == 0)
seems to assume that the UIDs are per-function ranges from
[0-gimple_stmt_max_uid (fn)]
which isn't the case for the uids set by the analyzer. Maybe that's
the issue here?
Sorry for not being more familiar with the IPA/LTO code
Dave
> hozna
> > Note LTO uses this exactly because of this comment to avoid
> > allocating
> > extra memory for an 'index' but it could of course leave gimple_uid
> > alone
> > at some extra expense (eventually paid for in generic cgraph data
> > structures
> > and thus for not only the streaming time).
> >
> > > On the assumption that this is the case, this patch updates the
> > > comments
> > > in gimple.h referring to passes being able to set uid to any
> > > value to
> > > note the caveat for IPA passes, and it updates the analyzer to
> > > save
> > > and restore the UIDs, fixing the error.
> > >
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > OK for master?
> >
> > The analyzer bits are OK, let's see how Honza can clarify the
> > situation.
> >
> > Thanks,
> > Richard.
> >
> > > gcc/analyzer/ChangeLog:
> > > PR analyzer/98599
> > > * supergraph.cc (saved_uids::make_uid_unique): New.
> > > (saved_uids::restore_uids): New.
> > > (supergraph::supergraph): Replace assignments to stmt-
> > > >uid with
> > > calls to m_stmt_uids.make_uid_unique.
> > > (supergraph::~supergraph): New.
> > > * supergraph.h (class saved_uids): New.
> > > (supergraph::~supergraph): New decl.
> > > (supergraph::m_stmt_uids): New field.
> > >
> > > gcc/ChangeLog:
> > > PR analyzer/98599
> > > * doc/gimple.texi: Document that UIDs must not change
> > > during IPA
> > > passes.
> > > * gimple.h (gimple::uid): Likewise.
> > > (gimple_set_uid): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > > PR analyzer/98599
> > > * gcc.dg/analyzer/pr98599-a.c: New test.
> > > * gcc.dg/analyzer/pr98599-b.c: New test.
> > > ---
> > > gcc/analyzer/supergraph.cc | 53
> > > +++++++++++++++++++++--
> > > gcc/analyzer/supergraph.h | 15 +++++++
> > > gcc/doc/gimple.texi | 6 +++
> > > gcc/gimple.h | 13 +++++-
> > > gcc/testsuite/gcc.dg/analyzer/pr98599-a.c | 8 ++++
> > > gcc/testsuite/gcc.dg/analyzer/pr98599-b.c | 1 +
> > > 6 files changed, 90 insertions(+), 6 deletions(-)
> > > create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > >
> > > diff --git a/gcc/analyzer/supergraph.cc
> > > b/gcc/analyzer/supergraph.cc
> > > index 419f6424f76..40acfbd16a8 100644
> > > --- a/gcc/analyzer/supergraph.cc
> > > +++ b/gcc/analyzer/supergraph.cc
> > > @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple
> > > *stmt)
> > > return edge;
> > > }
> > >
> > > +/* class saved_uids.
> > > +
> > > + In order to ensure consistent results without relying on the
> > > ordering
> > > + of pointer values we assign a uid to each gimple stmt,
> > > globally unique
> > > + across all functions.
> > > +
> > > + Normally, the stmt uids are a scratch space that each pass
> > > can freely
> > > + assign its own values to. However, in the case of LTO, the
> > > uids are
> > > + used to associate call stmts with callgraph edges between the
> > > WPA phase
> > > + (where the analyzer runs in LTO mode) and the LTRANS phase;
> > > if the
> > > + analyzer changes them in the WPA phase, it leads to errors
> > > when
> > > + streaming the code back in at LTRANS.
> > > +
> > > + Hence this class has responsibility for tracking the original
> > > uids
> > > + and restoring them once the pass is complete (in the
> > > supergraph dtor). */
> > > +
> > > +/* Give STMT a globally unique uid, storing its original uid so
> > > it can
> > > + later be restored. */
> > > +
> > > +void
> > > +saved_uids::make_uid_unique (gimple *stmt)
> > > +{
> > > + unsigned next_uid = m_old_stmt_uids.length ();
> > > + unsigned old_stmt_uid = stmt->uid;
> > > + stmt->uid = next_uid;
> > > + m_old_stmt_uids.safe_push
> > > + (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
> > > +}
> > > +
> > > +/* Restore the saved uids of all stmts. */
> > > +
> > > +void
> > > +saved_uids::restore_uids () const
> > > +{
> > > + unsigned i;
> > > + std::pair<gimple *, unsigned> *pair;
> > > + FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
> > > + pair->first->uid = pair->second;
> > > +}
> > > +
> > > /* supergraph's ctor. Walk the callgraph, building supernodes
> > > for each
> > > CFG basic block, splitting the basic blocks at
> > > callsites. Join
> > > together the supernodes with interprocedural and
> > > intraprocedural
> > > @@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger)
> > >
> > > /* First pass: make supernodes (and assign UIDs to the gimple
> > > stmts). */
> > > {
> > > - unsigned next_uid = 0;
> > > -
> > > /* Sort the cgraph_nodes? */
> > > cgraph_node *node;
> > > FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> > > @@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger)
> > > {
> > > gimple *stmt = gsi_stmt (gpi);
> > > m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > - stmt->uid = next_uid++;
> > > + m_stmt_uids.make_uid_unique (stmt);
> > > }
> > >
> > > /* Append statements from BB to the current supernode,
> > > splitting
> > > @@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger)
> > > gimple *stmt = gsi_stmt (gsi);
> > > node_for_stmts->m_stmts.safe_push (stmt);
> > > m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > - stmt->uid = next_uid++;
> > > + m_stmt_uids.make_uid_unique (stmt);
> > > if (cgraph_edge *edge = supergraph_call_edge (fun,
> > > stmt))
> > > {
> > > m_cgraph_edge_to_caller_prev_node.put(edge,
> > > node_for_stmts);
> > > @@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger)
> > > }
> > > }
> > >
> > > +/* supergraph's dtor. Reset stmt uids. */
> > > +
> > > +supergraph::~supergraph ()
> > > +{
> > > + m_stmt_uids.restore_uids ();
> > > +}
> > > +
> > > /* Dump this graph in .dot format to PP, using DUMP_ARGS.
> > > Cluster the supernodes by function, then by BB from original
> > > CFG. */
> > >
> > > diff --git a/gcc/analyzer/supergraph.h
> > > b/gcc/analyzer/supergraph.h
> > > index fc4a753c5a4..b2125a35410 100644
> > > --- a/gcc/analyzer/supergraph.h
> > > +++ b/gcc/analyzer/supergraph.h
> > > @@ -79,6 +79,18 @@ struct supergraph_traits
> > > typedef supercluster cluster_t;
> > > };
> > >
> > > +/* A class to manage the setting and restoring of statement
> > > uids. */
> > > +
> > > +class saved_uids
> > > +{
> > > +public:
> > > + void make_uid_unique (gimple *stmt);
> > > + void restore_uids () const;
> > > +
> > > +private:
> > > + auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
> > > +};
> > > +
> > > /* A "supergraph" is a directed graph formed by joining together
> > > all CFGs,
> > > linking them via interprocedural call and return edges.
> > >
> > > @@ -90,6 +102,7 @@ class supergraph : public
> > > digraph<supergraph_traits>
> > > {
> > > public:
> > > supergraph (logger *logger);
> > > + ~supergraph ();
> > >
> > > supernode *get_node_for_function_entry (function *fun) const
> > > {
> > > @@ -205,6 +218,8 @@ private:
> > >
> > > typedef hash_map<const function *, unsigned>
> > > function_to_num_snodes_t;
> > > function_to_num_snodes_t m_function_to_num_snodes;
> > > +
> > > + saved_uids m_stmt_uids;
> > > };
> > >
> > > /* A node within a supergraph. */
> > > diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> > > index 4b3d7d7452e..417f37169ff 100644
> > > --- a/gcc/doc/gimple.texi
> > > +++ b/gcc/doc/gimple.texi
> > > @@ -173,6 +173,12 @@ This is an unsigned integer used by passes
> > > that want to assign
> > > IDs to every statement. These IDs must be assigned and used by
> > > each pass.
> > >
> > > +Normally the uid field is a scratch space for use by passes that
> > > +need one. However, in LTO mode the UIDs are used to associate
> > > call
> > > +statements with callgraph edges between the WPA phase and the
> > > LTRANS
> > > +phase; hence if they are changed by an IPA pass they should be
> > > restored
> > > +at the end of that pass.
> > > +
> > > @item @code{location}
> > > This is a @code{location_t} identifier to specify source code
> > > location for this statement. It is inherited from the front
> > > diff --git a/gcc/gimple.h b/gcc/gimple.h
> > > index 3ec86f5f082..7d01bd4dd6a 100644
> > > --- a/gcc/gimple.h
> > > +++ b/gcc/gimple.h
> > > @@ -260,7 +260,11 @@ struct GTY((desc
> > > ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
> > >
> > > /* UID of this statement. This is used by passes that want to
> > > assign IDs to statements. It must be assigned and used by
> > > each
> > > - pass. By default it should be assumed to contain
> > > garbage. */
> > > + pass. By default it should be assumed to contain garbage.
> > > + However, in LTO mode the UIDs are used to associate call
> > > stmts with
> > > + callgraph edges between the WPA phase and the LTRANS phase;
> > > hence
> > > + if they are changed by an IPA pass they should be restored
> > > at the
> > > + end of that pass. */
> > > unsigned uid;
> > >
> > > /* [ WORD 2 ]
> > > @@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum plf_mask
> > > plf)
> > > Please note that this UID property is supposed to be
> > > undefined at
> > > pass boundaries. This means that a given pass should not
> > > assume it
> > > contains any useful value when the pass starts and thus can
> > > set it
> > > - to any value it sees fit. */
> > > + to any value it sees fit.
> > > +
> > > + That said, in LTO mode the UIDs are used to associate call
> > > stmts with
> > > + callgraph edges between the WPA phase and the LTRANS phase;
> > > hence
> > > + if they are changed by an IPA pass they should be restored at
> > > the
> > > + end of that pass. */
> > >
> > > static inline void
> > > gimple_set_uid (gimple *g, unsigned uid)
> > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > new file mode 100644
> > > index 00000000000..2bbf37b0e6e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > @@ -0,0 +1,8 @@
> > > +/* { dg-do link } */
> > > +/* { dg-require-effective-target lto } */
> > > +/* { dg-additional-options "-Os -flto" } */
> > > +/* { dg-additional-sources pr98599-b.c } */
> > > +
> > > +int b(int x);
> > > +int a() { b(5); }
> > > +int main() { a(); }
> > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > new file mode 100644
> > > index 00000000000..cfdeb3bf134
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > @@ -0,0 +1 @@
> > > +int b(int x) { return x; }
> > > --
> > > 2.26.2
> > >
More information about the Gcc-patches
mailing list