[PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]

Jan Hubicka hubicka@ucw.cz
Thu Jan 21 19:09:21 GMT 2021


> On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > > gimple.h has this comment for gimple's uid field:
> > > > 
> > > >   /* UID of this statement.  This is used by passes that want to
> > > >      assign IDs to statements.  It must be assigned and used by
> > > > each
> > > >      pass.  By default it should be assumed to contain
> > > > garbage.  */
> > > >   unsigned uid;
> > > > 
> > > > and gimple_set_uid has:
> > > > 
> > > >    Please note that this UID property is supposed to be undefined
> > > > at
> > > >    pass boundaries.  This means that a given pass should not
> > > > assume it
> > > >    contains any useful value when the pass starts and thus can
> > > > set it
> > > >    to any value it sees fit.
> > > > 
> > > > which suggests that any pass can use the uid field as an
> > > > arbitrary
> > > > scratch space.
> > > > 
> > > > PR analyzer/98599 reports a case where this error occurs in LTO
> > > > mode:
> > > >   fatal error: Cgraph edge statement index out of range
> > > > on certain inputs with -fanalyzer.
> > > > 
> > > > The error occurs in the LTRANS phase after -fanalyzer runs in the
> > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > stmts.
> > > > 
> > > > The error occurs when LTRANS is streaming callgraph edges back
> > > > in.
> > > > If I'm reading things correctly, the LTO format uses stmt uids to
> > > > associate call stmts with callgraph edges between WPA and LTRANS.
> > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > gimple_uid, and input_edge reads it back in.
> > > > 
> > > > Hence IPA passes that touch the uids in WPA need to restore them,
> > > > or the stream-in at LTRANS will fail.
> > > > 
> > > > Is it intended that the LTO machinery relies on the value of the
> > > > uid
> > > > field being preserved during WPA (or, at least, needs to be saved
> > > > and
> > > > restored by passes that touch it)?
> > > 
> > > I belive this is solely at the cgraph stream out to stream in
> > > boundary but
> > > this may be a blurred area since while we materialize the whole
> > > cgraph
> > > at once the function bodies are streamed in on demand.
> > > 
> > > Honza can probably clarify things.
> > 
> > Well, the uids are used to associate cgraph edges with
> > statements.  At
> > WPA stage you do not have function bodies and thus uids serves role
> > of
> > pointers to the statement.  If you load the body in (via get_body)
> > the
> > uids are replaced by pointers and when you stream out uids are
> > recomputed again.
> > 
> > When do you touch the uids? At WPA time or from small IPA pass in
> > ltrans?
> 
> The analyzer is here in passes.def:
>   INSERT_PASSES_AFTER (all_regular_ipa_passes)
>   NEXT_PASS (pass_analyzer);
> 
> and so in LTO runs as the first regular IPA pass at WPA time,
> when do_whole_program_analysis calls:
>   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> 
> FWIW I hope to eventually have a way to summarize function bodies in
> the analyzer, but I don't yet, so I'm currently brute forcing things by
> loading all function bodies at the start of the analyzer (when
> -fanalyzer is enabled).
> 
> I wonder if that's messing things up somehow?

Actually I think it should work.  If you do get_body or
get_untransformed_body (that will be equal at that time) you ought to
get ids in symtab datastructure rewritten to pointers and at stream out
time we should assign new ids...
> 
> Does the stream-out from WPA make any assumptions about the stmt uids?
> For example, 
>   #define STMT_UID_NOT_IN_RANGE(uid) \
>     (gimple_stmt_max_uid (fn) < uid || uid == 0)
> seems to assume that the UIDs are per-function ranges from
>   [0-gimple_stmt_max_uid (fn)]
> which isn't the case for the uids set by the analyzer.  Maybe that's
> the issue here?
> 
> Sorry for not being more familiar with the IPA/LTO code

There is lto_prepare_function_for_streaming which assigns uids to be
incremental.   So I guess problem is that it is not called at WPA time
if function is in memory (since at moment we do not really modify bodies
at WPA time, however we do stream them in sometimes to icf compare them
or to update profile).

So i guess fix would be to arrange lto_prepare_function_for_streaming to
be called on all functions with body defined before WPA stream-out?

Honza
> Dave
> 
> 
> > hozna
> > > Note LTO uses this exactly because of this comment to avoid
> > > allocating
> > > extra memory for an 'index' but it could of course leave gimple_uid
> > > alone
> > > at some extra expense (eventually paid for in generic cgraph data
> > > structures
> > > and thus for not only the streaming time).
> > > 
> > > > On the assumption that this is the case, this patch updates the
> > > > comments
> > > > in gimple.h referring to passes being able to set uid to any
> > > > value to
> > > > note the caveat for IPA passes, and it updates the analyzer to
> > > > save
> > > > and restore the UIDs, fixing the error.
> > > > 
> > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > OK for master?
> > > 
> > > The analyzer bits are OK, let's see how Honza can clarify the
> > > situation.
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > > gcc/analyzer/ChangeLog:
> > > >         PR analyzer/98599
> > > >         * supergraph.cc (saved_uids::make_uid_unique): New.
> > > >         (saved_uids::restore_uids): New.
> > > >         (supergraph::supergraph): Replace assignments to stmt-
> > > > >uid with
> > > >         calls to m_stmt_uids.make_uid_unique.
> > > >         (supergraph::~supergraph): New.
> > > >         * supergraph.h (class saved_uids): New.
> > > >         (supergraph::~supergraph): New decl.
> > > >         (supergraph::m_stmt_uids): New field.
> > > > 
> > > > gcc/ChangeLog:
> > > >         PR analyzer/98599
> > > >         * doc/gimple.texi: Document that UIDs must not change
> > > > during IPA
> > > >         passes.
> > > >         * gimple.h (gimple::uid): Likewise.
> > > >         (gimple_set_uid): Likewise.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > >         PR analyzer/98599
> > > >         * gcc.dg/analyzer/pr98599-a.c: New test.
> > > >         * gcc.dg/analyzer/pr98599-b.c: New test.
> > > > ---
> > > >  gcc/analyzer/supergraph.cc                | 53
> > > > +++++++++++++++++++++--
> > > >  gcc/analyzer/supergraph.h                 | 15 +++++++
> > > >  gcc/doc/gimple.texi                       |  6 +++
> > > >  gcc/gimple.h                              | 13 +++++-
> > > >  gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
> > > >  gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
> > > >  6 files changed, 90 insertions(+), 6 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > 
> > > > diff --git a/gcc/analyzer/supergraph.cc
> > > > b/gcc/analyzer/supergraph.cc
> > > > index 419f6424f76..40acfbd16a8 100644
> > > > --- a/gcc/analyzer/supergraph.cc
> > > > +++ b/gcc/analyzer/supergraph.cc
> > > > @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple
> > > > *stmt)
> > > >    return edge;
> > > >  }
> > > > 
> > > > +/* class saved_uids.
> > > > +
> > > > +   In order to ensure consistent results without relying on the
> > > > ordering
> > > > +   of pointer values we assign a uid to each gimple stmt,
> > > > globally unique
> > > > +   across all functions.
> > > > +
> > > > +   Normally, the stmt uids are a scratch space that each pass
> > > > can freely
> > > > +   assign its own values to.  However, in the case of LTO, the
> > > > uids are
> > > > +   used to associate call stmts with callgraph edges between the
> > > > WPA phase
> > > > +   (where the analyzer runs in LTO mode) and the LTRANS phase;
> > > > if the
> > > > +   analyzer changes them in the WPA phase, it leads to errors
> > > > when
> > > > +   streaming the code back in at LTRANS.
> > > > +
> > > > +   Hence this class has responsibility for tracking the original
> > > > uids
> > > > +   and restoring them once the pass is complete (in the
> > > > supergraph dtor).  */
> > > > +
> > > > +/* Give STMT a globally unique uid, storing its original uid so
> > > > it can
> > > > +   later be restored.  */
> > > > +
> > > > +void
> > > > +saved_uids::make_uid_unique (gimple *stmt)
> > > > +{
> > > > +  unsigned next_uid = m_old_stmt_uids.length ();
> > > > +  unsigned old_stmt_uid = stmt->uid;
> > > > +  stmt->uid = next_uid;
> > > > +  m_old_stmt_uids.safe_push
> > > > +    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
> > > > +}
> > > > +
> > > > +/* Restore the saved uids of all stmts.  */
> > > > +
> > > > +void
> > > > +saved_uids::restore_uids () const
> > > > +{
> > > > +  unsigned i;
> > > > +  std::pair<gimple *, unsigned> *pair;
> > > > +  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
> > > > +    pair->first->uid = pair->second;
> > > > +}
> > > > +
> > > >  /* supergraph's ctor.  Walk the callgraph, building supernodes
> > > > for each
> > > >     CFG basic block, splitting the basic blocks at
> > > > callsites.  Join
> > > >     together the supernodes with interprocedural and
> > > > intraprocedural
> > > > @@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger)
> > > > 
> > > >    /* First pass: make supernodes (and assign UIDs to the gimple
> > > > stmts).  */
> > > >    {
> > > > -    unsigned next_uid = 0;
> > > > -
> > > >      /* Sort the cgraph_nodes?  */
> > > >      cgraph_node *node;
> > > >      FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> > > > @@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger)
> > > >             {
> > > >               gimple *stmt = gsi_stmt (gpi);
> > > >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > > -             stmt->uid = next_uid++;
> > > > +             m_stmt_uids.make_uid_unique (stmt);
> > > >             }
> > > > 
> > > >           /* Append statements from BB to the current supernode,
> > > > splitting
> > > > @@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger)
> > > >               gimple *stmt = gsi_stmt (gsi);
> > > >               node_for_stmts->m_stmts.safe_push (stmt);
> > > >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > > -             stmt->uid = next_uid++;
> > > > +             m_stmt_uids.make_uid_unique (stmt);
> > > >               if (cgraph_edge *edge = supergraph_call_edge (fun,
> > > > stmt))
> > > >                 {
> > > >                   m_cgraph_edge_to_caller_prev_node.put(edge,
> > > > node_for_stmts);
> > > > @@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger)
> > > >    }
> > > >  }
> > > > 
> > > > +/* supergraph's dtor.  Reset stmt uids.  */
> > > > +
> > > > +supergraph::~supergraph ()
> > > > +{
> > > > +  m_stmt_uids.restore_uids ();
> > > > +}
> > > > +
> > > >  /* Dump this graph in .dot format to PP, using DUMP_ARGS.
> > > >     Cluster the supernodes by function, then by BB from original
> > > > CFG.  */
> > > > 
> > > > diff --git a/gcc/analyzer/supergraph.h
> > > > b/gcc/analyzer/supergraph.h
> > > > index fc4a753c5a4..b2125a35410 100644
> > > > --- a/gcc/analyzer/supergraph.h
> > > > +++ b/gcc/analyzer/supergraph.h
> > > > @@ -79,6 +79,18 @@ struct supergraph_traits
> > > >    typedef supercluster cluster_t;
> > > >  };
> > > > 
> > > > +/* A class to manage the setting and restoring of statement
> > > > uids.  */
> > > > +
> > > > +class saved_uids
> > > > +{
> > > > +public:
> > > > +  void make_uid_unique (gimple *stmt);
> > > > +  void restore_uids () const;
> > > > +
> > > > +private:
> > > > +  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
> > > > +};
> > > > +
> > > >  /* A "supergraph" is a directed graph formed by joining together
> > > > all CFGs,
> > > >     linking them via interprocedural call and return edges.
> > > > 
> > > > @@ -90,6 +102,7 @@ class supergraph : public
> > > > digraph<supergraph_traits>
> > > >  {
> > > >  public:
> > > >    supergraph (logger *logger);
> > > > +  ~supergraph ();
> > > > 
> > > >    supernode *get_node_for_function_entry (function *fun) const
> > > >    {
> > > > @@ -205,6 +218,8 @@ private:
> > > > 
> > > >    typedef hash_map<const function *, unsigned>
> > > > function_to_num_snodes_t;
> > > >    function_to_num_snodes_t m_function_to_num_snodes;
> > > > +
> > > > +  saved_uids m_stmt_uids;
> > > >  };
> > > > 
> > > >  /* A node within a supergraph.  */
> > > > diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> > > > index 4b3d7d7452e..417f37169ff 100644
> > > > --- a/gcc/doc/gimple.texi
> > > > +++ b/gcc/doc/gimple.texi
> > > > @@ -173,6 +173,12 @@ This is an unsigned integer used by passes
> > > > that want to assign
> > > >  IDs to every statement. These IDs must be assigned and used by
> > > >  each pass.
> > > > 
> > > > +Normally the uid field is a scratch space for use by passes that
> > > > +need one.  However, in LTO mode the UIDs are used to associate
> > > > call
> > > > +statements with callgraph edges between the WPA phase and the
> > > > LTRANS
> > > > +phase; hence if they are changed by an IPA pass they should be
> > > > restored
> > > > +at the end of that pass.
> > > > +
> > > >  @item @code{location}
> > > >  This is a @code{location_t} identifier to specify source code
> > > >  location for this statement. It is inherited from the front
> > > > diff --git a/gcc/gimple.h b/gcc/gimple.h
> > > > index 3ec86f5f082..7d01bd4dd6a 100644
> > > > --- a/gcc/gimple.h
> > > > +++ b/gcc/gimple.h
> > > > @@ -260,7 +260,11 @@ struct GTY((desc
> > > > ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
> > > > 
> > > >    /* UID of this statement.  This is used by passes that want to
> > > >       assign IDs to statements.  It must be assigned and used by
> > > > each
> > > > -     pass.  By default it should be assumed to contain
> > > > garbage.  */
> > > > +     pass.  By default it should be assumed to contain garbage.
> > > > +     However, in LTO mode the UIDs are used to associate call
> > > > stmts with
> > > > +     callgraph edges between the WPA phase and the LTRANS phase;
> > > > hence
> > > > +     if they are changed by an IPA pass they should be restored
> > > > at the
> > > > +     end of that pass.  */
> > > >    unsigned uid;
> > > > 
> > > >    /* [ WORD 2 ]
> > > > @@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum plf_mask
> > > > plf)
> > > >     Please note that this UID property is supposed to be
> > > > undefined at
> > > >     pass boundaries.  This means that a given pass should not
> > > > assume it
> > > >     contains any useful value when the pass starts and thus can
> > > > set it
> > > > -   to any value it sees fit.  */
> > > > +   to any value it sees fit.
> > > > +
> > > > +   That said, in LTO mode the UIDs are used to associate call
> > > > stmts with
> > > > +   callgraph edges between the WPA phase and the LTRANS phase;
> > > > hence
> > > > +   if they are changed by an IPA pass they should be restored at
> > > > the
> > > > +   end of that pass.  */
> > > > 
> > > >  static inline void
> > > >  gimple_set_uid (gimple *g, unsigned uid)
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > new file mode 100644
> > > > index 00000000000..2bbf37b0e6e
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > @@ -0,0 +1,8 @@
> > > > +/* { dg-do link } */
> > > > +/* { dg-require-effective-target lto } */
> > > > +/* { dg-additional-options "-Os -flto" } */
> > > > +/* { dg-additional-sources pr98599-b.c } */
> > > > +
> > > > +int b(int x);
> > > > +int a() { b(5); }
> > > > +int main() { a(); }
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > new file mode 100644
> > > > index 00000000000..cfdeb3bf134
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > @@ -0,0 +1 @@
> > > > +int b(int x) { return x; }
> > > > --
> > > > 2.26.2
> > > > 
> 


More information about the Gcc-patches mailing list