[PATCH 00/49] RFC: Add a static analysis framework to GCC

David Malcolm dmalcolm@redhat.com
Wed Dec 11 20:11:00 GMT 2019


On Mon, 2019-12-09 at 09:10 +0100, Richard Biener wrote:
> On Fri, Dec 6, 2019 at 11:31 PM Jeff Law <law@redhat.com> wrote:
> > On Wed, 2019-12-04 at 12:55 -0700, Martin Sebor wrote:
> > > On 11/15/19 6:22 PM, David Malcolm wrote:
> > > > This patch kit introduces a static analysis pass for GCC that
> > > > can
> > > > diagnose
> > > > various kinds of problems in C code at compile-time (e.g.
> > > > double-
> > > > free,
> > > > use-after-free, etc).
> > > 
> > > I haven't looked at the analyzer bits in any detail yet so I have
> > > just some very high-level questions.  But first let me say I'm
> > > excited to see this project! :)
> > > 
> > > It looks like the analyzer detects some of the same problems as
> > > some existing middle-end warnings (e.g., -Wnonnull,
> > > -Wuninitialized),
> > > and some that I have been working toward implementing (invalid
> > > uses
> > > of freed pointers such as returning them from functions or
> > > passing
> > > them to others), and others still that I have been thinking about
> > > as possible future projects (e.g., detecting uses of
> > > uninitialized
> > > arrays in string functions).
> > > 
> > > What are your thoughts about this sort of overlap?  Do you expect
> > > us to enhance both sets of warnings in parallel, or do you see us
> > > moving away from issuing warnings in the middle-end and toward
> > > making the analyzer the main source of these kinds of
> > > diagnostics?
> > > Maybe even replace some of the problematic middle-end warnings
> > > with the analyzer?  What (if anything) should we do about
> > > warnings
> > > issued for the same problems by both the middle-end and the
> > > analyzer?
> > > Or about false negatives?  E.g., a bug detected by the middle-end
> > > but not the analyzer or vice versa.
> > > 
> > > What do you see as the biggest pros and cons of either approach?
> > > (Middle-end vs analyzer.)  What limitations is the analyzer
> > > approach inherently subject to that the middle-end warnings
> > > aren't,
> > > and vice versa?
> > > 
> > > How do we prioritize between the two approaches (e.g., choose
> > > where to add a new warning)?
> > Given the cost of David's analyzer, I would tend to prioritize the
> > more
> > localized analysis.  Also note that because of the compile-time
> > complexities we end up pruning paths from the search space and lose
> > precision when we have to merge nodes.   These issues are inherent
> > in
> > the depth of analysis we're looking to do.
> > 
> > So the way to think about things is David's work is a slower,
> > deeper
> > analysis than what we usually do.  So things that are reasonable
> > candidates for -Wall would need to use the traditional mechansisms.
> > Things that require deeper analysis would be done in David's
> > framework.
> > 
> > Also note that part of David's work is to bring a fairly generic
> > engine
> > that we can expand with different domain specific analyzers.  It
> > just
> > happens to be the case that the first place he's focused is on
> > double-
> > free and use-after-free.  But (IMHO) the gem is really the generic
> > engine.
> 
> So if the "generic engine" lives inside GCC can the actual analyzers
> be plugins on a (stable) "analyzer plugin API"?

I like the idea of having plugins be able to support the analyzer
itself, so that new checkers can be registered by a plugin, analogous
to plugins that register new passes.  AIUI the clang static analyzer
works in such a fashion.

However, speaking to the "(stable)" part of your question: to do
anything useful, the checkers have to query GCC's IR (as well as
interact with the state of the analyzer), and so this reopens the
question of what the plugin API to GCC's IR is.

I'm focusing on building a concrete example of a checker (double-free)
and a few other examples; trying to generalize it into something
pluggable feels very much like something not to attempt in the initial
version.

> Does the analyzer work with LTO at whole-program scope btw?

My understanding of LTO is a little hazy, but yes, I think.

The first thing the analyzer does (in engine.cc) is:

  /* If using LTO, ensure that the cgraph nodes have function bodies.  */
  cgraph_node *node;
  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
    node->get_untransformed_body ();

before then building a "supergraph" that combines CFGs and the callgraph.

BTW, for more on implementation details, prebuilt HTML of the internal
docs are at:
https://dmalcolm.fedorapeople.org/gcc/static-analyzer/gccint/Static-Analyzer.html

Dave



More information about the Gcc-patches mailing list