Static Analyzer project

Status

Initial implementation was added in GCC 10; major rewrite occurred in GCC 11.

Only C is currently supported (I hope to support C++ in GCC 14, but it is out-of-scope for GCC 13)

User-facing documentation: prebuilt HTML

Internal documentation: prebuilt HTML

Git branch with some additional material: devel/analyzer (though this is now a long way behind "master" in other areas)

Integration tests for -fanalyzer: https://github.com/davidmalcolm/gcc-analyzer-integration-tests

Bugs relating to the analyzer

Also: RFEs for new GCC warnings, many of which might be analyzer-related.

See also Summer of Code project ideas

History

GCC 13 (under development; currently adding 20 new warnings, for a total of 47):

GCC 12 (added 5 more warnings, for a total of 27):

GCC 11 (added 7 more warnings, for 22 total):

GCC 10 (initial release, with 15 new warnings):

Implementation overview

This project introduces a static analysis pass for GCC that can diagnose various kinds of problems in C code at compile-time (e.g. double-free, use-after-free, etc).

The analyzer runs as an IPA pass on the gimple SSA representation. It associates state machines with data, with transitions at certain statements and edges. It finds "interesting" interprocedural paths through the user's code, in which bogus state transitions happen.

For example, given:

   free (ptr);
   free (ptr);

at the first call, ptr transitions to the "freed" state, and at the second call the analyzer complains, since ptr is already in the "freed" state (unless ptr is NULL, in which case it stays in the NULL state for both calls).

Specific state machines include:

A visualization of the malloc state machine can be seen at https://dmalcolm.fedorapeople.org/gcc/2019-11-22/sm-malloc.png

There are also two state-machine-based checkers that are just proof-of-concept at this stage:

There's a separation between the state machines and the analysis engine, so it ought to be relatively easy to add new warnings.

For any given diagnostic emitted by a state machine, the analysis engine generates the simplest feasible interprocedural path of control flow for triggering the diagnostic. The patch kit adds support to GCC's diagnostic subsystem for associating such a "diagnostic_path" with a diagnostic.

The analyzer itself is implemented as an interprocedural pass for GCC. It is off by default, and must be enabled via -fanalyzer. It can be disabled altogether at configure time when building GCC via --disable-analyzer.

To mitigate feature creep, I've been focusing on implementing double-free detection, albeit with an eye to building something that can be developed into a more fully-featured static analyzer. For example, I haven't yet attempted to track buffer overflows in this version, but I believe that that could be added on top of this foundation.

More details of the internals can be seen in the documentation (prebuilt HTML)

Diagnostic Paths

The patch kit also expands GCC's diagnostic subsystem in various ways:

(a) adding the ability to associate a "diagnostic path" with a diagnostic, describing a sequence of events predicted by the compiler that lead to the problem occurring, with their locations in the user's source, and text descriptions.

For example, the following warning has a 6-event interprocedural path:

malloc-ipa-8-unchecked.c: In function 'make_boxed_int':
malloc-ipa-8-unchecked.c:21:13: warning: dereference of possibly-NULL 'result' [CWE-690] [-Wanalyzer-possible-null-dereference]
  'make_boxed_int': events 1-2
    |
    |   18 | make_boxed_int (int i)
    |      | ^~~~~~~~~~~~~~
    |      | |
    |      | (1) entry to 'make_boxed_int'
    |   19 | {
    |   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int));
    |      |                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                                    |
    |      |                                    (2) calling 'wrapped_malloc' from 'make_boxed_int'
    |
    +--> 'wrapped_malloc': events 3-4
           |
           |    7 | void *wrapped_malloc (size_t size)
           |      |       ^~~~~~~~~~~~~~
           |      |       |
           |      |       (3) entry to 'wrapped_malloc'
           |    8 | {
           |    9 |   return malloc (size);
           |      |          ~~~~~~~~~~~~~
           |      |          |
           |      |          (4) this call could return NULL
           |
    <------+
    |
  'make_boxed_int': events 5-6
    |
    |   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int));
    |      |                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                                    |
    |      |                                    (5) possible return of NULL to 'make_boxed_int' from 'wrapped_malloc'
    |   21 |   result->i = i;
    |      |   ~~~~~~~~~~~~~
    |      |             |
    |      |             (6) 'result' could be NULL: unchecked value from (4)
    |

The diagnostic-printing code has consolidated the path into 3 runs of events (where the events are near each other and within the same function), using ASCII art to show the interprocedural call and return.

A colorized version of the above can be seen at:

Other examples can be seen at:

and:

An example of detecting a historical double-free CVE can be seen at:

The support for associating diagnostic paths with a diagnostic was committed to trunk on 2020-01-10 as r280142.

(b) adding the ability to associate additional metadata with a diagnostic. The only such metadata added by the patch kit are CWE classifications (for the new warnings), so that we can emit e.g.:

malloc-1.c: In function ‘test_42a’:
malloc-1.c:466:1: warning: leak of ‘p’ [CWE-401] [-Wanalyzer-malloc-leak]
  463 |   void *p = malloc (1024);
      |             ^~~~~~~~~~~~~
      |             |
      |             (1) allocated here
......
  466 | }
      | ~            
      | |
      | (2) ‘p’ leaks here; was allocated at (1)

The CWE support was committed to trunk as r279556 on 2019-12-18.

Scope

The analyzer itself is implemented as an interprocedural pass for GCC. It is off by default, and must be enabled via -fanalyzer. It can be disabled altogether at configure time when building GCC via --disable-analyzer.

Earlier versions of the patch kit implemented the analyzer via a GCC plugin and implemented support for "in-tree" plugins i.e. GCC plugins that would live in the GCC source tree and be shipped as part of the GCC tarball, but that idea was dropped in v3 to simplify things.

To mitigate feature creep, I've been focusing on implementing double-free detection, albeit with an eye to building something that can be developed into a more fully-featured static analyzer. For example, I haven't yet attempted to track buffer overflows in this version, but I believe that that could be added on top of this foundation.

Many projects implement some kind of wrapper around malloc and free, so there is enough interprocedural support to cope with that, but only very primitive support for summarizing larger functions and planning/performing an efficient interprocedural analysis on non-trivial functions that have state-machine effects.

In theory the analyzer can work with LTO, and perform cross-TU analysis. There's a bare-bones prototype of this in the testsuite, which finds a double-free spanning two TUs; see:

However this is just a proof-of-concept at this stage (see the internal docs for more notes on its limitations).

User interface

-fanalyzer turns on all the warnings (it also enables the expensive traversal that they rely on). All of the warnings are of the form -Wanalyzer-name-of-warning e.g. -Wanalyzer-malloc-leak. They can be disabled individually via -Wno-analyzer-name-of-warning e.g. -Wno-analyzer-malloc-leak.

Rationale

There's benefit in integrating a checker directly into the compiler, so that

  1. the programmer can see the diagnostics as he or she works on the code, rather than at some later point. I think that if the analyzer can be made sufficiently fast that many people would opt-in to deeper but more expensive warnings. (I'm aiming for 2x compile time as my rough estimate of what's reasonable in exchange for being told up-front about various kinds of pointer snafu).
  2. the analyzer is working with precisely the code that's being compiled (avoiding preprocessor issues, supporting exactly the dialect/extensions of the languages that GCC supports, etc)

Correctness

The analyzer is neither sound nor complete, but does attempt to explore "interesting" paths through the code. There are bugs... (see the xfails and TODOs in the testsuite, and the "Limitations" section of the internal docs).

Performance

Using -fanalyzer roughly doubles the compile time on various testcases I've tried (krb5, zlib), but also sometimes takes a lot longer (again, see the "Limitations" section of the internal docs; there are bugs...).

None: StaticAnalyzer (last edited 2023-02-22 18:54:00 by DavidMalcolm)