Static Analyzer project
Status
Initial implementation was added in GCC 10; major rewrite occurred in GCC 11.
Only C is currently supported (I hope to support C++ in GCC 12, but it is out-of-scope for GCC 11)
Internal documentation: prebuilt HTML
Git branch with some additional material: devel/analyzer (though this is now a long way behind "master" in other areas)
Bugs relating to the analyzer
tracker bug for reintroducing -Wanalyzer-use-of-uninitialized-value
issues in gcc that other static analyzers find that gcc misses
Also: RFEs for new GCC warnings, many of which might be analyzer-related.
History
2021-04-08: Fixes to leak detection
2021-03-11: Reimplemented how the analyzer finds the shortest feasible path for each diagnostic fixing various false negatives
2021-01-29: Simplified compound conditionals in analyzer paths
2021-01-28: Blog post: Static analysis updates in GCC 11
2021-01-18: Added support to analyzer for "malloc" attribute extending the malloc/free checking in the analyzer to cover arbitrary allocator/deallocator pairs
2020-11-11: Improvements to -Wanalyzer-stale-setjmp-buffer
2020-11-11: Added -Wanalyzer-shift-count-negative and -Wanalyzer-shift-count-overflow
2020-11-10: v2 of -fdiagnostics-path-format=html
2020-10-28: Committed various fixes for non-determinism in the analyzer 1 2 3 4
2020-10-22: [PATCH/RFC] Add -fdiagnostics-path-format=html
2020-10-14: Posted patch to add plugin support to -fanalyzer, with example of checking for CPython GIL errors
2020-10-12: Added -Wanalyzer-write-to-const and -Wanalyzer-write-to-string-literal
2020-10-05: Posted RFC: add "deallocated_by" attribute for use by analyzer
2020-09-23: Added -fno-analyzer-feasibility debug option
2020-09-22: Added -fdump-analyzer-json debug option
2020-09-09: Generalized malloc/free checking to new/delete (to the extent that this can be done, given how much support for C++ is missing)
2020-08-25: Talk at GNU Tools Track of LPC 2020: "GCC’s -fanalyzer option":
2020-08-13: Major rewrite of how state is tracked within the analyzer fixing numerous bugs and simplifying the implementation
2020-04-28: Removal of -Wanalyzer-use-of-uninitialized-value for GCC 10
2020-04-21: First CVE found using -fanalyzer, CVE-2020-1967
2020-03-26: Blog post: Static analysis in GCC 10
- 2020-01-15:
Updated branch from dmalcolm/analyzer to devel/analyzer
- 2020-01-14:
2020-01-10: diagnostic_path support committed to trunk (r280142)
2020-01-09: v6 of analyzer patch kit
- 2020-01-08: v5 of patches:
2020-01-02: Avoid printing redundant data when printing diagnostic paths
2019-12-19: Add support for tracking sets of functions; add -Wanalyzer-use-of-closed-file
2019-12-18: CWE support committed to trunk
2019-12-17: Fixed false positives seen with reproducer for CVE-2005-1689
- 2019-12-13: v4 of patches:
- 2019-12-10:
"analyzer" component and "analyzer branch" version added to GCC bugzilla
2019-12-08: Converted the analyzer to be built in to the compiler, rather than a plugin
Git branch analyzer-v3-unsquashed
2019-12-04: Added check for unsafe calls within signal handlers (screenshot)
2019-11-27: Reworking of command-line options (from --analyzer to -fanalyzer and -fno-analyzer)
- 2019-11-19: v2 of patches (rebased; LTO fixes):
Git branch: dmalcolm/analyzer-v2
- 2019-11-15: Initial proof-of-concept posted to gcc-patches:
Implementation overview
This project introduces a static analysis pass for GCC that can diagnose various kinds of problems in C code at compile-time (e.g. double-free, use-after-free, etc).
The analyzer runs as an IPA pass on the gimple SSA representation. It associates state machines with data, with transitions at certain statements and edges. It finds "interesting" interprocedural paths through the user's code, in which bogus state transitions happen.
For example, given:
free (ptr); free (ptr);
at the first call, ptr transitions to the "freed" state, and at the second call the analyzer complains, since ptr is already in the "freed" state (unless ptr is NULL, in which case it stays in the NULL state for both calls).
Specific state machines include:
- a checker for malloc/free (for detecting double-free, resource leaks, use-after-free, etc).
- a checker for stdio's FILE stream API (for detecting double-fclose, leaks, etc)
a checker for detecting uses of async-signal-unsafe functions from within a signal handler (CWE-479).
A visualization of the malloc state machine can be seen at https://dmalcolm.fedorapeople.org/gcc/2019-11-22/sm-malloc.png
There are also two state-machine-based checkers that are just proof-of-concept at this stage:
a checker for tracking exposure of sensitive data (e.g. writing passwords to log files aka CWE-532), and
a checker for tracking "taint", where data potentially under an attacker's control is used without sanitization for things like array indices (CWE-129).
There's a separation between the state machines and the analysis engine, so it ought to be relatively easy to add new warnings.
For any given diagnostic emitted by a state machine, the analysis engine generates the simplest feasible interprocedural path of control flow for triggering the diagnostic. The patch kit adds support to GCC's diagnostic subsystem for associating such a "diagnostic_path" with a diagnostic.
The analyzer itself is implemented as an interprocedural pass for GCC. It is off by default, and must be enabled via -fanalyzer. It can be disabled altogether at configure time when building GCC via --disable-analyzer.
To mitigate feature creep, I've been focusing on implementing double-free detection, albeit with an eye to building something that can be developed into a more fully-featured static analyzer. For example, I haven't yet attempted to track buffer overflows in this version, but I believe that that could be added on top of this foundation.
More details of the internals can be seen in the documentation (prebuilt HTML)
Diagnostic Paths
The patch kit also expands GCC's diagnostic subsystem in various ways:
(a) adding the ability to associate a "diagnostic path" with a diagnostic, describing a sequence of events predicted by the compiler that lead to the problem occurring, with their locations in the user's source, and text descriptions.
For example, the following warning has a 6-event interprocedural path:
malloc-ipa-8-unchecked.c: In function 'make_boxed_int': malloc-ipa-8-unchecked.c:21:13: warning: dereference of possibly-NULL 'result' [CWE-690] [-Wanalyzer-possible-null-dereference] 'make_boxed_int': events 1-2 | | 18 | make_boxed_int (int i) | | ^~~~~~~~~~~~~~ | | | | | (1) entry to 'make_boxed_int' | 19 | { | 20 | boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int)); | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (2) calling 'wrapped_malloc' from 'make_boxed_int' | +--> 'wrapped_malloc': events 3-4 | | 7 | void *wrapped_malloc (size_t size) | | ^~~~~~~~~~~~~~ | | | | | (3) entry to 'wrapped_malloc' | 8 | { | 9 | return malloc (size); | | ~~~~~~~~~~~~~ | | | | | (4) this call could return NULL | <------+ | 'make_boxed_int': events 5-6 | | 20 | boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int)); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (5) possible return of NULL to 'make_boxed_int' from 'wrapped_malloc' | 21 | result->i = i; | | ~~~~~~~~~~~~~ | | | | | (6) 'result' could be NULL: unchecked value from (4) |
The diagnostic-printing code has consolidated the path into 3 runs of events (where the events are near each other and within the same function), using ASCII art to show the interprocedural call and return.
A colorized version of the above can be seen at:
Other examples can be seen at:
and:
An example of detecting a historical double-free CVE can be seen at:
The support for associating diagnostic paths with a diagnostic was committed to trunk on 2020-01-10 as r280142.
(b) adding the ability to associate additional metadata with a diagnostic. The only such metadata added by the patch kit are CWE classifications (for the new warnings), so that we can emit e.g.:
malloc-1.c: In function ‘test_42a’: malloc-1.c:466:1: warning: leak of ‘p’ [CWE-401] [-Wanalyzer-malloc-leak] 463 | void *p = malloc (1024); | ^~~~~~~~~~~~~ | | | (1) allocated here ...... 466 | } | ~ | | | (2) ‘p’ leaks here; was allocated at (1)
The CWE support was committed to trunk as r279556 on 2019-12-18.
Scope
The analyzer itself is implemented as an interprocedural pass for GCC. It is off by default, and must be enabled via -fanalyzer. It can be disabled altogether at configure time when building GCC via --disable-analyzer.
Earlier versions of the patch kit implemented the analyzer via a GCC plugin and implemented support for "in-tree" plugins i.e. GCC plugins that would live in the GCC source tree and be shipped as part of the GCC tarball, but that idea was dropped in v3 to simplify things.
To mitigate feature creep, I've been focusing on implementing double-free detection, albeit with an eye to building something that can be developed into a more fully-featured static analyzer. For example, I haven't yet attempted to track buffer overflows in this version, but I believe that that could be added on top of this foundation.
Many projects implement some kind of wrapper around malloc and free, so there is enough interprocedural support to cope with that, but only very primitive support for summarizing larger functions and planning/performing an efficient interprocedural analysis on non-trivial functions that have state-machine effects.
In theory the analyzer can work with LTO, and perform cross-TU analysis. There's a bare-bones prototype of this in the testsuite, which finds a double-free spanning two TUs; see:
However this is just a proof-of-concept at this stage (see the internal docs for more notes on its limitations).
User interface
-fanalyzer turns on all the warnings (it also enables the expensive traversal that they rely on). All of the warnings are of the form -Wanalyzer-name-of-warning e.g. -Wanalyzer-malloc-leak. They can be disabled individually via -Wno-analyzer-name-of-warning e.g. -Wno-analyzer-malloc-leak.
Rationale
There's benefit in integrating a checker directly into the compiler, so that
- the programmer can see the diagnostics as he or she works on the code, rather than at some later point. I think that if the analyzer can be made sufficiently fast that many people would opt-in to deeper but more expensive warnings. (I'm aiming for 2x compile time as my rough estimate of what's reasonable in exchange for being told up-front about various kinds of pointer snafu).
- the analyzer is working with precisely the code that's being compiled (avoiding preprocessor issues, supporting exactly the dialect/extensions of the languages that GCC supports, etc)
Correctness
The analyzer is neither sound nor complete, but does attempt to explore "interesting" paths through the code. There are bugs... (see the xfails and TODOs in the testsuite, and the "Limitations" section of the internal docs).
Performance
Using -fanalyzer roughly doubles the compile time on various testcases I've tried (krb5, zlib), but also sometimes takes a lot longer (again, see the "Limitations" section of the internal docs; there are bugs...).