Static Analyzer project
Status
Initial implementation was added in GCC 10; major rewrite occurred in GCC 11.
Only C is currently supported (I hope to support C++ in GCC 14, but it is out-of-scope for GCC 13)
User-facing documentation: prebuilt HTML
Internal documentation: prebuilt HTML
Git branch with some additional material: devel/analyzer (though this is now a long way behind "master" in other areas)
Integration tests for -fanalyzer: https://github.com/davidmalcolm/gcc-analyzer-integration-tests
Bugs relating to the analyzer
Also: RFEs for new GCC warnings, many of which might be analyzer-related.
See also Summer of Code project ideas
History
GCC 13 (under development; currently adding 20 new warnings, for a total of 47):
2023-02-21: -fanalyzer now stops exploring executions paths after certain warnings to prevent noisy cascades of diagnostics.
2023-01-18: Created integration test suite for -fanalyzer
2022-11-23: Committed revamp of how the analyzer tracks heap-allocated regions
- 2022-11-15:
Committed two new warnings relating to sockets:
- -Wanalyzer-fd-phase-mismatch (e.g. calling 'accept' on a socket before calling 'listen' on it)
- -Wanalyzer-fd-type-mismatch (e.g. using a stream socket operation on a datagram socket)
Committed support for named constants
2022-11-13: New warning: -Wanalyzer-tainted-assertion
2022-11-11: New warning: -Wanalyzer-infinite-recursion
2022-11-10: New warning: -Wanalyzer-deref-before-check
2022-11-07: Posted patch to add warnings relating to sockets (waiting on named constants patch)
2022-11-07: Started adding support for errno
2022-11-03: Use std::unique_ptr in many places internally
2022-10-31: Posted patch for supporting named constants (not yet approved)
2022-10-04: Revamp of call summarization
2022-09-16: Talk at GNU Tools Cauldron 2022: What’s new in GCC -fanalyzer ?
2022-09-14: Talk at LPC 2022: GCC's -fanalyzer and the Linux kernel
- 2022-09-09:
Added support for plugins to supply behaviors of known functions
Added new warning -Wanalyzer-exposure-through-uninit-copy and proof-of-concept GCC plugin for using it with Linux kernel
2022-09-08: Tim Lange (GSoC) generalized -Wanalyzer-out-of-bounds to also handle various cases of symbolic values for offsets and capacities
2022-08-18: New warning: -Wanalyzer-imprecise-fp-arithmetic (implemented by Tim Lange as part of GSoC)
2022-08-12: Tim Lange (GSoC) committed initial implementation of -Wanalyzer-out-of-bounds (initially limited to just those cases where the offsets and capacities are constants)
2022-08-05: New warning: -Wanalyzer-jump-through-null
2022-08-02: Immad Mir (GSoC) added support for creat, dup, dup2 and dup3 to the file descriptor analysis.
2022-07-28: New warning: -Wanalyzer-putenv-of-auto-var
2022-07-23: Immad Mir (GSoC) implemented three new attributes for use on functions that work with file descriptors
2022-07-20: Experimented with combining -fanalyzer and gccrs to detect bugs in unsafe rust code.
2022-07-02: Tim Lange (GSoC) implemented -Wanalyzer-allocation-size
2022-07-02: Immad Mir (GSoC) implemented five new warnings, relating to misuses of file descriptors:
- -Wanalyzer-fd-access-mode-mismatch
- -Wanalyzer-fd-double-close
- -Wanalyzer-fd-leak
- -Wanalyzer-fd-use-after-close
- -Wanalyzer-fd-use-without-check
2022-06-24: Reimplemented call_string class
2022-06-22: Posted experimental patches for replay of serialized diagnostics (including analyzer warnings)
2022-06-15: Implemented fixups to how -fanalyzer emits execution paths in the face of inlined functions
2022-06-02: Implemented SARIF output for GCC diagnostics (including analyzer warnings)
2022-05-16: Implemented four new warnings, relating to misuses of <stdarg.h>:
- -Wanalyzer-va-arg-type-mismatch
- -Wanalyzer-va-list-exhausted
- -Wanalyzer-va-list-leak
- -Wanalyzer-va-list-use-after-va-end
GCC 12 (added 5 more warnings, for a total of 27):
2022-04-12: Blog post: The state of static analysis in the GCC 12 compiler
2022-03-24: Sped up -fanalyzer on a particularly slow Linux kernel source file, reducing wallclock time of cc1 from 254 seconds (~4 minutes) to 36 seconds (compared to 19 seconds without -fanalyzer)
2022-03-10: -Wanalyzer-write-to-const and -Wanalyzer-write-to-string-literal now respect __attribute__((access, write))
2022-02-23: Analyzer now handles __attribute__ ((const))
2022-01-14: Blog post about using the analyzer on D code
2022-01-13: Add __attribute__ ((tainted_args))
2022-01-12: Extended -Wanalyzer-tainted-size to use the "access" attribute
2021-11-13: Patches posted for adding "trust boundaries" to GCC
2021-11-13: Added four new taint-based warnings:
- -Wanalyzer-tainted-allocation-size
- -Wanalyzer-tainted-divisor
- -Wanalyzer-tainted-offset
- -Wanalyzer-tainted-size
2021-09-23: Session about -fanalyzer and kernel at LPC 2021
2021-09-20: Talk about -fanalyzer at GNU Tools track of LPC 2021
video (starts at 0:33:38)
2021-08-30: Rewrite of realloc handling, supporting analysis path "bifurcation"
2021-08-23: Rewrite of switch handling
2021-08-23: Report by Ankur Saini on his GSoC project to implement C++ vfunc calls in analyzer (and other dynamic dispatch)
2021-08-04: Initial implementation of asm support
2021-07-15: Reimplementation of -Wanalyzer-use-of-uninitialized-value for GCC 12
2021-06-30: Rewrite of memory state tracking needed for reimplementing tracking of uninitialized values
2021-06-15: Initial implementation of tracking sizes of dynamic allocations
2021-06-08: Fixed bitfield handling
GCC 11 (added 7 more warnings, for 22 total):
2021-04-08: Fixes to leak detection
2021-03-11: Reimplemented how the analyzer finds the shortest feasible path for each diagnostic fixing various false negatives
2021-01-29: Simplified compound conditionals in analyzer paths
2021-01-28: Blog post: Static analysis updates in GCC 11
2021-01-18: Added support to analyzer for "malloc" attribute extending the malloc/free checking in the analyzer to cover arbitrary allocator/deallocator pairs
2020-11-11: Improvements to -Wanalyzer-stale-setjmp-buffer
2020-11-11: Added -Wanalyzer-shift-count-negative and -Wanalyzer-shift-count-overflow
2020-11-10: v2 of -fdiagnostics-path-format=html
2020-10-28: Committed various fixes for non-determinism in the analyzer 1 2 3 4
2020-10-22: [PATCH/RFC] Add -fdiagnostics-path-format=html
2020-10-14: Posted patch to add plugin support to -fanalyzer, with example of checking for CPython GIL errors
2020-10-12: Added -Wanalyzer-write-to-const and -Wanalyzer-write-to-string-literal
2020-10-05: Posted RFC: add "deallocated_by" attribute for use by analyzer
2020-09-23: Added -fno-analyzer-feasibility debug option
2020-09-22: Added -fdump-analyzer-json debug option
2020-09-09: Generalized malloc/free checking to new/delete (to the extent that this can be done, given how much support for C++ is missing)
2020-08-25: Talk at GNU Tools Track of LPC 2020: "GCC’s -fanalyzer option":
2020-08-13: Major rewrite of how state is tracked within the analyzer fixing numerous bugs and simplifying the implementation
GCC 10 (initial release, with 15 new warnings):
2020-04-28: Removal of -Wanalyzer-use-of-uninitialized-value for GCC 10
2020-04-21: First CVE found using -fanalyzer, CVE-2020-1967
2020-03-26: Blog post: Static analysis in GCC 10
- 2020-01-15:
Updated branch from dmalcolm/analyzer to devel/analyzer
- 2020-01-14:
Initial commit of analyzer to GCC master branch for GCC 10, adding 15 warnings:
- -Wanalyzer-double-fclose
- -Wanalyzer-double-free
- -Wanalyzer-exposure-through-output-file
- -Wanalyzer-file-leak
- -Wanalyzer-free-of-non-heap
- -Wanalyzer-malloc-leak
- -Wanalyzer-possible-null-argument
- -Wanalyzer-possible-null-dereference
- -Wanalyzer-null-argument
- -Wanalyzer-null-dereference
- -Wanalyzer-stale-setjmp-buffer
- -Wanalyzer-tainted-array-index
- -Wanalyzer-use-after-free
- -Wanalyzer-use-of-pointer-in-stale-stack-frame
- -Wanalyzer-use-of-uninitialized-value
2020-01-10: diagnostic_path support committed to trunk (r280142)
2020-01-09: v6 of analyzer patch kit
- 2020-01-08: v5 of patches:
2020-01-02: Avoid printing redundant data when printing diagnostic paths
2019-12-19: Add support for tracking sets of functions; add -Wanalyzer-use-of-closed-file
2019-12-18: CWE support committed to trunk
2019-12-17: Fixed false positives seen with reproducer for CVE-2005-1689
- 2019-12-13: v4 of patches:
- 2019-12-10:
"analyzer" component and "analyzer branch" version added to GCC bugzilla
2019-12-08: Converted the analyzer to be built in to the compiler, rather than a plugin
Git branch analyzer-v3-unsquashed
2019-12-04: Added check for unsafe calls within signal handlers (screenshot)
2019-11-27: Reworking of command-line options (from --analyzer to -fanalyzer and -fno-analyzer)
- 2019-11-19: v2 of patches (rebased; LTO fixes):
Git branch: dmalcolm/analyzer-v2
- 2019-11-15: Initial proof-of-concept posted to gcc-patches:
Implementation overview
This project introduces a static analysis pass for GCC that can diagnose various kinds of problems in C code at compile-time (e.g. double-free, use-after-free, etc).
The analyzer runs as an IPA pass on the gimple SSA representation. It associates state machines with data, with transitions at certain statements and edges. It finds "interesting" interprocedural paths through the user's code, in which bogus state transitions happen.
For example, given:
free (ptr); free (ptr);
at the first call, ptr transitions to the "freed" state, and at the second call the analyzer complains, since ptr is already in the "freed" state (unless ptr is NULL, in which case it stays in the NULL state for both calls).
Specific state machines include:
- a checker for malloc/free (for detecting double-free, resource leaks, use-after-free, etc).
- a checker for stdio's FILE stream API (for detecting double-fclose, leaks, etc)
a checker for detecting uses of async-signal-unsafe functions from within a signal handler (CWE-479).
A visualization of the malloc state machine can be seen at https://dmalcolm.fedorapeople.org/gcc/2019-11-22/sm-malloc.png
There are also two state-machine-based checkers that are just proof-of-concept at this stage:
a checker for tracking exposure of sensitive data (e.g. writing passwords to log files aka CWE-532), and
a checker for tracking "taint", where data potentially under an attacker's control is used without sanitization for things like array indices (CWE-129).
There's a separation between the state machines and the analysis engine, so it ought to be relatively easy to add new warnings.
For any given diagnostic emitted by a state machine, the analysis engine generates the simplest feasible interprocedural path of control flow for triggering the diagnostic. The patch kit adds support to GCC's diagnostic subsystem for associating such a "diagnostic_path" with a diagnostic.
The analyzer itself is implemented as an interprocedural pass for GCC. It is off by default, and must be enabled via -fanalyzer. It can be disabled altogether at configure time when building GCC via --disable-analyzer.
To mitigate feature creep, I've been focusing on implementing double-free detection, albeit with an eye to building something that can be developed into a more fully-featured static analyzer. For example, I haven't yet attempted to track buffer overflows in this version, but I believe that that could be added on top of this foundation.
More details of the internals can be seen in the documentation (prebuilt HTML)
Diagnostic Paths
The patch kit also expands GCC's diagnostic subsystem in various ways:
(a) adding the ability to associate a "diagnostic path" with a diagnostic, describing a sequence of events predicted by the compiler that lead to the problem occurring, with their locations in the user's source, and text descriptions.
For example, the following warning has a 6-event interprocedural path:
malloc-ipa-8-unchecked.c: In function 'make_boxed_int': malloc-ipa-8-unchecked.c:21:13: warning: dereference of possibly-NULL 'result' [CWE-690] [-Wanalyzer-possible-null-dereference] 'make_boxed_int': events 1-2 | | 18 | make_boxed_int (int i) | | ^~~~~~~~~~~~~~ | | | | | (1) entry to 'make_boxed_int' | 19 | { | 20 | boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int)); | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (2) calling 'wrapped_malloc' from 'make_boxed_int' | +--> 'wrapped_malloc': events 3-4 | | 7 | void *wrapped_malloc (size_t size) | | ^~~~~~~~~~~~~~ | | | | | (3) entry to 'wrapped_malloc' | 8 | { | 9 | return malloc (size); | | ~~~~~~~~~~~~~ | | | | | (4) this call could return NULL | <------+ | 'make_boxed_int': events 5-6 | | 20 | boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int)); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (5) possible return of NULL to 'make_boxed_int' from 'wrapped_malloc' | 21 | result->i = i; | | ~~~~~~~~~~~~~ | | | | | (6) 'result' could be NULL: unchecked value from (4) |
The diagnostic-printing code has consolidated the path into 3 runs of events (where the events are near each other and within the same function), using ASCII art to show the interprocedural call and return.
A colorized version of the above can be seen at:
Other examples can be seen at:
and:
An example of detecting a historical double-free CVE can be seen at:
The support for associating diagnostic paths with a diagnostic was committed to trunk on 2020-01-10 as r280142.
(b) adding the ability to associate additional metadata with a diagnostic. The only such metadata added by the patch kit are CWE classifications (for the new warnings), so that we can emit e.g.:
malloc-1.c: In function ‘test_42a’: malloc-1.c:466:1: warning: leak of ‘p’ [CWE-401] [-Wanalyzer-malloc-leak] 463 | void *p = malloc (1024); | ^~~~~~~~~~~~~ | | | (1) allocated here ...... 466 | } | ~ | | | (2) ‘p’ leaks here; was allocated at (1)
The CWE support was committed to trunk as r279556 on 2019-12-18.
Scope
The analyzer itself is implemented as an interprocedural pass for GCC. It is off by default, and must be enabled via -fanalyzer. It can be disabled altogether at configure time when building GCC via --disable-analyzer.
Earlier versions of the patch kit implemented the analyzer via a GCC plugin and implemented support for "in-tree" plugins i.e. GCC plugins that would live in the GCC source tree and be shipped as part of the GCC tarball, but that idea was dropped in v3 to simplify things.
To mitigate feature creep, I've been focusing on implementing double-free detection, albeit with an eye to building something that can be developed into a more fully-featured static analyzer. For example, I haven't yet attempted to track buffer overflows in this version, but I believe that that could be added on top of this foundation.
Many projects implement some kind of wrapper around malloc and free, so there is enough interprocedural support to cope with that, but only very primitive support for summarizing larger functions and planning/performing an efficient interprocedural analysis on non-trivial functions that have state-machine effects.
In theory the analyzer can work with LTO, and perform cross-TU analysis. There's a bare-bones prototype of this in the testsuite, which finds a double-free spanning two TUs; see:
However this is just a proof-of-concept at this stage (see the internal docs for more notes on its limitations).
User interface
-fanalyzer turns on all the warnings (it also enables the expensive traversal that they rely on). All of the warnings are of the form -Wanalyzer-name-of-warning e.g. -Wanalyzer-malloc-leak. They can be disabled individually via -Wno-analyzer-name-of-warning e.g. -Wno-analyzer-malloc-leak.
Rationale
There's benefit in integrating a checker directly into the compiler, so that
- the programmer can see the diagnostics as he or she works on the code, rather than at some later point. I think that if the analyzer can be made sufficiently fast that many people would opt-in to deeper but more expensive warnings. (I'm aiming for 2x compile time as my rough estimate of what's reasonable in exchange for being told up-front about various kinds of pointer snafu).
- the analyzer is working with precisely the code that's being compiled (avoiding preprocessor issues, supporting exactly the dialect/extensions of the languages that GCC supports, etc)
Correctness
The analyzer is neither sound nor complete, but does attempt to explore "interesting" paths through the code. There are bugs... (see the xfails and TODOs in the testsuite, and the "Limitations" section of the internal docs).
Performance
Using -fanalyzer roughly doubles the compile time on various testcases I've tried (krb5, zlib), but also sometimes takes a lot longer (again, see the "Limitations" section of the internal docs; there are bugs...).