This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH 00/22] RFC: integrated 3rd-party static analysis support
- From: Eric Gallager <egall at gwmail dot gwu dot edu>
- To: David Malcolm <dmalcolm at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 4 Aug 2017 21:00:22 -0400
- Subject: Re: [PATCH 00/22] RFC: integrated 3rd-party static analysis support
- Authentication-results: sourceware.org; auth=none
- References: <email@example.com>
On 8/4/17, David Malcolm <firstname.lastname@example.org> wrote:
> This patch kit clearly isn't ready yet as-is (see e.g. the
> "known unknowns" below), but I'm posting it now in the hope of
> getting early feedback.
> This patch kit provides an easy way to make integrate 3rd-party static
> analysis tools into gcc, and have them:
> (a) report through gcc's diagnostic subsystem, and
> (b) "watermark" the generated binaries with queryable data on what checkers
> were run, and what the results were.
> Here's an example showing gcc running a bank of 3rd-party checkers on this
> source file:
> #include <stdlib.h>
> void test ()
> void *ptr_1;
> void *ptr_2;
> ptr_1 = malloc (64);
> if (!ptr_1)
> ptr_2 = malloc (64);
> if (!ptr_2)
> free (ptr_2);
> free (ptr_1);
> via a simple command-line:
> $ ./xgcc -B. -c conditional-leak.c -Wrun-analyzers=policy.json
> conditional-leak.c:13:5: error: Potential leak of memory pointed to by
> 'ptr_1' [clang-analyzer:Memory leak]
> conditional-leak.c:8:11: note: state 1 of 4: Memory is allocated
> ptr_1 = malloc (64);
> conditional-leak.c:9:7: note: state 2 of 4: Assuming 'ptr_1' is non-null
> if (!ptr_1)
> conditional-leak.c:12:7: note: state 3 of 4: Assuming 'ptr_2' is null
> if (!ptr_2)
> conditional-leak.c:13:5: note: state 4 of 4: Potential leak of memory
> pointed to by 'ptr_1'
> conditional-leak.c:13:0: error: Memory leak: ptr_1 [cppcheck:memleak]
> Of the checkers, clang's static analyzer and cppcheck both identify the
> memory leak; the former also identifies the control flow (the other
> checkers didn't report anything).
> The idea is to provide a mechanism to make it easy for developers and
> projects to impose policy on what checkers should be run, and to gate
> the build if certain tests fail.
> In this case, the results are treated as hard errors and block the build,
> but policy could allow them to be warnings.
> Extensive metadata is captured about what checkers were run, and what
> they emitted, using the "Firehose" interchange format:
> In the case where this doesn't block the build, this can be queried via a
> script, so e.g. you can verify that a setuid binary was indeed compiled
> using all the checkers that you expect it to be.
> This can also be used to embed data about the code into the watermark.
> For example, checkers/ianal.py embeds information about "Copyright"
> lines in the source code into the generated binaries, from where it
> can be queried (this example is intended as a proof-of-concept rather
> than as a real license-tracking solution...)
> Statement of the problem
> Static analysis is IMHO done too late, if at all: static analysis tools are
> as an optional extra, "on the side", rather than in developers' normal
> workflow, with some kind of "override the compiler and do extra work" hook,
> which may preclude running more than one analyzer at once. Analysis
> are reviewed (if at all) in some kind of on-the-side tool, rather than when
> code is being edited, or patches being prepared.
> It would be better to have an easy way for developers to run analyzer(s)
> as they're doing development, as part of their edit-compile-test cycle
> - analysis problems are reported immediately, and can be acted on
> immediately (e.g. by treating some checker tests as being hard errors).
> It would also be good to have a way to run analyzer(s) when packages are
> built, with a variety of precanned policies for analyzers. For example,
> setuid binaries and network-facing daemons could each be built with a
> higher strictness of checking.
> It would also be good to tag binaries with information on what analyzers
> were run, what options they were invoked with, etc.
> Potentially have "dump_file" information from optimization passes stored
> in the metadata also. Have a tool to query all of this.
> This way a distribution can perform a query like:
> "show me all setuid binaries that contain code that wasn't checked
> with $CHECKER with $TEST set to be a hard error"
> Can/should we break the build if there are issues?
> Yes: but have a way to opt-in easily: if the tool is well-integrated with
> compiler: e.g.
> then upstream developers and packagers can turn on the setting, and see
> breaks, and fix it naturally within an compile-edit-test cycle
> This gives a relatively painless way to opt-in to increasing levels of
> strictness (e.g. by an upstream project, or by an individual developer).
> Does this slow the build down?
> Yes: but you can choose which analyzers run, and can choose to turn them
> It ought to parallelize well. I believe users will prefer to turn them on,
> and have builders burn up the extra CPU cycles.
> This may make much more sense for binary distributions (e.g. Fedora,
> that it does for things like Gentoo.
> Example policy files/options might be:
> or whatnot.
> Idea is to provide mechanism, and for the distribution to decide on some
> standard policies.
> This may also allow us to sandbox a gcc plugin by running the plugin inside
> another cc1, for plugins that add warnings - if the plugin ICEs, then the
> cc1 isn't affected (useful for doing mass rebuilds of code using an
> experimental plugin).
> Known unknowns
> How does one suppress a specific false-positive site?
> Do we need a pragma for it? (though pragmas ought to already affect some
> the underlying checkers...)
> Do we really want .json for the policy format?
> If we're expecting users to edit this, we need great error messages,
> and probably support for comments. Would YAML or somesuch be better?
> Or have them as individual command-line flags, and the policy files are
> "@" files for gcc.
> How to mark which checkers are appropriate for which languages?
> (etc; see also all the FIXMEs in the code...)
> The "checkers" subdirectory uses Python 2 or 3, and has a few Python
> dependencies, including "firehose" and "gccinvocation".
> How it works
> If enabled, toplev.c starts each of the various checkers from separate
> threads from near the start of toplev.c, so that the checkers run in
> parallel with each other, and with the bulk of cc1. Near the end of
> toplev.c it waits for each thread to finish, and reads the stdout,
> which is expected to be in Firehose JSON format. This is then sent
> through the diagnostic subsystem.
> Each "checker" is a harness script, which "knows" how to invoke
> the particular 3rd-party tool, and coerce the output from the tool
> into the common JSON format.
> Some notes on the data model can be seen here:
> (though that's expressed as Python objects and XML, rather than
> the JSON format).
> Successfully bootstrapped®rtested the combination of the patches
> on x86_64-pc-linux-gnu (though the only testcases are selftest based
> unit-tests, rather than DejaGnu tests).
1. When bootstrapping, did you try adding the new -Wrun-analyzers to
the build system to see what it reports for GCC's own source code?
It'd be worthwhile to do some dogfooding to determine what kind of
results it produces
2. Since -Wrun-analyzers is a warning option, can I turn messages from
it into errors with -Werror=run-analyzers? Will it work with #pragma
GCC diagnostic push/pop?
3. Do we care about duplicated warnings between the analyzers and GCC
at all? I'm just thinking, if I put in a bug report requesting a new
warning for GCC after -Wrun-analyzers is added, will people just close
it saying "Oh you can already get that with -Wrun-analyzers" or
something? That would be disappointing.
4. Along those lines, how responsible exactly will GCC be for issues
with the analyzers it runs? For example, I know splint (from [20/22])
at least is pretty buggy, it crashed 5 times on me when running it
manually over my fork of gdb. Does GCC really want to encourage the
use of potentially buggy external tools that might lead to issues?
...I thought I had a 5th one but I forget... Anyways, I like the idea
overall! Keep up the good work!
> David Malcolm (22):
> Expose assert_loceq outside of input.c; add ASSERT_LOCEQ
> libcpp: add linemap_position_for_file_line_and_column
> Add JSON implementation
> Add firehose.h/cc
> diagnostic.c/h: add support for external tools
> Makefile.in: hack in -lpthread
> Add minimal version of Nick Clifton's annobin code
> Add GNU_BUILD_ATTRIBUTE_STATIC_ANALYSIS to annobin.h
> Add selftest::read_file (..., FILE *, ...)
> Add checkers.h/cc
> Add checkers/test-sources
> Add -Wrun-analyzers= to common.opt, toplev.c, and invoke.texi
> Add checkers/checker.py
> Add checkers/always_fails.py
> Add checkers/clang_analyzer.py
> Add checkers/coverity.py
> Add checkers/cppcheck.py
> Add checkers/flawfinder.py
> Add checkers/ianal.py
> Add checkers/splint.py
> Add checkers/Makefile
> Add contrib/get-static-analysis.py
> checkers/ChangeLog | 9 +
> checkers/Makefile | 23 +
> checkers/always_fails.py | 57 +
> checkers/checker.py | 367 ++++
> checkers/clang_analyzer.py | 145 ++
> checkers/coverity.py | 141 ++
> checkers/cppcheck.py | 138 ++
> checkers/flawfinder.py | 124 ++
> checkers/ianal.py | 79 +
> checkers/splint.py | 77 +
> checkers/test-sources/conditional-leak.c | 17 +
> checkers/test-sources/cpychecker-demo.c | 110 ++
> checkers/test-sources/divide-by-zero.c | 4 +
> checkers/test-sources/harmless.c | 9 +
> checkers/test-sources/multiple-1.c | 6 +
> checkers/test-sources/multiple-2.c | 9 +
> checkers/test-sources/out-of-bounds.c | 6 +
> checkers/test-sources/read-through-null.c | 4 +
> checkers/test-sources/return-of-stack-address.c | 6 +
> checkers/test-sources/unconditional-file-leak.c | 10 +
> contrib/get-static-analysis.py | 47 +
> gcc/Makefile.in | 7 +-
> gcc/annobin.cc | 185 ++
> gcc/annobin.h | 45 +
> gcc/checkers.cc | 736 ++++++++
> gcc/checkers.h | 26 +
> gcc/common.opt | 4 +
> gcc/diagnostic-show-locus.c | 29 +-
> gcc/diagnostic.c | 85 +-
> gcc/diagnostic.h | 5 +
> gcc/doc/invoke.texi | 8 +-
> gcc/firehose.cc | 709 ++++++++
> gcc/firehose.h | 199 ++
> gcc/input.c | 71 +-
> gcc/json.cc | 1914
> gcc/json.h | 214 +++
> gcc/selftest-diagnostic.h | 62 +
> gcc/selftest-input.h | 54 +
> gcc/selftest-run-tests.c | 3 +
> gcc/selftest.c | 16 +-
> gcc/selftest.h | 10 +
> .../checker-output/test-clang-analyzer.json | 122 ++
> .../selftests/checker-output/test-cppcheck.json | 50 +
> .../selftests/checker-output/test-failure.json | 38 +
> .../selftests/checker-policy/test-policy.json | 7 +
> gcc/toplev.c | 9 +
> libcpp/include/line-map.h | 9 +
> libcpp/line-map.c | 51 +
> 48 files changed, 6001 insertions(+), 55 deletions(-)
> create mode 100644 checkers/ChangeLog
> create mode 100644 checkers/Makefile
> create mode 100755 checkers/always_fails.py
> create mode 100755 checkers/checker.py
> create mode 100755 checkers/clang_analyzer.py
> create mode 100644 checkers/coverity.py
> create mode 100755 checkers/cppcheck.py
> create mode 100755 checkers/flawfinder.py
> create mode 100755 checkers/ianal.py
> create mode 100755 checkers/splint.py
> create mode 100644 checkers/test-sources/conditional-leak.c
> create mode 100644 checkers/test-sources/cpychecker-demo.c
> create mode 100644 checkers/test-sources/divide-by-zero.c
> create mode 100644 checkers/test-sources/harmless.c
> create mode 100644 checkers/test-sources/multiple-1.c
> create mode 100644 checkers/test-sources/multiple-2.c
> create mode 100644 checkers/test-sources/out-of-bounds.c
> create mode 100644 checkers/test-sources/read-through-null.c
> create mode 100644 checkers/test-sources/return-of-stack-address.c
> create mode 100644 checkers/test-sources/unconditional-file-leak.c
> create mode 100644 contrib/get-static-analysis.py
> create mode 100644 gcc/annobin.cc
> create mode 100644 gcc/annobin.h
> create mode 100644 gcc/checkers.cc
> create mode 100644 gcc/checkers.h
> create mode 100644 gcc/firehose.cc
> create mode 100644 gcc/firehose.h
> create mode 100644 gcc/json.cc
> create mode 100644 gcc/json.h
> create mode 100644 gcc/selftest-diagnostic.h
> create mode 100644 gcc/selftest-input.h
> create mode 100644
> create mode 100644
> create mode 100644
> create mode 100644 gcc/testsuite/selftests/checker-policy/test-policy.json