[PATCH 00/49] RFC: Add a static analysis framework to GCC

Mon Dec 2 03:20:00 GMT 2019

On 11/16/19, Thomas Schwinge <thomas@codesourcery.com> wrote:
> Hi David!
>
> On 2019-11-15T20:22:47-0500, David Malcolm <dmalcolm@redhat.com> wrote:
>> This patch kit
>
> (I have not looked at the patches.)  ;-)
>
>> introduces a static analysis pass for GCC that can diagnose
>> various kinds of problems in C code at compile-time (e.g. double-free,
>> use-after-free, etc).
>
> Sounds great from the description!
>
>
> Would it make sense to add to the wiki page
> <https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer> a (high-level)
> comparison to other static analyzers (Coverity, cppcheck,
> clang-static-analyzer, others?), in terms of how they work, what their
> respective benefits are, what their design goals are, etc.  (Of course
> understanding that yours is much less mature at this point; talking about
> high-level design rather than current implementation status.)
>
> For example, why do we want that in GCC instead of an external tool -- in
> part covered in your Rationale.

There are a lot of bug reports open for requests for warnings that
this analyzer could solve. Users clearly want this in GCC, or else
they wouldn't keep making these requests.

> Can a compiler-side implementation
> benefit from having more information available than an external tool?
> GCC-side implementation is readily available (modulo GCC plugin
> installation?) vs. external ones need to be installed/set up first.
> GCC-side one only works with GCC-supported languages.  GCC-side one
> analyzes actual code being compiled -- thinking about preprocessor-level
> '#if' etc., which surely are problematic for external tools that are not
> actually replicating a real build.  And so on.  (If you don't want to
> spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
> compare yours to external tools.)
>
> Just an idea, because I wondered about these things.
>
>
>> The analyzer runs as an IPA pass on the gimple SSA representation.
>> It associates state machines with data, with transitions at certain
>> statements and edges.  It finds "interesting" interprocedural paths
>> through the user's code, in which bogus state transitions happen.
>>
>> For example, given:
>>
>>    free (ptr);
>>    free (ptr);
>>
>> at the first call, "ptr" transitions to the "freed" state, and
>> at the second call the analyzer complains, since "ptr" is already in
>> the "freed" state (unless "ptr" is NULL, in which case it stays in
>> the NULL state for both calls).
>>
>> Specific state machines include:
>> - a checker for malloc/free, for detecting double-free, resource leaks,
>>   use-after-free, etc (sm-malloc.cc), and
>
> I can immediately see how this can be useful for a bunch of
> 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as source
> code directives.  ..., and this would've flagged existing code in the
> libgomp OpenACC tests, which recently has given me some grief. Short
> summary/examples:
>
> In addition to host-side 'malloc'/'free', there is device-side (separate
> memory space) 'acc_malloc'/'acc_free'.  Static checking example: don't
> mix up host-side and device-side pointers.  (Both are normal C/C++
> pointers.  Hmm, maybe such checking could easily be implemented even
> outside of your checker by annotating the respective function
> declarations with an attribute describing which in/out arguments are
> host-side vs. device-side pointers.)
>
> Then, there are functions to "map" host-side and device-side memory:
> 'acc_map_data'/'acc_unmap_data'.  Static checking example: you must not
> 'acc_free' memory spaces that are still mapped.
>
> Then, there are functions like 'acc_create' (or equivalent directives
> like '#pragma acc create') doing both 'acc_malloc', 'acc_map_data'
> (plus/depending on internal reference counting).  Static checking
> example: for a pointer returned by 'acc_create" etc., you must use
> 'acc_delete' etc. instead of 'acc_free', which first does
> 'acc_unmap_data' before interal 'acc_free' (..., and again all that
> depending on reference counting).  (Might be "interesting" to teach your
> checker about the reference counting -- if that is actually necessary;
> needs further thought.)
>
>
>> The checker is implemented as a GCC plugin.
>>
>> The patch kit adds support for "in-tree" plugins i.e. GCC plugins that
>> would live in the GCC source tree and be shipped as part of the GCC
>> tarball,
>> with a new:
>>   --enable-plugins=[LIST OF PLUGIN NAMES]
>> configure option, analogous to --enable-languages (the Makefile/configure
>> machinery for handling in-tree GCC plugins is adapted from how we support
>> frontends).
>
> I like that.  Implementing this as a plugin surely must help to either
> document the GCC plugin interface as powerful/mature for such a task.  Or
> make it so, if it isn't yet.  ;-)

Nick Clifton was bringing this up as a point in his talk on his
annobin plugin at Cauldron; this should make him happy.

>
>> The default is for no such plugins to be enabled, so the default would
>> be that the checker isn't built - you'd have to opt-in to building it,
>> with --enable-plugins=analyzer
>
> I'd favor a default of '--enable-plugins=default' which enables the
> "usable" plugins.

Agreed.

>
>
>> It's not clear to me whether I should focus on:
>>
>> (a) pruning the scope of the checker so that it works well on
>> *intra*procedural C examples (and bail on anything more complex), perhaps
>> targetting GCC 10 as an optional extra hidden behind
>> --enable-plugins=analyzer, or
>>
>> (b) work on deeper interprocedural analysis (and fixing efficiency issues
>> with this).
>
> I personally would be happy to see (a) happen now (without "optional
> extra hidden behind" configure-time flag but rather hidden behind GCC
> command-line flag, '-fanalyze'?), and then (b) later on.  As
> always, doing the incremental thing, (a) first, then (b) later, would
> give it more exposure in the wild, which should help to identify design
> issues etc. now instead of much later, for example.
>
>
>> Thoughts?
>
> One very high-level item: you're using the very generic name "analyzer"
> ('-Wanalyzer', 'gcc/analyzer/' filenames, for example, or the '-fanalyze'
> I just proposed).  Might there be potential for confusion (now, or in the
> future) what kind of analyzer that is, or is it safe to assume that in
> context of a compiler, an analyzer is always a compile-time, static code
> analyzer?  After all, the existing run-time ones are known as "checkers":
> '-fstack-check', or "sanitizers": '--fsanitize, or "verifiers":
> '-fvtable-verify'.
>
>
> Grüße
>  Thomas
>