GSoC: Working on the static analyzer

Mir Immad mirimnan017@gmail.com
Sat Jan 29 14:52:25 GMT 2022


Thank you for the detailed information.

I've been looking into the integer posix  file descriptor APIs and I
decided to write proof-of-concept  checker for them. (not caring about
errno). The checker tracks the fd returned by open(), warns if dup() is
called with closed fd otherwise tracks the fd returned by dup(), it also
warns if read() and write() functions were called on closed fd. I'm
attaching a text file that lists some c sources and warnings by the static
analyzer. I've used the diagnostic meta-data from sm-file. Is this
something that could also be added to the analyzer?

About the fd leak, that's the next thing I'll try to get working. Since
you've mentioned that it could be a GSoC project, this is what I'm going to
focus on.

Regards.



On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <dmalcolm@redhat.com> wrote:

> On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > Hi, sir.
> >
> > I've been trying to understand the static analyzer's code. I spent most
> > of
> > my time learning the state machine's API. I learned how state machine's
> > on_stmt is supposed to "recognize" specific functions and how
> > on_transition
> > takes a specific tree from one state to another, and how the captured
> > states are used by pending_diagnostics to report the errors.
> > Furthermore, I
> > was able to create a dummy checker that mimicked the behaviour of sm-
> > file's
> > double_fclose and compile GCC with these changes. Is this the right way
> > of
> > learning?
>
> This sounds great.
>
> >
> > As you've mentioned on the projects page that you would like to add
> > more
> > support for some POSIX APIs. Can you please write (or refer me to a) a
> > simple C program that uses such an API (and also what the analyzer
> > should
> > have done) so that I can attempt to add such a checker to the analyzer.
>
> A couple of project ideas:
>
> (i) treat data coming from a network connection as tainted, by somehow
> teaching the analyzer about networking APIs.  Ideally: look at some
> subset of historical CVEs involving network-facing attacks on user-
> space daemons, and find a way to detect them in the analyzer (need to
> find a way to mark the incoming data as tainted, so that the analyer
> "know" about the trust boundary - that the incoming data needs to be
> sanitized and treated with extra caution; see
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> for my attempts to do this for the Linux kernel).
>
> Obviously this is potentially a huge project, so maybe just picking a
> tiny subset and getting that working as a proof-of-concept would be a
> good GSoC project.  Maybe find an old CVE that someone has written a
> good write-up for, and think about "how could GCC/-fanalyzer have
> spotted it?"
>
> (ii) add leak-detection for POSIX file descriptors: i.e. the integer
> values returned by "open", "dup", etc.  It would be good to have a
> check that the user's code doesn't leak these values e.g. on error-
> handling paths, by failing to close a file-descriptor (and not storing
> it anywhere).  I think that much of this could be done by analogy with
> the sm-file.cc code.
>
>
> >
> > Also, I didn't realize the complexity of adding SARIF when I mentioned
> > it.
> > I'd rather work on adding more checkers.
>
> Fair enough.
>
> Hope this above is constructive.
>
> Dave
>
> >
> > Regards.
> >
> > Mir Immad
> >
> > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com> wrote:
> >
> > > Hi Sir,
> > >
> > > I've been trying to understand the static analyzer's code. I spent
> > > most of
> > > my time learning the state machine's API. I learned how state
> > > machine's
> > > on_stmt is supposed to "recognize" specific functions and takes a
> > > specific
> > > tree from one state to another, and how the captured states are used
> > > by
> > > pending_diagnostics to report the errors. Furthermore, I was able to
> > > create
> > > a dummy checker that mimicked the behaviour of sm-file's
> > > double_fclose and
> > > compile GCC with these changes. Is this the right way of learning?
> > >
> > > As you've mentioned on the projects page that you would like to add
> > > more
> > > support for some POSIX APIs. Can you please write (or refer me to a)
> > > a
> > > simple C program that uses such an API (and also what the analyzer
> > > should
> > > have done) so that I can attempt to add such a checker to the
> > > analyzer.
> > >
> > > Also, I didn't realize the complexity of adding SARIF when I
> > > mentioned it.
> > > I'd rather work on adding more checkers.
> > >
> > > Regards.
> > > Mir Immad
> > >
> > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <dmalcolm@redhat.com>
> > > wrote:
> > >
> > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > HI David,
> > > > > I've been tinkering with the static analyzer for the last few
> > > > > days. I
> > > > > find
> > > > > the project of adding SARIF output to the analyzer intresting.
> > > > > I'm
> > > > > writing
> > > > > this to let you know that I'm trying to learn the codebase.
> > > > > Thank you.
> > > >
> > > > Excellent.
> > > >
> > > > BTW, I think adding SARIF output would involve working more with
> > > > GCC's
> > > > diagnostics subsystem than with the static analyzer, since (in
> > > > theory)
> > > > all of the static analyzer's output is passing through the
> > > > diagnostics
> > > > subsystem - though the static analyzer is probably the only GCC
> > > > component generating diagnostic paths.
> > > >
> > > > I'm happy to mentor such a project as I maintain both subsystems
> > > > and
> > > > SARIF output would benefit both - but it would be rather tangential
> > > > to
> > > > the analyzer - so if you had specifically wanted to be working on
> > > > the
> > > > guts of the analyzer itself, you may want to pick a different
> > > > subproject.
> > > >
> > > > The SARIF standard is rather long and complicated, and we would
> > > > want to
> > > > be compatible with clang's implementation.
> > > >
> > > > It would be very cool if gcc could also accept SARIF files as an
> > > > *input* format, and emit them as diagnostics; that might help with
> > > > debugging SARIF output.   (I have a old patch for adding JSON
> > > > parsing
> > > > support to GCC that could be used as a starting point for this).
> > > >
> > > > Hope the above makes sense
> > > > Dave
> > > >
> > > > >
> > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > dmalcolm@redhat.com>
> > > > > wrote:
> > > > >
> > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> > > > > > > Hi everyone,
> > > > > >
> > > > > > Hi, and welcome.
> > > > > >
> > > > > > > I intend to work on the static analyzer. Are these documents
> > > > > > > enough to
> > > > > > > get
> > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > >
> > > > > >
> > > >
> > > >
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > There are also some high-level notes here:
> > > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > >
> > > > > > Also, given that the analyzer is part of GCC, the more general
> > > > > > introductions to hacking on GCC will be useful.
> > > > > >
> > > > > > I recommend creating a trivial C source file with a bug in it
> > > > > > (e.g.
> > > > > > a
> > > > > > 3-line function with a use-after-free), and stepping through
> > > > > > the
> > > > > > analyzer to get a sense of how it works.
> > > > > >
> > > > > > Hope this is helpful; don't hesitate to ask questions.
> > > > > > Dave
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
>
>
>
-------------- next part --------------
$ cat double-close.c

#include <fcntl.h>
#include <unistd.h>
void test()
{
  int fd = open("test.txt", O_RDONLY);
  close(fd);
  close(fd);
}

int main() {}

$ gcc-11.2.0 double-close.c -fanalyzer

double-close.c: In function ‘test’:
double-close.c:7:3: warning: double close on fd [-Wanalyzer-double-fclose]
    7 |   close(fd);
      |   ^~~~~~~~~
  ‘test’: events 1-3
    |
    |    5 |   int fd = open("test.txt", O_RDONLY);
    |      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |            |
    |      |            (1) opened here
    |    6 |   close(fd);
    |      |   ~~~~~~~~~ 
    |      |   |
    |      |   (2) first ‘close’ here
    |    7 |   close(fd);
    |      |   ~~~~~~~~~ 
    |      |   |
    |      |   (3) second ‘close’ was here; first ‘close’ was at (2)
    |

$ cat dup.c

#include <fcntl.h>
#include <unistd.h>

void test() {
	int fd = open("test.txt", O_RDONLY);
	close(fd);
	int fd2 = dup(fd);
}
int main(){}

$ gcc-11.2.0 dup.c -fanalyzer

dup.c: In function ‘test’:
dup.c:7:19: warning: dup on closed fd [-Wanalyzer-double-fclose]
    7 |         int fd2 = dup(fd);
      |                   ^~~~~~~
  ‘test’: events 1-3
    |
    |    5 |         int fd = open("test.txt", O_RDONLY);
    |      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                  |
    |      |                  (1) opened here
    |    6 |         close(fd);
    |      |         ~~~~~~~~~ 
    |      |         |
    |      |         (2) first ‘close’ here
    |    7 |         int fd2 = dup(fd);
    |      |                   ~~~~~~~
    |      |                   |
    |      |                   (3) dup on closed file-descriptor ‘fd’ was called here which was closed at (2)
    |

$ cat read-write.c

#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>

void test ()
{
	int fd = open ("text", O_RDONLY);
	close(fd);
	write(fd, "hello", 5); //write in closed fd
	char * s = (char *) malloc(2);
	read(fd, s, 2); //read on closed fd
	free(s);
}

void main(){}

$ gcc-11.2.0 read-write.c -fanalyzer

read-write.c: In function ‘test’:
read-write.c:9:9: warning: write on closed fd [-Wanalyzer-double-fclose]
    9 |         write(fd, "hello", 5); //write in closed fd
      |         ^~~~~~~~~~~~~~~~~~~~~
  ‘test’: events 1-3
    |
    |    7 |         int fd = open ("text", O_RDONLY);
    |      |                  ^~~~~~~~~~~~~~~~~~~~~~~
    |      |                  |
    |      |                  (1) opened here
    |    8 |         close(fd);
    |      |         ~~~~~~~~~ 
    |      |         |
    |      |         (2) first ‘close’ here
    |    9 |         write(fd, "hello", 5); //write in closed fd
    |      |         ~~~~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (3) write on closed file-descriptor ‘fd’ was called here which was closed at (2)
    |
read-write.c:11:9: warning: read on closed fd [-Wanalyzer-double-fclose]
   11 |         read(fd, s, 2); //read on closed fd
      |         ^~~~~~~~~~~~~~
  ‘test’: events 1-3
    |
    |    7 |         int fd = open ("text", O_RDONLY);
    |      |                  ^~~~~~~~~~~~~~~~~~~~~~~
    |      |                  |
    |      |                  (1) opened here
    |    8 |         close(fd);
    |      |         ~~~~~~~~~ 
    |      |         |
    |      |         (2) first ‘close’ here
    |......
    |   11 |         read(fd, s, 2); //read on closed fd
    |      |         ~~~~~~~~~~~~~~
    |      |         |
    |      |         (3) read on closed file-descriptor ‘fd’ was called here which was closed at (2)
    |


More information about the Gcc mailing list