[RFC] analyzer: allocation size warning

David Malcolm dmalcolm@redhat.com
Fri Jun 17 22:13:17 GMT 2022


On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:

[...snip...]

> > 
> 
> I have resent the patch using git send-email as a reply to my original
> message. 
> The new message looks properly formatted in the archive:
>     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html

Thanks; that's *much* more readable.


[...snip...]

> > 
> > 
> > 
> > On symbolic buffer sizes:
> > warning: Allocated buffer size is not a multiple of the pointee's
> > size 
> > [CWE-131] [-Wanalyzer-allocation-size]
> >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 }
> > */
> >       | ^~~~~~~~~~~~~~~~~~~~~~~~
> >   ‘test_3’: event 1
> >     |
> >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3
> > } 
> > */
> >     | | ^~~~~~~~~~~~~~~~~~~~~~~~
> >     | | |
> >     | | (1) Allocation is incompatible with ‘int *’; either the 
> > allocated size is bogus or the type on the left-hand side is wrong
> >     |
> > 
> > 
> > Is there location information for both the malloc and for the
> > assignment, here?
> 
> I'm not sure whether I understand your question but the warning is 
> emitted at the gcall* with a ssa var lhs and the call_fndecl on the
> rhs.
> I think that is enough to split that up into "(1) n + sizeof(int) 
> allocated here" and "(2) Allocation at (1) is incompatible with..."? 

Probably, yes.

FWIW I wrote some more notes about the events in my reply to to your
reply to Prathamesh, here:
  https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html

[...snip...]

> > 
> > There are some things to discuss from my side:
> > * The tests with the "toy re-implementation of CPython's object 
> > model"[2] fail due to a extra warning emitted. Because the analyzer
> > can't know the calculation actually results in a correct buffer size 
> > when viewed as a string_obj later on, it emits a warning, e.g. at
> > line 
> > 61 in data-model-5.c. The only mitigation would be to disable the 
> > warning for structs entirely. Now, the question is to rather have
> > noise
> > on these cases or disable the warning for structs entirely?
> > 
> > Can you post the full warning please?
> 
> /path/to/data-model-5.c: In function ‘alloc_obj’:
> /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a
> multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    61 |   base_obj *obj = (base_obj *)malloc (sz);
>       |                               ^~~~~~~~~~~
>   ‘new_string_obj’: events 1-2
>     |
>     |   69 | base_obj *new_string_obj (const char *str)
>     |      |           ^~~~~~~~~~~~~~
>     |      |           |
>     |      |           (1) entry to ‘new_string_obj’
>     |......
>     |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof
> (string_obj) + len + 1);
>     |      |                    
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |                     |
>     |      |                     (2) calling ‘alloc_obj’ from
> ‘new_string_obj’
>     |
>     +--> ‘alloc_obj’: events 3-4
>            |
>            |   59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz)
>            |      |           ^~~~~~~~~
>            |      |           |
>            |      |           (3) entry to ‘alloc_obj’
>            |   60 | {
>            |   61 |   base_obj *obj = (base_obj *)malloc (sz);
>            |      |                               ~~~~~~~~~~~
>            |      |                               |
>            |      |                               (4) Allocation is
> incompatible with ‘base_obj *’; either the allocated size is bogus or
> the type on the left-hand side is wrong
>            |
> 
> > 
> > These testcases exhibit a common way of faking inheritance in C, and
> > I
> > think it ought to be possible to support this in the warning.
> > 
> > I thing what's happening is we have
> > 
> > struct base
> > { 
> >   /* fields */
> > };
> > 
> > struct sub
> > {
> >   struct base m_base;
> >   /* extra fields.  */
> > };
> > 
> > struct base *construct_base (size_t sz)
> > {
> >   struct base *p = (struct base *) malloc (sz);
> > 
> >   /* set up fields of base in p  */
> > 
> >   return p;
> > }
> > 
> > Or is this on the interprocedural path as called with a specific
> > sizeof
> > for struct sub?
> 
> At (4), it does not know that base_obj is later used as a "base
> struct". 
> As it is called with sizeof(struct sub), my checker thinks the buffer
> is
> too large for one but too small for another base_obj.
> 
> > 
> > Maybe we can special-case these by detecting where struct sub's first
> > field is struct base, and hence where we expect this pattern?  (and
> > use
> > this to suppress the warning for such cases?)
> 
> I already excluded all structs with structs inside with 
> struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not
> help 
> in the case size for struct sub is allocated but casted as base. Maybe,
> we
> should do a special case for structs where we only warn when the sizeof
> is
> too small to hold the base struct together with supressing warnings
> when
> the first field is a struct? 

That sounds like it could work.

There are several things going on in the above example:
- fake inheritance
- the "trailing array idiom": struct string_obj's final field is:
     char str_buf[];
  meaning that the string_obj will have the char buffer trailing off
the end, and the allocation is expected to support this.

This is not uncommon in C; it occurs in CPython, see e.g.:
https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h

where CPython's PyBytesObject has the bytes in an ob_sval field
trailing off the end:

typedef struct {
    PyObject_VAR_HEAD
    Py_DEPRECATED(3.11) Py_hash_t ob_shash;
    char ob_sval[1];

    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the byte string or -1 if not computed yet.
     */
} PyBytesObject;

so it would be good for the warning to handle it gracefully, which I
think your proposal above would.

I try to have plenty of idiomatic C code in the analyzer test suite to
try to catch this kind of thing (as well as more "unit test" kinds of
test coverage); we want the warnings to have a good signal:noise ratio.



> 
> 
> * I'm unable to emit a warning whenever the cast happens at an 
> assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c. 
> This is because I'm unable to access a region_svalue for the returned
> value. Even in the new_program_state, the svalue of the lhs is still a 
> conjured_svalue. Maybe David can lead me to a place where I can access 
> the return value's region_svalue or do I have to adapt the engine?
> 
> Please can you try reposting the patch?  I tried to read it, but am
> having trouble with the mangled indentation.

See my inline answer above. Both, the test case and from where I want
to access the region_svalue are commented with // FIXME.

What does the dump of the state look like? e.g. via calling 

  (gdb) call m_region_model->debug()

from within gdb

A conjured_svalue represents the result of a call to an external
function (or a side-effect written out to a *out-style param of such a
function), but we have the body of create_buffer, so the call to
create_buffer should be analyzed interprocedurally, and we should have
a region_svalue pointing at a heap_allocated_region.

You might want to simplify things to just the functions of interest,
and then have a look at the output of -fdump-analyzer-exploded-graph in
your favorite .dot viewer (I like xdot; it's in python-xdot in Fedora).

I wonder if my idea from the other email of moving the test from sm-
malloc.cc to region-model.cc might affect this; the state machines run
at a slightly different time to the region model updates.



> 
> 
> * attr-malloc-6.c and pr96639.c did both contain structs without an
> implementation. Something in the analyzer must have triggered another
> warning about the usage of those without them having an implementation.
> I changed those structs to have an empty implementation, such that the 
> additional warning are gone. I think this shouldn't change the test
> case, so is this change okay?
> 
> What were the new warnings?

/path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type
‘struct FILE’
  175 |     FILE *p = malloc (100);   // { dg-message "allocated here"
}
      |               ^~~~~~~~~~~~

All were like the one above. error: invalid use of undefined type
'struct XXX'

That error looks bogus; I'm guessing that something the new diagnostics
is calling is generating it.  You can probably track it down by using

  (gdb) break-on-diagnostic

in the debugger, and then seeing what the backtrace shows when the
breakpoint fires.

See:
  https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html

break-on-diagnostic is one of the things in the support scripts
mentioned on that page.

Hope this is helpful

(BTW, I'm about to disappear for a long weekend; I'm back on Tuesday)

Dave




More information about the Gcc mailing list