analyzer: Weekly update on extending C++ support (2)
Benjamin Priour
priour.be@gmail.com
Mon Aug 7 13:04:57 GMT 2023
Hi Dave,
I made some more progress on moving tests from gcc.dg/analyzer/ to
c-c++-common/analyzer.
I'll only detail the most noteworthy tests I encountered.
gcc.dg/analyzer/pr103892.c troubled me with an Excess error when compiled
with g++
analysis bailed out early (91 'after-snode' enodes; 314 enodes)
[-Wanalyzer-too-complex]
pr103892.c is compiled with optimization level -O2.
Analysis bails out when the number of "after-SN" enodes explodes, i.e.
exceeds a certain proportion of the total number of SG nodes.
limit(after-SN) = m_sg.num_nodes () * param-bb-explosion-factor.
The reason why C and C++ differs is simply due to what '-O2' does to each
of them.
Under -O0 or -O1, there is no failure whatsoever, under -O2 only C++ fails,
whereas with -O3 both gcc and g++ emits -Wanalyzer-too-complex.
With -O2, although GCC produces a greater total number of "after-SN" enodes
than G++, their proportion barely stays under limit(after-SN) as
the total number of SN is also bigger, hence no warning is emitted.
All in all, I don't believe there is a significant difference here between
C and C++, nor is there much we can do about this.
Therefore I'll simply add a { dg-warning "analysis bailed out early" "" {
target c++ } }
An interesting divergence between GCC and G++ was in the handling of
built-ins.
Tests gcc.dg/analyzer/{pr104062.c,sprintf-2.c} are failing when compiling
with G++.
FAIL: c-c++-common/analyzer/pr104062.c leak of ap7 at line 13 (test for
warnings, line 12)
The reason is neither realloc nor sprintf are considered by G++ as a
built-in, contrary to GCC.
C built-ins are mapped within the analyzer to known_functions using their
'enum combined_fn' code
(See kf.cc:register_known_functions), not their function name.
Therefore, we find a built-in known function by checking
tree.h:fndecl_built_in_p returns true
and calling known_function_manager::get_normal_builtin.
The former returns false for 'realloc' and 'sprintf' when using G++.
To fix that, I've derived builtin_known_function from known_function.
/* Subclass of known_function for builtin functions. */
class builtin_known_function : public known_function
{
public:
virtual enum built_in_function builtin_code () const = 0;
tree builtin_decl () const {
gcc_assert (builtin_code () < END_BUILTINS);
return builtin_info[builtin_code ()].decl;
}
virtual const builtin_known_function *
dyn_cast_builtin_kf () const { return this; }
virtual builtin_known_function *
dyn_cast_builtin_kf () { return this; }
};
And 'kfm.add (BUILT_IN_REALLOC, make_unique<kf_realloc> ());' becomes
kfm.add ("realloc", make_unique<kf_realloc> ());
kfm.add ("__builtin_realloc", make_unique<kf_realloc> ());
Unfortunately we have to register the built-ins using their function name
which is more apt to bugs,
and the double call to kfm.add is quite messy. Having two instances of
kf_realloc however is not that troubling,
as builtin_known_function objects are lightweight.
That GCC built-ins are not recognized by G++ doesn't only impede their
detection,
but also how we process them, and what information we have of them.
In gcc.dg/analyzer/sprintf-2.c, sprintf signature follows the manual and
cppreference.
Yet we expect sm-malloc to warn about use of NULL when either of the
argument is NULL,
although there is no attribute(nonnull) specified.
In fact, sm-malloc isn't using the signature of sprintf as specified in the
test case, but rather the one provided
by GCC in builtins.def
/* See e.g. https://en.cppreference.com/w/c/io/fprintf
and https://www.man7.org/linux/man-pages/man3/sprintf.3.html */
extern int
sprintf(char* dst, const char* fmt, ...)
__attribute__((__nothrow__)); // No attribute(nonnull)
int
test_null_dst (void)
{
return sprintf (NULL, "hello world"); /* { dg-warning "use of NULL where
non-null expected" } */
}
int
test_null_fmt (char *dst)
{
return sprintf (dst, NULL); /* { dg-warning "use of NULL where non-null
expected" } */
}
Signature in builtins.def: DEF_LIB_BUILTIN (BUILT_IN_SPRINTF,
"sprintf", BT_FN_INT_STRING_CONST_STRING_VAR,
ATTR_NOTHROW_NONNULL_1_FORMAT_PRINTF_2_3)
My question is then : Should G++ behave as GCC and ignore the user's
signatures of built-ins, instead using the attributes specified by GCC ?
At the moment I went with a "yes". If the function is a builtin, I'm
operating on its builtin_known_function::builtin_decl () (see above) rather
than the callee_fndecl
deduced from a gimple call, therefore overriding the user's signature.
Doing so led to tests sprintf-2.c and realloc-1.c to pass both in GCC and
G++.
Last test I'd like to discuss is analyzer/pr99193-1.c
/* { dg-additional-options "-Wno-analyzer-too-complex" } */
/* Verify absence of false positive from -Wanalyzer-mismatching-deallocation
on realloc(3).
Based on
https://github.com/libguestfs/libguestfs/blob/f19fd566f6387ce7e4d82409528c9dde374d25e0/daemon/command.c#L115
which is GPLv2 or later. */
typedef __SIZE_TYPE__ size_t;
typedef __builtin_va_list va_list;
#ifdef __cplusplus
#if __cplusplus >= 201103L
#define NULL nullptr
#else
#define NULL 0
#endif
#else
#define NULL ((void *)0)
#endif
extern void *malloc (size_t __size)
__attribute__ ((__nothrow__ , __leaf__))
__attribute__ ((__malloc__))
__attribute__ ((__alloc_size__ (1)));
extern void perror (const char *__s);
extern void *realloc (void *__ptr, size_t __size)
__attribute__ ((__nothrow__ , __leaf__))
__attribute__ ((__warn_unused_result__))
__attribute__ ((__alloc_size__ (2)));
extern void guestfs_int_cleanup_free (void *ptr);
extern int commandrvf (char **stdoutput, char **stderror, unsigned flags,
char const* const *argv);
#define CLEANUP_FREE __attribute__((cleanup(guestfs_int_cleanup_free)))
int
commandrf (char **stdoutput, char **stderror, unsigned flags,
const char *name, ...)
{
va_list args;
CLEANUP_FREE const char **argv = NULL;
char *s;
int i, r;
/* Collect the command line arguments into an array. */
i = 2;
argv = (const char **) malloc (sizeof (char *) * i);
if (argv == NULL) {
perror ("malloc");
return -1;
}
argv[0] = (char *) name;
argv[1] = NULL;
__builtin_va_start (args, name);
while ((s = __builtin_va_arg (args, char *)) != NULL) {
const char **p = (const char **) realloc (argv, sizeof (char *) *
(++i)); /* { dg-bogus "'free'" } */
if (p == NULL) {
perror ("realloc");
__builtin_va_end (args); // (*)
return -1;
}
argv = p;
argv[i-2] = s;
argv[i-1] = NULL;
}
__builtin_va_end (args);
r = commandrvf (stdoutput, stderror, flags, argv);
return r;
} // G++ emits Excess error here
Even without the prior changes to the processing of built-ins, va_end and
__builtin_va_end are recognized by G++.
Yet, G++ fails with an Excess error "warning: missing call to 'va_end'
[-Wanalyzer-va-list-leak]".
/home/vultkayn/Desktop/gcc_sources_perso/gcc/gcc/testsuite/c-c++-common/analyzer/pr99193-1.c:75:1:
warning: missing call to ‘va_end’ [-Wanalyzer-va-list-leak]
75 | }
| ^
‘int commandrf(char**, char**, unsigned int, const char*, ...)’: events
1-3
|
| 49 | if (argv == NULL) {
| | ^~
| | |
| | (1) following ‘false’ branch...
|......
| 53 | argv[0] = (char *) name;
| | ~
| | |
| | (2) ...to here
|......
| 56 | __builtin_va_start (args, name);
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (3) ‘va_start’ called here (state of ‘’:
‘start’ -> ‘started’, NULL origin, meaning: {verb: ‘acquire’, noun:
‘resource’})
|
‘int commandrf(char**, char**, unsigned int, const char*, ...)’: events
4-9
|
| 58 | while ((s = __builtin_va_arg (args, char *)) != NULL) {
| | ^
| | |
| | (4) following
‘true’ branch...
| 59 | const char **p = (const char **) realloc (argv, sizeof
(char *) * (++i)); /* { dg-bogus "'free'" } */
| |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
|
| | |
(5) ...to here
| | (6) when
‘realloc’ fails
| 60 | if (p == NULL) {
| | ~~
| | |
| | (7) following ‘true’ branch (when ‘p’ is NULL)...
| 61 | perror ("realloc");
| | ~~~~~~~~~~~~~~~~~~
| | |
| | (8) ...to here
|......
| 75 | }
| | ~
| | |
| | (9) missing call to ‘va_end’ to match ‘va_start’ at (3) (in
global state ‘started’)
|
The IPA doesn't differ much between C and C++ at this sec
// gcc -fdump-ipa-analyzer
<bb 5> :
i_44 = i_22 + 1;
_8 = (long unsigned int) i_44;
_9 = _8 * 8;
argv.3_10 = argv;
p_46 = realloc (argv.3_10, _9);
if (p_46 == 0B)
goto <bb 6>; [INV]
else
goto <bb 7>; [INV]
<bb 6> :
perror ("realloc");
__builtin_va_end (&args);
_52 = -1;
// predicted unlikely by early return (on trees) predictor.
goto <bb 10>; [INV]
// g++ -fdump-analyzer-ipa
<bb 6> :
i_48 = i_22 + 1;
_8 = (long unsigned int) i_48;
_9 = _8 * 8;
argv.3_10 = argv;
p_50 = realloc (argv.3_10, _9);
if (p_50 == 0B) // if p == NULL
goto <bb 7>; [INV]
else
goto <bb 9>; [INV]
<bb 7> :
perror ("realloc");
<bb 8> :
__builtin_va_end (&args);
_56 = -1;
// predicted unlikely by early return (on trees) predictor.
goto <bb 13>; [INV]
If I comment out the call to perror("realloc"), then G++ behaves as GCC.
If I replace perror ("realloc") by any other call, to a fully-defined
function or whatever, then the warning reappears.
The above SSA differ by an extra basic block in G++ that splits GCC's <bb
6> in two.
However, that doesn't account for much, as changing
if (p == NULL) {
perror ("realloc");
__builtin_va_end (args); // (*)
return -1;
}
to
if (p == NULL) {
int x;
perror("realloc");
if (x > 7)
x = 12;
__builtin_va_end (args);
return -1;
}
generates a similar basic block around __builtin_va_end, yet G++ keeps
emitting the false positive whereas GCC doesn't.
// GCC modified SSA
<bb 6> :
perror ("realloc");
if (x_51(D) > 7)
goto <bb 7>; [INV]
else
goto <bb 8>; [INV]
<bb 7> :
x_52 = 12;
<bb 8> :
__builtin_va_end (&args);
_54 = -1;
// predicted unlikely by early return (on trees) predictor.
goto <bb 12>; [INV]
// G++ modified SSA
<bb 7> :
perror ("realloc");
<bb 8> :
if (x_55(D) > 7)
goto <bb 9>; [INV]
else
goto <bb 10>; [INV]
<bb 9> :
x_56 = 12;
<bb 10> :
__builtin_va_end (&args);
_58 = -1;
// predicted unlikely by early return (on trees) predictor.
goto <bb 15>; [INV]
Debugging and glaring at the exploded graph gave me no clue towards fixing
this test.
If you have any hint, I welcome it.
I hope the above isn't too indigest.
Cheers,
Benjamin.
More information about the Gcc
mailing list