[PATCH 1/2] add -Wuse-after-free

Martin Sebor msebor@gmail.com
Tue Nov 23 21:16:48 GMT 2021


On 11/22/21 6:32 PM, Jeff Law wrote:
> 
> 
> On 11/1/2021 4:17 PM, Martin Sebor via Gcc-patches wrote:
>> Patch 1 in the series detects a small subset of uses of pointers
>> made indeterminate by calls to deallocation functions like free
>> or C++ operator delete.  To control the conditions the warnings
>> are issued under the new -Wuse-after-free= option provides three
>> levels.  At the lowest level the warning triggers only for
>> unconditional uses of freed pointers and doesn't warn for uses
>> in equality expressions.  Level 2 warns also for come conditional
>> uses, and level 3 also for uses in equality expressions.
>>
>> I debated whether to make level 2 or 3 the default included in
>> -Wall.  I decided on 3 for two reasons: 1) to raise awareness
>> of both the problem and GCC's new ability to detect it: using
>> a pointer after it's been freed, even only in principle, by
>> a successful call to realloc, is undefined, and 2) because
>> it's trivial to lower the level either globally, or locally
>> by suppressing the warning around such misuses.
>>
>> I've tested the patch on x86_64-linux and by building Glibc
>> and Binutils/GDB.  It triggers a number of times in each, all
>> due to comparing invalidated pointers for equality (i.e., level
>> 3).  I have suppressed these in GCC (libiberty) by a #pragma,
>> and will see how the Glibc folks want to deal with theirs (I
>> track them in BZ #28521).
>>
>> The tests contain a number of xfails due to limitations I'm
>> aware of.  I marked them pr?????? until the patch is approved.
>> I will open bugs for them before committing if I don't resolve
>> them in a followup.
>>
>> Martin
>>
>> gcc-63272-1.diff
>>
>> Add -Wuse-after-free.
>>
>> gcc/c-family/ChangeLog
>>
>> 	* c.opt (-Wuse-after-free): New options.
>>
>> gcc/ChangeLog:
>>
>> 	* diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle
>> 	OPT_Wreturn_local_addr and OPT_Wuse_after_free_.
>> 	* diagnostic-spec.h (NW_DANGLING): New enumerator.
>> 	* doc/invoke.texi (-Wuse-after-free): Document new option.
>> 	* gimple-ssa-warn-access.cc (pass_waccess::check_call): Rename...
>> 	(pass_waccess::check_call_access): ...to this.
>> 	(pass_waccess::check): Rename...
>> 	(pass_waccess::check_block): ...to this.
>> 	(pass_waccess::check_pointer_uses): New function.
>> 	(pass_waccess::gimple_call_return_arg): New function.
>> 	(pass_waccess::warn_invalid_pointer): New function.
>> 	(pass_waccess::check_builtin): Handle free and realloc.
>> 	(gimple_use_after_inval_p): New function.
>> 	(get_realloc_lhs): New function.
>> 	(maybe_warn_mismatched_realloc): New function.
>> 	(pointers_related_p): New function.
>> 	(pass_waccess::check_call): Call check_pointer_uses.
>> 	(pass_waccess::execute): Compute and free dominance info.
>>
>> libcpp/ChangeLog:
>>
>> 	* files.c (_cpp_find_file): Substitute a valid pointer for
>> 	an invalid one to avoid -Wuse-0after-free.
>>
>> libiberty/ChangeLog:
>>
>> 	* regex.c: Suppress -Wuse-after-free.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.dg/Wmismatched-dealloc-2.c: Avoid -Wuse-after-free.
>> 	* gcc.dg/Wmismatched-dealloc-3.c: Same.
>> 	* gcc.dg/attr-alloc_size-6.c: Disable -Wuse-after-free.
>> 	* gcc.dg/attr-alloc_size-7.c: Same.
>> 	* c-c++-common/Wuse-after-free-2.c: New test.
>> 	* c-c++-common/Wuse-after-free-3.c: New test.
>> 	* c-c++-common/Wuse-after-free-4.c: New test.
>> 	* c-c++-common/Wuse-after-free-5.c: New test.
>> 	* c-c++-common/Wuse-after-free-6.c: New test.
>> 	* c-c++-common/Wuse-after-free-7.c: New test.
>> 	* c-c++-common/Wuse-after-free.c: New test.
>> 	* g++.dg/warn/Wdangling-pointer.C: New test.
>> 	* g++.dg/warn/Wmismatched-dealloc-3.C: New test.
>> 	* g++.dg/warn/Wuse-after-free.C: New test.
>>
>> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
>> index 63fc27a1487..2065402a2b9 100644
>> --- a/gcc/gimple-ssa-warn-access.cc
>> +++ b/gcc/gimple-ssa-warn-access.cc
>>
>> @@ -3397,33 +3417,460 @@ pass_waccess::maybe_check_dealloc_call (gcall *call)
>>       }
>>   }
>>   
>> +/* Return true if either USE_STMT's basic block (that of a pointer's use)
>> +   is dominated by INVAL_STMT's (that of a pointer's invalidating statement,
>> +   which is either a clobber or a deallocation call), or if they're in
>> +   the same block, USE_STMT follows INVAL_STMT.  */
>> +
>> +static bool
>> +gimple_use_after_inval_p (gimple *inval_stmt, gimple *use_stmt,
>> +			  bool last_block = false)
>> +{
>> +  tree clobvar =
>> +    gimple_clobber_p (inval_stmt) ? gimple_assign_lhs (inval_stmt) : NULL_TREE;
>> +
>> +  basic_block inval_bb = gimple_bb (inval_stmt);
>> +  basic_block use_bb = gimple_bb (use_stmt);
>> +
>> +  if (inval_bb != use_bb)
>> +    {
>> +      if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb))
>> +	return true;
>> +
>> +      if (!clobvar || !last_block)
>> +	return false;
>> +
>> +      auto gsi = gsi_for_stmt (use_stmt);
>> +
>> +      auto_bitmap visited;
>> +
>> +      /* A use statement in the last basic block in a function or one that
>> +	 falls through to it is after any other prior clobber of the used
>> +	 variable unless it's followed by a clobber of the same variable. */
>> +      basic_block bb = use_bb;
>> +      while (bb != inval_bb
>> +	     && single_succ_p (bb)
>> +	     && !(single_succ_edge (bb)->flags & (EDGE_EH|EDGE_DFS_BACK)))
>> +	{
>> +	  if (!bitmap_set_bit (visited, bb->index))
>> +	    /* Avoid cycles. */
>> +	    return true;
>> +
>> +	  for (; !gsi_end_p (gsi); gsi_next_nondebug (&gsi))
>> +	    {
>> +	      gimple *stmt = gsi_stmt (gsi);
>> +	      if (gimple_clobber_p (stmt))
>> +		{
>> +		  if (clobvar == gimple_assign_lhs (stmt))
>> +		    /* The use is followed by a clobber.  */
>> +		    return false;
>> +		}
>> +	    }
>> +
>> +	  bb = single_succ (bb);
>> +	  gsi = gsi_start_bb (bb);
>> +	}
>> +
>> +      return bb == EXIT_BLOCK_PTR_FOR_FN (cfun);
>> +    }
> ?!?  I would have thought the block dominance test plus checking UIDs if 
> the two statements are in the same block would be all you need.  Can you 
> elaborate more on what that hunk above is trying to do?

The loop is entered only for -Wdangling-pointer.  It looks for
the first clobber of the CLOBVAR variable (one whose clobber
statement has been seen during the CFG traversal and whose use
is being validated) in the successors along the single edge
from the use block.  If the search finds a clobber, the use
is valid.  If it doesn't, the use is one of a variable having
gone out of scope (the clobber must be before the use).

Among the cases the loop handles is the one in PR 63272
(the request for -Wdangling-pointer) where the use neither
follows the clobber in the same block nor dominated by it.

There may be a way to optimize it somehow but because it's
a search I don't think a simple UID check alone would be
enough.

>> +
>> +  for (auto si = gsi_for_stmt (inval_stmt); !gsi_end_p (si);
>> +       gsi_next_nondebug (&si))
>> +    {
>> +      gimple *stmt = gsi_stmt (si);
>> +      if (stmt == use_stmt)
>> +	return true;
>> +    }
>> +
>> +  return false;
>> +}
> So from a compile-time standpoint, would it be better to to assign UIDs 
> to each statement so that within a block you can just compare the UIDs?  
> That's a pretty standard way to deal with the problem of statement 
> domination within a block if we're going to be doing multiple queries.

I'd considered it but because statement UIDs don't exist at
the start of a pass, assigning them means either traversing all
statements in the whole CFG first, even in functions with no
deallocation calls or clobbers, or doing it lazily, after
the first such statement has been seen.  It might ultimately
be worthwhile if more warnings(*) end up relying on it but at
this point I'm not sure the optimization wouldn't end up slowing
things down on average.

For some data, in a GCC bootstrap, each statement visited by
this loop is visited on average twice (2.2 times), and
the average sequence of statements traversed by the loop is
2.65, with a maximum of 22 times and 18 statements, respectively.
So still not sure it would be a win.

Let me know if this is something you think I need to pursue at
this stage.

[*] I think simple memory/resource leak detection might perhaps
be one.

>> +
>> +/* Return true if P and Q point to the same object, and false if they
>> +   either don't or their relationship cannot be determined.  */
>> +
>> +static bool
>> +pointers_related_p (gimple *stmt, tree p, tree q, pointer_query &qry)
>> +{
>> +  if (!ptr_derefs_may_alias_p (p, q))
>> +    return false;
> Hmm, I guess that you don't need to worry about the case where P and Q 
> point to different elements within an array.  They point to different 
> final objects, though they do share a common enclosing object.  
> Similarly for P & Q pointing to different members within a structure.

Right.  The if statement is an optimization to avoid having to
determine the identity of the complete objects that P and Q
point to.  That's done by the calls to get_ref() below (for
complete objects; as you note, we don't care about subobjects
for this).

>> +
>> +/* For a STMT either a call to a deallocation function or a clobber, warn
>> +   for uses of the pointer PTR it was called with (including its copies
>> +   or others derived from it by pointer arithmetic).  */
>> +
>> +void
>> +pass_waccess::check_pointer_uses (gimple *stmt, tree ptr)
>> +{
>> +  gcc_assert (TREE_CODE (ptr) == SSA_NAME);
>> +
>> +  const bool check_dangling = !is_gimple_call (stmt);
>> +  basic_block stmt_bb = gimple_bb (stmt);
>> +
>> +  /* If the deallocation (or clobber) statement dominates more than
>> +     a single basic block issue a "maybe" k
> That seems wrong.   What you're looking for is a post-dominance 
> relationship I think.   If the sink (free/delete) is post-dominated by 
> the use, then it's a "must", if it's not post-dominated, then it's a 
> maybe.  Of course, that means you need to build post-dominators.

I'm sure you're right in general.  To avoid false positives
the warning is very simplistic and only considers straight
paths through the CFG, so I'm not sure this matters.  But
I'm fine with using the post-dominance test instead if you
thin it's worthwhile (it doesn't change any tests).

>> +
>> +	  if (check_dangling
>> +	      && gimple_code (use_stmt) == GIMPLE_RETURN
>> +	      && optimize && flag_isolate_erroneous_paths_dereference)
>> +	    /* Avoid interfering with -Wreturn-local-addr (which runs only
>> +	       with optimization enabled).  */
>> +	    continue;
> Umm, that looks like a hack.  I can't think of a good reason why removal 
> of erroneous paths should gate any of this code.  ISTM that you're 
> likely papering over a problem elsewhere.

This code avoids issuing -Wdangling-pointer for problems that
will later be diagnosed by -Wreturn-local-addr.  E.g., in this
case from Wreturn-local-addr-2.c:

   ATTR (noipa) void*
   return_array_plus_var (int i)
   {
       int a[32];
       void *p = a + i;
     sink (p);
     return p;         /* { dg-warning "function returns address of 
local" } */
   }

Without the test we'd end up with

   warning: using dangling pointer ‘p’ to ‘a’ [-Wdangling-pointer=]

in addition to -Wreturn-local-addr (and a whole slew of
failures in the -Wreturn-local-addr tests).

-Wreturn-local-addr only runs when
flag_isolate_erroneous_paths_dereference is nonzero, so
the conditional makes sure -Wdangling-pointer is issued when
either the flag or -Wreturn-local-addr is disabled.  I think
that works as expected (i.e., there's no problem elsewhere).

I could have the code issue -Wdangling-pointer and suppress
-Wreturn-local-addr but that doesn't seem right since
the pointer hasn't gone out of scope yet at the point it's
returned.

Alternatively, I could change this instance of
-Wdangling-pointer to -Wreturn-local-addr but that also
doesn't seem like good design since we have a whole pass
dedicated to the latter warning.

I can't think of any other more elegant solutions but I'm open
to suggestions.

Martin


More information about the Gcc-patches mailing list