PING: new pass to warn on questionable uses of alloca() and VLAs

Tue Oct 18 21:42:00 GMT 2016

On 09/13/2016 04:28 PM, Jeff Law wrote:
> On 08/19/2016 08:58 AM, Aldy Hernandez wrote:
>
>>> I'd just drop the /*strict_mode_p*/ comment in both places it appears in
>>> your patch's change to passes.def.  I think we've generally frowned on
>>> those embedded comments, even though some have snuck in.
>>
>> I've seen a lot of embedded comments throughout GCC, especially in
>> optional type arguments.  ISTM it makes things clearer for these
>> parameters.  But hey, I don't care that much.  Fixed.
> Yea.  They've crept in.   Long term this kind of stuff should have an
> enum or somesuch so that its obvious at both the call site and
> implementation site what those arguments to.
>>
>> Actually when the cast is from a known positive range we don't get a
>> VR_ANTI_RANGE, we get a proper VR_RANGE.
> Ah, in that case there's several of these things that get trivially
> cleaned up :-)
>
> Though I bet with some work I might be able to create an ANTI_RANGE that
> excludes all negative numbers.   But I don't think it's going to be that
> important in practice.  So we just have to make sure we don't
> abort/segfault.
>
>>
>> I've cleaned this code up a bit and merged some common conditionals.  In
>> the process, taking a subset of your advice, and cleaning up some things
>> I've managed to handle 2 cases where I was previously XFAILing.
>> So...less false positives.  More coverage.  Woo hoo!
> Always goodness.
>
>>
>>>
>>>> +{
>>>> +  gcc_assert (gimple_alloca_call_p (stmt));
>>>> +  tree len = gimple_call_arg (stmt, 0);
>>>> +  enum alloca_type w = ALLOCA_UNBOUNDED;
>>>> +  wide_int min, max;
>>>> +
>>>> +  gcc_assert (!is_vla || warn_vla_limit > 0);
>>>> +  gcc_assert (is_vla || warn_alloca_limit > 0);
>>>> +
>>>> +  // Adjust warn_alloca_max_size for VLAs, by taking the underlying
>>>> +  // type into account.
>>>> +  unsigned HOST_WIDE_INT max_size;
>>>> +  if (is_vla)
>>>> +    max_size = (unsigned HOST_WIDE_INT) warn_vla_limit;
>>>> +  else
>>>> +    max_size = (unsigned HOST_WIDE_INT) warn_alloca_limit;
>>>> +
>>>> +  // Check for the obviously bounded case.
>>>> +  if (TREE_CODE (len) == INTEGER_CST)
>>>> +    {
>>>> +      if (tree_to_uhwi (len) > max_size)
>>>> +    {
>>>> +      *assumed_limit = len;
>>>> +      return ALLOCA_BOUND_DEFINITELY_LARGE;
>>>> +    }
>>>> +      if (integer_zerop (len))
>>>> +    return ALLOCA_ARG_IS_ZERO;
>>>> +      w = ALLOCA_OK;
>>>> +    }
>>>> +  else if (TREE_CODE (len) != SSA_NAME)
>>>> +    return ALLOCA_UNBOUNDED;
>>> Hmm, other than INTEGER_CST and SSA_NAME, is there any other nodes we
>>> can get here?   Perhaps we get DECLs and such, particularly when not
>>> optimizing?!?
>>
>> Nope.  We don't even run without optimization (because we need VRP/range
>> info).  I added a gcc_unreachable() to make sure and added an
>> appropriate comment.  It doesn't get triggered on tests or
>> glibc/binutils builds.
> Yea, maybe I was conflating your work at Martin's (the latter of which
> can run without optimization).
>
>>> So this is an interesting tidbit that answers my questions about how we
>>> use alloca_call_type_by_arg.  Essentially this is meant to catch the
>>> merge point for flow control and give a conservative warning.  That's
>>> fine and good.  But ISTM it's really a bit of a hack.  What if we have
>>> something like this:
>>>
>>>    X   Y   Z
>>>     \  |  /
>>>       \|/
>>>        A
>>>       / \
>>>      /   \
>>>     B     C
>>>          / \
>>>         /   \
>>>        D     E
>>>
>>>
>>> Where the alloca call is in E and the incoming edges to A actually have
>>> useful information about the argument to the alloca call.
>>>
>>> ISTM you need to be doing something with the dominator tree here to find
>>> the merge point(s)  where we might know something useful.  And it's this
>>> kind of test that makes me wonder about re-purposing some of the path
>>> analysis code from tree-ssa-uninit.c.  It may be the case that the path
>>> Z->A->C->E is unfeasible, but left in the CFG because duplication to
>>> expose the unfeasible path was unprofitable.  If it turns out that the
>>> only argument that causes problems comes from the edge Z->A, then we
>>> wouldn't want to warn in this case.
>>>
>>>   I don't see Andrew's work necessarily being able to solve this
>>> problem.
>>
>> In my limited testing I've seen that 95% of all cases (I'm pulling this
>> number out of thin air ;-)) are relatively simple.  Just looking at the
>> definition of the SSA name in the alloca() call or the immediate
>> predecessors yields most of the information we need.  I haven't seen
>> much more complicated things with an actual range.
> Understood.  But it's this kind of thing that creates the false
> positives that drive people nuts :-)
>
> As I said, I don't necessarily think Andrew's work will resolve this nor
> do I think it ought to block this work.  I mostly pointed it out as an
> example of the kind of case we're not going to handle well today, but
> perhaps could with the tree-ssa-uninit.c framework.
>
>
>>
>>
>> gcc/
>>
>>     * Makefile.in (OBJS): Add gimple-ssa-warn-alloca.o.
>>     * passes.def: Add two instances of pass_walloca.
>>     * tree-pass.h (make_pass_walloca): New.
>>     * gimple-ssa-warn-alloca.c: New file.
>>     * doc/invoke.texi: Document -Walloca, -Walloca-larger-than=, and
>>     -Wvla-larger-than= options.
>>
>> gcc/c-family/
>>
>>     * c.opt (Walloca): New.
>>     (Walloca-larger-than=): New.
>>     (Wvla-larger-than=): New.
>>
>> @@ -4666,6 +4667,70 @@ annotations.
>>> +
>> +
>> +Unbounded uses, on the other hand, are uses of @code{alloca} with no
>> +controlling predicate verifying its size.  For example:
>> +
>> +@smallexample
>> +stuff ();
>> +p = alloca (n);
>> +@end smallexample
> Consider making this a full example.
>
>> diff --git a/gcc/gimple-ssa-warn-alloca.c b/gcc/gimple-ssa-warn-alloca.c
>> new file mode 100644
>> index 0000000..53b0b30
>> --- /dev/null
>> +++ b/gcc/gimple-ssa-warn-alloca.c
>
>> +// NOTE: When we get better range info, this entire function becomes
>> +// irrelevant, as it should be possible to get range info for an SSA
>> +// name at any point in the program.
>> +//
>> +// We have a few heuristics up our sleeve to determine if a call to
>> +// alloca() is within bounds.  Try them out and return the type of
>> +// alloca call with its assumed limit (if applicable).
>> +//
>> +// Given a known argument (ARG) to alloca() and an EDGE (E)
>> +// calculating said argument, verify that the last statement in the BB
>> +// in E->SRC is a gate comparing ARG to an acceptable bound for
>> +// alloca().  See examples below.
>> +//
>> +// If set, ARG_CASTED is the possible unsigned argument to which ARG
>> +// was casted to.  This is to handle cases where the controlling
>> +// predicate is looking at a casted value, not the argument itself.
>> +//    arg_casted = (size_t) arg;
>> +//    if (arg_casted < N)
>> +//      goto bb3;
>> +//    else
>> +//      goto bb5;
>> +//
>> +// MAX_SIZE is WARN_ALLOCA= adjusted for VLAs.  It is the maximum size
>> +// in bytes we allow for arg.
>> +
>> +static struct alloca_type_and_limit
>> +alloca_call_type_by_arg (tree arg, tree arg_casted, edge e, unsigned
>> max_size)
>> +{
>> +  basic_block bb = e->src;
>> +  gimple_stmt_iterator gsi = gsi_last_bb (bb);
>> +  gimple *last = gsi_stmt (gsi);
>> +  if (!last || gimple_code (last) != GIMPLE_COND)
>> +    return alloca_type_and_limit (ALLOCA_UNBOUNDED);
>> +
>> +
>> +  // Check for:
>> +  //   if (ARG .COND. N)
>> +  //     goto <bb 3>;
>> +  //   else
>> +  //     goto <bb 4>;
>> +  //   <bb 3>:
>> +  //   alloca(ARG);
>> +  // Similarly, but check for a comparison with an unknown LIMIT.
>> +  //   if (LIMIT .COND. ARG)
>> +  //     alloca(arg);
>> +  //
>> +  //   Where LIMIT has a bound of unknown range.
>> +  //
>> +  // Note: All conditions of the form (ARG .COND. XXXX) where covered
>> +  // by the previous check above, so we only need to look for (LIMIT
>> +  // .COND. ARG) here.
>> +  tree limit = gimple_cond_lhs (last);
>> +  if ((gimple_cond_rhs (last) == arg
>> +       || gimple_cond_rhs (last) == arg_casted)
>> +      && TREE_CODE (limit) == SSA_NAME)
>> +    {
>> +      wide_int min, max;
>> +      value_range_type range_type = get_range_info (limit, &min, &max);
>> +
>> +      if (range_type == VR_UNDEFINED || range_type == VR_VARYING)
>> +    return alloca_type_and_limit (ALLOCA_BOUND_UNKNOWN);
>> +
>> +      // ?? It looks like the above `if' is unnecessary, as we never
>> +      // get any VR_RANGE or VR_ANTI_RANGE here.  If we had a range
>> +      // for LIMIT, I suppose we would have taken care of it in
>> +      // alloca_call_type(), or handled above where we handle (ARG
>> .COND. N).
>> +      //
>> +      // If this ever triggers, figure out why and handle it, though
>> +      // it is likely to be just an ALLOCA_UNBOUNDED.
>> +      gcc_unreachable ();
> So this seems like a case of "we don't expect this, though it might in
> theory happen".  I think degrading gracefully to ALLOCA_UNBOUNDED is a
> better choice for this kind of situation than gcc_unreachable.  If we're
> generating a dump file, then perhaps saying something in the dump file
> about the unhandled case would be useful.
>
>
>
>> +
>> +// Analyze the alloca call in STMT and return the alloca type with its
>> +// corresponding limit (if applicable).  IS_VLA is set if the alloca
>> +// call is really a BUILT_IN_ALLOCA_WITH_ALIGN, signifying a VLA.
>> +//
>> +// If the alloca call may be too large because of a cast from a signed
>> +// type to an unsigned type, set *INVALID_CASTED_TYPE to the
>> +// problematic signed type.
>> +
>> +static struct alloca_type_and_limit
>> +alloca_call_type (gimple *stmt, bool is_vla, tree *invalid_casted_type)
>> +{
>> +  gcc_assert (gimple_alloca_call_p (stmt));
>> +  tree len = gimple_call_arg (stmt, 0);
>> +  tree len_casted = NULL;
>> +  wide_int min, max;
>> +  struct alloca_type_and_limit ret = alloca_type_and_limit
>> (ALLOCA_UNBOUNDED);
>> +
>> +  gcc_assert (!is_vla || warn_vla_limit > 0);
>> +  gcc_assert (is_vla || warn_alloca_limit > 0);
>> +
>> +  // Adjust warn_alloca_max_size for VLAs, by taking the underlying
>> +  // type into account.
>> +  unsigned HOST_WIDE_INT max_size;
>> +  if (is_vla)
>> +    max_size = (unsigned HOST_WIDE_INT) warn_vla_limit;
>> +  else
>> +    max_size = (unsigned HOST_WIDE_INT) warn_alloca_limit;
>> +
>> +  // Check for the obviously bounded case.
>> +  if (TREE_CODE (len) == INTEGER_CST)
>> +    {
>> +      if (tree_to_uhwi (len) > max_size)
>> +    return alloca_type_and_limit (ALLOCA_BOUND_DEFINITELY_LARGE, len);
>> +      if (integer_zerop (len))
>> +    return alloca_type_and_limit (ALLOCA_ARG_IS_ZERO);
>> +      ret = alloca_type_and_limit (ALLOCA_OK);
>> +    }
>> +  else if (TREE_CODE (len) != SSA_NAME)
>> +    {
>> +      // We only run with optimization and we have yet to trigger
>> +      // anything but an SSA_NAME here.  Fail here just in case we get
>> +      // a non SSA_NAME here in the future, though I doubt it will
>> +      // hold anything of value, since we need range information for
>> +      // anything to make sense.
>> +      gcc_unreachable ();
>> +      return alloca_type_and_limit (ALLOCA_UNBOUNDED);
>> +    }
> In the gimple world, the arguments to an alloca should only be constants
> and SSA_NAMES.  My concern WRT _DECL nodes was only an issue if not
> optimizing.  So I think this else clause can just go away.
>
> I think with the minor stuff noted above fixed, this is fine for the
> trunk.  I don't think it needs an additional review cycle.  Sorry for
> the long delay.

My apologies for the delay.  I have finally committed this patch. 
Before doing so, I merged with trunk and re-tested (x86-64 Linux) just 
in case.  There were no regressions.

Thanks for the review.
Aldy