PING: new pass to warn on questionable uses of alloca() and VLAs
Aldy Hernandez
aldyh@redhat.com
Tue Oct 18 21:42:00 GMT 2016
On 09/13/2016 04:28 PM, Jeff Law wrote:
> On 08/19/2016 08:58 AM, Aldy Hernandez wrote:
>
>>> I'd just drop the /*strict_mode_p*/ comment in both places it appears in
>>> your patch's change to passes.def. I think we've generally frowned on
>>> those embedded comments, even though some have snuck in.
>>
>> I've seen a lot of embedded comments throughout GCC, especially in
>> optional type arguments. ISTM it makes things clearer for these
>> parameters. But hey, I don't care that much. Fixed.
> Yea. They've crept in. Long term this kind of stuff should have an
> enum or somesuch so that its obvious at both the call site and
> implementation site what those arguments to.
>>
>> Actually when the cast is from a known positive range we don't get a
>> VR_ANTI_RANGE, we get a proper VR_RANGE.
> Ah, in that case there's several of these things that get trivially
> cleaned up :-)
>
> Though I bet with some work I might be able to create an ANTI_RANGE that
> excludes all negative numbers. But I don't think it's going to be that
> important in practice. So we just have to make sure we don't
> abort/segfault.
>
>>
>> I've cleaned this code up a bit and merged some common conditionals. In
>> the process, taking a subset of your advice, and cleaning up some things
>> I've managed to handle 2 cases where I was previously XFAILing.
>> So...less false positives. More coverage. Woo hoo!
> Always goodness.
>
>>
>>>
>>>> +{
>>>> + gcc_assert (gimple_alloca_call_p (stmt));
>>>> + tree len = gimple_call_arg (stmt, 0);
>>>> + enum alloca_type w = ALLOCA_UNBOUNDED;
>>>> + wide_int min, max;
>>>> +
>>>> + gcc_assert (!is_vla || warn_vla_limit > 0);
>>>> + gcc_assert (is_vla || warn_alloca_limit > 0);
>>>> +
>>>> + // Adjust warn_alloca_max_size for VLAs, by taking the underlying
>>>> + // type into account.
>>>> + unsigned HOST_WIDE_INT max_size;
>>>> + if (is_vla)
>>>> + max_size = (unsigned HOST_WIDE_INT) warn_vla_limit;
>>>> + else
>>>> + max_size = (unsigned HOST_WIDE_INT) warn_alloca_limit;
>>>> +
>>>> + // Check for the obviously bounded case.
>>>> + if (TREE_CODE (len) == INTEGER_CST)
>>>> + {
>>>> + if (tree_to_uhwi (len) > max_size)
>>>> + {
>>>> + *assumed_limit = len;
>>>> + return ALLOCA_BOUND_DEFINITELY_LARGE;
>>>> + }
>>>> + if (integer_zerop (len))
>>>> + return ALLOCA_ARG_IS_ZERO;
>>>> + w = ALLOCA_OK;
>>>> + }
>>>> + else if (TREE_CODE (len) != SSA_NAME)
>>>> + return ALLOCA_UNBOUNDED;
>>> Hmm, other than INTEGER_CST and SSA_NAME, is there any other nodes we
>>> can get here? Perhaps we get DECLs and such, particularly when not
>>> optimizing?!?
>>
>> Nope. We don't even run without optimization (because we need VRP/range
>> info). I added a gcc_unreachable() to make sure and added an
>> appropriate comment. It doesn't get triggered on tests or
>> glibc/binutils builds.
> Yea, maybe I was conflating your work at Martin's (the latter of which
> can run without optimization).
>
>>> So this is an interesting tidbit that answers my questions about how we
>>> use alloca_call_type_by_arg. Essentially this is meant to catch the
>>> merge point for flow control and give a conservative warning. That's
>>> fine and good. But ISTM it's really a bit of a hack. What if we have
>>> something like this:
>>>
>>> X Y Z
>>> \ | /
>>> \|/
>>> A
>>> / \
>>> / \
>>> B C
>>> / \
>>> / \
>>> D E
>>>
>>>
>>> Where the alloca call is in E and the incoming edges to A actually have
>>> useful information about the argument to the alloca call.
>>>
>>> ISTM you need to be doing something with the dominator tree here to find
>>> the merge point(s) where we might know something useful. And it's this
>>> kind of test that makes me wonder about re-purposing some of the path
>>> analysis code from tree-ssa-uninit.c. It may be the case that the path
>>> Z->A->C->E is unfeasible, but left in the CFG because duplication to
>>> expose the unfeasible path was unprofitable. If it turns out that the
>>> only argument that causes problems comes from the edge Z->A, then we
>>> wouldn't want to warn in this case.
>>>
>>> I don't see Andrew's work necessarily being able to solve this
>>> problem.
>>
>> In my limited testing I've seen that 95% of all cases (I'm pulling this
>> number out of thin air ;-)) are relatively simple. Just looking at the
>> definition of the SSA name in the alloca() call or the immediate
>> predecessors yields most of the information we need. I haven't seen
>> much more complicated things with an actual range.
> Understood. But it's this kind of thing that creates the false
> positives that drive people nuts :-)
>
> As I said, I don't necessarily think Andrew's work will resolve this nor
> do I think it ought to block this work. I mostly pointed it out as an
> example of the kind of case we're not going to handle well today, but
> perhaps could with the tree-ssa-uninit.c framework.
>
>
>>
>>
>> gcc/
>>
>> * Makefile.in (OBJS): Add gimple-ssa-warn-alloca.o.
>> * passes.def: Add two instances of pass_walloca.
>> * tree-pass.h (make_pass_walloca): New.
>> * gimple-ssa-warn-alloca.c: New file.
>> * doc/invoke.texi: Document -Walloca, -Walloca-larger-than=, and
>> -Wvla-larger-than= options.
>>
>> gcc/c-family/
>>
>> * c.opt (Walloca): New.
>> (Walloca-larger-than=): New.
>> (Wvla-larger-than=): New.
>>
>> @@ -4666,6 +4667,70 @@ annotations.
>>> +
>> +
>> +Unbounded uses, on the other hand, are uses of @code{alloca} with no
>> +controlling predicate verifying its size. For example:
>> +
>> +@smallexample
>> +stuff ();
>> +p = alloca (n);
>> +@end smallexample
> Consider making this a full example.
>
>> diff --git a/gcc/gimple-ssa-warn-alloca.c b/gcc/gimple-ssa-warn-alloca.c
>> new file mode 100644
>> index 0000000..53b0b30
>> --- /dev/null
>> +++ b/gcc/gimple-ssa-warn-alloca.c
>
>> +// NOTE: When we get better range info, this entire function becomes
>> +// irrelevant, as it should be possible to get range info for an SSA
>> +// name at any point in the program.
>> +//
>> +// We have a few heuristics up our sleeve to determine if a call to
>> +// alloca() is within bounds. Try them out and return the type of
>> +// alloca call with its assumed limit (if applicable).
>> +//
>> +// Given a known argument (ARG) to alloca() and an EDGE (E)
>> +// calculating said argument, verify that the last statement in the BB
>> +// in E->SRC is a gate comparing ARG to an acceptable bound for
>> +// alloca(). See examples below.
>> +//
>> +// If set, ARG_CASTED is the possible unsigned argument to which ARG
>> +// was casted to. This is to handle cases where the controlling
>> +// predicate is looking at a casted value, not the argument itself.
>> +// arg_casted = (size_t) arg;
>> +// if (arg_casted < N)
>> +// goto bb3;
>> +// else
>> +// goto bb5;
>> +//
>> +// MAX_SIZE is WARN_ALLOCA= adjusted for VLAs. It is the maximum size
>> +// in bytes we allow for arg.
>> +
>> +static struct alloca_type_and_limit
>> +alloca_call_type_by_arg (tree arg, tree arg_casted, edge e, unsigned
>> max_size)
>> +{
>> + basic_block bb = e->src;
>> + gimple_stmt_iterator gsi = gsi_last_bb (bb);
>> + gimple *last = gsi_stmt (gsi);
>> + if (!last || gimple_code (last) != GIMPLE_COND)
>> + return alloca_type_and_limit (ALLOCA_UNBOUNDED);
>> +
>> +
>> + // Check for:
>> + // if (ARG .COND. N)
>> + // goto <bb 3>;
>> + // else
>> + // goto <bb 4>;
>> + // <bb 3>:
>> + // alloca(ARG);
>> + // Similarly, but check for a comparison with an unknown LIMIT.
>> + // if (LIMIT .COND. ARG)
>> + // alloca(arg);
>> + //
>> + // Where LIMIT has a bound of unknown range.
>> + //
>> + // Note: All conditions of the form (ARG .COND. XXXX) where covered
>> + // by the previous check above, so we only need to look for (LIMIT
>> + // .COND. ARG) here.
>> + tree limit = gimple_cond_lhs (last);
>> + if ((gimple_cond_rhs (last) == arg
>> + || gimple_cond_rhs (last) == arg_casted)
>> + && TREE_CODE (limit) == SSA_NAME)
>> + {
>> + wide_int min, max;
>> + value_range_type range_type = get_range_info (limit, &min, &max);
>> +
>> + if (range_type == VR_UNDEFINED || range_type == VR_VARYING)
>> + return alloca_type_and_limit (ALLOCA_BOUND_UNKNOWN);
>> +
>> + // ?? It looks like the above `if' is unnecessary, as we never
>> + // get any VR_RANGE or VR_ANTI_RANGE here. If we had a range
>> + // for LIMIT, I suppose we would have taken care of it in
>> + // alloca_call_type(), or handled above where we handle (ARG
>> .COND. N).
>> + //
>> + // If this ever triggers, figure out why and handle it, though
>> + // it is likely to be just an ALLOCA_UNBOUNDED.
>> + gcc_unreachable ();
> So this seems like a case of "we don't expect this, though it might in
> theory happen". I think degrading gracefully to ALLOCA_UNBOUNDED is a
> better choice for this kind of situation than gcc_unreachable. If we're
> generating a dump file, then perhaps saying something in the dump file
> about the unhandled case would be useful.
>
>
>
>> +
>> +// Analyze the alloca call in STMT and return the alloca type with its
>> +// corresponding limit (if applicable). IS_VLA is set if the alloca
>> +// call is really a BUILT_IN_ALLOCA_WITH_ALIGN, signifying a VLA.
>> +//
>> +// If the alloca call may be too large because of a cast from a signed
>> +// type to an unsigned type, set *INVALID_CASTED_TYPE to the
>> +// problematic signed type.
>> +
>> +static struct alloca_type_and_limit
>> +alloca_call_type (gimple *stmt, bool is_vla, tree *invalid_casted_type)
>> +{
>> + gcc_assert (gimple_alloca_call_p (stmt));
>> + tree len = gimple_call_arg (stmt, 0);
>> + tree len_casted = NULL;
>> + wide_int min, max;
>> + struct alloca_type_and_limit ret = alloca_type_and_limit
>> (ALLOCA_UNBOUNDED);
>> +
>> + gcc_assert (!is_vla || warn_vla_limit > 0);
>> + gcc_assert (is_vla || warn_alloca_limit > 0);
>> +
>> + // Adjust warn_alloca_max_size for VLAs, by taking the underlying
>> + // type into account.
>> + unsigned HOST_WIDE_INT max_size;
>> + if (is_vla)
>> + max_size = (unsigned HOST_WIDE_INT) warn_vla_limit;
>> + else
>> + max_size = (unsigned HOST_WIDE_INT) warn_alloca_limit;
>> +
>> + // Check for the obviously bounded case.
>> + if (TREE_CODE (len) == INTEGER_CST)
>> + {
>> + if (tree_to_uhwi (len) > max_size)
>> + return alloca_type_and_limit (ALLOCA_BOUND_DEFINITELY_LARGE, len);
>> + if (integer_zerop (len))
>> + return alloca_type_and_limit (ALLOCA_ARG_IS_ZERO);
>> + ret = alloca_type_and_limit (ALLOCA_OK);
>> + }
>> + else if (TREE_CODE (len) != SSA_NAME)
>> + {
>> + // We only run with optimization and we have yet to trigger
>> + // anything but an SSA_NAME here. Fail here just in case we get
>> + // a non SSA_NAME here in the future, though I doubt it will
>> + // hold anything of value, since we need range information for
>> + // anything to make sense.
>> + gcc_unreachable ();
>> + return alloca_type_and_limit (ALLOCA_UNBOUNDED);
>> + }
> In the gimple world, the arguments to an alloca should only be constants
> and SSA_NAMES. My concern WRT _DECL nodes was only an issue if not
> optimizing. So I think this else clause can just go away.
>
> I think with the minor stuff noted above fixed, this is fine for the
> trunk. I don't think it needs an additional review cycle. Sorry for
> the long delay.
My apologies for the delay. I have finally committed this patch.
Before doing so, I merged with trunk and re-tested (x86-64 Linux) just
in case. There were no regressions.
Thanks for the review.
Aldy
More information about the Gcc-patches
mailing list