[PATCH] Fix PR/8268: implement compile time array subscript checking

Richard Guenther richard.guenther@gmail.com
Tue Dec 26 21:37:00 GMT 2006


On 12/21/06, Dirk Mueller <dmuell@gmx.net> wrote:
> On Tuesday, 12. December 2006 13:51, Richard Guenther wrote:
>
> > > Ok? Do we need the extra -Warray-bounds?
> > Yes, we need the extra -Warray-bounds for languages that don't include it
> > in -Wall and to turn the warning off.
>
> I don't really think it is a good idea to turn off the warning ;) The main
> reason I'm asking is because we'd need an extra flag for "maybe it overflows
> the array here" kind of warnings, as in my case I'd imagine that I want to
> Werror-out on the "always overflows" and only warn on the "maybe overflows"
> case.

I see.  We still should keep the flag.

> > > NULL_TREE) +      /* Accesses after the end of arrays of size 0 (gcc
> > > +         extension) and 1 are likely intentional ("struct
> > > +         hack").  */
> > > +      || compare_tree_int (ub, 1) <= 0)
> > > +    return;
> > The "struct hack" needs to include all arrays that are the last member
> > in a structure.
> > See tree-dfa.c:get_ref_base_and_extent - you can feed it the array_ref
> > and check if
> > *pmax_size is -1 (unbound or unknown).  This function should also deal with
> > flexible arrays (i.e. report them as unknown extent).
>
> Hmm, it does that when it can determine that the array subscript is accessing
> the array beyond a memory location that is inside the struct the array ref
> was declared. It doesn't make sense to disable an array overflow warning for
> the case that it overflows an array more than the size of the struct, does
> it? There might be many legal cases suppressed this way. Do you know any real
> life code that would trigger false positives here? Or do you only want the
> struct hack to be enabled if the array is at the end of a structure?

Yes, I only want the struct hack to be enabled if the array is at the
end of a structure.
You can find all of struct s { ... int c[]; }, { ... int c[0]; }, {
... int c[1]; } and even
{ ... char c[4]; } (use all padding explicitly).  The standard allows
treating all those
as flexible array members if they are last in the structure.

> > > +  ru = TREE_CODE (us) == INTEGER_CST ? tree_int_cst_compare (ub, us) :
> > > -2; +  rl = TREE_CODE (ls) == INTEGER_CST ? tree_int_cst_compare (ls, lb)
> > > : -2;
> > It looks like you can avoid adding 1 to ub by adjusting the checks
> > below to ru <= 0.
> > > +  else if (ru == -1 || (!ignore_off_by_one && ru == 0))
> > > +    warning (OPT_Warray_bounds, "%Harray subscript is above array
>
> No, because of the case quoted above. If the ARRAY_REF is inside an ADDR_EXPR,
> taking the address of the first element after the end of the array is
> welldefined.

Yes, but I was refering to the tree building (now you're using
int_const_binop which
also creates a new tree - at least now less frequently).  Looking
again there doesn't
seem a way to avoid it.

> if you're saying that int_const_binop is computationally more expensive than
> two tree_int_cst_compare calls.. then I see your point. Ok, I've done some
> profiling and the change below seems to be in your spirit, and is about
> factor 2.5 faster. The total runtime overhead for insn-recog.c, which has more
> than 30000 array subscripts, is now about 0,1%. Much of the speedup however
> comes from compare_tree_int() which I had to change to be cheaper.
>
> Bootstrapped, regtested against trunk on i686-suse-linux.
>
> Thanks,
> Dirk
>
> 2006-12-21  Dirk Mueller  <dmueller@suse.de>
>             Richard Guenther <rguenther@suse.de>
>
>         PR diagnostic/8268
>         * doc/invoke.texi (Warray-bounds): Document -Warray-bounds.
>         * common.opt (Warray-bounds): Add new warning option.
>         * c-opts.c (c_common_handle_option): Define -Warray-bounds
>         if -Wall is given.
>         * tree-vrp.c (vrp_finalize): Call check_array_refs if -Warray-bounds
>         is enabled.
>         * tree.c (simple_cst_equal): Manually inline tree_int_cst_sgn.

You changed compare_tree_int, not simple_cst_equal.

Ok with that change.

Thanks,
Richard.

>         (check_array_refs, check_array_bounds, check_array_ref): New.
>
>         * testsuite/gcc.dg/Warray-bounds.c: New testcase.
>
>
> --- doc/invoke.texi     (revision 119972)
> +++ doc/invoke.texi     (working copy)
> @@ -245,7 +245,7 @@ Objective-C and Objective-C++ Dialects}.
>  -Wreturn-type  -Wsequence-point  -Wshadow @gol
>  -Wsign-compare  -Wstack-protector @gol
>  -Wstrict-aliasing -Wstrict-aliasing=2 @gol
> --Wstring-literal-comparison @gol
> +-Wstring-literal-comparison -Warray-bounds @gol
>  -Wswitch  -Wswitch-default  -Wswitch-enum @gol
>  -Wsystem-headers  -Wtrigraphs  -Wundef  -Wuninitialized @gol
>  -Wunknown-pragmas  -Wno-pragmas -Wunreachable-code @gol
> @@ -2831,6 +2831,12 @@ compiler is using for optimization.  Thi
>  @option{-Wstrict-aliasing}, but it will also give a warning for some
> ambiguous
>  cases that are safe.
>
> +@item -Warray-bounds
> +@opindex Warray-bounds
> +This option is only active when @option{-ftree-vrp} is active
> +(default for -O2 and above). It warns about subscripts to arrays
> +that are always out of bounds.
> +
>  @item -Wall
>  @opindex Wall
>  All of the above @samp{-W} options combined.  This enables all the
> --- common.opt  (revision 119972)
> +++ common.opt  (working copy)
> @@ -61,6 +61,10 @@ Walways-true
>  Common Var(warn_always_true)
>  Warn about comparisons that always evaluate to true
>
> +Warray-bounds
> +Common Var(warn_array_bounds)
> +Warn if an array is accessed out of bounds
> +
>  Wattributes
>  Common Var(warn_attributes) Init(1)
>  Warn about inappropriate attribute usage
> --- tree.c      (revision 119972)
> +++ tree.c      (working copy)
> @@ -5003,7 +5003,7 @@ simple_cst_equal (tree t1, tree t2)
>  int
>  compare_tree_int (tree t, unsigned HOST_WIDE_INT u)
>  {
> -  if (tree_int_cst_sgn (t) < 0)
> +  if (!TYPE_UNSIGNED (TREE_TYPE (t)) && TREE_INT_CST_HIGH (t) < 0)
>      return -1;
>    else if (TREE_INT_CST_HIGH (t) != 0)
>      return 1;
> --- tree-vrp.c  (revision 119972)
> +++ tree-vrp.c  (working copy)
> @@ -32,6 +32,7 @@ Boston, MA 02110-1301, USA.  */
>  #include "tree-dump.h"
>  #include "timevar.h"
>  #include "diagnostic.h"
> +#include "toplev.h"
>  #include "cfgloop.h"
>  #include "tree-scalar-evolution.h"
>  #include "tree-ssa-propagate.h"
> @@ -3420,6 +3420,142 @@ insert_range_assertions (void)
>    BITMAP_FREE (need_assert_for);
>  }
>
> +/* Checks one ARRAY_REF in REF, located at LOCUS. Ignores flexible arrays
> +   and "struct" hacks. If VRP can determine that the
> +   array subscript is a contant, check if it is outside valid
> +   range. If the array subscript is a RANGE, warn if it is
> +   non-overlapping with valid range.
> +   IGNORE_OFF_BY_ONE is true if the ARRAY_REF is inside a ADDR_EXPR.  */
> +
> +static void
> +check_array_ref (tree ref, location_t* locus, bool ignore_off_by_one)
> +{
> +  value_range_t* vr = NULL;
> +  tree low_sub, up_sub;
> +  tree low_bound, up_bound = array_ref_up_bound(ref);
> +
> +  low_sub = up_sub = TREE_OPERAND (ref, 1);
> +
> +  if (!up_bound || !locus || TREE_NO_WARNING (ref)
> +      || TREE_CODE (up_bound) != INTEGER_CST
> +      /* Can not check flexible arrays.  */
> +      || (TYPE_SIZE (TREE_TYPE (ref)) == NULL_TREE
> +          && TYPE_DOMAIN (TREE_TYPE (ref)) != NULL_TREE
> +          && TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (ref))) == NULL_TREE)
> +      /* Accesses after the end of arrays of size 0 (gcc
> +         extension) and 1 are likely intentional ("struct
> +         hack").  */
> +      || compare_tree_int (up_bound, 1) <= 0)
> +    return;
> +
> +  low_bound = array_ref_low_bound(ref);
> +
> +  if (TREE_CODE (low_sub) == SSA_NAME)
> +    {
> +      vr = get_value_range (low_sub);
> +      if (vr->type == VR_RANGE || vr->type == VR_ANTI_RANGE)
> +        {
> +          low_sub = vr->type == VR_RANGE ? vr->max : vr->min;
> +          up_sub = vr->type == VR_RANGE ? vr->min : vr->max;
> +        }
> +    }
> +
> +  TREE_NO_WARNING (ref) = 1;
> +
> +  if (vr && vr->type == VR_ANTI_RANGE)
> +    {
> +      if (TREE_CODE (up_sub) == INTEGER_CST
> +          && tree_int_cst_lt (up_bound, up_sub)
> +          && TREE_CODE (low_sub) == INTEGER_CST
> +          && tree_int_cst_lt (low_sub, low_bound))
> +        warning (OPT_Warray_bounds,
> +                 "%Harray subscript is outside array bounds", locus);
> +      else
> +        TREE_NO_WARNING (ref) = 0;
> +    }
> +  else if (TREE_CODE (up_sub) == INTEGER_CST
> +           && tree_int_cst_lt (up_bound, up_sub)
> +           && !tree_int_cst_equal (up_bound, up_sub)
> +           && (!ignore_off_by_one
> +               || !tree_int_cst_equal (int_const_binop (PLUS_EXPR,
> +                                                        up_bound,
> +                                                        integer_one_node,
> +                                                        0),
> +                                       up_sub)))
> +    warning (OPT_Warray_bounds, "%Harray subscript is above array bounds",
> +             locus);
> +  else if (TREE_CODE (low_sub) == INTEGER_CST
> +           && tree_int_cst_lt (low_sub, low_bound))
> +    warning (OPT_Warray_bounds, "%Harray subscript is below array bounds",
> +             locus);
> +  else
> +    TREE_NO_WARNING (ref) = 0;
> +}
> +
> +static bool array_bounds_already_done;
> +
> +/* walk_tree() callback that checks if *TP is
> +   an ARRAY_REF inside an ADDR_EXPR (in which an array
> +   subscript one outside the valid range is allowed). Call
> +   check_array_ref for each ARRAY_REF found. The location is
> +   passed in DATA.  */
> +
> +static tree
> +check_array_bounds (tree *tp, int *walk_subtree, void *data)
> +{
> + tree t = *tp;
> +
> + *walk_subtree = TRUE;
> +
> +  if (TREE_CODE (t) == ARRAY_REF)
> +    check_array_ref (t, (location_t*) data, false /*ignore_off_by_one*/);
> +  else if (TREE_CODE (t) == ADDR_EXPR)
> +    {
> +       t = TREE_OPERAND (t, 0);
> +       while (!array_bounds_already_done && handled_component_p (t))
> +         {
> +           if (TREE_CODE (t) == ARRAY_REF)
> +             check_array_ref (t, (location_t*) data,
> true /*ignore_off_by_one*/);
> +           t = TREE_OPERAND (t, 0);
> +         }
> +       *walk_subtree = FALSE;
> +    }
> +
> +  return NULL_TREE;
> +}
> +
> +/* Walk over all statements of all reachable BBs and call check_array_bounds
> +   on them.  */
> +
> +static void
> +check_all_array_refs (void)
> +{
> +  basic_block bb;
> +  block_stmt_iterator si;
> +
> +  FOR_EACH_BB (bb)
> +    {
> +      /* Skip bb's that are clearly unreachable.  */
> +      if (single_pred_p (bb))
> +      {
> +       basic_block pred_bb = EDGE_PRED (bb, 0)->src;
> +       tree ls = NULL_TREE;
> +
> +       if (!bsi_end_p (bsi_last (pred_bb)))
> +         ls = bsi_stmt (bsi_last (pred_bb));
> +
> +       if (ls && TREE_CODE (ls) == COND_EXPR
> +           && ((COND_EXPR_COND (ls) == boolean_false_node
> +                && (EDGE_PRED (bb, 0)->flags & EDGE_TRUE_VALUE))
> +               || (COND_EXPR_COND (ls) == boolean_true_node
> +                   && (EDGE_PRED (bb, 0)->flags & EDGE_FALSE_VALUE))))
> +         continue;
> +      }
> +      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
> +       walk_tree (bsi_stmt_ptr (si), check_array_bounds,
> +                  EXPR_LOCUS (*bsi_stmt_ptr (si)), NULL);
> +    }
> +}
>
>  /* Convert range assertion expressions into the implied copies and
>     copy propagate away the copies.  Doing the trivial copy propagation
> @@ -4733,6 +4869,19 @@ vrp_finalize (void)
>
>    substitute_and_fold (single_val_range, true);
>
> +  if (warn_array_bounds)
> +      check_all_array_refs();
> +
> +  /* workaround for PR/26726. The problem here is that -fivopts
> +     sometimes shifts offsets so that arrays are accessed apparently
> +     out of bounds, but actually are not. therefore we do not warn
> +     about ARRAY_REF's inside ADDR_EXPR's anymore after the first run
> +     (which is before ivopts).
> +   */
> +
> +  array_bounds_already_done = true;
> +
> +
>    /* We must identify jump threading opportunities before we release
>       the datastructures built by VRP.  */
>    identify_jump_threads ();
> Index: testsuite/gcc.dg/Warray-bounds.c
> ===================================================================
> --- testsuite/gcc.dg/Warray-bounds.c    (revision 0)
> +++ testsuite/gcc.dg/Warray-bounds.c    (revision 0)
> @@ -0,0 +1,92 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Warray-bounds" } */
> +
> +int a[10];
> +
> +static inline int n(void) {
> +    __SIZE_TYPE__ strlen(const char *s);
> +    return strlen("12345");
> +}
> +
> +void g(int *p);
> +void h(int p);
> +
> +int* f(void) {
> +    int b[10];
> +    int i;
> +    struct {
> +       int c[10];
> +    } c;
> +
> +    a[-1] = 0;             /* { dg-warning "array subscript" } */
> +    a[ 0] = 0;
> +    a[ 1] = 0;
> +
> +
> +    a[ 9] = 0;
> +    a[10] = 0;             /* { dg-warning "array subscript" } */
> +    a[11] = 0;             /* { dg-warning "array subscript" } */
> +    a[2 * n() - 11] = 0;    /* { dg-warning "array subscript" } */
> +    a[2 * n() - 10] = 0;
> +    a[2 * n() -  1] = 0;
> +    a[2 * n() -  0] = 0;    /* { dg-warning "array subscript" } */
> +
> +    b[-1] = 0;             /* { dg-warning "array subscript" } */
> +    b[ 0] = 0;
> +    b[ 1] = 0;
> +    b[ 9] = 0;
> +    b[10] = 0;             /* { dg-warning "array subscript" } */
> +    b[11] = 0;             /* { dg-warning "array subscript" } */
> +    b[2 * n() - 11] = 0;    /* { dg-warning "array subscript" } */
> +    b[2 * n() - 10] = 0;
> +    b[2 * n() -  1] = 0;
> +    b[2 * n() -  0] = 0;    /* { dg-warning "array subscript" } */
> +
> +    c.c[-1] = 0;           /* { dg-warning "array subscript" } */
> +    c.c[ 0] = 0;
> +    c.c[ 1] = 0;
> +    c.c[ 9] = 0;
> +    c.c[10] = 0;           /* { dg-warning "array subscript" } */
> +    c.c[11] = 0;           /* { dg-warning "array subscript" } */
> +    c.c[2 * n() - 11] = 0;  /* { dg-warning "array subscript" } */
> +    c.c[2 * n() - 10] = 0;
> +    c.c[2 * n() -  1] = 0;
> +    c.c[2 * n() -  0] = 0;  /* { dg-warning "array subscript" } */
> +
> +    g(&a[8]);
> +    g(&a[9]);
> +    g(&a[10]);
> +    g(&a[11]);             /* { dg-warning "array subscript" } */
> +
> +    g(&b[10]);
> +    g(&c.c[10]);
> +    g(&a[11]);             /* { dg-warning "array subscript" } */
> +    g(&b[11]);             /* { dg-warning "array subscript" } */
> +    g(&c.c[11]);           /* { dg-warning "array subscript" } */
> +
> +    g(&a[0]);
> +    g(&b[0]);
> +    g(&c.c[0]);
> +
> +    g(&a[-1]);             /* { dg-warning "array subscript" } */
> +    g(&b[-1]);             /* { dg-warning "array subscript" } */
> +    h(sizeof a[-1]);
> +    h(sizeof a[10]);
> +    h(sizeof b[-1]);
> +    h(sizeof b[10]);
> +    h(sizeof c.c[-1]);
> +    h(sizeof c.c[10]);
> +
> +    if (10 < 10)
> +       a[10] = 0;
> +    if (10 < 10)
> +       b[10] = 0;
> +    if (-1 >= 0)
> +       c.c[-1] = 0;
> +
> +    for (i = 20; i < 30; ++i)
> +             a[i] = 1;       /* { dg-warning "array subscript" } */
> +
> +    return a;
> +}
> +
>
>



More information about the Gcc-patches mailing list