81601 – [12/13/14/15 Regression] incorrect Warray-bounds warning with -fsanitize

Bug 81601 - [12/13/14/15 Regression] incorrect Warray-bounds warning with -fsanitize

Summary: [12/13/14/15 Regression] incorrect Warray-bounds warning with -fsanitize

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	sanitizer (show other bugs)
Version:	7.1.1

Importance:	P2 normal
Target Milestone:	12.5
Assignee:	Not yet assigned to anyone

URL:
Keywords:	diagnostic

Depends on:
Blocks:	Warray-bounds 86953
	Show dependency tree / graph

Reported:	2017-07-28 12:50 UTC by Arnd Bergmann
Modified:	2024-07-19 13:00 UTC (History)
CC List:	4 users (show)

See Also:	15826
Host:
Target:
Build:
Known to work:
Known to fail:	6.3.0, 7.1.0
Last reconfirmed:	2017-07-28 00:00:00

Attachments
reduced version of linux/net/ipv4/tcp_output.c (440 bytes, text/plain) 2017-07-28 12:50 UTC, Arnd Bergmann	Details
WIP that fixes PR, but causes other regressions (3.03 KB, patch) 2018-01-24 19:07 UTC, Aldy Hernandez	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Arnd Bergmann 2017-07-28 12:50:52 UTC

Created attachment 41856 [details]
reduced version of linux/net/ipv4/tcp_output.c

Compiling the Linux kernel with gcc-7.1.1 and ubsan, I get this warning:

net/ipv4/tcp_output.c: In function 'tcp_connect':
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
                                        ^~
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~

I have manually reduced the file to the attached version (this can be reduced further, I decided to leave a little more context for clarity).

The warning is an array dereference after a range check:

	if (tp->chrono_type > TCP_CHRONO_UNSPEC)
		tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;

so it clearly cannot be below the bounds.

In the original version, this happens specifically when at least one of -fsanitize=object-size, -fsanitize=alignment, or -fsanitize=null is set in addition to "-O2 -Wall", but not when all three are disabled. In the reduced version, I can also reproduce it with "-Os -Wall" (without ubsan).

I also see the problem with gcc-7.0.1 on all architectures I tried (arm, arm64 and x86), but not with gcc-6.3.1.

Comment 1 Martin Liška 2017-07-28 13:18:29 UTC

Confirmed, started with r239421, which is the commit also mentioned in PR80202.

Comment 2 Jakub Jelinek 2017-07-28 14:15:48 UTC

The -Warray-bounds warning has false positives, so you can run into them and in that case using -Werror isn't really a good idea.
The problem is that the same bitfield, tp->chrono_start, is accessed 2 different ways in the IL before vrp1:
  _5 = tp_2(D)->chrono_type;
  if (_5 == 0)
in one spot, and
  _12 = BIT_FIELD_REF <*tp_2(D), 8, 128>;
  _13 = _12 & 3;
  if (_13 != 0)
in another.  The optimizers don't treat those two as the same thing, and we have
thus _5 used in the first condition, _13 in second one and then again _5.
From the first condition it determines that _5 must be zero in that branch (that is the if (type > tp->chrono_type) condition, where you do nothing if tp->chrono_type is not 0), it doesn't know that also implies that _13 is 0 to optimize that as dead code, and finally sees array index of [_5 - 1], and when _5 is known to be 0, that means [-1], which is invalid.  It is invalid, but in dead code.

Comment 3 Jakub Jelinek 2017-07-28 14:21:58 UTC

The reason this doesn't trigger without -fsanitize= is that without the extra UBSAN_* internal function calls we manage to find out earlier that this is a dead code and optimize it away completely (during cddce1).

Comment 4 Arnd Bergmann 2017-07-28 14:44:32 UTC

Thanks for the analysis. I have now submitted a local workaround for the kernel, adding a temporary variable to avoid accessing the bitfield twice, see https://patchwork.kernel.org/patch/9869037/

Comment 5 Patrick Palka 2017-07-30 16:20:25 UTC

So what's the right way to fix this?  To move optimize_bit_field_compare() from fold_binary to match.pd so that the conditions on

Comment 6 Patrick Palka 2017-07-30 16:22:12 UTC

(In reply to Patrick Palka from comment #5)
> So what's the right way to fix this?  To move optimize_bit_field_compare()
> from fold_binary to match.pd so that the conditions on

... so that conditions on tp->chrono_type get consistently transformed into BIT_FIELD_REFs, or to remove optimize_bit_field_compare() altogether?  It seems like a rather low-level optimization to be done in GENERIC/GIMPLE.

Comment 7 Richard Biener 2017-07-31 07:34:19 UTC

optimize_bit_field_compare() is a premature red herring...

Comment 8 Jeffrey A. Law 2017-12-04 18:14:23 UTC


The two key blocks are:

bb2:
  _3 = __builtin_object_size (tp_2(D), 0);
  _4 = &tp_2(D)->D.2254;
  GIMPLE_NOP
  _5 = tp_2(D)->chrono_type;
  if (_5 == 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 6>; [50.00%]

bb3:
  now_6 = tcp_jiffies32;
  _7 = BIT_FIELD_REF <*tp_2(D), 8, 128>;
  _8 = _7 & 3;
  if (_8 != 0)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

Where the out of bounds access occurs in BB4 which can only be reached via BB3.

We essentially need to prove that _5 and _8 are equivalent.  The only good news is that the edge 2->3 dominates bb3 so this could (in theory) be handled with good equivalence processing without jump threading.

Are we allowed to use types like this in a gimple conditional?

  <unnamed-unsigned:2> _5;

If so, then one approach would be first focus on BB3.  We'd want to combine the BIT_FIELD_REF and masking into a single BIT_FIELD_REF and test the result of that without conversion.  Could forwprop handle that perhaps?

Once the BIT_FIELD_REF just reads two bits, then we'd have a fighting chance of realizing that the BIT_FIELD_REF is just a reference to tp_2->chrono_type.  Which we could lookup in the hash table has _5 which has a known constant value of zero.

Not working on this, but figured I'd at least chime in with some thoughts on how we might be able to approach...

Comment 9 Aldy Hernandez 2018-01-17 19:06:09 UTC

I'll take a look.

Comment 10 Aldy Hernandez 2018-01-17 19:16:48 UTC

Further reduced testcase.  Can be reproduced with -O2 -Wall.

struct tcp_sock {
  int	chrono_stat[3];
  unsigned char	chrono_type:2;
};

void tcp_chrono_set(struct tcp_sock *tp)
{
  int type = 1;
  if (type > tp->chrono_type)
    {
      if (tp->chrono_type > 0)
	tp->chrono_stat[tp->chrono_type - 1] = 1234;
    }
}

Comment 11 Jeffrey A. Law 2018-01-17 19:19:13 UTC

What do the dumps look like?  In particular evrp & vrp1?

Comment 12 Aldy Hernandez 2018-01-17 19:23:16 UTC

The warning occurs in vrp1, not evrp, but for the record...

evrp dump:
tcp_chrono_set (struct tcp_sock * tp)
{
  int type;
  <unnamed-unsigned:2> _1;
  int _2;
  unsigned char _3;
  unsigned char _4;

  <bb 2> :
  _1 = tp_11(D)->chrono_type;
  _2 = (int) _1;
  if (_1 == 0)
    goto <bb 3>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 3> :
  _3 = BIT_FIELD_REF <*tp_11(D), 8, 96>;
  _4 = _3 & 3;
  if (_4 != 0)
    goto <bb 4>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 4> :
  tp_11(D)->chrono_stat[-1] = 1234;

  <bb 5> :
  return;
}

vrp1 dump with ASSERT_EXPRs:
  <bb 2> [local count: 1073741825]:
  _1 = tp_6(D)->chrono_type;
  tp_8 = ASSERT_EXPR <tp_6(D), tp_6(D) != 0B>;
  if (_1 == 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 3> [local count: 536870912]:
  _2 = BIT_FIELD_REF <*tp_8, 8, 96>;
  _3 = _2 & 3;
  if (_3 != 0)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 4> [local count: 268435456]:
  tp_8->chrono_stat[-1] = 1234;

vrp1 dump after ASSERT_EXPRs have been removed:
  <bb 2> [local count: 1073741825]:
  _1 = tp_6(D)->chrono_type;
  if (_1 == 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 3> [local count: 536870912]:
  _2 = BIT_FIELD_REF <*tp_6(D), 8, 96>;
  _3 = _2 & 3;
  if (_3 != 0)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 4> [local count: 268435456]:
  tp_6(D)->chrono_stat[-1] = 1234;

On Wed, Jan 17, 2018 at 2:19 PM, law at redhat dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601
>
> --- Comment #11 from Jeffrey A. Law <law at redhat dot com> ---
> What do the dumps look like?  In particular evrp & vrp1?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
> You are the assignee for the bug.

Comment 13 Jeffrey A. Law 2018-01-17 19:32:32 UTC

I realize the warnings are happening in VRP1.  EVRP and VRP1 use different styles of analysis and it can be the case that one is better suited for cleaning things up than the other.

But I really should have re-familiarized myself with the BZ first :-)

We need to prove that the reference to chrono_type and the BIT_FIELD_REF are actually the same.  That way the result of the first conditional can be used to prove the second conditional is always false making the out of bounds reference unreachable.

It probably needs to be done in FRE since that's our best bet to expose the relationship between the conditionals prior to VRP1.

Comment 14 Jakub Jelinek 2018-01-17 19:49:07 UTC

I think the easiest fix is move optimize_bit_field_compare from where it is currently done and do it only in some late gimple pass, around pass_fold_builtins/store merging or so?  Dunno which exact one though.

Comment 15 Aldy Hernandez 2018-01-18 03:03:45 UTC

(In reply to Jakub Jelinek from comment #14)
> I think the easiest fix is move optimize_bit_field_compare from where it is
> currently done and do it only in some late gimple pass, around
> pass_fold_builtins/store merging or so?  Dunno which exact one though.

Delaying optimize_bit_field_compare, would definitely do the trick.  For that matter, disabling it, at least in this particular case, causes FRE to sensibly realize the nonsense in:

  <bb 2> :
  _1 = tp_10(D)->chrono_type;
  _2 = (int) _1;
  if (_1 == 0)
    goto <bb 3>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 3> :
  _3 = tp_10(D)->chrono_type;
  if (_3 != 0)
    goto <bb 4>; [INV]
  else
    goto <bb 5>; [INV]

...and clean things up.  By VRP we no longer have a problematic store, and the warning is gone.

However, wouldn't optimizing bit field compares so late inhibit other optimizations that come before store merging???  Though come to think of it, optimizing bit fields so early (in the FE currently) seems mighty early.

I don't know whether Richard was frowning upon optimize_bit_field_compare twiddling in comment 7.

Also, doing it in gimple, instead of the FE, means we have to recreate the bit field comparison that's been gimplified.  We no longer have a comparison but:

  _3 = tp_10(D)->chrono_type;
  if (_3 != 0)

or worse:

  _1 = tp_10(D)->chrono_type;
  _2 = (int) _1;
  if (_1 == 0)

I'm certainly willing to tackle this, but I want to make sure I don't throw cycles away.

Thoughts?

Richard, Jeff, Jakub?

Comment 16 Jeffrey A. Law 2018-01-18 04:21:27 UTC

Well, my understanding of how other compilers have handled bitfields would indicate that deferring optimization of them until later is advisable.

Essentially they pretended bitfields had word precision though most of the early optimization passes, then lowered things to their actual size later.

I'm not entirely sure how to interpret Richi's c#7.  He could be saying that optimize_bit_field is premature optimization and if so, deferring it until a suitable point in gimple would be a good thing.

Or he could be saying that he doesn't think that's the real issue, just a red herring.  Though your experiments tend to indicate that it is a real issue here.

ISTM that either we defer bitfield optimization (which may have fallout) or we try to teach FRE to recognize that the COMPONENT_REF and the BIT_FIELD_REF are the same.

I'd say experiment with the former and see how bad the fallout looks.

I wonder if this BZ argues that we ought to have a canonical form when a COMPONENT_REF and BIT_FIELD_REF express the same thing -- and if so, which ought to be the canonical form?

Comment 17 Jakub Jelinek 2018-01-18 08:38:24 UTC

Sure, for working on GIMPLE the code needs to be adjusted.  On the other side, the advantage is that it will then be able to handle even cases that it couldn't before.
Like right now it can handle:
struct S { int a:5; int b:3; int c:2; };
void bar (void);

void
foo (struct S s)
{
  if (s.a == 7 && s.b == 6 && s.c == 1)
    bar ();
}

but it could also handle

void
baz (struct S s)
{
  int a = s.a;
  int b = s.b;
  int c = s.c;
  if (a == 7 && b == 6 && c == 1)
    bar ();
}

As to where exactly to implement it, as part of which other pass or as a new pass I'd like to defer to richi.

Comment 18 Jakub Jelinek 2018-01-18 09:00:47 UTC

Note, it isn't just optimize_bit_field_compare, but also fold_truth_andor_1 that creates this stuff.  Doing this at gimple might have best framework in the reassoc pass, because you need to look through conditions in multiple basic blocks with no significant code in between, but also conditions bitwise ored or anded within a single bb, i.e. the result of how multiple &&s or ||s are lowered.
The reassoc pass is the only one with this kind of framework, in optimize_range_tests.  Probably it would need to be done only in the second reassoc pass, which, while it is before vrp2, it is only shortly before that and as an advantage it there would be some time gimple passes to clean stuff up if needed.

Comment 19 Jakub Jelinek 2018-01-18 09:03:55 UTC

Testcase where in f1/f3/f5 this optimization is done early (already in *.original dump), but in f2/f4/f6 is not.

struct S { unsigned a:5; unsigned b:3; unsigned c:2; };
void bar (void);

void
f1 (struct S s)
{
  if (s.a == 7 && s.b == 6 && s.c == 1)
    bar ();
}

void
f2 (struct S s)
{
  unsigned a = s.a;
  unsigned b = s.b;
  unsigned c = s.c;
  if (a == 7 && b == 6 && c == 1)
    bar ();
}

void
f3 (struct S s, struct S t)
{
  if (s.a == t.a && s.b == t.b && s.c == t.c)
    bar ();
}

void
f4 (struct S s, struct S t)
{
  unsigned a = s.a;
  unsigned b = s.b;
  unsigned c = s.c;
  unsigned d = t.a;
  unsigned e = t.b;
  unsigned f = t.c;
  if (a == d && b == e && c == f)
    bar ();
}

Comment 20 Aldy Hernandez 2018-01-24 19:07:08 UTC

Created attachment 43233 [details]
WIP that fixes PR, but causes other regressions

I am attaching a proof of concept hack that fixes this PR by moving the optimize_bit_field_compare functionality from the front-end to the gimple reassoc pass (picked at random, we could do it anywhere else).

It recognizes the following pattern:

     _1 = some.bit_field;
     if (_1 == 99)

...and converts it to a BIT_FIELD_REF + comparison.

Although it works for the PR, it creates various regressions, because it seems fold_truth_andor can create optimized bit field comparisons as well, which would be challenging to deconstruct later in gimple.

For example.  Imagine this:

enum output_type
{
  type_pde,
  type_relocatable,
  type_pie,
  type_dll,
};

struct bfd_link_info
{
  enum output_type type : 2;
};

void test_pic (struct bfd_link_info *info)
{
  if ((((info)->type == type_dll) || ((info)->type == type_pie)))
    blah();
}

Thanks to fold_truth_andor (fold_range_test specifically), if we were to delay optimize_bit_field_compare(), we would generate the following coming out of the front-end:

  if (info->type + 2 <= 1)
    blah();

That is, not only have we merged the set of bit field comparisons, we have also introduced an unsigned overflow to make the comparison simpler.

So now our gimple optimize bit field compare function would also have to deal with this:

  _1 = info_7(D)->type;
  _2 = _1 + 2;
  if (_2 <= 1)

and perhaps convert it to:

   _5 = BIT_FIELD_REF <*info, 8, 0>
   _6 = (_5 + 2) & 3
   if (_6 <= 1)

My worry is that we can handle optimizations of the following quite easily (attached patch):

     _1 = some.bit_field;
     if (_1 == 99)

but with things like the following, it starts getting harder and harder:

  _1 = info_7(D)->type;
  _2 = _1 + 2;
  if (_2 <= 1)

And I'm pretty sure fold_truth_andor() has a lot more tricks up its sleeve, that we'll have to mop up in gimple.  All in all, I doubt we can sensibly move optimize_bit_field_compare() to the middle end, without having to bend over backwards to reconstruct things or port some of fold_truth_andor() to the middle end as well.  I doubt this is stage4 material.

I can certainly teach my WIP to handle the _2 = _1 + 2 case above without porting fold_truth_andor*() to the middle end, but I'm afraid this is just one of the many scenarios we'll have to deal with.  I could be wrong, and could certainly take a stab at it.

Oh, and I'm completely ignoring the reassocation suggestions Jakub makes in comment 19.  That would make even more work for stage4 :).

Thoughts?

Comment 21 Richard Biener 2018-01-25 08:23:06 UTC

GCC 7.3 is being released, adjusting target milestone.

Comment 22 Aldy Hernandez 2018-01-26 16:26:13 UTC

FWIW, the C++ front-end would also need changes if we move optimize_bit_field_compare to gimple, as the constexpr code (cxx_eval_bit_field_ref()) can only handle optimized BIT_FIELD_REFs.

For instance, it cannot look into an array of bit-fields as it's expanding a constructor, because it was expecting optimize_bit_field_compare() to have flattened things out first.  It would die on this:

(gdb) print debug_generic_stmt (whole)
{.u=0, .a={{.i=5, .k=0, .l=2}}, .b={{.j=6}}}

Comment 23 Jakub Jelinek 2018-01-26 16:36:35 UTC

What would create those BIT_FIELD_REFs so early though?  They should stay as COMPONENT_REFs.

Comment 24 Aldy Hernandez 2018-01-26 16:54:27 UTC

(In reply to Jakub Jelinek from comment #23)
> What would create those BIT_FIELD_REFs so early though?  They should stay as
> COMPONENT_REFs.

I thought you'd never ask... why our friend fold_truth_andor_1 :).

      /* Make a mask that corresponds to both fields being compared.
	 Do this for both items being compared.  If the operands are the
	 same size and the bits being compared are in the same position
	 then we can do this by masking both and comparing the masked
	 results.  */

Comment 25 Richard Biener 2019-11-14 07:50:30 UTC

The GCC 7 branch is being closed, re-targeting to GCC 8.4.

Comment 26 Jakub Jelinek 2020-03-04 09:32:24 UTC

GCC 8.4.0 has been released, adjusting target milestone.

Comment 27 Jeffrey A. Law 2020-03-09 14:42:49 UTC

So I just prototyped a bit of code that might help with this BZ.

This seems better suited for match.pd, except that match.pd doesn't seem to want to handle BIT_FIELD_REF nodes.  So I did the prototype in forwprop.

Given a BIT_AND_EXPR which just masks off high bits that is fed by a BIT_FIELD_REF, we can adjust the # of bits in the BIT_FIELD_REF and the type of the reference.  The BIT_AND turns into a nop-conversion.

So the key blocks start like this:

;;   basic block 2, loop depth 0, maybe hot
;;    prev block 0, next block 3, flags: (NEW, VISITED)
;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
  _13 = __builtin_object_size (tp_11(D), 0);
  _14 = &tp_11(D)->D.2292;
  .UBSAN_OBJECT_SIZE (_14, 13, _13, 0);
  _1 = tp_11(D)->chrono_type;
  _2 = (int) _1;
  if (_1 == 0)
    goto <bb 3>; [INV]
  else
    goto <bb 5>; [INV]
;;    succ:       3 (TRUE_VALUE,EXECUTABLE)
;;                5 (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, maybe hot
;;    prev block 2, next block 4, flags: (NEW, VISITED)
;;    pred:       2 (TRUE_VALUE,EXECUTABLE)
  _3 = BIT_FIELD_REF <*tp_11(D), 8, 96>;
  _4 = _3 & 3;
  if (_4 != 0)
    goto <bb 4>; [INV]
  else
    goto <bb 5>; [INV]
;;    succ:       4 (TRUE_VALUE,EXECUTABLE)
;;                5 (FALSE_VALUE,EXECUTABLE)

And the bits in block #3 turn into:

;;   basic block 3, loop depth 0, maybe hot
;;    prev block 2, next block 4, flags: (NEW, VISITED)
;;    pred:       2 (TRUE_VALUE,EXECUTABLE)
  _9 = BIT_FIELD_REF <*tp_11(D), 2, 96>;
  _4 = (unsigned char) _9;
  if (_9 != 0)
    goto <bb 4>; [INV]
  else
    goto <bb 5>; [INV]
;;    succ:       4 (TRUE_VALUE,EXECUTABLE)
;;                5 (FALSE_VALUE,EXECUTABLE)


That's good enough that FRE can see the redundancy between tp->chrono_type and the BIT_FIELD_REF and all the right things just happen from that point onward.

Comment 28 Jakub Jelinek 2021-05-14 09:49:08 UTC

GCC 8 branch is being closed.

Comment 29 Richard Biener 2021-06-01 08:09:05 UTC

GCC 9.4 is being released, retargeting bugs to GCC 9.5.

Comment 30 Richard Biener 2022-05-27 09:37:26 UTC

GCC 9 branch is being closed

Comment 31 Jakub Jelinek 2022-06-28 10:33:29 UTC

GCC 10.4 is being released, retargeting bugs to GCC 10.5.

Comment 32 Richard Biener 2023-07-07 10:32:22 UTC

GCC 10 branch is being closed.

Comment 33 Richard Biener 2024-07-19 13:00:25 UTC

GCC 11 branch is being closed.