Created attachment 41856 [details] reduced version of linux/net/ipv4/tcp_output.c Compiling the Linux kernel with gcc-7.1.1 and ubsan, I get this warning: net/ipv4/tcp_output.c: In function 'tcp_connect': net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds] tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; ^~ net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds] tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ I have manually reduced the file to the attached version (this can be reduced further, I decided to leave a little more context for clarity). The warning is an array dereference after a range check: if (tp->chrono_type > TCP_CHRONO_UNSPEC) tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; so it clearly cannot be below the bounds. In the original version, this happens specifically when at least one of -fsanitize=object-size, -fsanitize=alignment, or -fsanitize=null is set in addition to "-O2 -Wall", but not when all three are disabled. In the reduced version, I can also reproduce it with "-Os -Wall" (without ubsan). I also see the problem with gcc-7.0.1 on all architectures I tried (arm, arm64 and x86), but not with gcc-6.3.1.
Confirmed, started with r239421, which is the commit also mentioned in PR80202.
The -Warray-bounds warning has false positives, so you can run into them and in that case using -Werror isn't really a good idea. The problem is that the same bitfield, tp->chrono_start, is accessed 2 different ways in the IL before vrp1: _5 = tp_2(D)->chrono_type; if (_5 == 0) in one spot, and _12 = BIT_FIELD_REF <*tp_2(D), 8, 128>; _13 = _12 & 3; if (_13 != 0) in another. The optimizers don't treat those two as the same thing, and we have thus _5 used in the first condition, _13 in second one and then again _5. From the first condition it determines that _5 must be zero in that branch (that is the if (type > tp->chrono_type) condition, where you do nothing if tp->chrono_type is not 0), it doesn't know that also implies that _13 is 0 to optimize that as dead code, and finally sees array index of [_5 - 1], and when _5 is known to be 0, that means [-1], which is invalid. It is invalid, but in dead code.
The reason this doesn't trigger without -fsanitize= is that without the extra UBSAN_* internal function calls we manage to find out earlier that this is a dead code and optimize it away completely (during cddce1).
Thanks for the analysis. I have now submitted a local workaround for the kernel, adding a temporary variable to avoid accessing the bitfield twice, see https://patchwork.kernel.org/patch/9869037/
So what's the right way to fix this? To move optimize_bit_field_compare() from fold_binary to match.pd so that the conditions on
(In reply to Patrick Palka from comment #5) > So what's the right way to fix this? To move optimize_bit_field_compare() > from fold_binary to match.pd so that the conditions on ... so that conditions on tp->chrono_type get consistently transformed into BIT_FIELD_REFs, or to remove optimize_bit_field_compare() altogether? It seems like a rather low-level optimization to be done in GENERIC/GIMPLE.
optimize_bit_field_compare() is a premature red herring...
The two key blocks are: bb2: _3 = __builtin_object_size (tp_2(D), 0); _4 = &tp_2(D)->D.2254; GIMPLE_NOP _5 = tp_2(D)->chrono_type; if (_5 == 0) goto <bb 3>; [50.00%] else goto <bb 6>; [50.00%] bb3: now_6 = tcp_jiffies32; _7 = BIT_FIELD_REF <*tp_2(D), 8, 128>; _8 = _7 & 3; if (_8 != 0) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] Where the out of bounds access occurs in BB4 which can only be reached via BB3. We essentially need to prove that _5 and _8 are equivalent. The only good news is that the edge 2->3 dominates bb3 so this could (in theory) be handled with good equivalence processing without jump threading. Are we allowed to use types like this in a gimple conditional? <unnamed-unsigned:2> _5; If so, then one approach would be first focus on BB3. We'd want to combine the BIT_FIELD_REF and masking into a single BIT_FIELD_REF and test the result of that without conversion. Could forwprop handle that perhaps? Once the BIT_FIELD_REF just reads two bits, then we'd have a fighting chance of realizing that the BIT_FIELD_REF is just a reference to tp_2->chrono_type. Which we could lookup in the hash table has _5 which has a known constant value of zero. Not working on this, but figured I'd at least chime in with some thoughts on how we might be able to approach...
I'll take a look.
Further reduced testcase. Can be reproduced with -O2 -Wall. struct tcp_sock { int chrono_stat[3]; unsigned char chrono_type:2; }; void tcp_chrono_set(struct tcp_sock *tp) { int type = 1; if (type > tp->chrono_type) { if (tp->chrono_type > 0) tp->chrono_stat[tp->chrono_type - 1] = 1234; } }
What do the dumps look like? In particular evrp & vrp1?
The warning occurs in vrp1, not evrp, but for the record... evrp dump: tcp_chrono_set (struct tcp_sock * tp) { int type; <unnamed-unsigned:2> _1; int _2; unsigned char _3; unsigned char _4; <bb 2> : _1 = tp_11(D)->chrono_type; _2 = (int) _1; if (_1 == 0) goto <bb 3>; [INV] else goto <bb 5>; [INV] <bb 3> : _3 = BIT_FIELD_REF <*tp_11(D), 8, 96>; _4 = _3 & 3; if (_4 != 0) goto <bb 4>; [INV] else goto <bb 5>; [INV] <bb 4> : tp_11(D)->chrono_stat[-1] = 1234; <bb 5> : return; } vrp1 dump with ASSERT_EXPRs: <bb 2> [local count: 1073741825]: _1 = tp_6(D)->chrono_type; tp_8 = ASSERT_EXPR <tp_6(D), tp_6(D) != 0B>; if (_1 == 0) goto <bb 3>; [50.00%] else goto <bb 5>; [50.00%] <bb 3> [local count: 536870912]: _2 = BIT_FIELD_REF <*tp_8, 8, 96>; _3 = _2 & 3; if (_3 != 0) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] <bb 4> [local count: 268435456]: tp_8->chrono_stat[-1] = 1234; vrp1 dump after ASSERT_EXPRs have been removed: <bb 2> [local count: 1073741825]: _1 = tp_6(D)->chrono_type; if (_1 == 0) goto <bb 3>; [50.00%] else goto <bb 5>; [50.00%] <bb 3> [local count: 536870912]: _2 = BIT_FIELD_REF <*tp_6(D), 8, 96>; _3 = _2 & 3; if (_3 != 0) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] <bb 4> [local count: 268435456]: tp_6(D)->chrono_stat[-1] = 1234; On Wed, Jan 17, 2018 at 2:19 PM, law at redhat dot com <gcc-bugzilla@gcc.gnu.org> wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601 > > --- Comment #11 from Jeffrey A. Law <law at redhat dot com> --- > What do the dumps look like? In particular evrp & vrp1? > > -- > You are receiving this mail because: > You are on the CC list for the bug. > You are the assignee for the bug.
I realize the warnings are happening in VRP1. EVRP and VRP1 use different styles of analysis and it can be the case that one is better suited for cleaning things up than the other. But I really should have re-familiarized myself with the BZ first :-) We need to prove that the reference to chrono_type and the BIT_FIELD_REF are actually the same. That way the result of the first conditional can be used to prove the second conditional is always false making the out of bounds reference unreachable. It probably needs to be done in FRE since that's our best bet to expose the relationship between the conditionals prior to VRP1.
I think the easiest fix is move optimize_bit_field_compare from where it is currently done and do it only in some late gimple pass, around pass_fold_builtins/store merging or so? Dunno which exact one though.
(In reply to Jakub Jelinek from comment #14) > I think the easiest fix is move optimize_bit_field_compare from where it is > currently done and do it only in some late gimple pass, around > pass_fold_builtins/store merging or so? Dunno which exact one though. Delaying optimize_bit_field_compare, would definitely do the trick. For that matter, disabling it, at least in this particular case, causes FRE to sensibly realize the nonsense in: <bb 2> : _1 = tp_10(D)->chrono_type; _2 = (int) _1; if (_1 == 0) goto <bb 3>; [INV] else goto <bb 5>; [INV] <bb 3> : _3 = tp_10(D)->chrono_type; if (_3 != 0) goto <bb 4>; [INV] else goto <bb 5>; [INV] ...and clean things up. By VRP we no longer have a problematic store, and the warning is gone. However, wouldn't optimizing bit field compares so late inhibit other optimizations that come before store merging??? Though come to think of it, optimizing bit fields so early (in the FE currently) seems mighty early. I don't know whether Richard was frowning upon optimize_bit_field_compare twiddling in comment 7. Also, doing it in gimple, instead of the FE, means we have to recreate the bit field comparison that's been gimplified. We no longer have a comparison but: _3 = tp_10(D)->chrono_type; if (_3 != 0) or worse: _1 = tp_10(D)->chrono_type; _2 = (int) _1; if (_1 == 0) I'm certainly willing to tackle this, but I want to make sure I don't throw cycles away. Thoughts? Richard, Jeff, Jakub?
Well, my understanding of how other compilers have handled bitfields would indicate that deferring optimization of them until later is advisable. Essentially they pretended bitfields had word precision though most of the early optimization passes, then lowered things to their actual size later. I'm not entirely sure how to interpret Richi's c#7. He could be saying that optimize_bit_field is premature optimization and if so, deferring it until a suitable point in gimple would be a good thing. Or he could be saying that he doesn't think that's the real issue, just a red herring. Though your experiments tend to indicate that it is a real issue here. ISTM that either we defer bitfield optimization (which may have fallout) or we try to teach FRE to recognize that the COMPONENT_REF and the BIT_FIELD_REF are the same. I'd say experiment with the former and see how bad the fallout looks. I wonder if this BZ argues that we ought to have a canonical form when a COMPONENT_REF and BIT_FIELD_REF express the same thing -- and if so, which ought to be the canonical form?
Sure, for working on GIMPLE the code needs to be adjusted. On the other side, the advantage is that it will then be able to handle even cases that it couldn't before. Like right now it can handle: struct S { int a:5; int b:3; int c:2; }; void bar (void); void foo (struct S s) { if (s.a == 7 && s.b == 6 && s.c == 1) bar (); } but it could also handle void baz (struct S s) { int a = s.a; int b = s.b; int c = s.c; if (a == 7 && b == 6 && c == 1) bar (); } As to where exactly to implement it, as part of which other pass or as a new pass I'd like to defer to richi.
Note, it isn't just optimize_bit_field_compare, but also fold_truth_andor_1 that creates this stuff. Doing this at gimple might have best framework in the reassoc pass, because you need to look through conditions in multiple basic blocks with no significant code in between, but also conditions bitwise ored or anded within a single bb, i.e. the result of how multiple &&s or ||s are lowered. The reassoc pass is the only one with this kind of framework, in optimize_range_tests. Probably it would need to be done only in the second reassoc pass, which, while it is before vrp2, it is only shortly before that and as an advantage it there would be some time gimple passes to clean stuff up if needed.
Testcase where in f1/f3/f5 this optimization is done early (already in *.original dump), but in f2/f4/f6 is not. struct S { unsigned a:5; unsigned b:3; unsigned c:2; }; void bar (void); void f1 (struct S s) { if (s.a == 7 && s.b == 6 && s.c == 1) bar (); } void f2 (struct S s) { unsigned a = s.a; unsigned b = s.b; unsigned c = s.c; if (a == 7 && b == 6 && c == 1) bar (); } void f3 (struct S s, struct S t) { if (s.a == t.a && s.b == t.b && s.c == t.c) bar (); } void f4 (struct S s, struct S t) { unsigned a = s.a; unsigned b = s.b; unsigned c = s.c; unsigned d = t.a; unsigned e = t.b; unsigned f = t.c; if (a == d && b == e && c == f) bar (); }
Created attachment 43233 [details] WIP that fixes PR, but causes other regressions I am attaching a proof of concept hack that fixes this PR by moving the optimize_bit_field_compare functionality from the front-end to the gimple reassoc pass (picked at random, we could do it anywhere else). It recognizes the following pattern: _1 = some.bit_field; if (_1 == 99) ...and converts it to a BIT_FIELD_REF + comparison. Although it works for the PR, it creates various regressions, because it seems fold_truth_andor can create optimized bit field comparisons as well, which would be challenging to deconstruct later in gimple. For example. Imagine this: enum output_type { type_pde, type_relocatable, type_pie, type_dll, }; struct bfd_link_info { enum output_type type : 2; }; void test_pic (struct bfd_link_info *info) { if ((((info)->type == type_dll) || ((info)->type == type_pie))) blah(); } Thanks to fold_truth_andor (fold_range_test specifically), if we were to delay optimize_bit_field_compare(), we would generate the following coming out of the front-end: if (info->type + 2 <= 1) blah(); That is, not only have we merged the set of bit field comparisons, we have also introduced an unsigned overflow to make the comparison simpler. So now our gimple optimize bit field compare function would also have to deal with this: _1 = info_7(D)->type; _2 = _1 + 2; if (_2 <= 1) and perhaps convert it to: _5 = BIT_FIELD_REF <*info, 8, 0> _6 = (_5 + 2) & 3 if (_6 <= 1) My worry is that we can handle optimizations of the following quite easily (attached patch): _1 = some.bit_field; if (_1 == 99) but with things like the following, it starts getting harder and harder: _1 = info_7(D)->type; _2 = _1 + 2; if (_2 <= 1) And I'm pretty sure fold_truth_andor() has a lot more tricks up its sleeve, that we'll have to mop up in gimple. All in all, I doubt we can sensibly move optimize_bit_field_compare() to the middle end, without having to bend over backwards to reconstruct things or port some of fold_truth_andor() to the middle end as well. I doubt this is stage4 material. I can certainly teach my WIP to handle the _2 = _1 + 2 case above without porting fold_truth_andor*() to the middle end, but I'm afraid this is just one of the many scenarios we'll have to deal with. I could be wrong, and could certainly take a stab at it. Oh, and I'm completely ignoring the reassocation suggestions Jakub makes in comment 19. That would make even more work for stage4 :). Thoughts?
GCC 7.3 is being released, adjusting target milestone.
FWIW, the C++ front-end would also need changes if we move optimize_bit_field_compare to gimple, as the constexpr code (cxx_eval_bit_field_ref()) can only handle optimized BIT_FIELD_REFs. For instance, it cannot look into an array of bit-fields as it's expanding a constructor, because it was expecting optimize_bit_field_compare() to have flattened things out first. It would die on this: (gdb) print debug_generic_stmt (whole) {.u=0, .a={{.i=5, .k=0, .l=2}}, .b={{.j=6}}}
What would create those BIT_FIELD_REFs so early though? They should stay as COMPONENT_REFs.
(In reply to Jakub Jelinek from comment #23) > What would create those BIT_FIELD_REFs so early though? They should stay as > COMPONENT_REFs. I thought you'd never ask... why our friend fold_truth_andor_1 :). /* Make a mask that corresponds to both fields being compared. Do this for both items being compared. If the operands are the same size and the bits being compared are in the same position then we can do this by masking both and comparing the masked results. */
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
GCC 8.4.0 has been released, adjusting target milestone.
So I just prototyped a bit of code that might help with this BZ. This seems better suited for match.pd, except that match.pd doesn't seem to want to handle BIT_FIELD_REF nodes. So I did the prototype in forwprop. Given a BIT_AND_EXPR which just masks off high bits that is fed by a BIT_FIELD_REF, we can adjust the # of bits in the BIT_FIELD_REF and the type of the reference. The BIT_AND turns into a nop-conversion. So the key blocks start like this: ;; basic block 2, loop depth 0, maybe hot ;; prev block 0, next block 3, flags: (NEW, VISITED) ;; pred: ENTRY (FALLTHRU,EXECUTABLE) _13 = __builtin_object_size (tp_11(D), 0); _14 = &tp_11(D)->D.2292; .UBSAN_OBJECT_SIZE (_14, 13, _13, 0); _1 = tp_11(D)->chrono_type; _2 = (int) _1; if (_1 == 0) goto <bb 3>; [INV] else goto <bb 5>; [INV] ;; succ: 3 (TRUE_VALUE,EXECUTABLE) ;; 5 (FALSE_VALUE,EXECUTABLE) ;; basic block 3, loop depth 0, maybe hot ;; prev block 2, next block 4, flags: (NEW, VISITED) ;; pred: 2 (TRUE_VALUE,EXECUTABLE) _3 = BIT_FIELD_REF <*tp_11(D), 8, 96>; _4 = _3 & 3; if (_4 != 0) goto <bb 4>; [INV] else goto <bb 5>; [INV] ;; succ: 4 (TRUE_VALUE,EXECUTABLE) ;; 5 (FALSE_VALUE,EXECUTABLE) And the bits in block #3 turn into: ;; basic block 3, loop depth 0, maybe hot ;; prev block 2, next block 4, flags: (NEW, VISITED) ;; pred: 2 (TRUE_VALUE,EXECUTABLE) _9 = BIT_FIELD_REF <*tp_11(D), 2, 96>; _4 = (unsigned char) _9; if (_9 != 0) goto <bb 4>; [INV] else goto <bb 5>; [INV] ;; succ: 4 (TRUE_VALUE,EXECUTABLE) ;; 5 (FALSE_VALUE,EXECUTABLE) That's good enough that FRE can see the redundancy between tp->chrono_type and the BIT_FIELD_REF and all the right things just happen from that point onward.
GCC 8 branch is being closed.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.
GCC 11 branch is being closed.