cat test.c static long a; static unsigned b; void foo(void); int main() { long c, e; c = b = a; e = c ? 2 / (c + 1) : 0; if (e && !b) foo(); a = 0; } 11.2.0 at -O3 can eliminate the call to foo but trunk at -O3 cannot: gcc-11 -v Target: x86_64-pc-linux-gnu Configured with: ../configure --disable-multilib --disable-bootstrap --enable-languages=c,c++ Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.2.0 (GCC) gcc-11 test.c -S -O3 cat test.s .file "test.c" .text .section .text.startup,"ax",@progbits .p2align 4 .globl main .type main, @function main: .LFB0: .cfi_startproc movq $0, a(%rip) xorl %eax, %eax ret .cfi_endproc .LFE0: .size main, .-main .local a .comm a,8,8 .ident "GCC: (GNU) 11.2.0" .section .note.GNU-stack,"",@progbits gcc-trunk -v Target: x86_64-pc-linux-gnu Configured with: ../configure --disable-multilib --disable-bootstrap --enable-languages=c,c++ Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.0.0 20210930 (experimental) (GCC) gcc-trunk test.c -S -O3 cat test.s .file "test.c" .text .section .text.startup,"ax",@progbits .p2align 4 .globl main .type main, @function main: .LFB0: .cfi_startproc movq a(%rip), %rcx movq %rcx, %rsi andl $4294967295, %esi je .L13 movl $2, %eax addq $1, %rsi cqto idivq %rsi testb $1, %al je .L13 testl %ecx, %ecx je .L17 .L13: movq $0, a(%rip) xorl %eax, %eax ret .L17: pushq %rax .cfi_def_cfa_offset 16 call foo xorl %edx, %edx xorl %eax, %eax movq %rdx, a(%rip) popq %rcx .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size main, .-main .local a .comm a,8,8 .ident "GCC: (GNU) 12.0.0 20210930 (experimental)" .section .note.GNU-stack,"",@progbits
Started with r12-476-gd846f225c25c5885.
FRE1 has the following difference, simplifying the (unsigned int) truncation. <bb 2> : a.0_1 = a; _2 = (unsigned int) a.0_1; b = _2; - c_10 = (long int) _2; + _6 = a.0_1 & 4294967295; + c_10 = _6; if (c_10 != 0) goto <bb 3>; [INV] else where the EVRP which now uses ranger retains (diff from GCC 11 to trunk): <bb 2> : a.0_1 = a; _2 = (unsigned int) a.0_1; b = _2; - c_10 = (long int) _2; + _6 = a.0_1 & 4294967295; + c_10 = _6; if (c_10 != 0) goto <bb 3>; [INV] else - goto <bb 4>; [INV] + goto <bb 6>; [INV] <bb 3> : _4 = c_10 + 1; iftmp.2_12 = 2 / _4; + if (iftmp.2_12 != 0) + goto <bb 4>; [INV] + else + goto <bb 6>; [INV] <bb 4> : + if (_2 == 0) + goto <bb 5>; [INV] + else + goto <bb 6>; [INV] + + <bb 5> : + foo (); + + <bb 6> : a = 0; return 0; after EVRP we have + # RANGE [0, 4294967295] NONZERO 4294967295 c_10 = _6; ... + # RANGE [2, 4294967296] NONZERO 8589934591 _4 = c_10 + 1; + # RANGE [0, 1] NONZERO 1 iftmp.2_12 = 2 / _4; if (iftmp.2_12 != 0) what we did in GCC 11 is simplified the following check <bb 4> : if (_2 == 0) goto <bb 5>; [INV] else goto <bb 6>; [INV] based on iftmp.2_12 == [1, 1] via ranger and evrp visiting BB4 Visiting controlling predicate if (iftmp.2_12 != 0) Adding assert for iftmp.2_12 from iftmp.2_12 != 0 Intersecting long int ~[0, 0] EQUIVALENCES: { iftmp.2_12 } (1 elements) and long int [0, 1] to long int [1, 1] EQUIVALENCES: { iftmp.2_12 } (1 elements) Intersecting long int [0, 1] and long int [1, 1] to long int [1, 1] pushing new range for iftmp.2_12: long int [1, 1] EQUIVALENCES: { iftmp.2_12 } (1 elements) evrp visiting stmt if (_2 == 0) Folding statement: if (_2 == 0) Visiting conditional with predicate: if (_2 == 0) With known ranges _2: unsigned int [1, +INF] EQUIVALENCES: { _2 } (1 elements) Predicate evaluates to: 0 Folded into: if (0 != 0) that's now missing, somehow due to the folded IL if the bisect is correct.
To elide the foo(), _2 must be non-zero on the 2->3 edge dominating the call. Interestingly, a.0_1 is non-zero on the 2->3 edge, and we have: _2 = (unsigned int) a.0_1 but somehow we have no knowledge of _2. Andrew, wanna take a stab at this? =========== BB 2 ============ Imports: a.0_1 Exports: a.0_1 _6 c_10 _2 : a.0_1(I) _6 : a.0_1(I) c_10 : a.0_1(I) _6 Equivalence set : [] Equivalence set : [_6, c_10] <bb 2> : a.0_1 = a; _2 = (unsigned int) a.0_1; b = _2; _6 = a.0_1 & 4294967295; c_10 = _6; if (c_10 != 0) goto <bb 3>; [INV] else goto <bb 6>; [INV] _6 : long int [0, 4294967295] c_10 : long int [0, 4294967295] 2->3 (T) a.0_1 : long int [-INF, -1][1, +INF] 2->3 (T) _6 : long int [1, 4294967295] 2->3 (T) c_10 : long int [1, 4294967295] 2->6 (F) a.0_1 : long int [-INF, -4294967296][0, +INF] 2->6 (F) _6 : long int [0, 0] 2->6 (F) c_10 : long int [0, 0]
(In reply to Richard Biener from comment #2) > FRE1 has the following difference, simplifying the (unsigned int) truncation. > > <bb 2> : > a.0_1 = a; > _2 = (unsigned int) a.0_1; > b = _2; > - c_10 = (long int) _2; > + _6 = a.0_1 & 4294967295; > + c_10 = _6; > if (c_10 != 0) > goto <bb 3>; [INV] > else > Why does FRE make this transformation/simplification? It removes a relationship between c_10 and _2. The reason ranger no longer can fold _2 == 0 is because the sequence is now: a.0_1 = a; _2 = (unsigned int) a.0_1; b = _2; _6 = a.0_1 & 4294967295; c_10 = _6; if (c_10 != 0) goto <bb 3>; [INV] We do not find _2 is non-zero on the outgoing edge because _2 is not related to the calculation in the condition. (ie c_10 no longer has a dependency on _2) We do recalculate _2 based on the outgoing range of a.0_1, but with it being a 64 bit value and _2 being 32 bits, we only know the outgoing range of a.0_1 is non-zero.. we dont track any of the upper bits... 2->3 (T) a.0_1 : long int [-INF, -1][1, +INF] And when we recalculate _2 using that value, we still get varying because 0xFFFF0000 in not zero, but can still produce a zero in _2. The problem is that the condition c_10 != 0 no longer related to the value of _2 in the IL... so ranger never sees it. and we cant represent the 2^16 subranges that end in [1,0xFFFF]. Before that transformation, _2 = (unsigned int) a.0_1; b = _2; c_10 = (long int) _2; The relationship is obvious, and ranger would relate the c_10 != 0 to _2 no problem.
On Fri, 1 Oct 2021, amacleod at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540 > > --- Comment #4 from Andrew Macleod <amacleod at redhat dot com> --- > > > (In reply to Richard Biener from comment #2) > > FRE1 has the following difference, simplifying the (unsigned int) truncation. > > > > <bb 2> : > > a.0_1 = a; > > _2 = (unsigned int) a.0_1; > > b = _2; > > - c_10 = (long int) _2; > > + _6 = a.0_1 & 4294967295; > > + c_10 = _6; > > if (c_10 != 0) > > goto <bb 3>; [INV] > > else > > > > Why does FRE make this transformation/simplification? It's a match.pd transform that transforms a zero-extend from a smaller precision via two NOP_EXPRs to a single BIT_AND_EXPR which is better and more canonical on GIMPLE. > It removes a > relationship between c_10 and _2. The reason ranger no longer can fold _2 == 0 > is because the sequence is now: > > a.0_1 = a; > _2 = (unsigned int) a.0_1; > b = _2; > _6 = a.0_1 & 4294967295; > c_10 = _6; > if (c_10 != 0) > goto <bb 3>; [INV] > > We do not find _2 is non-zero on the outgoing edge because _2 is not related to > the calculation in the condition. (ie c_10 no longer has a dependency on _2) > > We do recalculate _2 based on the outgoing range of a.0_1, but with it being a > 64 bit value and _2 being 32 bits, we only know the outgoing range of a.0_1 is > non-zero.. we dont track any of the upper bits... > 2->3 (T) a.0_1 : long int [-INF, -1][1, +INF] > And when we recalculate _2 using that value, we still get varying because > 0xFFFF0000 in not zero, but can still produce a zero in _2. > > The problem is that the condition c_10 != 0 no longer related to the value of > _2 in the IL... so ranger never sees it. and we cant represent the 2^16 > subranges that end in [1,0xFFFF]. > > Before that transformation, > _2 = (unsigned int) a.0_1; > b = _2; > c_10 = (long int) _2; > The relationship is obvious, and ranger would relate the c_10 != 0 to _2 no > problem. I see - too bad. Note the transform made the dependence chain of _6 one instruction shorter without increasing the number of instructions so it's a profitable transform. Btw, the relation is still there but only indirectly via a.0_1. The old (E)VRP had this find_asserts(?) that produced assertions based on the definitions - sth that now range-ops does(?), so it would eventually have built assertions for a.0_1 for both conditions and allow relations based on that? I can't seem to find my way around the VRP code now - pieces moved all over the place and so my mind fails me on the searching task :/
> > > It removes a > > relationship between c_10 and _2. The reason ranger no longer can fold _2 == 0 > > is because the sequence is now: > > > > a.0_1 = a; > > _2 = (unsigned int) a.0_1; > > b = _2; > > _6 = a.0_1 & 4294967295; > > c_10 = _6; > > if (c_10 != 0) > > goto <bb 3>; [INV] > > > > We do not find _2 is non-zero on the outgoing edge because _2 is not related to > > the calculation in the condition. (ie c_10 no longer has a dependency on _2) > > > > We do recalculate _2 based on the outgoing range of a.0_1, but with it being a > > 64 bit value and _2 being 32 bits, we only know the outgoing range of a.0_1 is > > non-zero.. we dont track any of the upper bits... > > 2->3 (T) a.0_1 : long int [-INF, -1][1, +INF] > > And when we recalculate _2 using that value, we still get varying because > > 0xFFFF0000 in not zero, but can still produce a zero in _2. > > > > The problem is that the condition c_10 != 0 no longer related to the value of > > _2 in the IL... so ranger never sees it. and we cant represent the 2^16 > > subranges that end in [1,0xFFFF]. > > > > Before that transformation, > > _2 = (unsigned int) a.0_1; > > b = _2; > > c_10 = (long int) _2; > > The relationship is obvious, and ranger would relate the c_10 != 0 to _2 no > > problem. > > I see - too bad. Note the transform made the dependence chain of _6 > one instruction shorter without increasing the number of instructions > so it's a profitable transform. > > Btw, the relation is still there but only indirectly via a.0_1. The > old (E)VRP had this find_asserts(?) that produced assertions based > on the definitions - sth that now range-ops does(?), so it would > eventually have built assertions for a.0_1 for both conditions and > allow relations based on that? I can't seem to find my way around > the VRP code now - pieces moved all over the place and so my mind > fails me on the searching task :/ We do know that a.0_1 is non-zero on that edge: 2->3 (T) a.0_1 : long int [-INF, -1][1, +INF] the problem is that we can't currently represent that the bitmask operation causes all patterns ending in 0x00000000 to not occur.. we just leave it at ~[0,0]. which isn't sufficient for this use case. we don't currently track any equivalences between values of different precision.. (even though ranger once did). Handling it as a general equivalence was fraught with issues. We might be able to add a new equivalence class "slice" or something.. I had considered it but hadn't seen a great need case. This would make _6 a 32 bit slice of a.0_1 with range [1, 0xffffffff]. Then when we are querying for the cast _2 = (unsigned int) a.0_1; we could also query the 32 bit equivalence slices of a.0_1, find _6, and get the outgoing range of [1,0xffffffff].. and apply that value. It would probably resolve an entire class of things where we don't recognize an equivalence between a cast and a bitmask of equivalent precision. This would also mean the reverse would apply.. ie if we instead branched on _2 != 0 we would also understand that _6 will be non-zero.
On Mon, 4 Oct 2021, amacleod at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540 > > --- Comment #6 from Andrew Macleod <amacleod at redhat dot com> --- > > > > > > It removes a > > > relationship between c_10 and _2. The reason ranger no longer can fold _2 == 0 > > > is because the sequence is now: > > > > > > a.0_1 = a; > > > _2 = (unsigned int) a.0_1; > > > b = _2; > > > _6 = a.0_1 & 4294967295; > > > c_10 = _6; > > > if (c_10 != 0) > > > goto <bb 3>; [INV] > > > > > > We do not find _2 is non-zero on the outgoing edge because _2 is not related to > > > the calculation in the condition. (ie c_10 no longer has a dependency on _2) > > > > > > We do recalculate _2 based on the outgoing range of a.0_1, but with it being a > > > 64 bit value and _2 being 32 bits, we only know the outgoing range of a.0_1 is > > > non-zero.. we dont track any of the upper bits... > > > 2->3 (T) a.0_1 : long int [-INF, -1][1, +INF] > > > And when we recalculate _2 using that value, we still get varying because > > > 0xFFFF0000 in not zero, but can still produce a zero in _2. > > > > > > The problem is that the condition c_10 != 0 no longer related to the value of > > > _2 in the IL... so ranger never sees it. and we cant represent the 2^16 > > > subranges that end in [1,0xFFFF]. > > > > > > Before that transformation, > > > _2 = (unsigned int) a.0_1; > > > b = _2; > > > c_10 = (long int) _2; > > > The relationship is obvious, and ranger would relate the c_10 != 0 to _2 no > > > problem. > > > > I see - too bad. Note the transform made the dependence chain of _6 > > one instruction shorter without increasing the number of instructions > > so it's a profitable transform. > > > > Btw, the relation is still there but only indirectly via a.0_1. The > > old (E)VRP had this find_asserts(?) that produced assertions based > > on the definitions - sth that now range-ops does(?), so it would > > eventually have built assertions for a.0_1 for both conditions and > > allow relations based on that? I can't seem to find my way around > > the VRP code now - pieces moved all over the place and so my mind > > fails me on the searching task :/ > > We do know that a.0_1 is non-zero on that edge: > 2->3 (T) a.0_1 : long int [-INF, -1][1, +INF] > > the problem is that we can't currently represent that the bitmask operation > causes all patterns ending in 0x00000000 to not occur.. we just leave it at > ~[0,0]. which isn't sufficient for this use case. Hmm, but we do have nonzero bits on SSA where we also store global ranges, so there is a way to store the info and you could intersect the sliced range produced from it with the range you got? > we don't currently track any equivalences between values of different > precision.. (even though ranger once did). Handling it as a general > equivalence was fraught with issues. > > We might be able to add a new equivalence class "slice" or something.. I had > considered it but hadn't seen a great need case. This would make _6 a 32 bit > slice of a.0_1 with range [1, 0xffffffff]. > Then when we are querying for the cast > _2 = (unsigned int) a.0_1; > we could also query the 32 bit equivalence slices of a.0_1, find _6, and get > the outgoing range of [1,0xffffffff].. and apply that value. > > It would probably resolve an entire class of things where we don't recognize an > equivalence between a cast and a bitmask of equivalent precision. > > This would also mean the reverse would apply.. ie if we instead branched on _2 > != 0 we would also understand that _6 will be non-zero. I believe tracking known zero/one bits in addition to a range is more useful - would that help in this case?
> > > > It would probably resolve an entire class of things where we don't recognize an > > equivalence between a cast and a bitmask of equivalent precision. > > > > This would also mean the reverse would apply.. ie if we instead branched on _2 > > != 0 we would also understand that _6 will be non-zero. > > I believe tracking known zero/one bits in addition to a range is more > useful - would that help in this case? Thats in plan and what I first thought would fix it. Reflecting on this problem however, it would only help on the zero side where all the bits are known zero, but the non-zero property we are looking for can't be reflected by known ones or zeros.. Unfortunately I don't see how we can solve this particular problem by tracking known bit values.. there wont be any known 1s or 0s in a.0_1... just a particular bit pattern that cannot occur. Another option would be if I can get a cheap enough low-opt pass of evrp (also on my list) we could run it really early before things get rearranged and then run the higher levels later. Anyway, I'll keep thinking about this...
I think the GCC 12 IL would require tracking equivalences on parts of registers, in this case that _2 is equal to the low part of a.0_1. That is, one would need to extend what CCP does with bit propagation to include equivalences. There's also the if ((long)unsigned-var != 0) -> if (unsigned-var != 0) transform that could save us here but we have that guarded with single_use () and the converted value is used after the branching but not the source so we'd have increased register pressure. I do not see a good way to fix this issue, slightly altering the testcase to do the undesired transform on the source level static long a; static unsigned b; void foo(void); int main() { long c, e; b = a; c = (unsigned)a; e = c ? 2 / (c + 1) : 0; if (e && !b) foo(); a = 0; } causes the issue to appear also with GCC 11.
(In reply to Richard Biener from comment #9) > I think the GCC 12 IL would require tracking equivalences on parts of > registers, > in this case that _2 is equal to the low part of a.0_1. That is, one would > need to extend what CCP does with bit propagation to include equivalences. > I have in my workqueue some thoughts for GCC13 where we can track partial equivalences so we can see through different casts and masks. a.0_1 = a; _2 = (unsigned int) a.0_1; b = _2; _6 = a.0_1 & 4294967295; c_10 = _6; if (c_10 != 0) goto <bb 3>; [INV] _2 would be a 32 bit equivalence with a.0_1 generated by op_cast, and _6 would also be a 32 bit equivalence with a.0_1 generated op_bitwise_and. They would all end up in the same partial equivalence set. and when a range for anything in that set falls into the bitrange of the equivalence size, like the [0,0] , then it can be applied (with some restrictions), to other members of the set. Its an extension of some thought I had for trying to relates different size casts to each other, just extended to include masking. There is a series of other PRs which would be fixed by this sort of ability. it has become prevalent enough to warrant the work I think. (Also 79191 91881 102705 102872)
GCC 12.1 is being released, retargeting bugs to GCC 12.2.
GCC 12.2 is being released, retargeting bugs to GCC 12.3.
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>: https://gcc.gnu.org/g:6cc3394507a2303a18891d34222c53f679256c37 commit r13-3281-g6cc3394507a2303a18891d34222c53f679256c37 Author: Andrew MacLeod <amacleod@redhat.com> Date: Wed Oct 5 10:42:07 2022 -0400 propagate partial equivs in the cache. Adjust on-entry cache propagation to look for and propagate both full and partial equivalences. gcc/ PR tree-optimization/102540 PR tree-optimization/102872 * gimple-range-cache.cc (ranger_cache::fill_block_cache): Handle partial equivs. (ranger_cache::range_from_dom): Cleanup dump output. gcc/testsuite/ * gcc.dg/pr102540.c: New. * gcc.dg/pr102872.c: New.
fixed.