Created attachment 36661 [details] experimental-patch r226850 will cause gcc generate more ASSERT_EXPR thus trigger some latent issues in gcc tree-vrp pass. For example, we are generating sub-optimal code for gcc.c-torture/compile/20121027-1.c since r226850. For ARM target: ./cc1 -O3 -nostdinc 20121027-1.c -march=armv8-a -mthumb -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -nostdinc before r226850: "bl/64" turned into "bl >> 6" == .L3: asrs r3, r4, #6 add r2, sp, #1024 adds r4, r4, #1 add r3, r2, r3, lsl #3 subw r3, r3, #1019 vld1.64 {d16}, [r3] vmov r0, r1, d16 @ int bl ff ldr r3, [r5] cmp r3, r4 bgt .L3 after ("bl/64" is not turned into "bl >> 6") === .L3: add r3, r4, #63 add r2, sp, #1024 ands r3, r3, r4, asr #32 it cc movcc r3, r4 adds r4, r4, #1 asrs r3, r3, #6 add r3, r2, r3, lsl #3 subw r3, r3, #1019 vld1.64 {d16}, [r3] vmov r0, r1, d16 @ int bl ff ldr r3, [r5] cmp r3, r4 bgt .L3 For mips target: ./cc1 -O2 -march=mips32r2 -mabi=32 20121027-1.c -nostdinc before === $L8: addiu $16,$16,1 sll $2,$2,3 addu $2,$17,$2 lwl $5,9($2) lwl $4,5($2) lwr $5,12($2) lwr $4,8($2) jal ff lw $2,0($18) slt $2,$16,$2 bne $2,$0,$L8 sra $2,$16,6 after === $L9: slt $2,$16,0 movz $3,$16,$2 addiu $16,$16,1 sra $2,$3,6 sll $2,$2,3 addu $2,$17,$2 lwl $5,9($2) lwl $4,5($2) lwr $5,12($2) lwr $4,8($2) jal ff lw $2,0($18) slt $2,$16,$2 bne $2,$0,$L9 addiu $3,$16,63 This is because previously gcc can deduct the value range for the variable "bl", and conclude it will be positive, then later optimization can turn the signed division to right shift thus we can avoid runtime overhead for signed division. After r226850, gcc can't deduct this. The initial cause if r226850 will introduce more ASSERT_EXPR which is fine but it caused problem for tree-vrp. From .vrp1 dump, tree-vrp pass is too conservative at three places: 1. When handling ASSERT_EXPR Visiting statement: c_13 = ASSERT_EXPR <c_1, c_1 < nc.0_4>; Intersecting [-INF, nc.0_4 + -1] and [0, 0] to [-INF, nc.0_4 + -1] the range info should be [0, nc.0_4 + -1]. my understanding is for ASSERT_EXPR <var, var < limit>, if var is SSA_NAME and have valid range info, then it's minimum should be honored. 2. When handling PLUS_EXPR with symbolic range After fixed issue 1, the range of c_13 will be [0, nc.0_4 + -1], but the following PLUS_EXPR range b1_11 to be VARYING which is wrong. Visiting statement: bl_11 = c_13 + 1; Found new range for bl_11: VARYING The range of bl_11 should be [1, nc.0_4]. looks to me the following code at the bottom of extract_range_from_binary_expr_1 is overkilling. At least for PLUS_EXPR/MINUS_EXPR. "cmp == -2" which means there is symbolic range, should be allowed for both, otherwise why there are lots of code in PLUS_EXPR/MINUS_EXPR hunk deliberately handling symbolic range? cmp = compare_values (min, max); if (cmp == -2 || cmp == 1) set_value_range_to_varying (vr); So I think we should relax the condition to not included PLUS_EXPR/MINUS_EXPR when cmp == -2. 3. When handling phi node Even after both 1 and 2 fixed, there still be another issue for phi node. Given, bl_11 now with the range [1, nc.0_4], I found it's range info is not honored when visiting PHI node, looks to me, the following code in vrp_visit_phi_node is overkilling, and I don't know how to relax it properly, if we simply remove this block of code, then gcc can finally get correct range info for the testcase 20121027-1.c and generate optimized instruction sequences. /* Do not allow equivalences or symbolic ranges to leak in from backedges. That creates invalid equivalencies. See PR53465 and PR54767. */ if (e->flags & EDGE_DFS_BACK) { if (vr_arg.type == VR_RANGE || vr_arg.type == VR_ANTI_RANGE) { vr_arg.equiv = NULL; if (symbolic_range_p (&vr_arg)) { vr_arg.type = VR_VARYING; vr_arg.min = NULL_TREE; vr_arg.max = NULL_TREE; } Visiting PHI node: c_1 = PHI <0(2), bl_11(3)> Argument #0 (2 -> 4 executable) 0: [0, 0] Argument #1 (3 -> 4 executable) bl_11: VARYING <--- bl_11 is with VR_RANGE, but forced to VARYING because of it's symbolic range Meeting [0, 0] and VARYING to VARYING attachment is experiment patch to show the bug places from my understanding. Toughts? -- Regards, Jiong
(In reply to Jiong Wang from comment #0) > Created attachment 36661 [details] > experimental-patch > > r226850 will cause gcc generate more ASSERT_EXPR thus trigger some latent > issues in gcc tree-vrp pass. For example, we are generating sub-optimal > code for gcc.c-torture/compile/20121027-1.c since r226850. > > For ARM target: > > ./cc1 -O3 -nostdinc 20121027-1.c -march=armv8-a -mthumb > -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -nostdinc > > before r226850: > > "bl/64" turned into "bl >> 6" > == > .L3: > asrs r3, r4, #6 > add r2, sp, #1024 > adds r4, r4, #1 > add r3, r2, r3, lsl #3 > subw r3, r3, #1019 > vld1.64 {d16}, [r3] > vmov r0, r1, d16 @ int > bl ff > ldr r3, [r5] > cmp r3, r4 > bgt .L3 > > > after ("bl/64" is not turned into "bl >> 6") > === > .L3: > add r3, r4, #63 > add r2, sp, #1024 > ands r3, r3, r4, asr #32 > it cc > movcc r3, r4 > adds r4, r4, #1 > asrs r3, r3, #6 > add r3, r2, r3, lsl #3 > subw r3, r3, #1019 > vld1.64 {d16}, [r3] > vmov r0, r1, d16 @ int > bl ff > ldr r3, [r5] > cmp r3, r4 > bgt .L3 > > For mips target: > > ./cc1 -O2 -march=mips32r2 -mabi=32 20121027-1.c -nostdinc > > before > === > $L8: > addiu $16,$16,1 > sll $2,$2,3 > addu $2,$17,$2 > lwl $5,9($2) > lwl $4,5($2) > lwr $5,12($2) > lwr $4,8($2) > jal ff > lw $2,0($18) > slt $2,$16,$2 > bne $2,$0,$L8 > sra $2,$16,6 > after > === > $L9: > slt $2,$16,0 > movz $3,$16,$2 > addiu $16,$16,1 > sra $2,$3,6 > sll $2,$2,3 > addu $2,$17,$2 > lwl $5,9($2) > lwl $4,5($2) > lwr $5,12($2) > lwr $4,8($2) > jal ff > lw $2,0($18) > slt $2,$16,$2 > bne $2,$0,$L9 > addiu $3,$16,63 > > This is because previously gcc can deduct the value range for the variable > "bl", and conclude it will be positive, then later optimization can turn the > signed division to right shift thus we can avoid runtime overhead for signed > division. After r226850, gcc can't deduct this. The initial cause if r226850 > will introduce more ASSERT_EXPR which is fine but it caused problem for > tree-vrp. > > From .vrp1 dump, tree-vrp pass is too conservative at three places: > > 1. When handling ASSERT_EXPR > > Visiting statement: > c_13 = ASSERT_EXPR <c_1, c_1 < nc.0_4>; > Intersecting > [-INF, nc.0_4 + -1] > and > [0, 0] > to > [-INF, nc.0_4 + -1] > > the range info should be [0, nc.0_4 + -1]. No, the intersection is [0,0]. Basically the code fails to properly intersect and can choose either original range. We at the moment choose the first because always choosing a non-symbolic range regresses symbolic range handling (AFAIR). Note that [0, nc.0_4 + 1] is not correct as nc.0_4 + 1 could be less than zero (in which case the intersection result would be UNDEFINED). So eventually choosing [0, nc.0_4 + 1] is ok. > > my understanding is for ASSERT_EXPR <var, var < limit>, if var is SSA_NAME > and > have valid range info, then it's minimum should be honored. > > 2. When handling PLUS_EXPR with symbolic range > > After fixed issue 1, the range of c_13 will be [0, nc.0_4 + -1], but the > following PLUS_EXPR range b1_11 to be VARYING which is wrong. > > Visiting statement: > bl_11 = c_13 + 1; > Found new range for bl_11: VARYING > > The range of bl_11 should be [1, nc.0_4]. If the type is a wrapping type then nc.0_4 -1 + 1 might wrap around and the range become invalid (an anti-range). Note that symbolic range handling is _very_ limited right now. > looks to me the following code at the bottom of > extract_range_from_binary_expr_1 is overkilling. At least for > PLUS_EXPR/MINUS_EXPR. "cmp == -2" which means there is symbolic range, > should be allowed for both, otherwise why there are lots of code in > PLUS_EXPR/MINUS_EXPR hunk deliberately handling symbolic range? > > cmp = compare_values (min, max); > if (cmp == -2 || cmp == 1) > set_value_range_to_varying (vr); > > So I think we should relax the condition to not included > PLUS_EXPR/MINUS_EXPR when cmp == -2. See above for the overflow issues. For example with using x = [1, nc.0_4] we'd conclude that x > 0 is true but if nc.0_4 is zero this is not true. > 3. When handling phi node > > Even after both 1 and 2 fixed, there still be another issue for phi node. > > Given, bl_11 now with the range [1, nc.0_4], I found it's range info is > not honored when visiting PHI node, looks to me, the following code in > vrp_visit_phi_node is overkilling, and I don't know how to relax it > properly, if we simply remove this block of code, then gcc can finally get > correct range info for the testcase 20121027-1.c and generate optimized > instruction sequences. This case is the hardest and I can't see an easy way out (we've tried). > /* Do not allow equivalences or symbolic ranges to leak in from > backedges. That creates invalid equivalencies. > See PR53465 and PR54767. */ > if (e->flags & EDGE_DFS_BACK) > { > if (vr_arg.type == VR_RANGE > || vr_arg.type == VR_ANTI_RANGE) > { > vr_arg.equiv = NULL; > if (symbolic_range_p (&vr_arg)) > { > vr_arg.type = VR_VARYING; > vr_arg.min = NULL_TREE; > vr_arg.max = NULL_TREE; > } > > > Visiting PHI node: c_1 = PHI <0(2), bl_11(3)> > Argument #0 (2 -> 4 executable) > 0: [0, 0] > Argument #1 (3 -> 4 executable) > bl_11: VARYING <--- bl_11 is with VR_RANGE, but forced to > VARYING because of it's symbolic range > Meeting > [0, 0] > and > VARYING > to > VARYING > > > attachment is experiment patch to show the bug places from my understanding. > > Toughts? I think the issue is that we insert the assert in the first place or that we intersect to a symbolic range this causes us to not use SCEV / loop analysis to get at the range for c_1. That is, in vrp_visit_phi_node the early outs to varying: shouldn't skip /* If we dropped either bound to +-INF then if this is a loop PHI node SCEV may known more about its value-range. */ if ((cmp_min > 0 || cmp_min < 0 || cmp_max < 0 || cmp_max > 0) && (l = loop_containing_stmt (phi)) && l->header == gimple_bb (phi)) adjust_range_with_scev (&vr_result, l, phi, lhs); sth like Index: gcc/tree-vrp.c =================================================================== --- gcc/tree-vrp.c (revision 229804) +++ gcc/tree-vrp.c (working copy) @@ -8827,6 +8805,24 @@ update_range: /* No match found. Set the LHS to VARYING. */ varying: + + /* If we dropped either bound to +-INF then if this is a loop + PHI node SCEV may known more about its value-range. */ + if ((l = loop_containing_stmt (phi)) + && l->header == gimple_bb (phi)) + { + adjust_range_with_scev (&vr_result, l, phi, lhs); + + /* If we will end up with a (-INF, +INF) range, set it to + VARYING. Same if the previous max value was invalid for + the type and we end up with vr_result.min > vr_result.max. */ + if (!((vrp_val_is_max (vr_result.max) + && vrp_val_is_min (vr_result.min)) + || compare_values (vr_result.min, + vr_result.max) > 0)) + goto update_range; + } + set_value_range_to_varying (lhs_vr); return SSA_PROP_VARYING; } which ends up with Value ranges after VRP: c_1: [0, +INF] as desired. Maybe you can take the above and put it to testing. > -- > Regards, > Jiong
(In reply to Richard Biener from comment #1) > I think the issue is that we insert the assert in the first place or > that we intersect to a symbolic range this causes us to not use SCEV / loop > analysis to get at the range for c_1. That is, in vrp_visit_phi_node > the early outs to varying: shouldn't skip > > /* If we dropped either bound to +-INF then if this is a loop > PHI node SCEV may known more about its value-range. */ > if ((cmp_min > 0 || cmp_min < 0 > || cmp_max < 0 || cmp_max > 0) > && (l = loop_containing_stmt (phi)) > && l->header == gimple_bb (phi)) > adjust_range_with_scev (&vr_result, l, phi, lhs); > > sth like > > Index: gcc/tree-vrp.c > =================================================================== > --- gcc/tree-vrp.c (revision 229804) > +++ gcc/tree-vrp.c (working copy) > @@ -8827,6 +8805,24 @@ update_range: > > /* No match found. Set the LHS to VARYING. */ > varying: > + > + /* If we dropped either bound to +-INF then if this is a loop > + PHI node SCEV may known more about its value-range. */ > + if ((l = loop_containing_stmt (phi)) > + && l->header == gimple_bb (phi)) > + { > + adjust_range_with_scev (&vr_result, l, phi, lhs); > + > + /* If we will end up with a (-INF, +INF) range, set it to > + VARYING. Same if the previous max value was invalid for > + the type and we end up with vr_result.min > vr_result.max. */ > + if (!((vrp_val_is_max (vr_result.max) > + && vrp_val_is_min (vr_result.min)) > + || compare_values (vr_result.min, > + vr_result.max) > 0)) > + goto update_range; > + } > + > set_value_range_to_varying (lhs_vr); > return SSA_PROP_VARYING; > } > > which ends up with > > Value ranges after VRP: > > c_1: [0, +INF] > > as desired. Maybe you can take the above and put it to testing. Thanks for the explanation on those issues. The "if" check needs to be guarded by vr_result.type == VR_RANGE, otherwise above draft patch passed my testing. bootstrapping on x86-64 and AArch64 OK. no regresson on both.
On Mon, 9 Nov 2015, jiwang at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68234 > > Jiong Wang <jiwang at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Summary|tree-vrp pass need to be |tree-vrp pass need to be > |improved when handling |improved when handling > |ASSERT/PLUS/MINUS/_EXPR and |ASSERT_EXPR > |Phi node | > > --- Comment #2 from Jiong Wang <jiwang at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #1) > > > I think the issue is that we insert the assert in the first place or > > that we intersect to a symbolic range this causes us to not use SCEV / loop > > analysis to get at the range for c_1. That is, in vrp_visit_phi_node > > the early outs to varying: shouldn't skip > > > > /* If we dropped either bound to +-INF then if this is a loop > > PHI node SCEV may known more about its value-range. */ > > if ((cmp_min > 0 || cmp_min < 0 > > || cmp_max < 0 || cmp_max > 0) > > && (l = loop_containing_stmt (phi)) > > && l->header == gimple_bb (phi)) > > adjust_range_with_scev (&vr_result, l, phi, lhs); > > > > sth like > > > > Index: gcc/tree-vrp.c > > =================================================================== > > --- gcc/tree-vrp.c (revision 229804) > > +++ gcc/tree-vrp.c (working copy) > > @@ -8827,6 +8805,24 @@ update_range: > > > > /* No match found. Set the LHS to VARYING. */ > > varying: > > + > > + /* If we dropped either bound to +-INF then if this is a loop > > + PHI node SCEV may known more about its value-range. */ > > + if ((l = loop_containing_stmt (phi)) > > + && l->header == gimple_bb (phi)) > > + { > > + adjust_range_with_scev (&vr_result, l, phi, lhs); > > + > > + /* If we will end up with a (-INF, +INF) range, set it to > > + VARYING. Same if the previous max value was invalid for > > + the type and we end up with vr_result.min > vr_result.max. */ > > + if (!((vrp_val_is_max (vr_result.max) > > + && vrp_val_is_min (vr_result.min)) > > + || compare_values (vr_result.min, > > + vr_result.max) > 0)) > > + goto update_range; > > + } > > + > > set_value_range_to_varying (lhs_vr); > > return SSA_PROP_VARYING; > > } > > > > which ends up with > > > > Value ranges after VRP: > > > > c_1: [0, +INF] > > > > as desired. Maybe you can take the above and put it to testing. > > Thanks for the explanation on those issues. > > The "if" check needs to be guarded by vr_result.type == VR_RANGE, otherwise > above draft patch passed my testing. bootstrapping on x86-64 and AArch64 OK. no > regresson on both. Great. Can you add a testcase and post the patch?
(In reply to rguenther@suse.de from comment #3) > On Mon, 9 Nov 2015, jiwang at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68234 > > > > Jiong Wang <jiwang at gcc dot gnu.org> changed: > > > > What |Removed |Added > > ---------------------------------------------------------------------------- > > Summary|tree-vrp pass need to be |tree-vrp pass need to be > > |improved when handling |improved when handling > > |ASSERT/PLUS/MINUS/_EXPR and |ASSERT_EXPR > > |Phi node | > > > > --- Comment #2 from Jiong Wang <jiwang at gcc dot gnu.org> --- > > (In reply to Richard Biener from comment #1) > > > > > I think the issue is that we insert the assert in the first place or > > > that we intersect to a symbolic range this causes us to not use SCEV / loop > > > analysis to get at the range for c_1. That is, in vrp_visit_phi_node > > > the early outs to varying: shouldn't skip > > > > > > /* If we dropped either bound to +-INF then if this is a loop > > > PHI node SCEV may known more about its value-range. */ > > > if ((cmp_min > 0 || cmp_min < 0 > > > || cmp_max < 0 || cmp_max > 0) > > > && (l = loop_containing_stmt (phi)) > > > && l->header == gimple_bb (phi)) > > > adjust_range_with_scev (&vr_result, l, phi, lhs); > > > > > > sth like > > > > > > Index: gcc/tree-vrp.c > > > =================================================================== > > > --- gcc/tree-vrp.c (revision 229804) > > > +++ gcc/tree-vrp.c (working copy) > > > @@ -8827,6 +8805,24 @@ update_range: > > > > > > /* No match found. Set the LHS to VARYING. */ > > > varying: > > > + > > > + /* If we dropped either bound to +-INF then if this is a loop > > > + PHI node SCEV may known more about its value-range. */ > > > + if ((l = loop_containing_stmt (phi)) > > > + && l->header == gimple_bb (phi)) > > > + { > > > + adjust_range_with_scev (&vr_result, l, phi, lhs); > > > + > > > + /* If we will end up with a (-INF, +INF) range, set it to > > > + VARYING. Same if the previous max value was invalid for > > > + the type and we end up with vr_result.min > vr_result.max. */ > > > + if (!((vrp_val_is_max (vr_result.max) > > > + && vrp_val_is_min (vr_result.min)) > > > + || compare_values (vr_result.min, > > > + vr_result.max) > 0)) > > > + goto update_range; > > > + } > > > + > > > set_value_range_to_varying (lhs_vr); > > > return SSA_PROP_VARYING; > > > } > > > > > > which ends up with > > > > > > Value ranges after VRP: > > > > > > c_1: [0, +INF] > > > > > > as desired. Maybe you can take the above and put it to testing. > > > > Thanks for the explanation on those issues. > > > > The "if" check needs to be guarded by vr_result.type == VR_RANGE, otherwise > > above draft patch passed my testing. bootstrapping on x86-64 and AArch64 OK. no > > regresson on both. > > Great. Can you add a testcase and post the patch? OK. now I understand this patch actually done the range correction not in vrp1 but in vrp2, as after vrp1 those trouble maker ASSERT_EXPR are removed, then in vrp2, the tree is quite simplified that we invoke the scalar evolution to do a final calculation to make sure we find as accurate information as possible. I feel this is what we should have done originally, it is a general improvement to range info calcuation. Will combine this block of code with above similar code then send out the patch with testcase.
Author: jiwang Date: Wed Nov 11 10:51:31 2015 New Revision: 230150 URL: https://gcc.gnu.org/viewcvs?rev=230150&root=gcc&view=rev Log: [Patch] PR tree-optimization/68234 Improve range info for loop Phi node 2015-11-11 Richard Biener <rguenth@gcc.gnu.org> Jiong Wang <jiong.wang@arm.com> gcc/ PR tree-optimization/68234 * tree-vrp.c (vrp_visit_phi_node): Extend SCEV check to those loop PHI node which estimiated to be VR_VARYING initially. gcc/testsuite/ * gcc.dg/tree-ssa/pr68234.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr68234.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vrp.c
fixed by r230150.