Summary: | 0 to limit signed range checks don't always use unsigned compare | ||
---|---|---|---|
Product: | gcc | Reporter: | Peter Cordes <pcordes> |
Component: | tree-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | jakub, rguenth |
Priority: | P3 | Keywords: | missed-optimization |
Version: | 5.3.0 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2016-02-02 00:00:00 |
Description
Peter Cordes
2016-02-02 07:30:41 UTC
GCC relies on some fold() routine to do this and we end up with ;; Function r0_to_imax_2 (null) ;; enabled by -tree-original { if ((unsigned int) x <= 2147483645) { ext (); } } ;; Function r0_to_imax_1 (null) ;; enabled by -tree-original { if (x >= 0 && x != 2147483647) { ext (); } } ;; Function r0_to_imax (null) ;; enabled by -tree-original { if (x >= 0) { ext (); } so it seems we are confused by the trick that triggers first, replacing the <= INT_MAX-1 compare with a != INT_MAX compare and that not being handled in the range construction code. Looks like ifcombine doesn't handle it either (maybe_fold_and_comparisons). void ext(void); void r0_to_imax_1(int x){ if (x>=0 && x<=(__INT_MAX__-1)) ext(); } Shouldn't be hard to teach the range code in reassoc about this, but stage1 material. @Richard and Jakub: That's just addressing the first part of my report, the problem with x <= (INT_MAX-1), right? You may have missed the second part of the problem, since I probably buried it under too much detail with the first: In the case where the limit is variable, but can easily be proven to itself be in the range [0 .. INT_MAX-1) or much smaller: // gcc always fails to optimize this to an unsigned compare, but clang succeeds void rangecheck_var(int64_t x, int64_t lim2) { //lim2 >>= 60; lim2 &= 0xf; // let the compiler figure out the limited range of limit if (x>=0 && x<lim2) ext(); } --- I noticed after I submitted the report that I maybe should have tagged it tree-optimization, thanks for fixing. I don't really know gcc internals; I just file reports when it makes less than optimal code. I suppose even that is doable in the reassoc framework, or it could be done in match.pd just using the recorded value ranges, like richi has handled PR69595, or both. Update: https://godbolt.org/g/ZQDY1G gcc7/8 optimizes this to and / cmp / jb, while gcc6.3 doesn't. void rangecheck_var(int64_t x, int64_t lim2) { //lim2 >>= 60; lim2 &= 0xf; // let the compiler figure out the limited range of limit if (x>=0 && x<lim2) ext(); } gcc7 also fixed the -mtune=haswell bug where we got `movl %edi, %eax; subl $0, %eax` instead of `test %edi,%edi`. gcc8.1 still has the missed optimization with a compile-time constant 0..INT_MAX-1 range: void r0_to_imax_1(int x){ if (x>=0 && x<=(INT_MAX-1)) ext(); } // clang and gcc use 2 branches For r0_to_imax_1 the following works for me: --- gcc/fold-const.c.jj 2018-05-31 20:53:33.000000000 +0200 +++ gcc/fold-const.c 2018-06-02 18:36:23.795635887 +0200 @@ -5084,6 +5084,35 @@ merge_ranges (int *pin_p, tree *plow, tr tem = high0, high0 = high1, high1 = tem; } + /* If the second range is != high1 where high1 is the type maximum of + the type, try first merging with < high1 range. */ + if (low1 + && high1 + && TREE_CODE (low1) == INTEGER_CST + && (TREE_CODE (TREE_TYPE (low1)) == INTEGER_TYPE + || (TREE_CODE (TREE_TYPE (low1)) == ENUMERAL_TYPE + && known_eq (TYPE_PRECISION (TREE_TYPE (low1)), + GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (low1)))))) + && tree_int_cst_equal (low1, TYPE_MAX_VALUE (TREE_TYPE (low1))) + && operand_equal_p (low1, high1, 0) + && merge_ranges (pin_p, plow, phigh, in0_p, low0, high0, + !in1_p, NULL_TREE, range_predecessor (low1))) + return true; + + /* Similarly for first range != low0 where low0 is the type minimum of + the type, try first merging with > low0 range. */ + if (low0 + && high0 + && (TREE_CODE (TREE_TYPE (low0)) == INTEGER_TYPE + || (TREE_CODE (TREE_TYPE (low0)) == ENUMERAL_TYPE + && known_eq (TYPE_PRECISION (TREE_TYPE (low0)), + GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (low0)))))) + && tree_int_cst_equal (low0, TYPE_MIN_VALUE (TREE_TYPE (low0))) + && operand_equal_p (low0, high0, 0) + && merge_ranges (pin_p, plow, phigh, !in0_p, range_successor (low0), + NULL_TREE, in1_p, low1, high1)) + return true; + /* Now flag two cases, whether the ranges are disjoint or whether the second range is totally subsumed in the first. Note that the tests below are simplified by the ones above. */ The thing is that we canonicalize < INT_MAX (and <= INT_MAX-1) to != INT_MAX-1 and in the range code we'd better try the first form instead. Author: jakub Date: Mon Jun 4 07:37:56 2018 New Revision: 261139 URL: https://gcc.gnu.org/viewcvs?rev=261139&root=gcc&view=rev Log: PR tree-optimization/69615 * fold-const.c (merge_ranges): If range1 is - [x, x] and x is the maximum or minimum of the type, try to merge it also as if range1 is + [-, x - 1] or + [x + 1, -]. * gcc.dg/pr69615.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr69615.c Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/testsuite/ChangeLog Author: jakub Date: Thu Jun 7 07:41:18 2018 New Revision: 261264 URL: https://gcc.gnu.org/viewcvs?rev=261264&root=gcc&view=rev Log: PR tree-optimization/69615 * tree-ssa-reassoc.c (optimize_range_tests_var_bound): If rhs2 is lhs of a cast from a same precision integral SSA_NAME in a bb dominated by first_bb, retry with rhs2 set to the rhs1 of the cast. Don't emit cast to utype if rhs2 has already a compatible type. * gcc.dg/tree-ssa/pr69615.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr69615.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-reassoc.c Jakub: Can you please update Known to work? Fixed for GCC9+. |