This is the test code: int foo(unsigned int i) { if (i > 0xffff || i*4 > 0xffff) baz(); } gcc -O2 generates a cmp, a shift, and another cmp. Why does this not generate a single cmp with 0x3fff?
0x3fff is wrong as 0x3fff * 4 is just 0xfffc. So proper optimization here is i > 0x3fffu. That is a missed opportunity in VRP.
I don't think this has anything to do with VRP. VRP does not propagate "backwards", that is, optimize away the first compare in if (i > 0xffff) if (i*4 > 0xffff) from ranges derived from a compare following it. This is a missed optimization in fold instead. Not sure if practically relevant though.
@comment 1: maybe it's me but that does not make any sense. 3fff is wrong and the correct value is 3fff? Huh? @comment 2: I extracted this code from a piece of commercial production software compiled with gcc. Not sure where you draw the line but to me that makes it relevant :-)
(In reply to comment #3) > @comment 2: I extracted this code from a piece of commercial production > software compiled with gcc. Not sure where you draw the line but to me that > makes it relevant :-) Did it occur in this simplified form, that is, as a single if statement?
Yes. However I'd hope that fixing this case would mean that gcc also catches the case where it is split to multiple if statements. I think this statement came about because they had a range check and someone pointed out that checking foo*4>0xffff could be circumvented via an integer overflow if foo is untrusted and very large. There are smarter ways to do this but it's not completely mind-bogglingly incomprehensible why this code would come about. I have in fact been advocating for a while that programmers should rather spell out their security checks as plainly as possible and let the compiler optimize them and remove superfluous checks. See http://www.fefe.de/source-code-optimization.pdf if you are interested.
This actually isn't about optimizing away the first compare, but about merging the two conditions into one that is equivalent to those two ored together. The first condition is for range of i [0x10000U, 0xffffffffU] while the latter for ranges [0x4000U, 0x3fffffffU] or [0x40004000U, 0x7fffffffU] or [0x80004000U, 0xbfffffffU] or [0xc0004000U, 0xfffffffU], and all the 5 ranges together are [0x4000U, 0xffffffffU]. Perhaps optimize_range_tests (or its fold-const.c counterpart) could both do it, but the really ugly thing is that either we'd need to expand i*4 into 4 range tests and teach the code that those 4 are really already represented by one range tests and thus an optimization would be only if we can even fewer range tests than that (with some cap on number of ranges we'd generate, like 8-16 or so), or have some way to mark some range fuzzy (i.e. in that range we don't know if it is in the range or out of it), and represent i*4 > 0xffffU as [0x4000, 0x3fffffffU] range ored with fuzzy range [0x40004000U, 0xffffffffU]. Fuzzy range would then be treated for | as only optimizable if other non-fuzzy ranges together completely cover that range (and for & that non-fuzzy ranges anded together don't cover any of the values in the fuzzy range). Anyway, I agree with Richard that it is questionable how often this would actually hit in real-world code, i.e. whether this really is something to spend lots of work on.
I filed this bug because I was under the impression that gcc was already supposed to optimize this out as part of the value range optimizations. You probably know better than me whether the required effort would be disproportionate. I'd still vote for supporting this case because then I can go around and tell people to worry about writing readable code instead of worrying about code that the compiler will compile well.
On the test code clang since 3.5 and before 9.0 does something very surprising. It optimizes (A > 0xffff || B > 0xffff) into (A | B) > 0xffff. I don't think this is what the reporter expected, but still is a potential optimization for GCC. See https://godbolt.org/z/WqPhbW
Created attachment 49853 [details] gcc11-pr56719.patch Untested patch to implement the #c8 optimization.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:d96b8556e569a1ccce36ef990e167031d07a661a commit r11-6374-gd96b8556e569a1ccce36ef990e167031d07a661a Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Dec 31 10:19:06 2020 +0100 reassoc: Optimize x > 0x1fff || y > 0x1fff into (x | y) > 0x1fff [PR56719] The following patch adds an optimization mentioned in PR56719 #c8. We already have the x != 0 && y != 0 && z != 0 into (x | y | z) != 0 and x != -1 && y != -1 && y != -1 into (x & y & z) != -1 optimizations, this patch just extends that to x < C && y < C && z < C for power of two constants C into (x | y | z) < C (for unsigned comparisons). I didn't want to create too many buckets (there can be TYPE_PRECISION such constants), so the patch instead just uses one buckets for all such constants and loops over that bucket up to TYPE_PRECISION times. 2020-12-31 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/56719 * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Also optimize x < C && y < C && z < C when C is a power of two constant into (x | y | z) < C. * gcc.dg/tree-ssa/pr56719.c: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:3ab7a91f36c898b9da665e5e36318a1d9ff12946 commit r11-6382-g3ab7a91f36c898b9da665e5e36318a1d9ff12946 Author: Jakub Jelinek <jakub@redhat.com> Date: Fri Jan 1 00:03:35 2021 +0100 testsuite: Fix up pr56719.c testcase [PR98489] On some targets, there are no < 8191; and >= 8191; strings, but < 8191) and >= 8191), so just remove the ; from the regexps. 2021-01-01 Jakub Jelinek <jakub@redhat.com> PR testsuite/98489 PR tree-optimization/56719 * gcc.dg/tree-ssa/pr56719.c: Remove semicolon from scan-tree-dump-times regexps.
*** Bug 94651 has been marked as a duplicate of this bug. ***
Fixed.