Bug 56719 - missed optimization: i > 0xffff || i*4 > 0xffff
Summary: missed optimization: i > 0xffff || i*4 > 0xffff
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.2
: P3 enhancement
Target Milestone: 11.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 94651 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-03-25 11:27 UTC by felix-gcc
Modified: 2024-02-22 22:56 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-03-25 00:00:00


Attachments
gcc11-pr56719.patch (1.91 KB, patch)
2020-12-29 12:05 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description felix-gcc 2013-03-25 11:27:43 UTC
This is the test code:

int foo(unsigned int i) {
  if (i > 0xffff || i*4 > 0xffff)
    baz();
}

gcc -O2 generates a cmp, a shift, and another cmp.

Why does this not generate a single cmp with 0x3fff?
Comment 1 Kai Tietz 2013-03-25 11:41:15 UTC
0x3fff is wrong as 0x3fff * 4 is just 0xfffc.  So proper optimization here is i > 0x3fffu.  That is a missed opportunity in VRP.
Comment 2 Richard Biener 2013-03-25 12:21:21 UTC
I don't think this has anything to do with VRP.  VRP does not propagate
"backwards", that is, optimize away the first compare in

  if (i > 0xffff)
    if (i*4 > 0xffff)

from ranges derived from a compare following it.

This is a missed optimization in fold instead.  Not sure if practically
relevant though.
Comment 3 felix-gcc 2013-03-25 14:41:10 UTC
@comment 1: maybe it's me but that does not make any sense.  3fff is wrong and the correct value is 3fff?  Huh?

@comment 2: I extracted this code from a piece of commercial production software compiled with gcc.  Not sure where you draw the line but to me that makes it relevant :-)
Comment 4 Richard Biener 2013-03-25 14:55:45 UTC
(In reply to comment #3)
> @comment 2: I extracted this code from a piece of commercial production
> software compiled with gcc.  Not sure where you draw the line but to me that
> makes it relevant :-)

Did it occur in this simplified form, that is, as a single if statement?
Comment 5 felix-gcc 2013-03-25 15:06:02 UTC
Yes.  However I'd hope that fixing this case would mean that gcc also catches the case where it is split to multiple if statements.

I think this statement came about because they had a range check and someone pointed out that checking foo*4>0xffff could be circumvented via an integer overflow if foo is untrusted and very large.

There are smarter ways to do this but it's not completely mind-bogglingly incomprehensible why this code would come about.

I have in fact been advocating for a while that programmers should rather spell out their security checks as plainly as possible and let the compiler optimize them and remove superfluous checks.  See

  http://www.fefe.de/source-code-optimization.pdf

if you are interested.
Comment 6 Jakub Jelinek 2013-03-25 15:25:53 UTC
This actually isn't about optimizing away the first compare, but about merging the two conditions into one that is equivalent to those two ored together.
The first condition is for range of i [0x10000U, 0xffffffffU] while the latter for ranges [0x4000U, 0x3fffffffU] or [0x40004000U, 0x7fffffffU] or [0x80004000U, 0xbfffffffU] or [0xc0004000U, 0xfffffffU], and all the 5 ranges together are
[0x4000U, 0xffffffffU].
Perhaps optimize_range_tests (or its fold-const.c counterpart) could both do it, but the really ugly thing is that either we'd need to expand i*4 into 4 range tests and teach the code that those 4 are really already represented by one range tests and thus an optimization would be only if we can even fewer range tests than that (with some cap on number of ranges we'd generate, like 8-16 or so), or have some way to mark some range fuzzy (i.e. in that range we don't know if it is in the range or out of it), and represent i*4 > 0xffffU as
[0x4000, 0x3fffffffU] range ored with fuzzy range [0x40004000U, 0xffffffffU].
Fuzzy range would then be treated for | as only optimizable if other non-fuzzy ranges together completely cover that range (and for & that non-fuzzy ranges anded together don't cover any of the values in the fuzzy range).

Anyway, I agree with Richard that it is questionable how often this would actually hit in real-world code, i.e. whether this really is something to spend lots of work on.
Comment 7 felix-gcc 2013-03-25 16:01:14 UTC
I filed this bug because I was under the impression that gcc was already supposed to optimize this out as part of the value range optimizations.

You probably know better than me whether the required effort would be disproportionate.  I'd still vote for supporting this case because then I can go around and tell people to worry about writing readable code instead of worrying about code that the compiler will compile well.
Comment 8 Ivan Sorokin 2020-12-25 12:23:09 UTC
On the test code clang since 3.5 and before 9.0 does something very surprising. It optimizes (A > 0xffff || B > 0xffff) into (A | B) > 0xffff. I don't think this is what the reporter expected, but still is a potential optimization for GCC.

See https://godbolt.org/z/WqPhbW
Comment 9 Jakub Jelinek 2020-12-29 12:05:51 UTC
Created attachment 49853 [details]
gcc11-pr56719.patch

Untested patch to implement the #c8 optimization.
Comment 10 GCC Commits 2020-12-31 09:19:50 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:d96b8556e569a1ccce36ef990e167031d07a661a

commit r11-6374-gd96b8556e569a1ccce36ef990e167031d07a661a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Dec 31 10:19:06 2020 +0100

    reassoc: Optimize x > 0x1fff || y > 0x1fff into (x | y) > 0x1fff [PR56719]
    
    The following patch adds an optimization mentioned in PR56719 #c8.
    We already have the x != 0 && y != 0 && z != 0 into (x | y | z) != 0
    and x != -1 && y != -1 && y != -1 into (x & y & z) != -1
    optimizations, this patch just extends that to
    x < C && y < C && z < C for power of two constants C into
    (x | y | z) < C (for unsigned comparisons).
    
    I didn't want to create too many buckets (there can be TYPE_PRECISION such
    constants), so the patch instead just uses one buckets for all such
    constants and loops over that bucket up to TYPE_PRECISION times.
    
    2020-12-31  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/56719
            * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Also optimize
            x < C && y < C && z < C when C is a power of two constant into
            (x | y | z) < C.
    
            * gcc.dg/tree-ssa/pr56719.c: New test.
Comment 11 GCC Commits 2020-12-31 23:04:44 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:3ab7a91f36c898b9da665e5e36318a1d9ff12946

commit r11-6382-g3ab7a91f36c898b9da665e5e36318a1d9ff12946
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Jan 1 00:03:35 2021 +0100

    testsuite: Fix up pr56719.c testcase [PR98489]
    
    On some targets, there are no < 8191; and >= 8191; strings,
    but < 8191) and >= 8191), so just remove the ; from the regexps.
    
    2021-01-01  Jakub Jelinek  <jakub@redhat.com>
    
            PR testsuite/98489
            PR tree-optimization/56719
            * gcc.dg/tree-ssa/pr56719.c: Remove semicolon from
            scan-tree-dump-times regexps.
Comment 12 Andrew Pinski 2021-08-01 18:10:51 UTC
*** Bug 94651 has been marked as a duplicate of this bug. ***
Comment 13 Andrew Pinski 2021-08-01 18:12:14 UTC
Fixed.