Bug 98703 - Failure to optimize out non-zero check after multiplication overflow check
Summary: Failure to optimize out non-zero check after multiplication overflow check
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2021-01-16 03:22 UTC by Gabriel Ravier
Modified: 2024-11-11 04:02 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-10-02 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gabriel Ravier 2021-01-16 03:22:58 UTC
bool f1(unsigned x, unsigned y, unsigned *res)
{
    return __builtin_mul_overflow(x, y, res) && x;
}

This can be optimized to `return __builtin_mul_overflow(x, y, res);`. This transformation is done by LLVM, but not by GCC.

PS: I originally found this while looking at the code generation for this code (from https://gcc.gnu.org/PR95852) :

bool f2(unsigned x, unsigned y, unsigned *res)
{
    *res = x * y;
    return x && ((*res / x) != y);
}

f2 is equivalent to `return __builtin_mul_overflow(x, y, res);`, but the code emitted is more like f1.
Comment 1 Richard Biener 2021-01-18 09:30:19 UTC
Guess VRP should infer a range for x then after __builtin_mul_overflow is true?
Comment 2 Jakub Jelinek 2021-01-18 09:36:24 UTC
I guess so, and even for the case where it returned false.
If it returned true, we can at least infer that both operands are non-zero (and for operands with the same type and same as the return type's element type also not 1, 1 * x won't overflow either, but e.g. 1 * x might overflow if stored into unsigned and x is negative), if it returns false and say one argument is constant, we can easily compute range of the other one.  Perhaps from range of one operand we could also infer the range of the other operand, but perhaps that might be too dangerous.
Comment 3 Andrew Pinski 2021-10-02 22:16:55 UTC
Confirmed.

Note the released version of GCC 11, f2 does not produce the extra compare.
Comment 4 Aldy Hernandez 2021-10-03 17:30:07 UTC
I can't remember what Andrew's plan was for complex numbers, possibly for next release when we handle floats.  We have somewhat of a kludge to know that the result of REALPART_EXPR and IMAGPART_EXPR are [0,1] for this case (gimple_range_adjustment), but not much else.

We are interested in the 2->3 edge for this IL:

=========== BB 2 ============
Imports: _2  
Exports: _2  _3  
         _3 : _2(I)  
x_5(D)	unsigned int VARYING
    <bb 2> :
    _7 = .MUL_OVERFLOW (x_5(D), y_6(D));
    _1 = REALPART_EXPR <_7>;
    *res_9(D) = _1;
    _2 = IMAGPART_EXPR <_7>;
    _3 = (bool) _2;
    if (_3 != 0)
      goto <bb 3>; [INV]
    else
      goto <bb 4>; [INV]

_2 : unsigned int [0, 1]
2->3  (T) _2 : 	unsigned int [1, 1]
2->3  (T) _3 : 	bool [1, 1]
2->4  (F) _2 : 	unsigned int [0, 0]
2->4  (F) _3 : 	bool [0, 0]

On the 2->3 edge we know that _2 is [1,1], which means thee multiplication doesn't overflow and that IMAGPART_EXPR<_7> is [1,1].  If we could handle complex ints, we could unwind to

    _7 = .MUL_OVERFLOW (x_5(D), y_6(D));

and register that x_5 is not 0 plus all the goodies Jakub mentions in comment 2 (*).  We don't do any of this, as we don't handle complex numbers, which _7 is one.

(*) Clearly, Jakub is the person to contribute copious knowledge to range-op.cc, since he always has such great ideas in this space.  And it's so easy to do so ;-).
Comment 5 Jakub Jelinek 2021-10-03 17:40:27 UTC
Yeah, the IMAGPART_EXPR of .*_OVERFLOW is a hand-written forward pattern matching, for this PR one would need pattern matching in the backwards computation (like operator*::op1_range) that would not go through a single stmt, but through the whole sequence.  And propagate that [1, 1] IMAGPART_EXPR implies that .MUL_OVERFLOW are ~[0, 0] (for .ADD_OVERFLOW it is that at least one has to be ~[0, 0]).