int foo (long long int a, int b) { if (a >= 0 || b) baz (); } int bar (long long int a, int b) { if (a < 0 || b) baz (); } on x86_64-linux with -O2 is compiled into following starting with r146817: testq %rdi, %rdi jns .L4 testl %esi, %esi jne .L4 for foo, which looks reasonable, and shrq $63, %rdi testb %dil, %dil jne .L9 testl %esi, %esi jne .L9 in bar, where I'd expect and we used to emit in the past testq %rdi, %rdi js .L9 testl %esi, %esi jne .L9 or something similar. I wonder if we want a combine pattern that would transform right shift by precision - 1 followed by comparison of the result against 0 if the result of the right shift is dead into just signed comparison.
The question is why we expand a < 0 to shift and test in the first place.
Exactly. Whatever makes it currently work for >=0 should be made to work for <0 as well.
Well, generally if we want to store a < 0 or a >= 0 into a bool/int variable etc., then the right shift is desirable, because we avoid branching or sets/setns, but not when the result is used in a conditional jump.
Actually, it's merely a deficiency in current combine not simplifying intermediate expressions enough. One of the things that need to happen is the following transformation: (compare:CCZ (subreg:QI (lshiftrt:DI (reg:DI 95) (const_int 63 [0x3f])) 0) (const_int 0 [0])) --> (remove subreg) (compare:CCZ (lshiftrt:DI (reg:DI 95) (const_int 63 [0x3f])) (const_int 0 [0])) --> (recognize that it's a sign bit extract) (ge (reg:DI 95) (const_int 0)) (or 'lt', doesn't matter, depends on the outer code by which the compare:CCZ result itself is compared). In current combine.c this requires two steps, the removal of the irrelevant subreg (irrelevant in this specific context), and then the recognition of the sign bit extraction. With the first function combine has the chance for two attempts of simplification because we have a sequence of three isnstructions to start with: t1 = ~a t2 = 255 & (t1 >> 63) flags = t2 != 0 The intermediate expression is combine_simplify_rtx'ed twice, so both steps above happen, and we get good code. In the second function we only have two instructions to start with (the not is missing): t2 = 255 & (a >> 63) flags = t2 == 0 Only the first step above happens, we're left with the (lshiftrt(subreg)) and nothing simplifies this further before it tries to recognize the insn (which ultimately fails). The same can also be seen when artificially forcing the expression to be a tad more complicated: int bar2 (long long int a, long long int a2, int b) { if ((a+a2) < 0 || b) baz(); } Here, we also get good code again, simply because combine can have two attempts at the intermediate expression. After some amount of tracing combine I've come up with the below patch. The simplify_comparison function already contains loops that effectively retry simplification after a change occured. But the code that actually removes a useless subreg is after that loop. Putting it into the loop as well fixes the problem. It can't be removed from the old place because between the loop and subreg removal it might actually change the expression further due to make_compound_operation (though I don't know if that really creates subregs often). combines normal facilities of not doing combines when intermediate results are really used outside will take care of not creating useless code. Index: combine.c =================================================================== --- combine.c (revision 235171) +++ combine.c (working copy) @@ -11891,6 +11891,27 @@ simplify_comparison (enum rtx_code code, if (subreg_lowpart_p (op0) && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width) /* Fall through */ ; + else if (subreg_lowpart_p (op0) + && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT + && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT + && (code == NE || code == EQ) + && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) + <= HOST_BITS_PER_WIDE_INT) + && !paradoxical_subreg_p (op0) + && (nonzero_bits (SUBREG_REG (op0), + GET_MODE (SUBREG_REG (op0))) + & ~GET_MODE_MASK (GET_MODE (op0))) == 0) + { + tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1); + + if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0))) + & ~GET_MODE_MASK (GET_MODE (op0))) == 0) + { + op0 = SUBREG_REG (op0), op1 = tem; + continue; + } + break; + } else break;
That looks good, please submit to gcc-patches?
Author: matz Date: Tue Nov 15 14:02:28 2016 New Revision: 242414 URL: https://gcc.gnu.org/viewcvs?rev=242414&root=gcc&view=rev Log: PR missed-optimization/77881 * combine.c (simplify_comparison): Remove useless subregs also inside the loop, not just after it. (make_compound_operation): Recognize some subregs as being masking as well. testsuite/ * gcc.target/i386/pr77881.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr77881.c Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c trunk/gcc/testsuite/ChangeLog
Fixed for gcc 7. Not planning backports myself , it's a minor code quality regression only. But keeping it open in case anyone else wants to.
Author: segher Date: Wed Nov 23 14:33:13 2016 New Revision: 242757 URL: https://gcc.gnu.org/viewcvs?rev=242757&root=gcc&view=rev Log: combine: Convert subreg-of-lshiftrt to zero_extract properly (PR78390) r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477). It all has the same root cause: that patch makes combine convert every lowpart subreg of a logical shift right to a zero_extract. This cannot work at all if it is not a constant shift, and it has to be a bit more careful exactly which bits it extracts. PR target/77881 PR bootstrap/78390 PR target/78438 PR bootstrap/78477 * combine.c (make_compound_operation_int): Do not convert a subreg of a non-constant logical shift right to a zero_extract. Handle the case where some zero bits have been shifted into the range covered by that subreg. Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c
GCC 5 branch is being closed
(In reply to Michael Matz from comment #7) > Fixed for gcc 7. Not planning backports myself , it's a minor code quality > regression only. But keeping it open in case anyone else wants to. Well, if anyone else wants to, now's their last chance...
GCC 6 branch is being closed, fixed in 7.x.