Starting from 4.8.3, gcc does not optimize integer overflow and underflow checks inserting a redundant cmp instruction in both 32-bit and 64-bit modes. Having git-bisected the revisions, I found out r204088 (by pr58779) had added such behavior. --------------- example --------------- extern void underflow(void) __attribute__((noreturn)); unsigned sub(unsigned a, unsigned b) { unsigned r = a - b; if (r > a) underflow(); return r; } ------------ gcc-4.8.2 -O1 ------------ sub(unsigned int, unsigned int): movl %edi, %eax subl %esi, %eax /* CF is set on overflow */ jae .L4 /* jae = jnb = jnc = 73h */ subq $8, %rsp call underflow() .L4: rep; ret ------------ gcc-4.8.3 -O1 ------------ sub(unsigned int, unsigned int): movl %edi, %eax subl %esi, %eax cmpl %eax, %edi /* absolutely redundant */ jnb .L4 subq $8, %rsp call underflow() .L4: rep ret ---------------------------------------
(In reply to Mike from comment #0) > Starting from 4.8.3, gcc does not optimize integer overflow and underflow > checks inserting a redundant cmp instruction in both 32-bit and 64-bit > modes. Having git-bisected the revisions, I found out r204088 (by pr58779) > had added such behavior. It was removed for a reason, as explained in the referred PR. > ------------ gcc-4.8.3 -O1 ------------ > sub(unsigned int, unsigned int): > movl %edi, %eax > subl %esi, %eax > cmpl %eax, %edi /* absolutely redundant */ > jnb .L4 > subq $8, %rsp > call underflow() > .L4: > rep ret > --------------------------------------- The sequence is actually: (insn 7 4 8 2 (parallel [ (set (reg/v:SI 87 [ r ]) (minus:SI (reg/v:SI 89 [ a ]) (reg/v:SI 90 [ b ]))) (clobber (reg:CC 17 flags)) ]) pr67089.c:4 259 {*subsi_1} (expr_list:REG_DEAD (reg/v:SI 90 [ b ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 8 7 9 2 (set (reg:CC 17 flags) (compare:CC (reg/v:SI 89 [ a ]) (reg/v:SI 87 [ r ]))) pr67089.c:5 7 {*cmpsi_1} (expr_list:REG_DEAD (reg/v:SI 89 [ a ]) (nil))) (jump_insn 9 8 10 2 (set (pc) (if_then_else (geu (reg:CC 17 flags) (const_int 0 [0])) (label_ref 13) (pc))) pr67089.c:5 609 {*jcc_1} (expr_list:REG_DEAD (reg:CC 17 flags) (int_list:REG_BR_PROB 9996 (nil))) -> 13) and combine tries to simplify this sequence with: Trying 7 -> 8: Failed to match this instruction: (parallel [ (set (reg:CC 17 flags) (compare:CC (minus:SI (reg/v:SI 89 [ a ]) (reg/v:SI 90 [ b ])) (reg/v:SI 89 [ a ]))) (set (reg/v:SI 87 [ r ]) (minus:SI (reg/v:SI 89 [ a ]) (reg/v:SI 90 [ b ]))) ]) and further to: Trying 7, 8 -> 9: Failed to match this instruction: (parallel [ (set (pc) (if_then_else (leu (minus:SI (reg/v:SI 89 [ a ]) (reg/v:SI 90 [ b ])) (reg/v:SI 89 [ a ])) (label_ref 13) (pc))) (set (reg/v:SI 87 [ r ]) (minus:SI (reg/v:SI 89 [ a ]) (reg/v:SI 90 [ b ]))) ]) LEU condition was created from GEU as the operands in (insn 8) were swapped. As can be seen from config/i386.c, ix86_cc_mode, LEU can't be implemented using only Carry flag: case GTU: /* CF=0 & ZF=0 */ case LEU: /* CF=1 | ZF=1 */ return CCmode; The removed code compensated this with *reversion* of the flag check, e.g. the same condition was emitted for LEU with CCCmode as for GEU (and in a similar for GTU in CCCmode vs. LTU, as was the case in PR58779). We shouldn't do this, and it reflected in PR58779.
(In reply to Uroš Bizjak from comment #1) > We shouldn't do this, and it reflected in PR58779. But can't we improve the logic to satisfy both pr58779 and pr67089? The absolute metric is code quality, and with all due respect, the code can be better in this very case.
(In reply to Mike from comment #2) > (In reply to Uroš Bizjak from comment #1) > > We shouldn't do this, and it reflected in PR58779. > But can't we improve the logic to satisfy both pr58779 and pr67089? The > absolute metric is code quality, and with all due respect, the code can be > better in this very case. I don't see a way. pr58779 will generate: (set (pc) (if_then_else (gtu (plus:SI (reg:SI 86 [ D.1769 ]) (const_int -1 [0xffffffffffffffff])) (reg:SI 86 [ D.1769 ])) (label_ref 22) (pc))) which can be also represented as: (set (pc) (if_then_else (gtu (minus:SI (reg:SI 86 [ D.1769 ]) (const_int 1 [0x1])) (reg:SI 86 [ D.1769 ])) (label_ref 22) (pc))) and before pr58779 was fixed, these two equivalent insns would result in swapped condition, as evident from the diff: case GTU: /* CF=0 & ZF=0 */ case LEU: /* CF=1 | ZF=1 */ - /* Detect overflow checks. They need just the carry flag. */ - if (GET_CODE (op0) == MINUS - && rtx_equal_p (op1, XEXP (op0, 0))) - return CCCmode; - else - return CCmode; + return CCmode; and the part where GTU is emitted: @@ -14103,8 +14103,6 @@ Those same assemblers have the same but opposite lossage on cmov. */ if (mode == CCmode) suffix = fp ? "nbe" : "a"; - else if (mode == CCCmode) - suffix = "b"; else gcc_unreachable (); break;
Note this affects even code generated with __builtin_sub_overflow.
Author: rth Date: Mon Nov 23 15:55:58 2015 New Revision: 230767 URL: https://gcc.gnu.org/viewcvs?rev=230767&root=gcc&view=rev Log: Add uaddv4_optab and usubv4_optab PR target/67089 * optabs.def (uaddv4_optab, usubv4_optab): New. * internal-fn.c (expand_addsub_overflow): Use uaddv4_optab and usubv4_optab in the u +- u -> u case. * doc/md.texi (Standard Names): Document addv{m}4, subv{m}4, mulv{m}4, uaddv{m}4, usubv{m}4, umulv{m}4. * config/i386/i386.md (uaddv<SWI>4, usubv<SWI>4): New. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.md trunk/gcc/doc/md.texi trunk/gcc/internal-fn.c trunk/gcc/optabs.def
Created attachment 36830 [details] gcc6-pr67089.patch Untested patch for matching hand written unsigned addition/subtraction overflow checks as {ADD,SUB}_OVERFLOW if target has u{add,sub}v4 optabs.
Author: jakub Date: Wed Nov 25 08:58:32 2015 New Revision: 230856 URL: https://gcc.gnu.org/viewcvs?rev=230856&root=gcc&view=rev Log: PR target/67089 * tree-ssa-math-opts.c (uaddsub_overflow_check_p, match_uaddsub_overflow): New functions. (pass_optimize_widening_mul::execute): Call match_uaddsub_overflow. * gcc.dg/pr67089-1.c: New test. * gcc.dg/pr67089-2.c: New test. * gcc.dg/pr67089-3.c: New test. * gcc.dg/pr67089-4.c: New test. * gcc.dg/pr67089-5.c: New test. * gcc.dg/pr67089-6.c: New test. * gcc.dg/pr67089-7.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr67089-1.c trunk/gcc/testsuite/gcc.dg/pr67089-2.c trunk/gcc/testsuite/gcc.dg/pr67089-3.c trunk/gcc/testsuite/gcc.dg/pr67089-4.c trunk/gcc/testsuite/gcc.dg/pr67089-5.c trunk/gcc/testsuite/gcc.dg/pr67089-6.c trunk/gcc/testsuite/gcc.dg/pr67089-7.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-math-opts.c
Fixed on the trunk so far. Wonder if this is desirable to backport.
GCC 4.9 branch is being closed
GCC 5 branch has been closed, should be fixed in GCC 6 and later.