Bug 67089 - [5 Regression] Integer overflow checks not optimized on x86/x86_64
Summary: [5 Regression] Integer overflow checks not optimized on x86/x86_64
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.2.1
: P2 minor
Target Milestone: 6.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2015-08-01 10:59 UTC by Mike
Modified: 2017-10-10 14:57 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-10-16 00:00:00


Attachments
gcc6-pr67089.patch (4.81 KB, patch)
2015-11-24 18:43 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mike 2015-08-01 10:59:51 UTC
Starting from 4.8.3, gcc does not optimize integer overflow and underflow checks inserting a redundant cmp instruction in both 32-bit and 64-bit modes. Having git-bisected the revisions, I found out r204088 (by pr58779) had added such behavior.

--------------- example ---------------
extern void underflow(void) __attribute__((noreturn));
unsigned sub(unsigned a, unsigned b)
{
    unsigned r = a - b;
    if (r > a) underflow();
    return r;
}
------------ gcc-4.8.2 -O1 ------------
sub(unsigned int, unsigned int):
    movl    %edi, %eax
    subl    %esi, %eax /* CF is set on overflow */
    jae .L4            /* jae = jnb = jnc = 73h */
    subq    $8, %rsp
    call    underflow()
.L4:
    rep; ret
------------ gcc-4.8.3 -O1 ------------
sub(unsigned int, unsigned int):
    movl    %edi, %eax
    subl    %esi, %eax
    cmpl    %eax, %edi  /* absolutely redundant */
    jnb .L4
    subq    $8, %rsp
    call    underflow()
.L4:
    rep ret
---------------------------------------
Comment 1 Uroš Bizjak 2015-08-03 08:21:11 UTC
(In reply to Mike from comment #0)
> Starting from 4.8.3, gcc does not optimize integer overflow and underflow
> checks inserting a redundant cmp instruction in both 32-bit and 64-bit
> modes. Having git-bisected the revisions, I found out r204088 (by pr58779)
> had added such behavior.

It was removed for a reason, as explained in the referred PR.

> ------------ gcc-4.8.3 -O1 ------------
> sub(unsigned int, unsigned int):
>     movl    %edi, %eax
>     subl    %esi, %eax
>     cmpl    %eax, %edi  /* absolutely redundant */
>     jnb .L4
>     subq    $8, %rsp
>     call    underflow()
> .L4:
>     rep ret
> ---------------------------------------

The sequence is actually:

(insn 7 4 8 2 (parallel [
            (set (reg/v:SI 87 [ r ])
                (minus:SI (reg/v:SI 89 [ a ])
                    (reg/v:SI 90 [ b ])))
            (clobber (reg:CC 17 flags))
        ]) pr67089.c:4 259 {*subsi_1}
     (expr_list:REG_DEAD (reg/v:SI 90 [ b ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 8 7 9 2 (set (reg:CC 17 flags)
        (compare:CC (reg/v:SI 89 [ a ])
            (reg/v:SI 87 [ r ]))) pr67089.c:5 7 {*cmpsi_1}
     (expr_list:REG_DEAD (reg/v:SI 89 [ a ])
        (nil)))
(jump_insn 9 8 10 2 (set (pc)
        (if_then_else (geu (reg:CC 17 flags)
                (const_int 0 [0]))
            (label_ref 13)
            (pc))) pr67089.c:5 609 {*jcc_1}
     (expr_list:REG_DEAD (reg:CC 17 flags)
        (int_list:REG_BR_PROB 9996 (nil)))
 -> 13)

and combine tries to simplify this sequence with:

Trying 7 -> 8:
Failed to match this instruction:
(parallel [
        (set (reg:CC 17 flags)
            (compare:CC (minus:SI (reg/v:SI 89 [ a ])
                    (reg/v:SI 90 [ b ]))
                (reg/v:SI 89 [ a ])))
        (set (reg/v:SI 87 [ r ])
            (minus:SI (reg/v:SI 89 [ a ])
                (reg/v:SI 90 [ b ])))
    ])

and further to:

Trying 7, 8 -> 9:
Failed to match this instruction:
(parallel [
        (set (pc)
            (if_then_else (leu (minus:SI (reg/v:SI 89 [ a ])
                        (reg/v:SI 90 [ b ]))
                    (reg/v:SI 89 [ a ]))
                (label_ref 13)
                (pc)))
        (set (reg/v:SI 87 [ r ])
            (minus:SI (reg/v:SI 89 [ a ])
                (reg/v:SI 90 [ b ])))
    ])

LEU condition was created from GEU as the operands in (insn 8) were swapped. As can be seen from config/i386.c, ix86_cc_mode, LEU can't be implemented using only Carry flag:

    case GTU:			/* CF=0 & ZF=0 */
    case LEU:			/* CF=1 | ZF=1 */
      return CCmode;

The removed code compensated this with *reversion* of the flag check, e.g. the same condition was emitted for LEU with CCCmode as for GEU (and in a similar for GTU in CCCmode vs. LTU, as was the case in PR58779).

We shouldn't do this, and it reflected in PR58779.
Comment 2 Mike 2015-08-03 08:37:22 UTC
(In reply to Uroš Bizjak from comment #1)
> We shouldn't do this, and it reflected in PR58779.
But can't we improve the logic to satisfy both pr58779 and pr67089? The absolute metric is code quality, and with all due respect, the code can be better in this very case.
Comment 3 Uroš Bizjak 2015-08-03 09:36:04 UTC
(In reply to Mike from comment #2)
> (In reply to Uroš Bizjak from comment #1)
> > We shouldn't do this, and it reflected in PR58779.
> But can't we improve the logic to satisfy both pr58779 and pr67089? The
> absolute metric is code quality, and with all due respect, the code can be
> better in this very case.

I don't see a way. pr58779 will generate:

(set (pc)
    (if_then_else (gtu (plus:SI (reg:SI 86 [ D.1769 ])
                (const_int -1 [0xffffffffffffffff]))
            (reg:SI 86 [ D.1769 ]))
        (label_ref 22)
        (pc)))

which can be also represented as:

(set (pc)
    (if_then_else (gtu (minus:SI (reg:SI 86 [ D.1769 ])
                (const_int 1 [0x1]))
            (reg:SI 86 [ D.1769 ]))
        (label_ref 22)
        (pc)))

and before pr58779 was fixed, these two equivalent insns would result in swapped condition, as evident from the diff:

     case GTU:			/* CF=0 & ZF=0 */
     case LEU:			/* CF=1 | ZF=1 */
-      /* Detect overflow checks.  They need just the carry flag.  */
-      if (GET_CODE (op0) == MINUS
-	  && rtx_equal_p (op1, XEXP (op0, 0)))
-	return CCCmode;
-      else
-	return CCmode;
+      return CCmode;

and the part where GTU is emitted:

@@ -14103,8 +14103,6 @@
 	 Those same assemblers have the same but opposite lossage on cmov.  */
       if (mode == CCmode)
 	suffix = fp ? "nbe" : "a";
-      else if (mode == CCCmode)
-	suffix = "b";
       else
 	gcc_unreachable ();
       break;
Comment 4 Jakub Jelinek 2015-11-19 17:34:46 UTC
Note this affects even code generated with __builtin_sub_overflow.
Comment 5 Richard Henderson 2015-11-23 15:56:30 UTC
Author: rth
Date: Mon Nov 23 15:55:58 2015
New Revision: 230767

URL: https://gcc.gnu.org/viewcvs?rev=230767&root=gcc&view=rev
Log:
Add uaddv4_optab and usubv4_optab

	PR target/67089
    	* optabs.def (uaddv4_optab, usubv4_optab): New.
    	* internal-fn.c (expand_addsub_overflow): Use uaddv4_optab
    	and usubv4_optab in the u +- u -> u case.
    	* doc/md.texi (Standard Names): Document addv{m}4, subv{m}4,
    	mulv{m}4, uaddv{m}4, usubv{m}4, umulv{m}4.

    	* config/i386/i386.md (uaddv<SWI>4, usubv<SWI>4): New.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.md
    trunk/gcc/doc/md.texi
    trunk/gcc/internal-fn.c
    trunk/gcc/optabs.def
Comment 6 Jakub Jelinek 2015-11-24 18:43:26 UTC
Created attachment 36830 [details]
gcc6-pr67089.patch

Untested patch for matching hand written unsigned addition/subtraction overflow checks as {ADD,SUB}_OVERFLOW if target has u{add,sub}v4 optabs.
Comment 7 Jakub Jelinek 2015-11-25 08:59:06 UTC
Author: jakub
Date: Wed Nov 25 08:58:32 2015
New Revision: 230856

URL: https://gcc.gnu.org/viewcvs?rev=230856&root=gcc&view=rev
Log:
	PR target/67089
	* tree-ssa-math-opts.c (uaddsub_overflow_check_p,
	match_uaddsub_overflow): New functions.
	(pass_optimize_widening_mul::execute): Call match_uaddsub_overflow.

	* gcc.dg/pr67089-1.c: New test.
	* gcc.dg/pr67089-2.c: New test.
	* gcc.dg/pr67089-3.c: New test.
	* gcc.dg/pr67089-4.c: New test.
	* gcc.dg/pr67089-5.c: New test.
	* gcc.dg/pr67089-6.c: New test.
	* gcc.dg/pr67089-7.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr67089-1.c
    trunk/gcc/testsuite/gcc.dg/pr67089-2.c
    trunk/gcc/testsuite/gcc.dg/pr67089-3.c
    trunk/gcc/testsuite/gcc.dg/pr67089-4.c
    trunk/gcc/testsuite/gcc.dg/pr67089-5.c
    trunk/gcc/testsuite/gcc.dg/pr67089-6.c
    trunk/gcc/testsuite/gcc.dg/pr67089-7.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-math-opts.c
Comment 8 Jakub Jelinek 2015-11-25 21:14:36 UTC
Fixed on the trunk so far.  Wonder if this is desirable to backport.
Comment 9 Richard Biener 2016-08-03 11:49:07 UTC
GCC 4.9 branch is being closed
Comment 10 Jakub Jelinek 2017-10-10 13:50:29 UTC
GCC 5 branch has been closed, should be fixed in GCC 6 and later.