Bug 86438 - [8 Regression] wrong code at -Os
Summary: [8 Regression] wrong code at -Os
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 9.0
: P2 normal
Target Milestone: 9.0
Assignee: Alexandre Oliva
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2018-07-09 11:51 UTC by Zdenek Sojka
Modified: 2021-05-14 10:51 UTC (History)
6 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Build:
Known to work: 9.0
Known to fail: 8.1.1
Last reconfirmed: 2019-03-13 00:00:00


Attachments
reduced testcase (246 bytes, text/plain)
2018-07-09 11:51 UTC, Zdenek Sojka
Details
candidate patch (390 bytes, patch)
2018-11-08 03:45 UTC, Alexandre Oliva
Details | Diff
another candidate patch (883 bytes, patch)
2018-11-08 09:08 UTC, Alexandre Oliva
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2018-07-09 11:51:15 UTC
Created attachment 44365 [details]
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc -Os testcase.c
$ ./a.out 
Aborted

Diff between -O2 -> -Os shows:
@@ -77,15 +76,15 @@
 # testcase.c:13:   u32 f = __builtin_sub_overflow_p (d, (u128) d, (u64)0);
        mov     ebx, 0  # _2,
 # testcase.c:12:   u64 d = (g ? 5 : 4);
-       setne   al      #, iftmp.0_9
-       movzx   eax, al # iftmp.0_9, iftmp.0_9
-       add     rax, 4  # iftmp.0_9,
+       cmp     rax, 1  # tmp100,
+       sbb     rax, rax        # iftmp.0_9
+       add     rax, 5  # iftmp.0_9,
 # testcase.c:13:   u32 f = __builtin_sub_overflow_p (d, (u128) d, (u64)0);
        mov     rcx, rax        # _2, iftmp.0_9
        setc    al      #, _11

The - (-O2) never sets CF (thus the last setc always sets al=0).
The + (-Os) set CF if g==0 (thus the last setc sets al=(g==0)).

The __builtin_sub_overflow_p() in fact never overflows, because it does (u64)d-(u128)d == 0 in all cases.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-262509-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-262509-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
gcc version 9.0.0 20180709 (experimental) (GCC)
Comment 1 Uroš Bizjak 2018-07-09 14:16:36 UTC
Looks like postreload cmp elimitation pass is at fault.

It converts:

(insn 64 63 76 2 (parallel [
            (set (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])
                (plus:DI (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])
                    (const_int 5 [0x5])))
            (clobber (reg:CC 17 flags))
        ]) "pr86438.c":17 232 {*adddi_1}
     (nil))
(insn 76 64 77 2 (set (reg:DI 4 si [orig:88 _2 ] [88])
        (const_int 0 [0])) "pr86438.c":18 85 {*movdi_internal}
     (nil))
(insn 77 76 14 2 (set (reg:DI 5 di [ _2+8 ])
        (const_int 0 [0])) "pr86438.c":18 85 {*movdi_internal}
     (nil))
(insn 14 77 15 2 (set (reg:DI 4 si [orig:88 _2 ] [88])
        (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])) "pr86438.c":18 85 {*movdi_internal}
     (nil))
(insn 15 14 66 2 (set (reg:DI 5 di [ _2+8 ])
        (const_int 0 [0])) "pr86438.c":18 85 {*movdi_internal}
     (nil))
(insn 66 15 67 2 (set (reg:CC 17 flags)
        (compare:CC (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])
            (reg:DI 4 si [orig:88 _2 ] [88]))) "pr86438.c":18 12 {*cmpdi_1}
     (nil))
(note 67 66 78 2 NOTE_INSN_DELETED)
(insn 78 67 79 2 (set (reg:QI 2 cx [orig:96 _11+8 ] [96])
        (ltu:QI (reg:CC 17 flags)
            (const_int 0 [0]))) "pr86438.c":18 700 {*setcc_qi}
     (nil))

to:

(insn 64 63 76 2 (parallel [
            (set (reg:CCC 17 flags)
                (compare:CCC (plus:DI (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])
                        (const_int 5 [0x5]))
                    (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])))
            (set (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])
                (plus:DI (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])
                    (const_int 5 [0x5])))
        ]) "pr86438.c":17 346 {*adddi3_cc_overflow_1}
     (nil))
(insn 76 64 77 2 (set (reg:DI 4 si [orig:88 _2 ] [88])
        (const_int 0 [0])) "pr86438.c":18 85 {*movdi_internal}
     (nil))
(insn 77 76 14 2 (set (reg:DI 5 di [ _2+8 ])
        (const_int 0 [0])) "pr86438.c":18 85 {*movdi_internal}
     (nil))
(insn 14 77 15 2 (set (reg:DI 4 si [orig:88 _2 ] [88])
        (reg:DI 0 ax [orig:94 iftmp.0_9 ] [94])) "pr86438.c":18 85 {*movdi_internal}
     (nil))
(insn 15 14 67 2 (set (reg:DI 5 di [ _2+8 ])
        (const_int 0 [0])) "pr86438.c":18 85 {*movdi_internal}
     (nil))
(note 67 15 78 2 NOTE_INSN_DELETED)
(insn 78 67 79 2 (set (reg:QI 2 cx [orig:96 _11+8 ] [96])
        (ltu:QI (reg:CCC 17 flags)
            (const_int 0 [0]))) "pr86438.c":18 700 {*setcc_qi}
     (nil))

Please note that (insn 66) is deleted and (insn 64) gets converted to an insn that sets flags reg in CCCmode (where only carry flag is valid). The original sequence sets flags reg in CCmode (where all flags are valid), so these two sequences are not identical.

Adding -fno-compare-elim fixes the test.
Comment 2 Uroš Bizjak 2018-07-09 18:52:38 UTC
Here is what happens:

compare operators in (insn 66) are substituted with their defs from (insn 64) and (insn 14). The CC mode is calculated from SELECT_CC_MODE, which really returns CCCmode. The flags reg clobber is substituted with the compare, and insn then matches {*adddi3_cc_overflow_1}.

Unfortunately, the compare is not correct. %rax from (insn 14) is already updated, and should not be compared with its previous value from (insn 64).
Comment 3 Alexandre Oliva 2018-11-08 03:16:47 UTC
Mine
Comment 4 Alexandre Oliva 2018-11-08 03:45:53 UTC
Created attachment 44970 [details]
candidate patch
Comment 5 Alexandre Oliva 2018-11-08 07:48:45 UTC
The candidate patch regresses the regression test added along with the fix for bug 49095.  At first I thought it should, but there seems to be logic in place to adjust the compare for the flags-clobbering insn.
Comment 6 Alexandre Oliva 2018-11-08 09:08:53 UTC
Created attachment 44971 [details]
another candidate patch

This one does not regress pr49095 with -m32, but -m64 scan-assembler-times fails before and after.
Comment 8 Alexandre Oliva 2018-11-09 10:16:40 UTC
Author: aoliva
Date: Fri Nov  9 10:16:09 2018
New Revision: 265957

URL: https://gcc.gnu.org/viewcvs?rev=265957&root=gcc&view=rev
Log:
[PR86438] compare-elim: cope with set of in_b

When in_a resolves to a register set in the prev_clobber insn, we may
use the SET_SRC for the compare instead.  However, when in_b so
resolves, we proceed to use the reg with its earlier value.  When both
resolve to the same register and prev_clobber is an insn that modifies
the register, this arrangement may cause the compare to match (when it
shouldn't) and the elimination of the compare to incorrectly succeed.

(set (reg 1) (plus (reg 1) (const_int N)))
(set (reg 2) (reg 1))
(set (reg flags) (compare (reg 1) (reg 2)))

in_a: (reg 1)            --> (plus (reg 1) (const_int N))
in_b: (reg 2) -> (reg 1) -/> oops

(parallel [
 (set (reg flags) (compare (plus (reg 1) (const_int N))
                           (reg 1))) ;; should be (plus...)
 (set (reg 1) (plus (reg 1) (const_int N)))])
(set (reg 2) (reg 1))

This patch arranges for in_b to also undergo SET_SRC substitution
when appropriate, with a shortcut for when in_a and in_b are the same
rtx.


for  gcc/ChangeLog

	PR rtl-optimization/86438
	* compare-elim.c (try_eliminate_compare): Use SET_SRC instead
	of in_b for the compare if in_b is SET_DEST.

for  gcc/testsuite/ChangeLog

	PR rtl-optimization/86438
	* gcc.dg/torture/pr86438.c: New.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr86438.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/compare-elim.c
    trunk/gcc/testsuite/ChangeLog
Comment 9 Andrew Stubbs 2018-11-16 10:18:20 UTC
The testcase is undefined for targets that have no 128-bit type (i.e. long long is 64-bit, and __int128 doesn't exist).

For (not yet in tree) GCN port I get "warning: right shift count >= width of type [-Wshift-count-overflow]", and then the execution test aborts.
Comment 10 Jakub Jelinek 2019-02-22 15:20:00 UTC
GCC 8.3 has been released.
Comment 11 John David Anglin 2019-03-13 21:08:29 UTC
Fails on hppa64-hp-hpux11.11 with error shown in comment 9.
Comment 12 Alexandre Oliva 2019-04-16 12:45:33 UTC
Author: aoliva
Date: Tue Apr 16 12:44:46 2019
New Revision: 270388

URL: https://gcc.gnu.org/viewcvs?rev=270388&root=gcc&view=rev
Log:
[PR86438] avoid too-long shift in test

The test fell back to long long and long when __int128 is not
available, but it assumed sizeof(long) < sizeof(long long) because of
a shift count that would be out of range for a long long if their
widths are the same.  Fixed by splitting it up into two shifts.


for  gcc/testsuite/ChangeLog

	PR rtl-optimization/86438
	* gcc.dg/torture/pr86438.c: Split up too-wide shift.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/torture/pr86438.c
Comment 13 Jakub Jelinek 2020-03-04 09:31:33 UTC
GCC 8.4.0 has been released, adjusting target milestone.
Comment 14 Jakub Jelinek 2021-05-14 10:51:16 UTC
The GCC 8 branch is being closed, fixed in GCC 9.1.