Bug 69305 - [5 Regression] wrong code with -O and int128 @ aarch64
Summary: [5 Regression] wrong code with -O and int128 @ aarch64
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: 5.4
Assignee: Richard Henderson
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2016-01-15 22:35 UTC by Zdenek Sojka
Modified: 2016-02-01 07:11 UTC (History)
3 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: aarch64-unknown-linux-gnu
Build:
Known to work: 4.8.5, 4.9.4
Known to fail: 5.3.1, 6.0
Last reconfirmed: 2016-01-18 00:00:00


Attachments
reduced testcase (490 bytes, text/plain)
2016-01-15 22:35 UTC, Zdenek Sojka
Details
Extended adddi3_compare pattern (446 bytes, patch)
2016-01-19 10:07 UTC, Nick Clifton
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2016-01-15 22:35:31 UTC
Created attachment 37365 [details]
reduced testcase

Output (qemu userspace emulation):
$ aarch64-unknown-linux-gnu-gcc -v                    
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-232232-checking-yes-rtl-df-nographite-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/6.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-checking=yes,rtl,df --without-cloog --without-ppl --without-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu --with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld --with-as=/usr/bin/aarch64-unknown-linux-gnu-as --with-sysroot=/usr/aarch64-unknown-linux-gnu --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-232232-checking-yes-rtl-df-nographite-aarch64
Thread model: posix
gcc version 6.0.0 20160111 (experimental) (GCC) 

$ aarch64-unknown-linux-gnu-gcc -O testcase.c
$ ./a.out 
46000000000000000100000000000000
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted

Tested revisions:
trunk r232232 - FAIL
5-branch r231950 - FAIL
4_9-branch r231949 - OK
4_8-branch r224828 - OK
Comment 1 H.J. Lu 2016-01-15 23:06:06 UTC
Dup.

*** This bug has been marked as a duplicate of bug 69306 ***
Comment 2 Zdenek Sojka 2016-01-15 23:10:59 UTC
(In reply to H.J. Lu from comment #1)
> typedef unsigned long int uint64_t;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It should be long long.

"unsigned long int" is 64bit on aarch64 (contrary to x32)
Comment 3 Jakub Jelinek 2016-01-18 12:52:48 UTC
Looks like aarch64 backend bug to me.
unsigned __int128 f1 (unsigned __int128 x, unsigned __int128 y) { return x + y; }
unsigned __int128 f2 (unsigned __int128 x, unsigned __int128 y) { return x - y; }
unsigned __int128 f3 (unsigned __int128 x, unsigned long long y) { return x + y; }
unsigned __int128 f4 (unsigned __int128 x, unsigned long long y) { return x - y; }
unsigned __int128 f5 (unsigned __int128 x, unsigned long long y) { return x + (unsigned long long) -y; }

This case is f5 from the above, where the value should be negated first in DImode and then added to the TImode value.  But combiner sees
(insn 7 4 10 2 (set (reg:DI 79)
        (neg:DI (reg/v:DI 77 [ y ]))) pr69305-1.c:5 331 {negdi2}
     (expr_list:REG_DEAD (reg/v:DI 77 [ y ])
        (nil)))
(insn 10 7 11 2 (parallel [
            (set (reg:CC_NZ 66 cc)
                (compare:CC_NZ (plus:DI (reg:DI 79)
                        (reg:DI 85 [ x ]))
                    (const_int 0 [0])))
            (set (reg:DI 81)
                (plus:DI (reg:DI 79)
                    (reg:DI 85 [ x ])))
        ]) pr69305-1.c:5 100 {adddi3_compare0}
     (expr_list:REG_DEAD (reg:DI 85 [ x ])
        (expr_list:REG_DEAD (reg:DI 79)
            (nil))))
(insn 11 10 25 2 (set (reg:DI 82)
        (plus:DI (geu:DI (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:DI 86 [ x+8 ]))) pr69305-1.c:5 452 {*csinc2di_insn}
     (expr_list:REG_DEAD (reg:DI 86 [ x+8 ])
        (expr_list:REG_DEAD (reg:CC 66 cc)
            (nil))))
and attempts to merge insn 7 into insn 10 by having a minus in there instead, and that pattern is then recognized as subdi3_compare0.  But probably csinc2di_insn actually relies on the preceeding insn being result of adddi3_compare instead of subdi3_compare.
Seems on arm32 or x86_64 we have a DImode (arm32) or TImode pattern for the addition that is only lowered later on, so this is something that really can't happen on those targets.
Comment 4 Jakub Jelinek 2016-01-18 13:36:24 UTC
Reduced testcase:
__attribute__((noinline, noclone)) unsigned __int128
foo (unsigned __int128 x, unsigned long long y)
{
  return x + (unsigned long long) -y;
}

int
main ()
{
  if (foo (70, 0) != 70)
    __builtin_abort ();
  return 0;
}
Comment 5 ktkachov 2016-01-18 13:40:40 UTC
Confirmed as well.
If combine changed the plus-compare into a minus-compare, shouldn't it also go into the condition code usage and update that too though?
Comment 6 ktkachov 2016-01-18 13:43:35 UTC
(In reply to ktkachov from comment #5)
> Confirmed as well.
> If combine changed the plus-compare into a minus-compare, shouldn't it also
> go into the condition code usage and update that too though?

scratch that, I was thinking of something else
Comment 7 Jakub Jelinek 2016-01-18 14:02:05 UTC
The patterns are just weird.
(insn 10 7 11 2 (parallel [
            (set (reg:CC_NZ 66 cc)
                (compare:CC_NZ (plus:DI (reg:DI 79)
                        (reg:DI 85 [ x ]))
                    (const_int 0 [0])))
            (set (reg:DI 81)
                (plus:DI (reg:DI 79)
                    (reg:DI 85 [ x ])))
        ]) pr69305-1.c:5 100 {adddi3_compare0}
     (expr_list:REG_DEAD (reg:DI 85 [ x ])
        (expr_list:REG_DEAD (reg:DI 79)
            (nil))))
(insn 11 10 25 2 (set (reg:DI 82)
        (plus:DI (geu:DI (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:DI 86 [ x+8 ]))) pr69305-1.c:5 452 {*csinc2di_insn}
     (expr_list:REG_DEAD (reg:DI 86 [ x+8 ])
        (expr_list:REG_DEAD (reg:CC 66 cc)
            (nil))))
This suggests that that you compare (a + b) cmp 0 and add 1 in the second insn if (a + b) u>= 0, which is always, while what you are actually interested in is whether (a + b) u< a (or equivalently (a + b) u< b), i.e. if there is an overflow in the addition.  E.g. on x86_64 (though, only after split2) it is represented that way:
(insn 33 13 34 2 (parallel [
            (set (reg:CCC 17 flags)
                (compare:CCC (plus:DI (reg:DI 0 ax [97])
                        (reg:DI 38 r9 [orig:90 x ] [90]))
                    (reg:DI 0 ax [97])))
            (set (reg:DI 0 ax [95])
                (plus:DI (reg:DI 0 ax [97])
                    (reg:DI 38 r9 [orig:90 x ] [90])))
        ]) pr69305-1.c:5 306 {*adddi3_cc_overflow_1}
     (nil))
(insn 34 33 20 2 (parallel [
            (set (reg:DI 1 dx [+8 ])
                (plus:DI (plus:DI (ltu:DI (reg:CC 17 flags)
                            (const_int 0 [0]))
                        (reg:DI 1 dx [+8 ]))
                    (reg:DI 39 r10 [ x+8 ])))
            (clobber (reg:CC 17 flags))
        ]) pr69305-1.c:5 284 {adddi3_carry}
     (nil))
Comment 8 ktkachov 2016-01-18 14:04:55 UTC
(In reply to Jakub Jelinek from comment #7)
> The patterns are just weird.

All that comes from the addti3 expander in aarch64.md
If I delete it the testcase doesn't abort.
I'll have a closer look there.
Comment 9 ktkachov 2016-01-18 15:16:40 UTC
(In reply to ktkachov from comment #8)
> (In reply to Jakub Jelinek from comment #7)
> > The patterns are just weird.
> 
> All that comes from the addti3 expander in aarch64.md
> If I delete it the testcase doesn't abort.
> I'll have a closer look there.

Every approach I've tried to implement addti3 properly (that is, comparing (a + b) u< a instead of (a + b) u>= 0) doesn't give better code than 4.9 which didn't have the custom expanders.

Richard, you added these expanders. Any ideas on how to implement them properly and better than the fallback?
Or should we just delete them?
Comment 10 Richard Henderson 2016-01-18 21:01:01 UTC
Mine.
Comment 11 Nick Clifton 2016-01-19 10:07:46 UTC
Created attachment 37391 [details]
Extended adddi3_compare pattern

(In reply to Richard Henderson from comment #10)
> Mine.

oops - I had just started looking at this one too. :-)

I have uploaded a possible patch, although I have not got very far in testing it yet.  Maybe it will be of some use ?

Cheers
  Nick
Comment 12 Richard Henderson 2016-01-24 20:39:34 UTC
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01829.html
Comment 13 Richard Henderson 2016-01-28 17:48:55 UTC
Author: rth
Date: Thu Jan 28 17:48:22 2016
New Revision: 232936

URL: https://gcc.gnu.org/viewcvs?rev=232936&root=gcc&view=rev
Log:
PR target/69305

  * config/aarch64/aarch64-modes.def (CC_Cmode): New
  * config/aarch64/aarch64-protos.h: Update.
  * config/aarch64/aarch64.c (aarch64_zero_extend_const_eq): New.
  (aarch64_select_cc_mode): Add check for use of CC_Cmode.
  (aarch64_get_condition_code_1): Handle CC_Cmode.
  * config/aarch64/aarch64.md (addti3): Use adddi3_compareC.
  (*add<mode>3_compareC_cconly_imm): New.
  (*add<mode>3_compareC_cconly): New.
  (*add<mode>3_compareC_imm): New.
  (add<mode>3_compareC): New.
  (add<mode>3_carryin, *addsi3_carryin_uxtw): Sort compare operand
  to be first.  Use aarch64_carry_operation.
  (*add<mode>3_carryin_alt1, *addsi3_carryin_alt1_uxtw): Remove.
  (*add<mode>3_carryin_alt2, *addsi3_carryin_alt2_uxtw): Remove.
  (*add<mode>3_carryin_alt3, *addsi3_carryin_alt3_uxtw): Remove.
  (subti3): Use subdi3_compare1.
  (*sub<mode>3_compare0): Rename from sub<mode>3_compare0.
  (sub<mode>3_compare1): New.
  (*sub<mode>3_carryin0, *subsi3_carryin_uxtw): New.
  (*sub<mode>3_carryin): Use aarch64_borrow_operation.
  (*subsi3_carryin_uxtw): Likewise.
  (*ngc<mode>, *ngcsi_uxtw): Likewise.
  (*sub<mode>3_carryin_alt, *subsi3_carryin_alt_uxtw): New.
  * config/aarch64/iterators.md (DWI): New.
  * config/aarch64/predicates.md (aarch64_carry_operation): New.
  (aarch64_borrow_operation): New.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/aarch64/aarch64-modes.def
    trunk/gcc/config/aarch64/aarch64-protos.h
    trunk/gcc/config/aarch64/aarch64.c
    trunk/gcc/config/aarch64/aarch64.md
    trunk/gcc/config/aarch64/iterators.md
    trunk/gcc/config/aarch64/predicates.md
    trunk/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
    trunk/gcc/testsuite/gcc.target/aarch64/tst_3.c
Comment 14 Richard Henderson 2016-01-29 19:43:50 UTC
Fixed for gcc-6.
Comment 15 Richard Henderson 2016-02-01 07:07:25 UTC
Author: rth
Date: Mon Feb  1 07:06:53 2016
New Revision: 233031

URL: https://gcc.gnu.org/viewcvs?rev=233031&root=gcc&view=rev
Log:
PR target/69305

  * config/aarch64/aarch64-modes.def (CC_Cmode): New
  * config/aarch64/aarch64-protos.h: Update.
  * config/aarch64/aarch64.c (aarch64_zero_extend_const_eq): New.
  (aarch64_select_cc_mode): Add check for use of CC_Cmode.
  (aarch64_get_condition_code_1): Handle CC_Cmode.
  * config/aarch64/aarch64.md (addti3): Use adddi3_compareC.
  (*add<mode>3_compareC_cconly_imm): New.
  (*add<mode>3_compareC_cconly): New.
  (*add<mode>3_compareC_imm): New.
  (add<mode>3_compareC): New.
  (add<mode>3_carryin, *addsi3_carryin_uxtw): Sort compare operand
  to be first.  Use aarch64_carry_operation.
  (*add<mode>3_carryin_alt1, *addsi3_carryin_alt1_uxtw): Remove.
  (*add<mode>3_carryin_alt2, *addsi3_carryin_alt2_uxtw): Remove.
  (*add<mode>3_carryin_alt3, *addsi3_carryin_alt3_uxtw): Remove.
  (subti3): Use subdi3_compare1.
  (*sub<mode>3_compare0): Rename from sub<mode>3_compare0.
  (sub<mode>3_compare1): New.
  (*sub<mode>3_carryin0, *subsi3_carryin_uxtw): New.
  (*sub<mode>3_carryin): Use aarch64_borrow_operation.
  (*subsi3_carryin_uxtw): Likewise.
  (*ngc<mode>, *ngcsi_uxtw): Likewise.
  (*sub<mode>3_carryin_alt, *subsi3_carryin_alt_uxtw): New.
  * config/aarch64/iterators.md (DWI): New.
  * config/aarch64/predicates.md (aarch64_carry_operation): New.
  (aarch64_borrow_operation): New.

Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/config/aarch64/aarch64-modes.def
    branches/gcc-5-branch/gcc/config/aarch64/aarch64-protos.h
    branches/gcc-5-branch/gcc/config/aarch64/aarch64.c
    branches/gcc-5-branch/gcc/config/aarch64/aarch64.md
    branches/gcc-5-branch/gcc/config/aarch64/iterators.md
    branches/gcc-5-branch/gcc/config/aarch64/predicates.md
Comment 16 Richard Henderson 2016-02-01 07:11:08 UTC
Fixed for gcc-5 too.