This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/68381] New: [6 Regression] wrong code and quality regression with __builtin_mul_overflow() @ aarch64


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68381

            Bug ID: 68381
           Summary: [6 Regression] wrong code and quality regression with
                    __builtin_mul_overflow() @ aarch64
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zsojka at seznam dot cz
  Target Milestone: ---

Created attachment 36733
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36733&action=edit
reduced testcase

Output:
$ aarch64-unknown-linux-gnu-gcc -O -fexpensive-optimizations -fno-tree-bit-ccp 
testcase.c
$ ./a.out
Aborted

The function foo() is miscompiled.

5-branch output:
foo:
        uxth    w0, w0
        uxth    w1, w1
        umull   x0, w0, w1
        tbz     w0, #31, .L7
        stp     x29, x30, [sp, -16]!
        add     x29, sp, 0
        bl      abort
.L7:
        ret

trunk output:
foo:
        tbnz    w0, #31, .L10
        ret
.L10:
        stp     x29, x30, [sp, -16]!
        add     x29, sp, 0
        bl      abort


Things seem to break in .combine if -fexpensive-optimisations is enabled.
Before .combine, there is:

(insn 2 5 3 2 (set (reg/v:SI 80 [ xD.2712 ])
        (zero_extend:SI (reg:HI 0 x0 [ xD.2712 ]))) testcase.c:3 82
{*zero_extendhisi2_aarch64}
     (expr_list:REG_DEAD (reg:HI 0 x0 [ xD.2712 ])
        (nil)))
(insn 3 2 4 2 (set (reg/v:SI 81 [ yD.2713 ])
        (zero_extend:SI (reg:HI 1 x1 [ yD.2713 ]))) testcase.c:3 82
{*zero_extendhisi2_aarch64}
     (expr_list:REG_DEAD (reg:HI 1 x1 [ yD.2713 ])
        (nil)))
(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 4 8 2 (set (reg:SI 76 [ _5+4 ])
        (const_int 0 [0])) testcase.c:5 39 {*movsi_aarch64}
     (nil))
(insn 8 7 9 2 (set (reg:DI 82)
        (mult:DI (zero_extend:DI (reg/v:SI 80 [ xD.2712 ]))
            (zero_extend:DI (reg/v:SI 81 [ yD.2713 ])))) testcase.c:5 360
{umulsidi3}
     (expr_list:REG_DEAD (reg/v:SI 81 [ yD.2713 ])
        (expr_list:REG_DEAD (reg/v:SI 80 [ xD.2712 ])
            (nil))))
(insn 9 8 10 2 (set (reg:DI 83)
        (lshiftrt:DI (reg:DI 82)
            (const_int 32 [0x20]))) testcase.c:5 614
{*aarch64_lshr_sisd_or_int_di3}
     (nil))
(insn 10 9 39 2 (set (reg:CC 66 cc)
        (compare:CC (subreg:SI (reg:DI 83) 0)
            (const_int 0 [0]))) testcase.c:5 375 {*cmpsi}
     (expr_list:REG_UNUSED (reg:CC 66 cc)
        (nil)))
...
(insn 43 42 44 2 (set (reg:CC 66 cc)
        (compare:CC (subreg:SI (reg:DI 82) 0)
            (const_int 0 [0]))) testcase.c:5 375 {*cmpsi}
     (nil))


and .combine shows:

Trying 2, 8, 9 -> 10:
Successfully matched this instruction:
(set (reg:DI 83)
    (const_int 0 [0]))
(const_int 0 [0])

which seems to miss the parallel set of reg 82.



The performance regression is at -O3:
5-branch output:
foo:
        uxth    x0, w0  // xD.2664, xD.2664
        uxth    x1, w1  // yD.2665, yD.2665
        mul     x0, x0, x1      // tmp84, xD.2664, yD.2665
        cmp     x0, x0, sxtw    // tmp84, tmp84
        bne     .L9     //,
        ret
.L9:
        stp     x29, x30, [sp, -16]!    //,,,
        add     x29, sp, 0      //,,
        bl      abort   //

trunk output:
foo:
        uxth    w0, w0  // xD.2712, xD.2712
        uxth    w1, w1  // yD.2713, yD.2713
        umull   x0, w0, w1      // tmp81, xD.2712, yD.2713
        tbnz    w0, #31, .L6    // tmp81,
        mov     w2, 0   // _5,
        cbnz    w2, .L6 // _5,
        ret
.L6:
        stp     x29, x30, [sp, -16]!    //,,,
        add     x29, sp, 0      //,,
        bl      abort   //


The code:
        mov     w2, 0   // _5,
        cbnz    w2, .L6 // _5,
seems to be absolutely unneeded.


I don't know if the wrong-code and missed-optimization is related.



$ aarch64-unknown-linux-gnu-gcc -v                                        
Using built-in specs.
COLLECT_GCC=/mnt/svn/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/mnt/svn/gcc-trunk/binary-230409-checking-yes-rtl-df-nographite-aarch64/libexec/gcc/aarch64-unknown-linux-gnu/6.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /mnt/svn/gcc-trunk//configure --enable-checking=yes,rtl,df
--enable-languages=c,c++
--prefix=/mnt/svn/gcc-trunk/binary-230409-checking-yes-rtl-df-nographite-aarch64/
--without-cloog --without-ppl --without-isl --host=x86_64-pc-linux-gnu
--target=aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu
--with-sysroot=/home/aarch64-chroot
--with-as=/usr/libexec/gcc/aarch64-unknown-linux-gnu/as
--with-ld=/usr/libexec/gcc/aarch64-unknown-linux-gnu/ld
Thread model: posix
gcc version 6.0.0 20151116 (experimental) (GCC) 


Tested revisions:
trunk r230409 - FAIL
5-branch r229483 - OK

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]