This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/68381] New: [6 Regression] wrong code and quality regression with __builtin_mul_overflow() @ aarch64
- From: "zsojka at seznam dot cz" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 17 Nov 2015 12:46:21 +0000
- Subject: [Bug target/68381] New: [6 Regression] wrong code and quality regression with __builtin_mul_overflow() @ aarch64
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68381
Bug ID: 68381
Summary: [6 Regression] wrong code and quality regression with
__builtin_mul_overflow() @ aarch64
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: zsojka at seznam dot cz
Target Milestone: ---
Created attachment 36733
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36733&action=edit
reduced testcase
Output:
$ aarch64-unknown-linux-gnu-gcc -O -fexpensive-optimizations -fno-tree-bit-ccp
testcase.c
$ ./a.out
Aborted
The function foo() is miscompiled.
5-branch output:
foo:
uxth w0, w0
uxth w1, w1
umull x0, w0, w1
tbz w0, #31, .L7
stp x29, x30, [sp, -16]!
add x29, sp, 0
bl abort
.L7:
ret
trunk output:
foo:
tbnz w0, #31, .L10
ret
.L10:
stp x29, x30, [sp, -16]!
add x29, sp, 0
bl abort
Things seem to break in .combine if -fexpensive-optimisations is enabled.
Before .combine, there is:
(insn 2 5 3 2 (set (reg/v:SI 80 [ xD.2712 ])
(zero_extend:SI (reg:HI 0 x0 [ xD.2712 ]))) testcase.c:3 82
{*zero_extendhisi2_aarch64}
(expr_list:REG_DEAD (reg:HI 0 x0 [ xD.2712 ])
(nil)))
(insn 3 2 4 2 (set (reg/v:SI 81 [ yD.2713 ])
(zero_extend:SI (reg:HI 1 x1 [ yD.2713 ]))) testcase.c:3 82
{*zero_extendhisi2_aarch64}
(expr_list:REG_DEAD (reg:HI 1 x1 [ yD.2713 ])
(nil)))
(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 4 8 2 (set (reg:SI 76 [ _5+4 ])
(const_int 0 [0])) testcase.c:5 39 {*movsi_aarch64}
(nil))
(insn 8 7 9 2 (set (reg:DI 82)
(mult:DI (zero_extend:DI (reg/v:SI 80 [ xD.2712 ]))
(zero_extend:DI (reg/v:SI 81 [ yD.2713 ])))) testcase.c:5 360
{umulsidi3}
(expr_list:REG_DEAD (reg/v:SI 81 [ yD.2713 ])
(expr_list:REG_DEAD (reg/v:SI 80 [ xD.2712 ])
(nil))))
(insn 9 8 10 2 (set (reg:DI 83)
(lshiftrt:DI (reg:DI 82)
(const_int 32 [0x20]))) testcase.c:5 614
{*aarch64_lshr_sisd_or_int_di3}
(nil))
(insn 10 9 39 2 (set (reg:CC 66 cc)
(compare:CC (subreg:SI (reg:DI 83) 0)
(const_int 0 [0]))) testcase.c:5 375 {*cmpsi}
(expr_list:REG_UNUSED (reg:CC 66 cc)
(nil)))
...
(insn 43 42 44 2 (set (reg:CC 66 cc)
(compare:CC (subreg:SI (reg:DI 82) 0)
(const_int 0 [0]))) testcase.c:5 375 {*cmpsi}
(nil))
and .combine shows:
Trying 2, 8, 9 -> 10:
Successfully matched this instruction:
(set (reg:DI 83)
(const_int 0 [0]))
(const_int 0 [0])
which seems to miss the parallel set of reg 82.
The performance regression is at -O3:
5-branch output:
foo:
uxth x0, w0 // xD.2664, xD.2664
uxth x1, w1 // yD.2665, yD.2665
mul x0, x0, x1 // tmp84, xD.2664, yD.2665
cmp x0, x0, sxtw // tmp84, tmp84
bne .L9 //,
ret
.L9:
stp x29, x30, [sp, -16]! //,,,
add x29, sp, 0 //,,
bl abort //
trunk output:
foo:
uxth w0, w0 // xD.2712, xD.2712
uxth w1, w1 // yD.2713, yD.2713
umull x0, w0, w1 // tmp81, xD.2712, yD.2713
tbnz w0, #31, .L6 // tmp81,
mov w2, 0 // _5,
cbnz w2, .L6 // _5,
ret
.L6:
stp x29, x30, [sp, -16]! //,,,
add x29, sp, 0 //,,
bl abort //
The code:
mov w2, 0 // _5,
cbnz w2, .L6 // _5,
seems to be absolutely unneeded.
I don't know if the wrong-code and missed-optimization is related.
$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/mnt/svn/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/mnt/svn/gcc-trunk/binary-230409-checking-yes-rtl-df-nographite-aarch64/libexec/gcc/aarch64-unknown-linux-gnu/6.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /mnt/svn/gcc-trunk//configure --enable-checking=yes,rtl,df
--enable-languages=c,c++
--prefix=/mnt/svn/gcc-trunk/binary-230409-checking-yes-rtl-df-nographite-aarch64/
--without-cloog --without-ppl --without-isl --host=x86_64-pc-linux-gnu
--target=aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu
--with-sysroot=/home/aarch64-chroot
--with-as=/usr/libexec/gcc/aarch64-unknown-linux-gnu/as
--with-ld=/usr/libexec/gcc/aarch64-unknown-linux-gnu/ld
Thread model: posix
gcc version 6.0.0 20151116 (experimental) (GCC)
Tested revisions:
trunk r230409 - FAIL
5-branch r229483 - OK