This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/86891] [9 Regression] wrong code with -O -frerun-cse-after-loop -fno-tree-dominator-opts -fno-tree-fre
- From: "jakub at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 14 Nov 2018 16:03:51 +0000
- Subject: [Bug target/86891] [9 Regression] wrong code with -O -frerun-cse-after-loop -fno-tree-dominator-opts -fno-tree-fre
- Auto-submitted: auto-generated
- References: <bug-86891-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86891
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |collison at gcc dot gnu.org,
| |jakub at gcc dot gnu.org,
| |ktkachov at gcc dot gnu.org,
| |rth at gcc dot gnu.org
Component|tree-optimization |target
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r262890.
I believe the bug is in the RTL of the *sub<mode>3_carryinCV* patterns (though
I admit I don't know exactly what those insns do on aarch64).
Through -O and disabling of some GIMPLE opts we end up with arguments to
SUB_OVERFLOW that aren't constant during expansion, but will be turned into
constants during RTL optimizations.
We effectively are checking if unsigned subtraction 0xfff9U -
0xfffffffffffffffffffffffffffffff8U overflows.
The aarch64 expander has a pattern for usubvti4, and on input we have
that 0xfff9U constant in (reg:TI 111) and the -(unsigned __int128) 8 constant
in (reg:TI 117). The expander emits:
(insn 23 22 24 2 (parallel [
(set (reg:CC 66 cc)
(compare:CC (subreg:DI (reg:TI 111) 0)
(subreg:DI (reg:TI 117) 0)))
(set (reg:DI 121)
(minus:DI (subreg:DI (reg:TI 111) 0)
(subreg:DI (reg:TI 117) 0)))
]) "pr86891.c":9 298 {subdi3_compare1}
(nil))
(insn 24 23 25 2 (set (reg:DI 123)
(subreg:DI (reg:TI 111) 8)) "pr86891.c":9 47 {*movdi_aarch64}
(nil))
(insn 25 24 26 2 (parallel [
(set (reg:CC 66 cc)
(compare:CC (sign_extend:TI (reg:DI 123))
(plus:TI (sign_extend:TI (subreg:DI (reg:TI 117) 8))
(ltu:TI (reg:CC 66 cc)
(const_int 0 [0])))))
(set (reg:DI 122)
(minus:DI (minus:DI (reg:DI 123)
(subreg:DI (reg:TI 117) 8))
(ltu:DI (reg:CC 66 cc)
(const_int 0 [0]))))
]) "pr86891.c":9 367 {*subdi3_carryinCV}
(nil))
(insn 26 25 27 2 (set (subreg:DI (reg:TI 120) 0)
(reg:DI 121)) "pr86891.c":9 47 {*movdi_aarch64}
(nil))
(insn 27 26 28 2 (set (subreg:DI (reg:TI 120) 8)
(reg:DI 122)) "pr86891.c":9 47 {*movdi_aarch64}
(nil))
(jump_insn 28 27 71 2 (set (pc)
(if_then_else (ltu (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref 31)
(pc))) "pr86891.c":9 9 {condjump}
(int_list:REG_BR_PROB 536868 (nil))
-> 31)
I assume these instructions DTRT at runtime.
Now consider what happens when you propagate those constants into that RTL.
insn 23 sets pseudo 121 to 0xfff9ULL - (-8ULL), i.e. 0x10001ULL with overflow
(first argument is smaller than second argument), so I expect carry is set.
Now, insn 25 I'd hope performs r122 = 0ULL - (-1ULL) - 1 (the last being
carry), which is I think modelled properly by the (set (reg:DI 122) (minus
...)) part of the insn.
What is incorrect is the compare part of insn 25. Because it is compare 0
(-(unsigned __int128)1 + carry), i.e. comparison of 0 and 0, so no CC is set,
even when I hope
the instruction actually should set carry (if it doesn't it couldn't be used
even at runtime, because 0xfff9ULL - (-(unsigned __int128)8) does overflow.
Note, the i386.md pattern which I'd think would be similar in what it does at
runtime, is:
(define_insn "sub<mode>3_carry_ccc"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extend:<DWI> (match_operand:DWIH 1 "register_operand" "0"))
(plus:<DWI>
(ltu:<DWI> (reg:CC FLAGS_REG) (const_int 0))
(zero_extend:<DWI>
(match_operand:DWIH 2 "x86_64_sext_operand" "rmWe")))))
(clobber (match_scratch:DWIH 0 "=r"))]
""
"sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
[(set_attr "type" "alu")
(set_attr "mode" "<MODE>")])
i.e. instead of comparing (sign_extend:TI (operand 1)) with (plus:TI
(sign_extend:TI (operand 2)) (ltu:TI (cc) (const_int 0))) in aarch64 backend it
compares
(zero_extend:TI (operand 1)) with (plus:TI (ltu:TI (cc) (const_int 0))
(zero_extend:TI (operand 2)). Not really sure if the order of carry and
extension of operand 2 matters that much
(needs to be checked in combine log if it should be matches by combiner in some
cases), but I think the zero_extend vs. sign_extend is significant.
Now, looking at what aarch64 does for add with carry, there are separate
patterns like add<mode>3_carryinC which set CC_C mode and use zero_extend and
add<mode>3_carryinV which sets CC_V mode and uses sign_extend.
So, shouldn't sub<mode>3_carryin{C,V} be split similarly and if we check carry
flag, we should use subdi3_carryinC?