This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/86891] [9 Regression] wrong code with -O -frerun-cse-after-loop -fno-tree-dominator-opts -fno-tree-fre


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86891

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |collison at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org,
                   |                            |ktkachov at gcc dot gnu.org,
                   |                            |rth at gcc dot gnu.org
          Component|tree-optimization           |target

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r262890.
I believe the bug is in the RTL of the *sub<mode>3_carryinCV* patterns (though
I admit I don't know exactly what those insns do on aarch64).

Through -O and disabling of some GIMPLE opts we end up with arguments to
SUB_OVERFLOW that aren't constant during expansion, but will be turned into
constants during RTL optimizations.

We effectively are checking if unsigned subtraction 0xfff9U -
0xfffffffffffffffffffffffffffffff8U overflows.
The aarch64 expander has a pattern for usubvti4, and on input we have
that 0xfff9U constant in (reg:TI 111) and the -(unsigned __int128) 8 constant
in (reg:TI 117).  The expander emits:
(insn 23 22 24 2 (parallel [
            (set (reg:CC 66 cc)
                (compare:CC (subreg:DI (reg:TI 111) 0)
                    (subreg:DI (reg:TI 117) 0)))
            (set (reg:DI 121)
                (minus:DI (subreg:DI (reg:TI 111) 0)
                    (subreg:DI (reg:TI 117) 0)))
        ]) "pr86891.c":9 298 {subdi3_compare1}
     (nil))
(insn 24 23 25 2 (set (reg:DI 123)
        (subreg:DI (reg:TI 111) 8)) "pr86891.c":9 47 {*movdi_aarch64}
     (nil))
(insn 25 24 26 2 (parallel [
            (set (reg:CC 66 cc)
                (compare:CC (sign_extend:TI (reg:DI 123))
                    (plus:TI (sign_extend:TI (subreg:DI (reg:TI 117) 8))
                        (ltu:TI (reg:CC 66 cc)
                            (const_int 0 [0])))))
            (set (reg:DI 122)
                (minus:DI (minus:DI (reg:DI 123)
                        (subreg:DI (reg:TI 117) 8))
                    (ltu:DI (reg:CC 66 cc)
                        (const_int 0 [0]))))
        ]) "pr86891.c":9 367 {*subdi3_carryinCV}
     (nil))
(insn 26 25 27 2 (set (subreg:DI (reg:TI 120) 0)
        (reg:DI 121)) "pr86891.c":9 47 {*movdi_aarch64}
     (nil))
(insn 27 26 28 2 (set (subreg:DI (reg:TI 120) 8)
        (reg:DI 122)) "pr86891.c":9 47 {*movdi_aarch64}
     (nil))
(jump_insn 28 27 71 2 (set (pc)
        (if_then_else (ltu (reg:CC 66 cc)
                (const_int 0 [0]))
            (label_ref 31)
            (pc))) "pr86891.c":9 9 {condjump}
     (int_list:REG_BR_PROB 536868 (nil))
 -> 31)

I assume these instructions DTRT at runtime.
Now consider what happens when you propagate those constants into that RTL.
insn 23 sets pseudo 121 to 0xfff9ULL - (-8ULL), i.e. 0x10001ULL with overflow
(first argument is smaller than second argument), so I expect carry is set.

Now, insn 25 I'd hope performs r122 = 0ULL - (-1ULL) - 1 (the last being
carry), which is I think modelled properly by the (set (reg:DI 122) (minus
...)) part of the insn.
What is incorrect is the compare part of insn 25.  Because it is compare 0
(-(unsigned __int128)1 + carry), i.e. comparison of 0 and 0, so no CC is set,
even when I hope
the instruction actually should set carry (if it doesn't it couldn't be used
even at runtime, because 0xfff9ULL - (-(unsigned __int128)8) does overflow.

Note, the i386.md pattern which I'd think would be similar in what it does at
runtime, is:
(define_insn "sub<mode>3_carry_ccc"
  [(set (reg:CCC FLAGS_REG)
        (compare:CCC
          (zero_extend:<DWI> (match_operand:DWIH 1 "register_operand" "0"))
          (plus:<DWI>
            (ltu:<DWI> (reg:CC FLAGS_REG) (const_int 0))
            (zero_extend:<DWI>
              (match_operand:DWIH 2 "x86_64_sext_operand" "rmWe")))))
   (clobber (match_scratch:DWIH 0 "=r"))]
  ""
  "sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "mode" "<MODE>")])

i.e. instead of comparing (sign_extend:TI (operand 1)) with (plus:TI
(sign_extend:TI (operand 2)) (ltu:TI (cc) (const_int 0))) in aarch64 backend it
compares
(zero_extend:TI (operand 1)) with (plus:TI (ltu:TI (cc) (const_int 0))
(zero_extend:TI (operand 2)).  Not really sure if the order of carry and
extension of operand 2 matters that much
(needs to be checked in combine log if it should be matches by combiner in some
cases), but I think the zero_extend vs. sign_extend is significant.

Now, looking at what aarch64 does for add with carry, there are separate
patterns like add<mode>3_carryinC which set CC_C mode and use zero_extend and
add<mode>3_carryinV which sets CC_V mode and uses sign_extend.
So, shouldn't sub<mode>3_carryin{C,V} be split similarly and if we check carry
flag, we should use subdi3_carryinC?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]