[Bug rtl-optimization/93565] Combine duplicates count trailing zero instructions
segher at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Feb 4 23:38:00 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565
Segher Boessenkool <segher at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |segher at gcc dot gnu.org
--- Comment #1 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Well, on power9 I get just
cmpdi 0,3,0
beq 0,.L2
cnttzd 3,3
sldi 9,3,2
lwzx 9,4,9
or 3,9,3
stw 3,0(4)
.L2:
li 3,0
blr
so it is more than just CTZ_DEFINED_VALUE_AT_ZERO = 2 .
(Also on power7, power8, but those don't have that neat ctz insn).
On aarch64, combine starts with
insn_cost 4 for 43: r106:DI=x0:DI
REG_DEAD x0:DI
insn_cost 4 for 2: r98:DI=r106:DI
REG_DEAD r106:DI
insn_cost 4 for 44: r107:DI=x1:DI
REG_DEAD x1:DI
insn_cost 4 for 3: r99:DI=r107:DI
REG_DEAD r107:DI
insn_cost 4 for 7: cc:CC=cmp(r98:DI,0)
insn_cost 4 for 8: pc={(cc:CC==0)?L17:pc}
REG_DEAD cc:CC
REG_BR_PROB 536870916
insn_cost 4 for 10: r100:DI=ctz(r98:DI)
REG_DEAD r98:DI
insn_cost 4 for 12: r101:DI=sign_extend(r100:DI#0)
insn_cost 16 for 14: r104:SI=[r101:DI*0x4+r99:DI]
REG_DEAD r101:DI
insn_cost 4 for 15: r103:SI=r104:SI|r100:DI#0
REG_DEAD r104:SI
REG_DEAD r100:DI
insn_cost 4 for 16: [r99:DI]=r103:SI
REG_DEAD r103:SI
REG_DEAD r99:DI
insn_cost 4 for 23: x0:DI=0
insn_cost 0 for 24: use x0:DI
r100 (set in 10) is used later, just like r101 (set in 12).
Trying 10 -> 12:
10: r100:DI=ctz(r98:DI)
REG_DEAD r98:DI
12: r101:DI=sign_extend(r100:DI#0)
Successfully matched this instruction:
(set (reg:DI 100)
(ctz:DI (reg/v:DI 98 [ x ])))
Successfully matched this instruction:
(set (reg:DI 101 [ _9 ])
(ctz:DI (reg/v:DI 98 [ x ])))
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement costs 4 + 4 = 8
So, it is *not* duplicating the ctz: the duplicate was already there to start
with, in some sense.
More information about the Gcc-bugs
mailing list