[Bug target/81456] [7/8 Regression] x86-64 optimizer makes wrong decision when optimizing for size
jgreenhalgh at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jul 17 16:57:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81456
--- Comment #2 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #1)
> Confirmed, started with r238594.
The cost model relies on the target giving a reasonable approximation for an
instruction size through ix86_rtx_costs.
The basic branch structure looks like:
t = mod
if (a / b % 2)
t = b - mod
In RTL, this looks like:
(insn 14 13 15 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 99)
(const_int 0 [0]))) "foo.c":5 3 {*cmpsi_ccno_1}
(expr_list:REG_DEAD (reg:SI 99)
(nil)))
(jump_insn 15 14 16 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 22)
(pc))) "foo.c":5 617 {*jcc_1}
(expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 20000 (nil)))
-> 22)
(note 16 15 17 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 17 16 22 3 (parallel [
(set (reg/v:SI 93 [ <retval> ])
(minus:SI (reg/v:SI 95 [ b ])
(reg/v:SI 93 [ <retval> ])))
(clobber (reg:CC 17 flags))
]) "foo.c":5 273 {*subsi_1}
(expr_list:REG_DEAD (reg/v:SI 95 [ b ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
(code_label 22 17 25 4 1 (nil) [1 uses])
That is to say, we're starting with a comparison, a branch and a subtract. We
want to know if that sequence is cheaper than a subtract a and conditional
select.
In the cost model, we take an approximation for the branch and comparison of
COST_N_INSNS(2) and the backend tells us the cost of a subtract is
COST_N_INSNS(1). Thus, the cost before transformation is COST_N_INSNS (3) ==
12.
After the transformation, we create this RTL:
(insn 31 0 32 (set (reg:SI 102)
(reg/v:SI 93 [ <retval> ])) 82 {*movsi_internal}
(nil))
(insn 32 31 33 (parallel [
(set (reg:SI 101)
(minus:SI (reg/v:SI 95 [ b ])
(reg/v:SI 93 [ <retval> ])))
(clobber (reg:CC 17 flags))
]) 273 {*subsi_1}
(nil))
(insn 33 32 34 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 99)
(const_int 0 [0]))) 3 {*cmpsi_ccno_1}
(nil))
(insn 34 33 0 (set (reg/v:SI 93 [ <retval> ])
(if_then_else:SI (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:SI 101)
(reg:SI 102))) 966 {*movsicc_noc}
(nil))
That is a set to protect the "false" value, the same subtract, a comparison to
set the flags, and a conditional move. When we ask the backend to give us costs
for this it gives us COST_N_INSNS(1) for the set, COST_N_INSNS(1) for the
subtract, COST_N_INSNS(1) for the comparison, and COST_N_INSNS(2) for the
conditional move. That's a total cost of COST_N_INSNS(5) == 20 for the whole
sequence. 20 > 12, so from the perspective of the ifcvt cost model this is a
bad transformation.
Note that ifcvt is not aware that an extra set will be introduced after the
original subtract, nor does it care about the final movl %edx, %eax as that is
unconditional. I thinks it is being asked to trade test, branch, subtract for
set, subtract, test branch - when you spell it out like that it should be clear
why it makes the decision it does.
I can't treproduce your comment about -m32 - I still see branches at -Os.
More information about the Gcc-bugs
mailing list