This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RFC - COST of const_double for x86 prevents constant copy propagation in cse
- From: Fariborz Jahanian <fjahanian at apple dot com>
- To: gcc mailing list <gcc at gcc dot gnu dot org>
- Cc: Ian Lance Taylor <ian at airs dot com>
- Date: Thu, 25 Aug 2005 11:09:13 -0700
- Subject: RFC - COST of const_double for x86 prevents constant copy propagation in cse
(Note! I am starting a new thread of an old thread because of old
thread's corruption which prevented me from responding).
Following test case:
struct S {
double d1, d2, d3;
};
struct S ms()
{
struct S s = {0,0,0};
return s;
}
Compiled with -O1 -mdynamic-no-pic -march=pentium4 produces:
pxor %xmm0, %xmm0
movsd %xmm0, 16(%eax)
movsd %xmm0, 8(%eax)
movsd %xmm0, (%eax)
But following code results in 7% performance gain in eon as reported
by one of Apple's performance people:
movl $0, 16(%eax)
movl $0, 20(%eax)
movl $0, 8(%eax)
movl $0, 12(%eax)
movl $0, (%eax)
movl $0, 4(%eax)
This is because cse does not do the constant propagation in this rtl
(note that cse is capable of grabbing a constant from REG_EQUAL ).
(insn 12 7 13 0 (set (reg:DF 59)
(mem/u/i:DF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S8
A64])) 64 {*movdf_nointeger} (nil)
(expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
(nil)))
(insn 13 12 15 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ])
(const_int 16 [0x10])) [0 <result>.d3+0 S8 A32])
(reg:DF 59)) 64 {*movdf_nointeger} (nil)
(nil))
(insn 15 13 17 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ])
(const_int 8 [0x8])) [0 <result>.d2+0 S8 A32])
(reg:DF 59)) 64 {*movdf_nointeger} (nil)
(nil))
(insn 17 15 20 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1470 ]) [0
<result>.d1+0 S8 A32])
(reg:DF 59)) 64 {*movdf_nointeger} (nil)
(nil))
And the reason that it is not doing it is the definition of COST
macro which returns a higher cost for const_double than when constant
is available in a register. For x86 platform, this cost is evaluated
in call to ix86_rtx_costs. It returns 1 or 2. I had a lengthy
conversation with Ian Lance Taylor. He suggested to lower the
const_double cost to 0. And indeed, this will lower the cost so COST
of const_double constant wins. But careful selection of this cost in
ix86_rtx_costs makes me cautious that this may break performance on
some other flavors of x86 architecture and/or on some other
benchmarks. Any comments from those familiar with this cost function
(or any other way that cse to do its job, such as a special new cost
function) is appreciated.
- Thanks, fariborz (fjahanian@apple.com).