This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC - COST of const_double for x86 prevents constant copy propagation in cse


(Note! I am starting a new thread of an old thread because of old thread's corruption which prevented me from responding).

Following test case:

struct S {
        double d1, d2, d3;
};

struct S ms()
{
        struct S s = {0,0,0};
        return s;
}

Compiled with -O1 -mdynamic-no-pic -march=pentium4 produces:

        pxor    %xmm0, %xmm0
        movsd   %xmm0, 16(%eax)
        movsd   %xmm0, 8(%eax)
        movsd   %xmm0, (%eax)

But following code results in 7% performance gain in eon as reported by one of Apple's performance people:

        movl    $0, 16(%eax)
        movl    $0, 20(%eax)
        movl    $0, 8(%eax)
        movl    $0, 12(%eax)
        movl    $0, (%eax)
        movl    $0, 4(%eax)

This is because cse does not do the constant propagation in this rtl (note that cse is capable of grabbing a constant from REG_EQUAL ).

(insn 12 7 13 0 (set (reg:DF 59)
(mem/u/i:DF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S8 A64])) 64 {*movdf_nointeger} (nil)
(expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
(nil)))


(insn 13 12 15 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ])
                (const_int 16 [0x10])) [0 <result>.d3+0 S8 A32])
        (reg:DF 59)) 64 {*movdf_nointeger} (nil)
    (nil))

(insn 15 13 17 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ])
                (const_int 8 [0x8])) [0 <result>.d2+0 S8 A32])
        (reg:DF 59)) 64 {*movdf_nointeger} (nil)
    (nil))

(insn 17 15 20 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1470 ]) [0 <result>.d1+0 S8 A32])
(reg:DF 59)) 64 {*movdf_nointeger} (nil)
(nil))


And the reason that it is not doing it is the definition of COST macro which returns a higher cost for const_double than when constant is available in a register. For x86 platform, this cost is evaluated in call to ix86_rtx_costs. It returns 1 or 2. I had a lengthy conversation with Ian Lance Taylor. He suggested to lower the const_double cost to 0. And indeed, this will lower the cost so COST of const_double constant wins. But careful selection of this cost in ix86_rtx_costs makes me cautious that this may break performance on some other flavors of x86 architecture and/or on some other benchmarks. Any comments from those familiar with this cost function (or any other way that cse to do its job, such as a special new cost function) is appreciated.

- Thanks, fariborz (fjahanian@apple.com).





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]