[PATCH, i386]: Fix PR target/13958:Conversion from unsigned to double is painfully slow on P4

Fri Mar 21 20:52:00 GMT 2008

Hello!

Due to store forwarding penalty (this is how partial memory access is 
called nowadays), the code from PR runs "painfully" slow:

--cut here--
unsigned a[2]={1,2};

inline unsigned foo1(int i) { return a[i]; }

int main()
{
    double x=0;
    int    i;

    for ( i=0; i<100000000; ++i )
        x+=foo1(i%2);

    return (int)x;
}
--cut here--

The inner loop is compiled (-O2 -march=pentium4 -malign-double) to:

.L4:
        movl    %ecx, %eax
        andl    $1, %eax
        movl    a(,%eax,4), %eax
        xorl    %edx, %edx
(*)    pushl   %edx
(*)    pushl   %eax
(*)    fildll  (%esp)
        addl    $8, %esp
        faddp   %st, %st(1)
        addl    $1, %ecx
        cmpl    $100000000, %ecx
        jne     .L4

Instructions marked with (*) form partial memory access.

Runtime:

time ./a.out

real    0m0.794s
user    0m0.724s
sys     0m0.000s

Patched gcc creates:

.L4:
        movl    %edx, %eax
        andl    $1, %eax
        movd    a(,%eax,4), %xmm0
        movq    %xmm0, -16(%ebp)
        fildll  -16(%ebp)
        faddp   %st, %st(1)
        addl    $1, %edx
        cmpl    $100000000, %edx
        jne     .L4

time ./a.out

real    0m0.123s
user    0m0.124s
sys     0m0.000s

This represents more than 5.8x speedup on what is claimed as:

--quote--

Btw, such conversions are quite common in numerical codes that deal
with uniform grids: the array index can be used as a coordinate (usually
after some trivial scaling). Given that the indices used in libstdc++
are usually of the type size_t the slow conversion can have quite a
negative performance impact.

--unqoute--

I guess that such a speedup comes quite handy. This code prefers DImode 
aligned to 8, since we are dealing with real DImode values. H.J. - 
should we align DImode values to 8 for TARGET_MMX/TARGET_SSE ?

2008-03-21  Uros Bizjak  <ubizjak@gmail.com>

        PR target/13958
        * config/i386/i386.md ("*floatunssi<mode2>_1"): New pattern with
        corresponding post-reload splitters.
        ("floatunssi<mode>2"): Expand to unsigned_float x87 insn pattern
        when x87 FP math is selected.
        * config/i386/i386-protos.h (ix86_expand_convert_uns_sixf_sse):
        New function prototype.
        * config/i386/i386.c (ix86_expand_convert_uns_sixf_sse): New
        unreachable function to ease macroization of insn patterns.

The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu 
{,-m32}, patch is committed to SVN.

RMs, Do we want this patch in 4.3.1, although it isn't strictly a 
regression?

Uros.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p.diff.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20080321/c922e916/attachment.txt>