This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Fix PR target/13958:Conversion from unsigned to double is painfully slow on P4


On Fri, Mar 21, 2008 at 8:26 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> Hello!
>
>  Due to store forwarding penalty (this is how partial memory access is
>  called nowadays), the code from PR runs "painfully" slow:
>
>  --cut here--
>  unsigned a[2]={1,2};
>
>  inline unsigned foo1(int i) { return a[i]; }
>
>  int main()
>  {
>     double x=0;
>     int    i;
>
>     for ( i=0; i<100000000; ++i )
>         x+=foo1(i%2);
>
>     return (int)x;
>  }
>  --cut here--
>
>  The inner loop is compiled (-O2 -march=pentium4 -malign-double) to:
>
>  .L4:
>         movl    %ecx, %eax
>         andl    $1, %eax
>         movl    a(,%eax,4), %eax
>         xorl    %edx, %edx
>  (*)    pushl   %edx
>  (*)    pushl   %eax
>  (*)    fildll  (%esp)
>         addl    $8, %esp
>         faddp   %st, %st(1)
>         addl    $1, %ecx
>         cmpl    $100000000, %ecx
>         jne     .L4
>
>  Instructions marked with (*) form partial memory access.
>
>  Runtime:
>
>  time ./a.out
>
>  real    0m0.794s
>  user    0m0.724s
>  sys     0m0.000s
>
>  Patched gcc creates:
>
>  .L4:
>         movl    %edx, %eax
>         andl    $1, %eax
>         movd    a(,%eax,4), %xmm0
>         movq    %xmm0, -16(%ebp)
>         fildll  -16(%ebp)
>         faddp   %st, %st(1)
>         addl    $1, %edx
>         cmpl    $100000000, %edx
>         jne     .L4
>
>  time ./a.out
>
>  real    0m0.123s
>  user    0m0.124s
>  sys     0m0.000s
>
>  This represents more than 5.8x speedup on what is claimed as:
>
>  --quote--
>
>  Btw, such conversions are quite common in numerical codes that deal
>  with uniform grids: the array index can be used as a coordinate (usually
>  after some trivial scaling). Given that the indices used in libstdc++
>  are usually of the type size_t the slow conversion can have quite a
>  negative performance impact.
>
>  --unqoute--
>
>  I guess that such a speedup comes quite handy. This code prefers DImode
>  aligned to 8, since we are dealing with real DImode values. H.J. -
>  should we align DImode values to 8 for TARGET_MMX/TARGET_SSE ?
>
>  2008-03-21  Uros Bizjak  <ubizjak@gmail.com>
>
>         PR target/13958
>         * config/i386/i386.md ("*floatunssi<mode2>_1"): New pattern with
>         corresponding post-reload splitters.
>         ("floatunssi<mode>2"): Expand to unsigned_float x87 insn pattern
>         when x87 FP math is selected.
>         * config/i386/i386-protos.h (ix86_expand_convert_uns_sixf_sse):
>         New function prototype.
>         * config/i386/i386.c (ix86_expand_convert_uns_sixf_sse): New
>         unreachable function to ease macroization of insn patterns.
>
>  The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
>  {,-m32}, patch is committed to SVN.
>
>  RMs, Do we want this patch in 4.3.1, although it isn't strictly a
>  regression?

Does this only affect P4 as the PR states?  Does this have a measuable positive
impact on SPEC?

Otherwise in general no, not without overwhelming benefit.

Thanks,
Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]