[PATCH, i386]: Fix PR target/26915: -1 should be loaded as fld1;fchs

Uros Bizjak ubizjak@gmail.com
Sat Nov 4 19:49:00 GMT 2006


Roger Sayle wrote:

>>        PR target/26915
>>        * config/i386/i386.c (standard_80387_constant_p): When optimizing
>>        for size, treat -1 as a valid 80387 constant.
>>    
>>
>Cool.  PR26915 was one of the PRs on my list.  This is Ok for mainline.
>
>Two minor things though.  The first is that I believe that a near
>identical fix should also allow us to handle -0.0 (not very common
>but should keep the IEEE conscious people happy).
>
Indeed! In the revised patch, handling of -0.0 is also included.

>  I'll pre-approve
>either a follow-up or a revised version of your patch with that
>change.  Secondly, have you done any benchmarking with this change
>enabled for !optimize_size?  Although it's clearly a size win, there's
>also the possibility that avoiding the memory traffic, and reducing
>code size may be faster on some cores.  If the numbers are in the noise,
>we should leave the optimize_size tests in, but we should at least
>check.  Perhaps H.J., Evandro or Honza might know the answer?
>  
>
Yes, I have some suprising numbers. Consider this testcase:

--cut here--
int main()
{
  int i;
  double sum = 0.0;

  for (i = 0; i < 1000000000; i++)
    sum += (i & 0x01) ? 1.0 : -1.0;

  printf("%f\n", sum);

  return 0;
}
--cut here--

When the test is compiled with unpatched gcc (-m32 -Os):
user    0m3.069s
and when this test is compiled with patched gcc (-m32 -Os):
user    0m2.511s

This is on:
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 47
model name      : AMD Athlon(tm) 64 Processor 3000+
stepping        : 2
cpu MHz         : 1809.287
cache size      : 512 KB

The difference between assembler dumps is:
--- bench.s_    2006-11-04 20:03:02.000000000 +0100
+++ bench.s     2006-11-04 20:03:20.000000000 +0100
@@ -18,7 +18,8 @@
 .L2:
        testb   $1, %al
        jne     .L3
-       flds    .LC1
+       fld1
+       fchs
        jmp     .L5
 .L3:
        fld1
@@ -37,9 +38,5 @@
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
-       .section        .rodata.cst4,"aM",@progbits,4
-       .align 4
-.LC1:
-       .long   3212836864
-       .ident  "GCC: (GNU) 4.3.0 20061103 (experimental)"
+       .ident  "GCC: (GNU) 4.3.0 20061104 (experimental)"
        .section        .note.GNU-stack,"",@progbits

I think that we can safely enable this optimization for other 
optimization levels.

Attached is a new revision of patch. It was bootstrapped on 
x86_64-pc-linux-gnu, regression test is in progress.
If it is still OK, I'll commit this patch to mainline, as soon as 
regression test finish (a couple of hours).

2006-11-04  Uros Bizjak  <ubizjak@gmail.com>

        PR target/26915
        * config/i386/i386.c (standard_80387_constant_p): Treat -0.0 and 
-1.0
        as a valid 80387 constant.
        (standard_80387_constant_opcode): Return "#" for -0.0 and -1.0.
        * config/i386/i386.md (unnamed splitter): Split the load of
        constant -0.0 or -1.0  into the load of 0.0 or 1.0, followed
        by negation.

testsuite/ChangeLog:

2006-11-04  Uros Bizjak  <ubizjak@gmail.com>

        PR target/26915
        * gcc.target/i386/387-12.c: New test.

BTW: Unfortunatelly, we split too late for gcse-after-reload to 
eliminate one fld1:
.L2:
        testb   $1, %al
        jne     .L3
        fld1
        fchs
        jmp     .L5
.L3:
        fld1
.L5:

There is nothing that can be done in this case, but to move post-reload 
split before gcse-after-reload pass. This would result in:
.L2:
        testb   $1, %al
        fld1
        jne     .L3
        fchs
.L3:

Uros.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: i386-fldm1-2.diff
Type: text/x-patch
Size: 3011 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20061104/df275332/attachment.bin>


More information about the Gcc-patches mailing list