[PATCH, i386]: Fix PR target/26915: -1 should be loaded as fld1;fchs
Uros Bizjak
ubizjak@gmail.com
Sat Nov 4 19:49:00 GMT 2006
Roger Sayle wrote:
>> PR target/26915
>> * config/i386/i386.c (standard_80387_constant_p): When optimizing
>> for size, treat -1 as a valid 80387 constant.
>>
>>
>Cool. PR26915 was one of the PRs on my list. This is Ok for mainline.
>
>Two minor things though. The first is that I believe that a near
>identical fix should also allow us to handle -0.0 (not very common
>but should keep the IEEE conscious people happy).
>
Indeed! In the revised patch, handling of -0.0 is also included.
> I'll pre-approve
>either a follow-up or a revised version of your patch with that
>change. Secondly, have you done any benchmarking with this change
>enabled for !optimize_size? Although it's clearly a size win, there's
>also the possibility that avoiding the memory traffic, and reducing
>code size may be faster on some cores. If the numbers are in the noise,
>we should leave the optimize_size tests in, but we should at least
>check. Perhaps H.J., Evandro or Honza might know the answer?
>
>
Yes, I have some suprising numbers. Consider this testcase:
--cut here--
int main()
{
int i;
double sum = 0.0;
for (i = 0; i < 1000000000; i++)
sum += (i & 0x01) ? 1.0 : -1.0;
printf("%f\n", sum);
return 0;
}
--cut here--
When the test is compiled with unpatched gcc (-m32 -Os):
user 0m3.069s
and when this test is compiled with patched gcc (-m32 -Os):
user 0m2.511s
This is on:
vendor_id : AuthenticAMD
cpu family : 15
model : 47
model name : AMD Athlon(tm) 64 Processor 3000+
stepping : 2
cpu MHz : 1809.287
cache size : 512 KB
The difference between assembler dumps is:
--- bench.s_ 2006-11-04 20:03:02.000000000 +0100
+++ bench.s 2006-11-04 20:03:20.000000000 +0100
@@ -18,7 +18,8 @@
.L2:
testb $1, %al
jne .L3
- flds .LC1
+ fld1
+ fchs
jmp .L5
.L3:
fld1
@@ -37,9 +38,5 @@
leal -4(%ecx), %esp
ret
.size main, .-main
- .section .rodata.cst4,"aM",@progbits,4
- .align 4
-.LC1:
- .long 3212836864
- .ident "GCC: (GNU) 4.3.0 20061103 (experimental)"
+ .ident "GCC: (GNU) 4.3.0 20061104 (experimental)"
.section .note.GNU-stack,"",@progbits
I think that we can safely enable this optimization for other
optimization levels.
Attached is a new revision of patch. It was bootstrapped on
x86_64-pc-linux-gnu, regression test is in progress.
If it is still OK, I'll commit this patch to mainline, as soon as
regression test finish (a couple of hours).
2006-11-04 Uros Bizjak <ubizjak@gmail.com>
PR target/26915
* config/i386/i386.c (standard_80387_constant_p): Treat -0.0 and
-1.0
as a valid 80387 constant.
(standard_80387_constant_opcode): Return "#" for -0.0 and -1.0.
* config/i386/i386.md (unnamed splitter): Split the load of
constant -0.0 or -1.0 into the load of 0.0 or 1.0, followed
by negation.
testsuite/ChangeLog:
2006-11-04 Uros Bizjak <ubizjak@gmail.com>
PR target/26915
* gcc.target/i386/387-12.c: New test.
BTW: Unfortunatelly, we split too late for gcse-after-reload to
eliminate one fld1:
.L2:
testb $1, %al
jne .L3
fld1
fchs
jmp .L5
.L3:
fld1
.L5:
There is nothing that can be done in this case, but to move post-reload
split before gcse-after-reload pass. This would result in:
.L2:
testb $1, %al
fld1
jne .L3
fchs
.L3:
Uros.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: i386-fldm1-2.diff
Type: text/x-patch
Size: 3011 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20061104/df275332/attachment.bin>
More information about the Gcc-patches
mailing list