This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Committed] Don't hoist FP constants on x87
- From: "Uros Bizjak" <ubizjak at gmail dot com>
- To: "GCC Patches" <gcc-patches at gcc dot gnu dot org>
- Cc: "Roger Sayle" <roger at eyesopen dot com>
- Date: Thu, 17 Aug 2006 15:09:07 +0200
- Subject: Re: [Committed] Don't hoist FP constants on x87
Hello Roger!
This is a followup on your commit from Feb. 2006
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01584.html, which is
based on your description "[RFC/RFT] Should we hoist FP constants on
x87?", http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01859.html
There is a performance regression PR target/21676
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676), where in comment
#8 a regression can be found due to
* gcse.c (want_to_gcse_p): On STACK_REGS targets, look through
constant pool references to identify stack mode constants.
For the testcase in PR, "sum" variable is no more CSE'd from both arms
of an if instruction.
For reference, this is the relevant tree code:
<L1>:;
r.0 = (unsigned int) r;
D.1556 = r.0 * 4;
rowR = *((int *) D.1556 + row);
rowRp1 = *((int *) D.1556 + row + 4B);
if (rowR < rowRp1) goto <L41>; else goto <L42>;
<L42>:;
sum = 0.0;
goto <bb 5> (<L4>);
<L41>:;
i = rowR;
sum = 0.0;
Compiling with -mfmpmath=sse (non STACK_REGS) target, we get:
.L8:
movl 20(%ebp), %edx
movapd %xmm2, %xmm1
movl (%edx,%ebx,4), %eax
movl 4(%edx,%ebx,4), %ecx
cmpl %ecx, %eax
jge .L11
movl %eax, %edx
.p2align 4,,7
.L12:
However, due to change to gcse.c, mfpmath=387 generates unoptimized code:
.L8:
movl 20(%ebp), %edx
fldz
movl (%edx,%ebx,4), %eax
movl 4(%edx,%ebx,4), %ecx
cmpl %ecx, %eax
jge .L11
fstp %st(0) <<<< here
movl %eax, %edx
fldz <<<< and here
.p2align 4,,7
.L12:
Backing out the change to gcse.c, following code is produced:
fldz
.L8:
movl 20(%ebp), %edx
fld %st(0)
movl (%edx,%ebx,4), %eax
movl 4(%edx,%ebx,4), %ecx
cmpl %ecx, %eax
jge .L11
movl %eax, %edx
.p2align 4,,7
.L12:
Timings show consistent gain in the later case, 0m2.557s vs. 0m2.613s
for the testcase from PR 21676.
Uros.