This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Committed] Don't hoist FP constants on x87


Hello Roger!

This is a followup on your commit from  Feb. 2006
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01584.html, which is
based on your description "[RFC/RFT] Should we hoist FP constants on
x87?", http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01859.html

There is a performance regression PR target/21676
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676), where in comment
#8 a regression can be found due to

	* gcse.c (want_to_gcse_p): On STACK_REGS targets, look through
	constant pool references to identify stack mode constants.

For the testcase in PR, "sum" variable is no more CSE'd from both arms
of an if instruction.

For reference, this is the relevant tree code:

<L1>:;
 r.0 = (unsigned int) r;
 D.1556 = r.0 * 4;
 rowR = *((int *) D.1556 + row);
 rowRp1 = *((int *) D.1556 + row + 4B);
 if (rowR < rowRp1) goto <L41>; else goto <L42>;

<L42>:;
 sum = 0.0;
 goto <bb 5> (<L4>);

<L41>:;
 i = rowR;
 sum = 0.0;

Compiling with -mfmpmath=sse (non STACK_REGS) target, we get:

.L8:
       movl 20(%ebp), %edx
       movapd  %xmm2, %xmm1
       movl (%edx,%ebx,4), %eax
       movl 4(%edx,%ebx,4), %ecx
       cmpl %ecx, %eax
       jge .L11
       movl %eax, %edx
       .p2align 4,,7
.L12:

However, due to change to gcse.c, mfpmath=387 generates unoptimized code:

.L8:
       movl    20(%ebp), %edx
       fldz
       movl    (%edx,%ebx,4), %eax
       movl    4(%edx,%ebx,4), %ecx
       cmpl    %ecx, %eax
       jge     .L11
       fstp    %st(0)                    <<<< here
       movl    %eax, %edx
       fldz                                    <<<< and here
       .p2align 4,,7
.L12:

Backing out the change to gcse.c, following code is produced:

       fldz
.L8:
       movl    20(%ebp), %edx
       fld     %st(0)
       movl    (%edx,%ebx,4), %eax
       movl    4(%edx,%ebx,4), %ecx
       cmpl    %ecx, %eax
       jge     .L11
       movl    %eax, %edx
       .p2align 4,,7
.L12:

Timings show consistent gain in the later case, 0m2.557s vs. 0m2.613s
for the testcase from PR 21676.

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]