[Committed] Don't hoist FP constants on x87

Uros Bizjak ubizjak@gmail.com
Thu Aug 17 13:31:00 GMT 2006


Hello Roger!

This is a followup on your commit from  Feb. 2006
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01584.html, which is
based on your description "[RFC/RFT] Should we hoist FP constants on
x87?", http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01859.html

There is a performance regression PR target/21676
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676), where in comment
#8 a regression can be found due to

	* gcse.c (want_to_gcse_p): On STACK_REGS targets, look through
	constant pool references to identify stack mode constants.

For the testcase in PR, "sum" variable is no more CSE'd from both arms
of an if instruction.

For reference, this is the relevant tree code:

<L1>:;
  r.0 = (unsigned int) r;
  D.1556 = r.0 * 4;
  rowR = *((int *) D.1556 + row);
  rowRp1 = *((int *) D.1556 + row + 4B);
  if (rowR < rowRp1) goto <L41>; else goto <L42>;

<L42>:;
  sum = 0.0;
  goto <bb 5> (<L4>);

<L41>:;
  i = rowR;
  sum = 0.0;

Compiling with -mfmpmath=sse (non STACK_REGS) target, we get:

.L8:
        movl 20(%ebp), %edx
        movapd  %xmm2, %xmm1
        movl (%edx,%ebx,4), %eax
        movl 4(%edx,%ebx,4), %ecx
        cmpl %ecx, %eax
        jge .L11
        movl %eax, %edx
        .p2align 4,,7
.L12:

However, due to change to gcse.c, mfpmath=387 generates unoptimized code:

.L8:
        movl    20(%ebp), %edx
        fldz
        movl    (%edx,%ebx,4), %eax
        movl    4(%edx,%ebx,4), %ecx
        cmpl    %ecx, %eax
        jge     .L11
        fstp    %st(0)                    <<<< here
        movl    %eax, %edx
        fldz                                    <<<< and here
        .p2align 4,,7
.L12:

Backing out the change to gcse.c, following code is produced:

        fldz
.L8:
        movl    20(%ebp), %edx
        fld     %st(0)
        movl    (%edx,%ebx,4), %eax
        movl    4(%edx,%ebx,4), %ecx
        cmpl    %ecx, %eax
        jge     .L11
        movl    %eax, %edx
        .p2align 4,,7
.L12:

Timings show consistent gain in the later case, 0m2.557s vs. 0m2.613s
for the testcase from PR 21676.

Uros.



More information about the Gcc-patches mailing list