29042 – [4.2 Regression] Useless floating-point stores and loads on x86

Bug 29042 - [4.2 Regression] Useless floating-point stores and loads on x86

Summary: [4.2 Regression] Useless floating-point stores and loads on x86

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.2.0

Importance:	P2 normal
Target Milestone:	4.3.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2006-09-12 20:28 UTC by Guillaume Melquiond
Modified:	2009-03-30 19:37 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:	4.3.0
Known to fail:	4.0.4 4.2.5
Last reconfirmed:	2007-01-15 19:42:39

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Guillaume Melquiond 2006-09-12 20:28:23 UTC

This is the same testcase as PR26778. This bug is marked as resolved, and the patch indeed prevents GCC from using useless mmx registers. Concerning integer operations, the generated assembly got even better than GCC 3.4, as the values are directly incremented in memory instead of being loaded and stored; no callee-save register is used anymore. Unfortunately, there is still a regression with respect to the floating-point stack.

Testcase compiled with -march=pentium3 -O3:

typedef union {
  long long l;
  double d;
} db_number;

double test(double x[3]) {
  double th = x[1] + x[2];
  if (x[2] != th - x[1]) {
    db_number thdb;
    thdb.d = th;
    thdb.l++;
    th = thdb.d;
  }
  return x[0] + th;
}

It may be clearer with a unified diff between the assembly code generated by 3.4.6 and the one by 4.2.0 (svn 2006-09-12). Note: the assembly code for 3.4 was edited by hand in order to reduce the noise due to mismatching integer registers.

	pushl	%ebp
	movl	%esp, %ebp
	subl	$16, %esp
	movl	8(%ebp), %eax
	fldl	8(%eax)
        fldl    16(%eax)
        fld     %st(1)
        fadd    %st(1), %st
+       fstl    -8(%ebp)
        fsub    %st, %st(2)
        fxch    %st(1)
        fucomip %st(2), %st
        fstp    %st(1)
        jp      .L7
-       je      .L2
+       je      .L9
 .L7:
        fstpl   -16(%ebp)
        addl    $1, -16(%ebp)
        adcl    $0, -12(%ebp)
        fldl    -16(%ebp)
+       fstpl   -8(%ebp)
+       jmp     .L2
+       .p2align 4,,7
+.L9:
+       fstp    %st(0)
+       .p2align 4,,15
 .L2:
+       fldl    -8(%ebp)
        faddl   (%eax)
        leave
        ret

The 3.4 code never stores the value of th; it is kept at the top of the floating-point stack. In my opinion, this is optimal. This is no longer the case with 4.2 code. The value of "th" is stored in -8(%ebp). Then, on one branch (L7), it is overwritten with the content of -16(%ebp). And on the other branch (L9), the value is discarded from the top of the stack and then immediatly (L2) reloaded from memory. Each line prefixed by + is useless: if none is present, the code will still behave correctly and it will contain five assembly instructions less.

Comment 1 Andrew Pinski 2006-10-30 02:17:03 UTC

Confirmed, this regression was caused by the removal of ADDRESSOF.

Comment 2 Gabriel Dos Reis 2007-02-03 20:05:19 UTC

Won't fix in GCC-4.0.x.  Adjusting milestone.

Comment 3 Andrew Pinski 2007-06-18 05:29:37 UTC

This has now been fixed on the trunk.  I think by the dataflow branch merge.

Comment 4 Joseph S. Myers 2008-07-04 21:31:10 UTC

Closing 4.1 branch.

Comment 5 Joseph S. Myers 2009-03-30 19:37:28 UTC

Closing 4.2 branch, fixed in 4.3.