This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Fix PR63743: Incorrect ordering of operands in sequence of commutative operations


Hi,

Improved canonization after r216728 causes GCC to more often generate poor code due to suboptimal ordering of operand of commutative libcalls. Indeed, if one of the operands of a commutative operation is the result of a previous operation, both being implemented by libcall, the wrong ordering of the operands in the second operation can lead to extra mov. Consider the following case on softfloat targets:

double
test1 (double x, double y)
{
  return x * (x + y);
}

If x + y is put in the operand using the same register as the result of the libcall for x + y then no mov is generated, otherwise mov is needed. The following happens on arm softfloat with the right ordering:

bl      __aeabi_dadd
ldr     r2, [sp]
ldr     r3, [sp, #4]
/* r0, r1 are reused from the return values of the __aeabi_dadd libcall. */
bl      __aeabi_dmul

With the wrong ordering one gets:

bl      __aeabi_dadd
mov     r2, r0
mov     r3, r1
ldr     r0, [sp]
ldr     r1, [sp, #4]
bl      __aeabi_dmul

This patch extend the patch written by Yuri Rumyantsev in r219646 to also deal with the case of only one of the operand being the result of an operation. ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2015-03-05  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/63743
        * cfgexpand.c (reorder_operands): Also reorder if only second operand
        had its definition forwarded by TER.

*** gcc/testsuite/ChangeLog ***

2015-03-05  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/63743
        * gcc.dg/pr63743.c: New test.


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 7dfe1f6..4fbc037 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5117,13 +5117,11 @@ reorder_operands (basic_block bb)
 	continue;
       /* Swap operands if the second one is more expensive.  */
       def0 = get_gimple_for_ssa_name (op0);
-      if (!def0)
-	continue;
       def1 = get_gimple_for_ssa_name (op1);
       if (!def1)
 	continue;
       swap = false;
-      if (lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
+      if (!def0 || lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
 	swap = true;
       if (swap)
 	{
@@ -5132,7 +5130,7 @@ reorder_operands (basic_block bb)
 	      fprintf (dump_file, "Swap operands in stmt:\n");
 	      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 	      fprintf (dump_file, "Cost left opnd=%d, right opnd=%d\n",
-		       lattice[gimple_uid (def0)],
+		       def0 ? lattice[gimple_uid (def0)] : 0,
 		       lattice[gimple_uid (def1)]);
 	    }
 	  swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
diff --git a/gcc/testsuite/gcc.dg/pr63743.c b/gcc/testsuite/gcc.dg/pr63743.c
new file mode 100644
index 0000000..87254ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr63743.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-expand-details" } */
+
+double
+libcall_dep (double x, double y)
+{
+  return x * (x + y);
+}
+
+/* { dg-final { scan-rtl-dump-times "Swap operands" 1 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */


Testsuite was run in QEMU when compiled by an arm-none-eabi GCC cross-compiler targeting Cortex-M3 and a bootstrapped x86_64 native GCC without any regression. 
CSiBE sees a -0.5034% code size decrease on arm-none-eabi and a 0.0058% code size increase on x86_64-linux-gnu.

Is it ok for trunk (since it fixes a code size regression in 5.0)?

Best regards,

Thomas




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]