This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH] [Bug middle-end/33187] Missed cmove opportunity

From: Uros Bizjak <ubizjak at gmail dot com>
To: GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Mon, 27 Aug 2007 22:20:01 +0200
Subject: [PATCH] [Bug middle-end/33187] Missed cmove opportunity

Hello!

Attached patch teaches combine pass to handle load + float_extend sequences, produces by compress_float_constant optimization. Combine tried to fully simplify float_extend constant load, but failed, because direct constant load is usually not supported. Non-simplified load + float_extend sequences blocked cmove ifcvt conversion.

In following testcase (-O2 -march=i686 -ffast-math):

--cut here--
double sgn (double __x)
{
 return __x >= 0.0 ? 1.0 : -1.0;
}
--cut here--

combine tried to combine:

(insn 11 10 12 3 cmov7.c:14 (set (reg:SF 63) (mem/u/c/i:SF (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [3 S4 A32])) 97 {* movsf_1} (expr_list:REG_EQUAL (const_double:SF -1.0e+0 [-0x0.8p+1]) (nil)))

(insn 12 11 36 3 cmov7.c:14 (set (reg:DF 58 [ D.1658 ]) (float_extend:DF (reg:SF 63))) 134 {*extendsfdf2_i387} (expr_list:REG_DE AD (reg:SF 63) (expr_list:REG_EQUAL (const_double:DF -1.0e+0 [-0x0.8p+1]) (nil))))

directly into:

Failed to match this instruction:
(set (reg:DF 58 [ D.1658 ])
   (const_double:DF -1.0e+0 [-0x0.8p+1]))

With attached patch, combine generates:

Successfully matched this instruction: (set (reg:DF 61) (float_extend:DF (mem/u/c/i:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [3 S4 A32])))

This insn is further simplified in the post-reload split pass into:

(insn 51 50 41 2 cmov7.c:14 (set (reg:DF 9 st(1) [67])
       (const_double:DF 1.0e+0 [0x0.8p+1])) 101 {*movdf_nointeger} (nil))

and generates expected fld1 x87 asm insn. However, in ifcvt after combie pass, this patch enables generation of fcmov insn, producing:

sgn:
       fldl    4(%esp)
       fldz
       fcomip  %st(1), %st
       fstp    %st(0)
       fld1
       fchs
       fld1
       fcmovnbe        %st(1), %st
       fstp    %st(1)
       ret

In further checkings, this patch reduced povray-3.6.1 size for 2728 bytes and the test execution time was 0.38% shorter (using -O3 -mfpmath=387 -ffast-math on x86_64). Number of fcmov insns raised from 1583 to 1679.

2007-08-27 Uros Bizjak <ubizjak@gmail.com>

       PR middle-end/33187
       * combine.c (subst): Do not try to simplify X if it represents load
       of FP constant from the constant pool via float extension.

testsuite/ChangeLog:

2007-08-27 Uros Bizjak <ubizjak@gmail.com>

       PR middle-end/33187
       * gcc.target/i386/cmov7.c: New file.

Patch was bootstrapped and regression tested on x86_64 with and without -m32. OK for mainline?

Uros.

Index: testsuite/gcc.target/i386/cmov7.c
===================================================================
--- testsuite/gcc.target/i386/cmov7.c	(revision 0)
+++ testsuite/gcc.target/i386/cmov7.c	(revision 0)
@@ -0,0 +1,15 @@
+/* PR middle-end/33187 */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=k8 -ffast-math -mfpmath=387" } */
+/* { dg-final { scan-assembler "fcmov" } } */
+
+/* compress_float_constant generates load + float_extend
+   sequence which combine pass failed to combine into
+   (set (reg:DF) (float_extend:DF (mem:SF (symbol_ref...)))).  */
+
+double
+sgn (double __x)
+{
+  return __x >= 0.0 ? 1.0 : -1.0;
+}
Index: combine.c
===================================================================
--- combine.c	(revision 127835)
+++ combine.c	(working copy)
@@ -4478,6 +4478,18 @@ subst (rtx x, rtx from, rtx to, int in_d
 	}
     }
 
+  /* Check if we are loading something from the constant pool via float
+     extension; in this case we would undo compress_float_constant
+     optimization and degenerate constant load to an immediate value.  */
+  if (GET_CODE (x) == FLOAT_EXTEND
+      && MEM_P (XEXP (x, 0))
+      && MEM_READONLY_P (XEXP (x, 0)))
+    {
+      rtx tmp = avoid_constant_pool_reference (x);
+      if (x != tmp)
+	return x;
+    }
+
   /* Try to simplify X.  If the simplification changed the code, it is likely
      that further simplification will help, so loop, but limit the number
      of repetitions that will be performed.  */

Follow-Ups:
- Re: [PATCH] [Bug middle-end/33187] Missed cmove opportunity
  - From: Andrew Pinski

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]