This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Skip double->float conversions on x87 with -ffast-math


The following modifies the i386 backend's truncdfsf2 expanders such that
when compiling with flag_unsafe_math_optimizations, we implement double
to float conversions using a no-op move between FP registers.  During most
of GCC's RTL passes, the truncation is represented by a new set of
trunc?f?f2_noop patterns, which is then completely eliminated during
GCC's reg-stack pass.  This is identical to the way the extendsfdf2 is
implemented, and should explain why the get_true_reg tweak is needed.

As mentioned earlier to day in the "x87 float truncation/accuracy (gcc vs.
icc/msvc)" thread on gcc, ignoring double to float rounding (except when
storing/spilling to memory) is the default behaviour of Intel's and
Microsoft's compilers.

For the test case below, we now generate code that is significantly
faster than mainline when compiled with "-O2 -ffast-math", as we no
longer store the accumulator, "y", to memory and reload it back in on
each loop iteration.

float foo(float *x)
{
  int i;
  float y = 0.0;
  for (i=0; i<10; i++)
    y += 2.0*x[i];
  return y;
}


This patch also cures another GCC "feature" that's annoyed me for
sometime.  Consider the following code:

float a;
double b;

void foo() {
  a = b;
}

With "-O2 -fomit-frame-pointer", mainline currently generates this:


foo:    pushl   %edx
        fldl    b
        fstps   a
        popl    %eax
        ret

Note the curious push and pop.  It transpires that these instructions
exist because the i386's regular truncdfsf2 expander always requests a
four-byte stack slot.  This stack slot (frame) is never cleaned up even
if we subsequently discover this slot isn't actually required, if we
elide the floating point truncation with a store to memory.

As a side-effect of this patch, we now no longer request the stack slot
with -ffast-math, so instead "-O2 -fomit-frame-pointer -ffast-math"
generates the much prettier code:

foo:	fldl    b
        fstps   a
        ret


The following patch has been tested on i686-pc-linux-gnu with a complete
"make bootstrap", all languages except treelang, and regression tested
with a top-level "make -k check" (including acats) with no new failures.
Many thanks to Jan for confirming this was a reasonable approach.


Ok for mainline?


2004-03-18  Roger Sayle  <roger@eyesopen.com>

	* reg-stack.c (get_true_reg): Handle FLOAT_TRUNCATE.
	* config/i386/i386.md (truncdfsf2): If flag_unsafe_math_optimizations
	and TARGET_80387 expand using truncdfsf2_noop pattern.
	(truncxfsf2): Likewise using truncxfsf2_noop.
	(truncxfdf2): Likewise using truncxfdf2_noop.
	(truncdfsf2_noop, truncxfsf2_noop, truncxfdf2_noop): New patterns.


Index: reg-stack.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/reg-stack.c,v
retrieving revision 1.144
diff -c -3 -p -r1.144 reg-stack.c
*** reg-stack.c	3 Mar 2004 08:34:33 -0000	1.144
--- reg-stack.c	19 Mar 2004 01:00:30 -0000
*************** get_true_reg (rtx *pat)
*** 573,578 ****
--- 573,579 ----
        case FLOAT:
        case FIX:
        case FLOAT_EXTEND:
+       case FLOAT_TRUNCATE:
  	pat = & XEXP (*pat, 0);
        }
  }
Index: config/i386/i386.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v
retrieving revision 1.519
diff -c -3 -p -r1.519 i386.md
*** config/i386/i386.md	10 Mar 2004 22:36:13 -0000	1.519
--- config/i386/i386.md	19 Mar 2004 01:00:37 -0000
***************
*** 3653,3667 ****
  	      (clobber (match_dup 2))])]
    "TARGET_80387 || TARGET_SSE2"
    "
!    if (TARGET_80387)
!      operands[2] = assign_386_stack_local (SFmode, 0);
!    else
       {
  	emit_insn (gen_truncdfsf2_sse_only (operands[0], operands[1]));
  	DONE;
       }
  ")

  (define_insn "*truncdfsf2_1"
    [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f#rx,?r#fx,?x#rf")
  	(float_truncate:SF
--- 3653,3681 ----
  	      (clobber (match_dup 2))])]
    "TARGET_80387 || TARGET_SSE2"
    "
!    if (!TARGET_80387)
       {
  	emit_insn (gen_truncdfsf2_sse_only (operands[0], operands[1]));
  	DONE;
       }
+    else if (flag_unsafe_math_optimizations)
+      {
+ 	rtx reg = REG_P (operands[0]) ? operands[0] : gen_reg_rtx (SFmode);
+ 	emit_insn (gen_truncdfsf2_noop (reg, operands[1]));
+ 	if (reg != operands[0])
+ 	  emit_move_insn (operands[0], reg);
+ 	DONE;
+      }
+    else
+      operands[2] = assign_386_stack_local (SFmode, 0);
  ")

+ (define_insn "truncdfsf2_noop"
+   [(set (match_operand:SF 0 "register_operand" "=f")
+ 	(float_truncate:SF (match_operand:DF 1 "register_operand" "f")))]
+   "TARGET_80387 && flag_unsafe_math_optimizations"
+   "#")
+
  (define_insn "*truncdfsf2_1"
    [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f#rx,?r#fx,?x#rf")
  	(float_truncate:SF
***************
*** 3886,3892 ****
  		    (match_operand:XF 1 "register_operand" "")))
  	      (clobber (match_dup 2))])]
    "TARGET_80387"
!   "operands[2] = assign_386_stack_local (SFmode, 0);")

  (define_insn "*truncxfsf2_1"
    [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f#rx,?r#fx,?x#rf")
--- 3900,3923 ----
  		    (match_operand:XF 1 "register_operand" "")))
  	      (clobber (match_dup 2))])]
    "TARGET_80387"
!   "
!   if (flag_unsafe_math_optimizations)
!     {
!       rtx reg = REG_P (operands[0]) ? operands[0] : gen_reg_rtx (SFmode);
!       emit_insn (gen_truncxfsf2_noop (reg, operands[1]));
!       if (reg != operands[0])
! 	emit_move_insn (operands[0], reg);
!       DONE;
!     }
!   else
!     operands[2] = assign_386_stack_local (SFmode, 0);
!   ")
!
! (define_insn "truncxfsf2_noop"
!   [(set (match_operand:SF 0 "register_operand" "=f")
! 	(float_truncate:SF (match_operand:XF 1 "register_operand" "f")))]
!   "TARGET_80387 && flag_unsafe_math_optimizations"
!   "#")

  (define_insn "*truncxfsf2_1"
    [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f#rx,?r#fx,?x#rf")
***************
*** 3948,3954 ****
  		    (match_operand:XF 1 "register_operand" "")))
  	      (clobber (match_dup 2))])]
    "TARGET_80387"
!   "operands[2] = assign_386_stack_local (DFmode, 0);")

  (define_insn "*truncxfdf2_1"
    [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f#rY,?r#fY,?Y#rf")
--- 3979,4002 ----
  		    (match_operand:XF 1 "register_operand" "")))
  	      (clobber (match_dup 2))])]
    "TARGET_80387"
!   "
!   if (flag_unsafe_math_optimizations)
!     {
!       rtx reg = REG_P (operands[0]) ? operands[0] : gen_reg_rtx (DFmode);
!       emit_insn (gen_truncxfdf2_noop (reg, operands[1]));
!       if (reg != operands[0])
! 	emit_move_insn (operands[0], reg);
!       DONE;
!     }
!   else
!     operands[2] = assign_386_stack_local (DFmode, 0);
!   ")
!
! (define_insn "truncxfdf2_noop"
!   [(set (match_operand:DF 0 "register_operand" "=f")
! 	(float_truncate:DF (match_operand:XF 1 "register_operand" "f")))]
!   "TARGET_80387 && flag_unsafe_math_optimizations"
!   "#")

  (define_insn "*truncxfdf2_1"
    [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f#rY,?r#fY,?Y#rf")


Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]