This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Disable only appropriate x87 builtins for -mfpmath=sse


Roger Sayle wrote:

My long term thoughts, though it shouldn't affect this patch, are that
we eventually want to represent all x87 math as XFmode operations,
but my guess is that this will be implemented via the RTL/md expanders.
Your approach of disabling the SFmode define_expands fits this well.



I have already implemented x87 fancy math in XFmode. The attached patch (trigonometric functions only) has been boostrapped on pentium4-pc-linux-gnu and regtested c,c++. PovRay was built and its output has been checked for correctnes. It moves fancy math in its own file and basically, only ChangeLog is missing ...

However, there are some problems with all-XFmode approach. Consider this testcase:

int testf (float a, float b) {
 printf("==FLOAT==\n");
 printf("tan = %f\n", tanf(a));
 printf("sin = %f\n", sinf(a));
 printf("cos = %f\n", cosf(a));

 printf("atan2 = %f\n", atan2f(b, a));
 printf("atan = %f\n", atanf(b));
 printf("asin = %f\n", asinf(b));
 printf("acos = %f\n", acosf(b));
}

===cut here===
The problem here is, that optimizers (CSE) will produce this sequence:

(insn:HI 16 14 133 (set (reg:XF 8 st)
(float_extend:XF (mem/i:SF (plus:SI (reg/f:SI 6 bp)
(const_int 8 [0x8])) [3 a+0 S4 A32]))) 91 {*extendsfxf2_i387} (insn_list:REG_DEP_TRUE 6 (nil))
(nil))


(insn 133 16 105 (set (reg:XF 8 st)
       (reg:XF 8 st)) 72 {*movxf_integer} (nil)
   (nil))

(insn 105 133 17 (set (mem:XF (plus:SI (reg/f:SI 6 bp)
               (const_int -24 [0xffffffe8])) [11 S12 A8])
       (reg:XF 8 st)) 72 {*movxf_integer} (nil)
   (expr_list:REG_DEAD (reg:XF 8 st)
       (nil)))

(insn:HI 17 105 134 (parallel [
           (set (reg:XF 8 st)
               (unspec:XF [
                       (reg:XF 8 st)
                   ] 82))
           (set (reg:XF 9 st(1))
               (unspec:XF [
                       (reg:XF 8 st)
                   ] 83))
       ]) 421 {*tanxf3} (insn_list:REG_DEP_TRUE 16 (nil))
   (nil))
===cut here===

First problem is in XFmode temporaries. To save them to XFmode memory, a sequence of (insn 133) and (insn 105) is needed. There is no non-popping XFmode move to memory, but this can be solved by avoiding memory temporaries (this is the actual problem in PR rtl-optimization/8126). Also note that XFmode access to memory is more expensive than {D,S}Fmode access.

However, extend* patterns are more problematic. CSE does not know that extending a {S,D}Fmode to XFmode doesn't cost anything, as it happens automatically during load. So it groups all SFMode -> XFmode extends into one (insn 16 in above example) and then saves resulting XFmode temporary into memory. This results in quite unoptimized code:
...
flds 8(%ebp) # a <- loading in SFmode
movl $.LC1, (%esp) #,
fld %st(0) # <- dummy load to compensate following insn
fstpt -24(%ebp) # <- popping store in XFmode
fptan
fstp %st(0) #
fstpl 4(%esp) #
call printf #
fldt -24(%ebp) # <- load CSE'd XFmode temporary
movl $.LC2, (%esp) #,
fsincos
fstpt -40(%ebp) # <- saving result in XFmode
fstpl 4(%esp) # <- DFmode is needed by printf
call printf #
fldt -40(%ebp) # <- loading result in XFmode
movl $.LC3, (%esp) #,
fstpl 4(%esp) # <- but only DFmode is needed
call printf #
...


These problems can be solved by some kind of "mode propagation pass", to avoid storing XFmode values, when in fact DFmode values are needed by following insn. OTOH, the RTL code produced is in fact correct model of what x87 really calculates.

Uros.


;; GCC machine description for i387 fancy math instructions
;; Copyright (C) 2005
;; Free Software Foundation, Inc.
;;
;; This file is part of GCC.
;;
;; GCC is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 2, or (at your option)
;; any later version.
;;
;; GCC is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GCC; see the file COPYING.  If not, write to
;; the Free Software Foundation, 59 Temple Place - Suite 330,
;; Boston, MA 02111-1307, USA.


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; Trigonometric patterns
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(define_insn "*tanxf3"
  [(set (match_operand:XF 0 "register_operand" "=f")
	(unspec:XF [(match_operand:XF 2 "register_operand" "0")]
		   UNSPEC_TAN_ONE))
   (set (match_operand:XF 1 "register_operand" "=u")
        (unspec:XF [(match_dup 2)] UNSPEC_TAN_TAN))]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
  "fptan"
  [(set_attr "type" "fpspc")
   (set_attr "mode" "XF")])

(define_expand "tanxf2"
  [(parallel [(set (match_dup 2)
		   (unspec:XF [(match_operand:XF 1 "register_operand" "")]
			      UNSPEC_TAN_ONE))
	      (set (match_operand:XF 0 "register_operand" "")
		   (unspec:XF [(match_dup 1)] UNSPEC_TAN_TAN))])]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
{
  operands[2] = gen_reg_rtx (XFmode);
})

(define_expand "tandf2"
  [(use (match_operand:DF 0 "register_operand" ""))
   (use (match_operand:DF 1 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!(TARGET_SSE2 && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);

  emit_insn (gen_extenddfxf2 (op1, operands[1]));
  emit_insn (gen_tanxf2 (op0, op1));

  emit_insn (gen_truncxfdf2_i387_noop (operands[0], op0));
  DONE;
})

(define_expand "tansf2"
  [(use (match_operand:SF 0 "register_operand" ""))
   (use (match_operand:SF 1 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!TARGET_SSE_MATH || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);

  emit_insn (gen_extendsfxf2 (op1, operands[1]));
  emit_insn (gen_tanxf2 (op0, op1));

  emit_insn (gen_truncxfsf2_i387_noop (operands[0], op0));
  DONE;
})

(define_insn "*sinxf2"
  [(set (match_operand:XF 0 "register_operand" "=f")
	(unspec:XF [(match_operand:XF 1 "register_operand" "0")] UNSPEC_SIN))]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
  "fsin"
  [(set_attr "type" "fpspc")
   (set_attr "mode" "XF")])

(define_insn "*cosxf2"
  [(set (match_operand:XF 0 "register_operand" "=f")
	(unspec:XF [(match_operand:XF 1 "register_operand" "0")] UNSPEC_COS))]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
  "fcos"
  [(set_attr "type" "fpspc")
   (set_attr "mode" "XF")])

;; When sincos pattern is defined then sin and cos builtin function
;; will be expanded to sincos pattern with one of its outputs left unused. 
;; CSE pass will detect if two sincos patterns can be combined,
;; otherwise sincos pattern will be split back to sin or cos pattern,
;; depending on its unused output.

(define_insn "sincosxf3"
  [(set (match_operand:XF 0 "register_operand" "=f")
	(unspec:XF [(match_operand:XF 2 "register_operand" "0")]
		   UNSPEC_SINCOS_COS))
   (set (match_operand:XF 1 "register_operand" "=u")
        (unspec:XF [(match_dup 2)] UNSPEC_SINCOS_SIN))]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
  "fsincos"
  [(set_attr "type" "fpspc")
   (set_attr "mode" "XF")])

(define_split
  [(set (match_operand:XF 0 "register_operand" "")
	(unspec:XF [(match_operand:XF 2 "register_operand" "")]
		   UNSPEC_SINCOS_COS))
   (set (match_operand:XF 1 "register_operand" "")
	(unspec:XF [(match_dup 2)] UNSPEC_SINCOS_SIN))]
  "find_regno_note (insn, REG_UNUSED, REGNO (operands[0]))
   && !(reload_completed || reload_in_progress)"
  [(set (match_dup 1) (unspec:XF [(match_dup 2)] UNSPEC_SIN))]
  "")

(define_split
  [(set (match_operand:XF 0 "register_operand" "")
	(unspec:XF [(match_operand:XF 2 "register_operand" "")]
		   UNSPEC_SINCOS_COS))
   (set (match_operand:XF 1 "register_operand" "")
	(unspec:XF [(match_dup 2)] UNSPEC_SINCOS_SIN))]
  "find_regno_note (insn, REG_UNUSED, REGNO (operands[1]))
   && !(reload_completed || reload_in_progress)"
  [(set (match_dup 0) (unspec:XF [(match_dup 2)] UNSPEC_COS))]
  "")

(define_expand "sincosdf3"
  [(use (match_operand:DF 0 "register_operand" ""))
   (use (match_operand:DF 1 "register_operand" ""))
   (use (match_operand:DF 2 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!(TARGET_SSE2 && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);
  rtx op2 = gen_reg_rtx (XFmode);

  emit_insn (gen_extenddfxf2 (op2, operands[2]));
  emit_insn (gen_sincosxf3 (op0, op1, op2));

  emit_insn (gen_truncxfdf2_i387_noop (operands[0], op0));
  emit_insn (gen_truncxfdf2_i387_noop (operands[1], op1));
  DONE;
})

(define_expand "sincossf3"
  [(use (match_operand:SF 0 "register_operand" ""))
   (use (match_operand:SF 1 "register_operand" ""))
   (use (match_operand:SF 2 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!TARGET_SSE_MATH || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);
  rtx op2 = gen_reg_rtx (XFmode);

  emit_insn (gen_extendsfxf2 (op2, operands[2]));
  emit_insn (gen_sincosxf3 (op0, op1, op2));

  emit_insn (gen_truncxfsf2_i387_noop (operands[0], op0));
  emit_insn (gen_truncxfsf2_i387_noop (operands[1], op1));
  DONE;
})

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; Inverse trigonometric patterns
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(define_insn "*atan2xf3_1"
  [(set (match_operand:XF 0 "register_operand" "=f")
        (unspec:XF [(match_operand:XF 1 "register_operand" "0")
	            (match_operand:XF 2 "register_operand" "u")]
	           UNSPEC_FPATAN))
   (clobber (match_scratch:XF 3 "=2"))]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
  "fpatan"
  [(set_attr "type" "fpspc")
   (set_attr "mode" "XF")])

(define_expand "atan2xf3"
  [(parallel [(set (match_operand:XF 0 "register_operand" "")
		   (unspec:XF [(match_operand:XF 2 "register_operand" "")
			       (match_operand:XF 1 "register_operand" "")]
			      UNSPEC_FPATAN))
	      (clobber (match_scratch:XF 3 ""))])]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
  "")

(define_expand "atan2df3"
  [(use (match_operand:DF 0 "register_operand" ""))
   (use (match_operand:DF 1 "register_operand" ""))
   (use (match_operand:DF 2 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!(TARGET_SSE2 && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);
  rtx op2 = gen_reg_rtx (XFmode);

  emit_insn (gen_extenddfxf2 (op1, operands[1]));
  emit_insn (gen_extenddfxf2 (op2, operands[2]));
  emit_insn (gen_atan2xf3 (op0, op1, op2));

  emit_insn (gen_truncxfdf2_i387_noop (operands[0], op0));
  DONE;
})

(define_expand "atan2sf3"
  [(use (match_operand:SF 0 "register_operand" ""))
   (use (match_operand:SF 1 "register_operand" ""))
   (use (match_operand:SF 2 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!TARGET_SSE_MATH || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);
  rtx op2 = gen_reg_rtx (XFmode);

  emit_insn (gen_extendsfxf2 (op1, operands[1]));
  emit_insn (gen_extendsfxf2 (op2, operands[2]));
  emit_insn (gen_atan2xf3 (op0, op1, op2));

  emit_insn (gen_truncxfsf2_i387_noop (operands[0], op0));
  DONE;
})

(define_expand "atanxf2"
  [(parallel [(set (match_operand:XF 0 "register_operand" "")
		   (unspec:XF [(match_dup 2)
			       (match_operand:XF 1 "register_operand" "")]
			      UNSPEC_FPATAN))
   	      (clobber (match_scratch:XF 3 ""))])]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
{
  operands[2] = gen_reg_rtx (XFmode);

  emit_move_insn (operands[2], CONST1_RTX (XFmode));  /* fld1 */
})

(define_expand "atandf2"
  [(use (match_operand:DF 0 "register_operand" ""))
   (use (match_operand:DF 1 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!(TARGET_SSE2 && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);

  emit_insn (gen_extenddfxf2 (op1, operands[1]));
  emit_insn (gen_atanxf2 (op0, op1));

  emit_insn (gen_truncxfdf2_i387_noop (operands[0], op0));
  DONE;
})

(define_expand "atansf2"
  [(use (match_operand:SF 0 "register_operand" ""))
   (use (match_operand:SF 1 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!TARGET_SSE_MATH || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);

  emit_insn (gen_extendsfxf2 (op1, operands[1]));
  emit_insn (gen_atanxf2 (op0, op1));

  emit_insn (gen_truncxfsf2_i387_noop (operands[0], op0));
  DONE;
})

(define_expand "asinxf2"
  [(set (match_dup 2)
	(mult:XF (match_operand:XF 1 "register_operand" "")
		 (match_dup 1)))
   (set (match_dup 4) (minus:XF (match_dup 3) (match_dup 2)))
   (set (match_dup 5) (sqrt:XF (match_dup 4)))
   (parallel [(set (match_operand:XF 0 "register_operand" "")
        	   (unspec:XF [(match_dup 5) (match_dup 1)]
			      UNSPEC_FPATAN))
   	      (clobber (match_scratch:XF 6 ""))])]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
{
  int i;

  for (i=2; i<6; i++)
    operands[i] = gen_reg_rtx (XFmode);

  emit_move_insn (operands[3], CONST1_RTX (XFmode));  /* fld1 */
})

(define_expand "asindf2"
  [(use (match_operand:DF 0 "register_operand" ""))
   (use (match_operand:DF 1 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!(TARGET_SSE2 && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);

  emit_insn (gen_extenddfxf2 (op1, operands[1]));
  emit_insn (gen_asinxf2 (op0, op1));

  emit_insn (gen_truncxfdf2_i387_noop (operands[0], op0));
  DONE;
})

(define_expand "asinsf2"
  [(use (match_operand:SF 0 "register_operand" ""))
   (use (match_operand:SF 1 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!TARGET_SSE_MATH || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);

  emit_insn (gen_extendsfxf2 (op1, operands[1]));
  emit_insn (gen_asinxf2 (op0, op1));

  emit_insn (gen_truncxfsf2_i387_noop (operands[0], op0));
  DONE;
})

(define_expand "acosxf2"
  [(set (match_dup 2)
	(mult:XF (match_operand:XF 1 "register_operand" "")
		 (match_dup 1)))
   (set (match_dup 4) (minus:XF (match_dup 3) (match_dup 2)))
   (set (match_dup 5) (sqrt:XF (match_dup 4)))
   (parallel [(set (match_operand:XF 0 "register_operand" "")
        	   (unspec:XF [(match_dup 1) (match_dup 5)]
			      UNSPEC_FPATAN))
   	      (clobber (match_scratch:XF 6 ""))])]
  "TARGET_USE_FANCY_MATH_387
   && flag_unsafe_math_optimizations"
{
  int i;

  for (i=2; i<6; i++)
    operands[i] = gen_reg_rtx (XFmode);

  emit_move_insn (operands[3], CONST1_RTX (XFmode));  /* fld1 */
})

(define_expand "acosdf2"
  [(use (match_operand:DF 0 "register_operand" ""))
   (use (match_operand:DF 1 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!(TARGET_SSE2 && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);

  emit_insn (gen_extenddfxf2 (op1, operands[1]));
  emit_insn (gen_acosxf2 (op0, op1));

  emit_insn (gen_truncxfdf2_i387_noop (operands[0], op0));
  DONE;
})

(define_expand "acossf2"
  [(use (match_operand:SF 0 "register_operand" ""))
   (use (match_operand:SF 1 "register_operand" ""))]
  "TARGET_USE_FANCY_MATH_387
   && (!TARGET_SSE_MATH || TARGET_MIX_SSE_I387)
   && flag_unsafe_math_optimizations"
{
  rtx op0 = gen_reg_rtx (XFmode);
  rtx op1 = gen_reg_rtx (XFmode);

  emit_insn (gen_extendsfxf2 (op1, operands[1]));
  emit_insn (gen_acosxf2 (op0, op1));

  emit_insn (gen_truncxfsf2_i387_noop (operands[0], op0));
  DONE;
})

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]