This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH 14851] Partial fix for suboptimal fp division with -ffast-math

From: Uros Bizjak <uros at kss-loka dot si>
To: gcc-patches at gcc dot gnu dot org
Date: Mon, 20 Sep 2004 17:14:10 +0200
Subject: [PATCH 14851] Partial fix for suboptimal fp division with -ffast-math

Hello!

The testcase in rtl-optimization/14851 [1]:

double test(double x) {return x/x;}

currently produces something like x * (1.0  / x).
       pushl   %ebp
       movl    %esp, %ebp
       fldl    8(%ebp)
       fld1
       fdiv    %st(1), %st
       popl    %ebp
       fmulp   %st, %st(1)
       ret

This code is emitted because in expr.c, around line 7676:
     /* Emit a/b as a*(1/b).  Later we may manage CSE the reciprocal saving
        expensive divide.  If not, combine will rebuild the original
        computation.  */

Unfortunatelly, combine doesn't rebuild the original computation. Two bugs are present in current combine phase:

a) Missing "code = GET_CODE (op);"in commutative_operand_precedence() in rtlanal.c. If constants are not "nice", then avoid_constant_pool_reference() will obtain naked constants. Because non constant-pool operators will remain untouched, switch should operate on naked constants, not on referenced constants!

These testcases then produce following code:
double test(double x) {return x/x;}
test:
       pushl   %ebp
       movl    %esp, %ebp
       fldl    8(%ebp)
       fdiv    %st(0), %st
       popl    %ebp
       ret

double test(double x) {return 1.0/x;}
test:
       pushl   %ebp
       movl    %esp, %ebp
       fld1
       fdivl   8(%ebp)
       popl    %ebp
       ret

b) simplify_binary_operation() in simplify-rtx.c can simplify two CONST_DOUBLEs with: if (GET_MODE_CLASS (mode) == MODE_FLOAT && GET_CODE (trueop0) == CONST_DOUBLE && GET_CODE (trueop1) == CONST_DOUBLE && mode == GET_MODE (op0) && mode == GET_MODE (op1))

where trueop0 and trueop1 are naked constants out of constant pool. Around line 1314, function is returning simplified CONST_DOUBLE, but it should return memory reference to const double from constant pool when simplifying memory references to constant pool. In i686 case, there is no "div: (reg) (double)" optab, but "div: (reg)(mem)" is OK, and accepted by combine. It should be noted here, that IMHO many other functions should return memory reference, not just naked constant. Look for "return CONST_DOUBLE_FROM_REAL_VALUE (...)".

Simple testcase, which will produce unoptimized code because of this bug:

double test(double x) {return 1.1/x;}

With the fix, this testcase will produce following asm:
test:
     pushl    %ebp
     fldl     .LC1
     movl     %esp, %ebp
     fdivl    8(%ebp)
     popl     %ebp
     ret

Both fixes together now handle functions as stated in original comment in expr.c. This example: double test (double x) {return 1.1 / x + sin (1.2 / x);}

now produces:
test:
     fld1
     fdivl    4(%esp)
     fld      %st(0)
     fmull    .LC2
     fxch     %st(1)
     fmull    .LC1
     fxch     %st(1)
     fsin
     faddp    %st, %st(1)
     ret

Which is quite good, considering that fmul has latency 7 and fdiv has latency 43 on P4. One fdiv less, but two additional fmuls.

This testcase shows, that fmuls are gone if not needed:
double test (double x, double y) {return 5.0 / x + 5.2 / y;}

test:
     flds     .LC1
     fldl     .LC2
     fxch     %st(1)
     fdivl    4(%esp)
     fxch     %st(1)
     fdivl    12(%esp)
     faddp    %st, %st(1)
     ret

_However_ , strange things happens with similar code, where both constants are DFmode: double test (double x, double y) {return 5.1 / x + 5.2 / y;}

test:
     fld1
     fld      %st(0)
     fdivl    4(%esp)
     fxch     %st(1)
     fdivl    12(%esp)
     fxch     %st(1)
     fmull    .LC1
     fxch     %st(1)
     fmull    .LC2
     faddp    %st, %st(1)
     ret

This is the same unoptimized code as without patch. The diff between test.c.16.life of the last two sources is only: --cut here-- --- div1.c.16.life 2004-09-20 16:26:18.000000000 +0200 +++ div2.c.16.life 2004-09-20 16:27:07.000000000 +0200 @@ -69,8 +69,8 @@ try_optimize_cfg iteration 1 (nil)))

(insn 14 13 15 0 (set (reg:DF 64) - (float_extend:DF (mem/u/i:SF (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [4 S4 A32]))) 89 {*extendsfdf2_1} (nil) - (expr_list:REG_EQUAL (const_double:DF -1610612736 [0xa0000000] 5.0e+0 [0x0.ap+3]) + (mem/u/i:DF (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [3 S8 A64])) 67 {*movdf_nointeger} (nil) + (expr_list:REG_EQUAL (const_double:DF -1556925645 [0xa3333333] 5.09999999999999964472863211994990706443786621094e+0 [0x0.a333333333333p+3]) (nil)))

(insn 15 14 17 0 (set (reg:DF 65)
--cut here--

I'm a little out of the ideas here... In second case, 1.0 and 5.1 are simplified to 5.1, but second operator (5.2) is never simplified with its 1.0 and it looks that the whole combination is aborted. It is interesting, that previous testcase doesn't touch b) case, but it gets simplified in

     /* In IEEE floating point, x*1 is not equivalent to x for
        signalling NaNs.  */

     if (!HONOR_SNANS (mode)
         && trueop1 == CONST1_RTX (mode))
       return op0;

Everytime, we are simplifying in this section (in combine.c):
--cut here--
     /* Try simplify a*(b/c) as (a*b)/c.  */
     if (FLOAT_MODE_P (mode) && flag_unsafe_math_optimizations
     && GET_CODE (XEXP (x, 0)) == DIV)
   {
        GET_CODE (XEXP (XEXP (x, 0), 1)),
        GET_CODE (XEXP (x, 1)));

     rtx tem = simplify_binary_operation (MULT, mode,
                          XEXP (XEXP (x, 0), 0),
                          XEXP (x, 1));
     if (tem)
       return gen_binary (DIV, mode, tem, XEXP (XEXP (x, 0), 1));
--cut here--

Another strange case, perhaps related to case above is:
double test (double x) {return 1.0 + 1.3 / x;}

which produces:
test:
     fld1
     fld      %st(0)
     fdivl    4(%esp)
     fmull    .LC1
     faddp    %st, %st(1)
     ret

Again, optimization was dropped!

I would appreciate some help here. Actually, with -ffast-math, every x / y division is emitted as x * 1.0 / y, which means one extra fp multiplication and one extra constant load. This "-ffast-math" is killing performance!

The patch was bootstrapped and regtested c, c++, no new regressions.

2004-09-20 Uros Bizjak <uros@kss-loka.si>

   * rtlanal.c (commutative_operand_precedence): Calculate
   precedence of stripped constant.
   * simplify-rtx.c (simplify_binary_operation): Return
   memory reference to constant pool if operands are constant
   pool references.

Uros.
--

Index: rtlanal.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/rtlanal.c,v
retrieving revision 1.200
diff -u -p -r1.200 rtlanal.c
--- rtlanal.c	9 Sep 2004 17:19:12 -0000	1.200
+++ rtlanal.c	20 Sep 2004 14:03:51 -0000
@@ -2980,6 +2980,7 @@ commutative_operand_precedence (rtx op)
   if (code == CONST_DOUBLE)
     return -6;
   op = avoid_constant_pool_reference (op);
+  code = GET_CODE (op);
 
   switch (GET_RTX_CLASS (code))
     {
Index: simplify-rtx.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v
retrieving revision 1.204
diff -u -p -r1.204 simplify-rtx.c
--- simplify-rtx.c	9 Sep 2004 17:19:16 -0000	1.204
+++ simplify-rtx.c	20 Sep 2004 14:03:51 -0000
@@ -1309,7 +1309,10 @@ simplify_binary_operation (enum rtx_code
 	  REAL_ARITHMETIC (value, rtx_to_tree_code (code), f0, f1);
 
 	  value = real_value_truncate (mode, value);
-	  return CONST_DOUBLE_FROM_REAL_VALUE (value, mode);
+	  tem = CONST_DOUBLE_FROM_REAL_VALUE (value, mode);
+	  if (MEM_P (op0) || MEM_P (op1))
+	    tem = validize_mem (force_const_mem (mode, tem));
+	  return tem;
 	}
     }

Follow-Ups:
- Re: [PATCH 14851] Partial fix for suboptimal fp division with-ffast-math
  - From: Roger Sayle

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]