This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH 14851] Partial fix for suboptimal fp division with -ffast-math
- From: Uros Bizjak <uros at kss-loka dot si>
- To: gcc-patches at gcc dot gnu dot org
- Date: Mon, 20 Sep 2004 17:14:10 +0200
- Subject: [PATCH 14851] Partial fix for suboptimal fp division with -ffast-math
Hello!
The testcase in rtl-optimization/14851 [1]:
double test(double x) {return x/x;}
currently produces something like x * (1.0 / x).
pushl %ebp
movl %esp, %ebp
fldl 8(%ebp)
fld1
fdiv %st(1), %st
popl %ebp
fmulp %st, %st(1)
ret
This code is emitted because in expr.c, around line 7676:
/* Emit a/b as a*(1/b). Later we may manage CSE the reciprocal saving
expensive divide. If not, combine will rebuild the original
computation. */
Unfortunatelly, combine doesn't rebuild the original computation. Two
bugs are present in current combine phase:
a) Missing "code = GET_CODE (op);"in commutative_operand_precedence() in
rtlanal.c. If constants are not "nice", then
avoid_constant_pool_reference() will obtain naked constants. Because non
constant-pool operators will remain untouched,
switch should operate on naked constants, not on referenced constants!
These testcases then produce following code:
double test(double x) {return x/x;}
test:
pushl %ebp
movl %esp, %ebp
fldl 8(%ebp)
fdiv %st(0), %st
popl %ebp
ret
double test(double x) {return 1.0/x;}
test:
pushl %ebp
movl %esp, %ebp
fld1
fdivl 8(%ebp)
popl %ebp
ret
b) simplify_binary_operation() in simplify-rtx.c can simplify two
CONST_DOUBLEs with:
if (GET_MODE_CLASS (mode) == MODE_FLOAT
&& GET_CODE (trueop0) == CONST_DOUBLE
&& GET_CODE (trueop1) == CONST_DOUBLE
&& mode == GET_MODE (op0) && mode == GET_MODE (op1))
where trueop0 and trueop1 are naked constants out of constant pool.
Around line 1314, function is returning simplified CONST_DOUBLE, but it
should return memory reference to const double from constant pool when
simplifying memory references to constant pool. In i686 case, there is
no "div: (reg) (double)" optab, but "div: (reg)(mem)" is OK, and
accepted by combine. It should be noted here, that IMHO many other
functions should return memory reference, not just naked constant. Look
for "return CONST_DOUBLE_FROM_REAL_VALUE (...)".
Simple testcase, which will produce unoptimized code because of this bug:
double test(double x) {return 1.1/x;}
With the fix, this testcase will produce following asm:
test:
pushl %ebp
fldl .LC1
movl %esp, %ebp
fdivl 8(%ebp)
popl %ebp
ret
Both fixes together now handle functions as stated in original comment
in expr.c. This example:
double test (double x) {return 1.1 / x + sin (1.2 / x);}
now produces:
test:
fld1
fdivl 4(%esp)
fld %st(0)
fmull .LC2
fxch %st(1)
fmull .LC1
fxch %st(1)
fsin
faddp %st, %st(1)
ret
Which is quite good, considering that fmul has latency 7 and fdiv has
latency 43 on P4. One fdiv less, but two additional fmuls.
This testcase shows, that fmuls are gone if not needed:
double test (double x, double y) {return 5.0 / x + 5.2 / y;}
test:
flds .LC1
fldl .LC2
fxch %st(1)
fdivl 4(%esp)
fxch %st(1)
fdivl 12(%esp)
faddp %st, %st(1)
ret
_However_ , strange things happens with similar code, where both
constants are DFmode:
double test (double x, double y) {return 5.1 / x + 5.2 / y;}
test:
fld1
fld %st(0)
fdivl 4(%esp)
fxch %st(1)
fdivl 12(%esp)
fxch %st(1)
fmull .LC1
fxch %st(1)
fmull .LC2
faddp %st, %st(1)
ret
This is the same unoptimized code as without patch. The diff between
test.c.16.life of the last two sources is only:
--cut here--
--- div1.c.16.life 2004-09-20 16:26:18.000000000 +0200
+++ div2.c.16.life 2004-09-20 16:27:07.000000000 +0200
@@ -69,8 +69,8 @@ try_optimize_cfg iteration 1
(nil)))
(insn 14 13 15 0 (set (reg:DF 64)
- (float_extend:DF (mem/u/i:SF (symbol_ref/u:SI ("*.LC1") [flags
0x2]) [4 S4 A32]))) 89 {*extendsfdf2_1} (nil)
- (expr_list:REG_EQUAL (const_double:DF -1610612736 [0xa0000000]
5.0e+0 [0x0.ap+3])
+ (mem/u/i:DF (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [3 S8
A64])) 67 {*movdf_nointeger} (nil)
+ (expr_list:REG_EQUAL (const_double:DF -1556925645 [0xa3333333]
5.09999999999999964472863211994990706443786621094e+0 [0x0.a333333333333p+3])
(nil)))
(insn 15 14 17 0 (set (reg:DF 65)
--cut here--
I'm a little out of the ideas here... In second case, 1.0 and 5.1 are
simplified to 5.1, but second operator (5.2) is never simplified with
its 1.0 and it looks that the whole combination is aborted. It is
interesting, that previous testcase doesn't touch b) case, but it gets
simplified in
/* In IEEE floating point, x*1 is not equivalent to x for
signalling NaNs. */
if (!HONOR_SNANS (mode)
&& trueop1 == CONST1_RTX (mode))
return op0;
Everytime, we are simplifying in this section (in combine.c):
--cut here--
/* Try simplify a*(b/c) as (a*b)/c. */
if (FLOAT_MODE_P (mode) && flag_unsafe_math_optimizations
&& GET_CODE (XEXP (x, 0)) == DIV)
{
GET_CODE (XEXP (XEXP (x, 0), 1)),
GET_CODE (XEXP (x, 1)));
rtx tem = simplify_binary_operation (MULT, mode,
XEXP (XEXP (x, 0), 0),
XEXP (x, 1));
if (tem)
return gen_binary (DIV, mode, tem, XEXP (XEXP (x, 0), 1));
--cut here--
Another strange case, perhaps related to case above is:
double test (double x) {return 1.0 + 1.3 / x;}
which produces:
test:
fld1
fld %st(0)
fdivl 4(%esp)
fmull .LC1
faddp %st, %st(1)
ret
Again, optimization was dropped!
I would appreciate some help here. Actually, with -ffast-math, every x /
y division is emitted as x * 1.0 / y, which means one extra fp
multiplication and one extra constant load. This "-ffast-math" is
killing performance!
The patch was bootstrapped and regtested c, c++, no new regressions.
2004-09-20 Uros Bizjak <uros@kss-loka.si>
* rtlanal.c (commutative_operand_precedence): Calculate
precedence of stripped constant.
* simplify-rtx.c (simplify_binary_operation): Return
memory reference to constant pool if operands are constant
pool references.
Uros.
--
Index: rtlanal.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/rtlanal.c,v
retrieving revision 1.200
diff -u -p -r1.200 rtlanal.c
--- rtlanal.c 9 Sep 2004 17:19:12 -0000 1.200
+++ rtlanal.c 20 Sep 2004 14:03:51 -0000
@@ -2980,6 +2980,7 @@ commutative_operand_precedence (rtx op)
if (code == CONST_DOUBLE)
return -6;
op = avoid_constant_pool_reference (op);
+ code = GET_CODE (op);
switch (GET_RTX_CLASS (code))
{
Index: simplify-rtx.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v
retrieving revision 1.204
diff -u -p -r1.204 simplify-rtx.c
--- simplify-rtx.c 9 Sep 2004 17:19:16 -0000 1.204
+++ simplify-rtx.c 20 Sep 2004 14:03:51 -0000
@@ -1309,7 +1309,10 @@ simplify_binary_operation (enum rtx_code
REAL_ARITHMETIC (value, rtx_to_tree_code (code), f0, f1);
value = real_value_truncate (mode, value);
- return CONST_DOUBLE_FROM_REAL_VALUE (value, mode);
+ tem = CONST_DOUBLE_FROM_REAL_VALUE (value, mode);
+ if (MEM_P (op0) || MEM_P (op1))
+ tem = validize_mem (force_const_mem (mode, tem));
+ return tem;
}
}