This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH 14851] Final fix for suboptimal fp division with -ffast-math
- From: Uros Bizjak <uros at kss-loka dot si>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Wed, 22 Sep 2004 13:01:42 +0200
- Subject: [PATCH 14851] Final fix for suboptimal fp division with -ffast-math
- References: <Pine.LNX.4.44.0409201838440.13777-100000@www.eyesopen.com>
Hello!
place to fix this. I think a better strategy would be to fix this
deep in try_combine, where if an attempt to recognize a simplified
expression fails, we check for CONST_DOUBLE operands, force them
to memory and then try to recognize the instruction again. This
should catch all the CONST_DOUBLE_FROM_REAL_VALUE cases that you're
worried about, without adversely affecting further optimizations or
other platforms which may suffer by tweaking simplify-rtx directly.
Does this seem like a reasonable strategy? Anyone else agree or
disagree?
I agree with this approach. But please note, that we cannot return
CONST_DOUBLE in register with force_reg(). So basically, we try to
combine to CONST_DOUBLE and if this fails, the only remaining choice is
to try with a memory reference to CONST_DOUBLE. The problem is, that
CONST_DOUBLE can not be returned in register, and this fact will make
following troubles:
Consider a testcase:
double test (double x, double y) {
return 5.1 / x + 5.2 / y;
}
compiled with -ffast-math -O2, this calculation will be expanded to:
(1.0 / x) * 5.1 + (1.0 / y) * 5.2;
So in i686 case, combiner will combine:
- division with 'two linked insn' into (div:DF (reg)(mem [x]))
- multiplication with 'two linked insn' into (mult:DF (reg)(mem [5.1]))
After 'two linked insn' pass, a 'three linked insn' combine pass
follows, which tries to further simplify multiplication to:
- multiplication with 'three linked insn' into (mult:DF (div:DF (mem
[1.0])(mem[x])) (mem [5.1])) [here, combiner traversed the 'div' path]
This instruction is simplified with MULT/DIV simplification as (div:DF
(mult:DF (mem[1.0])(mem[5.1])) (mem[x])), and when multiplication is
further simplified to (mem[5.1]), resulting simplified insn becomes
(div:DF (mem[5.1])(mem[x]). Unfortunatelly, this insn is _rejected_ by
insn check, because there is no correspondig pattern.
If a 'three linked insn' pass is moved before the 'two linked insn'
pass, combiner has considerably more freedom to simplify instructions.
In this case, combine pass will combine insn this way:
- division with 'two linked insn' into (div:DF (reg)(mem [x]))
- multiplication with 'three linked insn' into (mult:DF (div:DF
(mem[1.0])(mem[x])) (reg)). This time, multiplication is simplified to
(mult:DF (mem[1.0])(reg)), which is simplified to (reg). Resulting
simplified insn is now (div:DF (reg)(mem[x])), where (reg) holds a value
of 5.1. In this case, insn is recognized and emitted as valid insn.
Patch was bootstrapped and regtested on i686-pc-linux-gnu, for c and
c++. It should be noted that it successfully removed multiplications
from all testcases of PR 14851. If this patch is approved, the PR could
be closed.
Patch was benchmarked with scimark 2.0 and didn't show any noticeable
difference (benchmark, compiled with patched gcc was 0.2% faster). On
artifical testcases (like one above) it is considerably faster, because
there is no multiplication any more.
2004-09-22 Uros Bizjak <uros@kss-loka.si>
PR rtl-optimization/14851
* combine.c (combine_instructions): Move 'three linked insns'
pass before 'two linked insns' pass.
Uros.
Index: combine.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/combine.c,v
retrieving revision 1.454
diff -u -p -r1.454 combine.c
--- combine.c 9 Sep 2004 21:57:50 -0000 1.454
+++ combine.c 22 Sep 2004 09:48:42 -0000
@@ -732,13 +732,6 @@ combine_instructions (rtx f, unsigned in
insn based upon SUBREG flags. */
check_promoted_subreg (insn, PATTERN (insn));
- /* Try this insn with each insn it links back to. */
-
- for (links = LOG_LINKS (insn); links; links = XEXP (links, 1))
- if ((next = try_combine (insn, XEXP (links, 0),
- NULL_RTX, &new_direct_jump_p)) != 0)
- goto retry;
-
/* Try each sequence of three linked insns ending with this one. */
for (links = LOG_LINKS (insn); links; links = XEXP (links, 1))
@@ -759,6 +752,13 @@ combine_instructions (rtx f, unsigned in
goto retry;
}
+ /* Try this insn with each insn it links back to. */
+
+ for (links = LOG_LINKS (insn); links; links = XEXP (links, 1))
+ if ((next = try_combine (insn, XEXP (links, 0),
+ NULL_RTX, &new_direct_jump_p)) != 0)
+ goto retry;
+
#ifdef HAVE_cc0
/* Try to combine a jump insn that uses CC0
with a preceding insn that sets CC0, and maybe with its