This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 14851] Final fix for suboptimal fp division with -ffast-math


Hello!

place to fix this.  I think a better strategy would be to fix this
deep in try_combine, where if an attempt to recognize a simplified
expression fails, we check for CONST_DOUBLE operands, force them
to memory and then try to recognize the instruction again.  This
should catch all the CONST_DOUBLE_FROM_REAL_VALUE cases that you're
worried about, without adversely affecting further optimizations or
other platforms which may suffer by tweaking simplify-rtx directly.

Does this seem like a reasonable strategy?  Anyone else agree or
disagree?



I agree with this approach. But please note, that we cannot return CONST_DOUBLE in register with force_reg(). So basically, we try to combine to CONST_DOUBLE and if this fails, the only remaining choice is to try with a memory reference to CONST_DOUBLE. The problem is, that CONST_DOUBLE can not be returned in register, and this fact will make following troubles:

Consider a testcase:
double test (double x, double y) {
       return 5.1 / x + 5.2 / y;
}

compiled with -ffast-math -O2, this calculation will be expanded to:

(1.0 / x) * 5.1 + (1.0 / y) * 5.2;

So in i686 case, combiner will combine:
- division with 'two linked insn' into (div:DF (reg)(mem [x]))
- multiplication with 'two linked insn' into (mult:DF (reg)(mem [5.1]))

After 'two linked insn' pass, a 'three linked insn' combine pass follows, which tries to further simplify multiplication to:
- multiplication with 'three linked insn' into (mult:DF (div:DF (mem [1.0])(mem[x])) (mem [5.1])) [here, combiner traversed the 'div' path]


This instruction is simplified with MULT/DIV simplification as (div:DF (mult:DF (mem[1.0])(mem[5.1])) (mem[x])), and when multiplication is further simplified to (mem[5.1]), resulting simplified insn becomes (div:DF (mem[5.1])(mem[x]). Unfortunatelly, this insn is _rejected_ by insn check, because there is no correspondig pattern.

If a 'three linked insn' pass is moved before the 'two linked insn' pass, combiner has considerably more freedom to simplify instructions. In this case, combine pass will combine insn this way:
- division with 'two linked insn' into (div:DF (reg)(mem [x]))
- multiplication with 'three linked insn' into (mult:DF (div:DF (mem[1.0])(mem[x])) (reg)). This time, multiplication is simplified to (mult:DF (mem[1.0])(reg)), which is simplified to (reg). Resulting simplified insn is now (div:DF (reg)(mem[x])), where (reg) holds a value of 5.1. In this case, insn is recognized and emitted as valid insn.


Patch was bootstrapped and regtested on i686-pc-linux-gnu, for c and c++. It should be noted that it successfully removed multiplications from all testcases of PR 14851. If this patch is approved, the PR could be closed.

Patch was benchmarked with scimark 2.0 and didn't show any noticeable difference (benchmark, compiled with patched gcc was 0.2% faster). On artifical testcases (like one above) it is considerably faster, because there is no multiplication any more.

2004-09-22 Uros Bizjak <uros@kss-loka.si>

       PR rtl-optimization/14851
       * combine.c (combine_instructions): Move 'three linked insns'
       pass before 'two linked insns' pass.

Uros.


Index: combine.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/combine.c,v
retrieving revision 1.454
diff -u -p -r1.454 combine.c
--- combine.c	9 Sep 2004 21:57:50 -0000	1.454
+++ combine.c	22 Sep 2004 09:48:42 -0000
@@ -732,13 +732,6 @@ combine_instructions (rtx f, unsigned in
 		 insn based upon SUBREG flags.  */
 	      check_promoted_subreg (insn, PATTERN (insn));
 
-	      /* Try this insn with each insn it links back to.  */
-
-	      for (links = LOG_LINKS (insn); links; links = XEXP (links, 1))
-		if ((next = try_combine (insn, XEXP (links, 0),
-					 NULL_RTX, &new_direct_jump_p)) != 0)
-		  goto retry;
-
 	      /* Try each sequence of three linked insns ending with this one.  */
 
 	      for (links = LOG_LINKS (insn); links; links = XEXP (links, 1))
@@ -759,6 +752,13 @@ combine_instructions (rtx f, unsigned in
 		      goto retry;
 		}
 
+	      /* Try this insn with each insn it links back to.  */
+
+	      for (links = LOG_LINKS (insn); links; links = XEXP (links, 1))
+		if ((next = try_combine (insn, XEXP (links, 0),
+					 NULL_RTX, &new_direct_jump_p)) != 0)
+		  goto retry;
+
 #ifdef HAVE_cc0
 	      /* Try to combine a jump insn that uses CC0
 		 with a preceding insn that sets CC0, and maybe with its

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]