This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Improved insn splitting in try_combine


The following patch tweaks the method used to split binary operator
instructions in combine.  One of the transformations performed by
combine is to "smoosh" together up to three related instructions,
simplify the resulting RTL and, for the three instruction case, if
the result isn't recognized by the backend, split the combined insn
into two instructions to see if they can be both recognized,
i.e. a 3 insn -> 2 insn optimization.

Given an instruction such as (set (dest) (op (src1) (src2))), where
src1 and src2 may be complex expressions, the default splitting
strategy used by combine, is to decompose this as:

	(set (temp) (src1))
	(set (dest) (op (temp) (src2)))

So given a complex instruction such as:

	(set (dest) (mult (plus X X) (plus X X)))

we currently decompose this into:

	(set (temp) (plus X X))
	(set (dest) (mult (temp) (plus X X)))

And hope that the second instruction is recognised.  The tweak proposed
below is to spot that in this case, an often better decomposition that's
applicable to all binary operators whose operands have no side-effects
is:

	(set (temp) (plus X X))
	(set (dest) (mult (temp) (temp)))

This is effectively CSE within a single instruction, by noticing that
op0 and op1 are rtx_equal_p when we split.


It turns out that we can be even cleverer still for commutative
associative binary operators, such as addition and multiplication.
Consider the following code below:

  double foo(double x, double y)
  {
    double p = x+x;
    double q = y+y;
    return p+q;
  }

With mainline using "-O2 -ffast-math -fomit-frame-pointer" on x86
we generate the following code requiring three additions:

foo:    fldl    4(%esp)
        fldl    12(%esp)
        fxch    %st(1)
        fadd    %st(0), %st
        fxch    %st(1)
        fadd    %st(0), %st
        faddp   %st, %st(1)
        ret

With the patch below, which teaches combine that (X+X)+(Y+Y) is
equivalent to "(X+Y)+(X+Y)" and therefore to "T=X+Y; T+T", we
now generate only two additions:

foo:    fldl    12(%esp)
        faddl   4(%esp)
        fadd    %st(0), %st
        ret


This approach to local (intra-instruction) CES/reassociation in combine
is complementary to Dan's proposed tree-ssa reassociation work.  It's
effectively a target independent peephole that allows poorly associated
operations to be reorganized.  This optimization isn't implemented in
the version of LLVM available as an on-line demo, and I suspect not by
Dan's pass (but I haven't looked too closely).


The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", and regression tested with a top-level "make -k
check" with no new failures.  These optimizations trigger about 70
times during stage2, stage3 and building the run-time libraries
during a GCC bootstrap.


Ok for mainline?  Any objections?



2005-12-12  Roger Sayle  <roger@eyesopen.com>

	* combine.c (try_combine): Improve splitting of binary operators
	by taking advantage of reassociative transformations.


Index: combine.c
===================================================================
*** combine.c	(revision 108280)
--- combine.c	(working copy)
*************** try_combine (rtx i3, rtx i2, rtx i1, int
*** 2526,2531 ****
--- 2526,2533 ----
  	  rtx newdest = i2dest;
  	  enum rtx_code split_code = GET_CODE (*split);
  	  enum machine_mode split_mode = GET_MODE (*split);
+ 	  bool subst_done = false;
+ 	  newi2pat = NULL_RTX;

  	  /* Get NEWDEST as a register in the proper mode.  We have already
  	     validated that we can do this.  */
*************** try_combine (rtx i3, rtx i2, rtx i1, int
*** 2571,2578 ****
  	    }
  #endif

! 	  newi2pat = gen_rtx_SET (VOIDmode, newdest, *split);
! 	  SUBST (*split, newdest);
  	  i2_code_number = recog_for_combine (&newi2pat, i2, &new_i2_notes);

  	  /* recog_for_combine might have added CLOBBERs to newi2pat.
--- 2573,2641 ----
  	    }
  #endif

! 	  /* Attempt to split binary operators using arithmetic identities.  */
! 	  if (BINARY_P (SET_SRC (newpat))
! 	      && split_mode == GET_MODE (SET_SRC (newpat))
! 	      && ! side_effects_p (SET_SRC (newpat)))
! 	    {
! 	      rtx setsrc = SET_SRC (newpat);
! 	      enum machine_mode mode = GET_MODE (setsrc);
! 	      enum rtx_code code = GET_CODE (setsrc);
! 	      rtx src_op0 = XEXP (setsrc, 0);
! 	      rtx src_op1 = XEXP (setsrc, 1);
!
! 	      /* Split "X = Y op Y" as "Z = Y; X = Z op Z".  */
! 	      if (rtx_equal_p (src_op0, src_op1))
! 		{
! 		  newi2pat = gen_rtx_SET (VOIDmode, newdest, src_op0);
! 		  SUBST (XEXP (setsrc, 0), newdest);
! 		  SUBST (XEXP (setsrc, 1), newdest);
! 		  subst_done = true;
! 		}
! 	      /* Split "((P op Q) op R) op S" where op is PLUS or MULT.  */
! 	      else if ((code == PLUS || code == MULT)
! 		       && GET_CODE (src_op0) == code
! 		       && GET_CODE (XEXP (src_op0, 0)) == code
! 		       && (INTEGRAL_MODE_P (mode)
! 			   || (FLOAT_MODE_P (mode)
! 			       && flag_unsafe_math_optimizations)))
! 		{
! 		  rtx p = XEXP (XEXP (src_op0, 0), 0);
! 		  rtx q = XEXP (XEXP (src_op0, 0), 1);
! 		  rtx r = XEXP (src_op0, 1);
! 		  rtx s = src_op1;
!
! 		  /* Split both "((X op Y) op X) op Y" and
! 		     "((X op Y) op Y) op X" as "T op T" where T is
! 		     "X op Y".  */
! 		  if ((rtx_equal_p (p,r) && rtx_equal_p (q,s))
! 		       || (rtx_equal_p (p,s) && rtx_equal_p (q,r)))
! 		    {
! 		      newi2pat = gen_rtx_SET (VOIDmode, newdest,
! 					      XEXP (src_op0, 0));
! 		      SUBST (XEXP (setsrc, 0), newdest);
! 		      SUBST (XEXP (setsrc, 1), newdest);
! 		      subst_done = true;
! 		    }
! 		  /* Split "((X op X) op Y) op Y)" as "T op T" where
! 		     T is "X op Y".  */
! 		  else if (rtx_equal_p (p,q) && rtx_equal_p (r,s))
! 		    {
! 		      rtx tmp = simplify_gen_binary (code, mode, p, r);
! 		      newi2pat = gen_rtx_SET (VOIDmode, newdest, tmp);
! 		      SUBST (XEXP (setsrc, 0), newdest);
! 		      SUBST (XEXP (setsrc, 1), newdest);
! 		      subst_done = true;
! 		    }
! 		}
! 	    }
!
! 	  if (!subst_done)
! 	    {
! 	      newi2pat = gen_rtx_SET (VOIDmode, newdest, *split);
! 	      SUBST (*split, newdest);
! 	    }
!
  	  i2_code_number = recog_for_combine (&newi2pat, i2, &new_i2_notes);

  	  /* recog_for_combine might have added CLOBBERs to newi2pat.


Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]