This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Try to coalesce for unary and binary ops


The patch below increases the number of coalescs we attempt
to also cover unary and binary operations.  This improves
initial code generation for code like

int foo (int i, int j, int k, int l)
{
  int res = i;
  res += j;
  res += k;
  res += l;
  return res;
}

from

;; res_3 = i_1(D) + j_2(D);

(insn 9 8 0 (parallel [
            (set (reg/v:SI 83 [ res ])
                (plus:SI (reg/v:SI 87 [ i ])
                    (reg/v:SI 88 [ j ])))
            (clobber (reg:CC 17 flags))
        ]) t.c:4 -1
     (nil))

;; res_5 = res_3 + k_4(D);

(insn 10 9 0 (parallel [
            (set (reg/v:SI 84 [ res ])
                (plus:SI (reg/v:SI 83 [ res ])
                    (reg/v:SI 89 [ k ])))
            (clobber (reg:CC 17 flags))
        ]) t.c:5 -1
     (nil))
...

to

;; res_3 = i_1(D) + j_2(D);

(insn 9 8 0 (parallel [
            (set (reg/v:SI 83 [ res ])
                (plus:SI (reg/v:SI 85 [ i ])
                    (reg/v:SI 86 [ j ])))
            (clobber (reg:CC 17 flags))
        ]) t.c:4 -1
     (nil))

;; res_5 = res_3 + k_4(D);

(insn 10 9 0 (parallel [
            (set (reg/v:SI 83 [ res ])
                (plus:SI (reg/v:SI 83 [ res ])
                    (reg/v:SI 87 [ k ])))
            (clobber (reg:CC 17 flags))
        ]) t.c:5 -1
     (nil))

re-using the same pseudo for the LHS.

Expansion has special code to improve coalescing of op1 with
target thus this is what we try to match here.

Overall there are positive and negative size effects during
a bootstrap on x86_64, but overall it seems to be a loss
- stage3 cc1 text size is 18261647 bytes without the patch
compared to 18265751 bytes with the patch.

Now the question is what does this tell us?  Not re-using
the same pseudo as op and target is always better?

Btw, I tried this to find a convincing metric for a intra-BB
scheduling pass (during out-of-SSA) on GIMPLE (to be able
to kill that odd scheduling code we now have in reassoc).
And to have sth that TER not immediately un-does we have
to disable TER which conveniently happens for coalesced
SSA names.  Thus -> schedule for "register pressure", and thus
reduce SSA name lifetime - with the goal that out-of-SSA can
do more coalescing.  But it won't even try to coalesce
anything else than PHI copies (not affected by scheduling)
or plain SSA name copies (shouldn't happen anyway due to
copy propagation).

So - any ideas?  Or is the overall negative for cc1 just
an artifact to ignore and we _should_ coalesce as much
as possible (even if it doesn't avoid copies - thus the
"cost" of 0 used in the patch)?

Otherwise the patch bootstraps and tests fine on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2014-04-17  Richard Biener  <rguenther@suse.de>

	* tree-ssa-coalesce.c (create_outofssa_var_map): Try to
	coalesce SSA name uses with SSA name results in all unary
	and binary operations.

Index: gcc/tree-ssa-coalesce.c
===================================================================
*** gcc/tree-ssa-coalesce.c	(revision 209469)
--- gcc/tree-ssa-coalesce.c	(working copy)
*************** create_outofssa_var_map (coalesce_list_p
*** 991,1007 ****
  	    case GIMPLE_ASSIGN:
  	      {
  		tree lhs = gimple_assign_lhs (stmt);
  		tree rhs1 = gimple_assign_rhs1 (stmt);
! 		if (gimple_assign_ssa_name_copy_p (stmt)
  		    && gimple_can_coalesce_p (lhs, rhs1))
  		  {
  		    v1 = SSA_NAME_VERSION (lhs);
  		    v2 = SSA_NAME_VERSION (rhs1);
! 		    cost = coalesce_cost_bb (bb);
! 		    add_coalesce (cl, v1, v2, cost);
  		    bitmap_set_bit (used_in_copy, v1);
  		    bitmap_set_bit (used_in_copy, v2);
  		  }
  	      }
  	      break;
  
--- 993,1031 ----
  	    case GIMPLE_ASSIGN:
  	      {
  		tree lhs = gimple_assign_lhs (stmt);
+ 		if (TREE_CODE (lhs) != SSA_NAME)
+ 		  break;
+ 
+ 		/* Expansion handles target == op1 properly and also
+ 		   target == op2 for commutative binary ops.  */
  		tree rhs1 = gimple_assign_rhs1 (stmt);
! 		enum tree_code code = gimple_assign_rhs_code (stmt);
! 		enum gimple_rhs_class klass = get_gimple_rhs_class (code);
! 		if (TREE_CODE (rhs1) == SSA_NAME
  		    && gimple_can_coalesce_p (lhs, rhs1))
  		  {
  		    v1 = SSA_NAME_VERSION (lhs);
  		    v2 = SSA_NAME_VERSION (rhs1);
! 		    add_coalesce (cl, v1, v2,
! 				  klass == GIMPLE_SINGLE_RHS
! 				  ? coalesce_cost_bb (bb) : 0);
  		    bitmap_set_bit (used_in_copy, v1);
  		    bitmap_set_bit (used_in_copy, v2);
  		  }
+ 		if (klass == GIMPLE_BINARY_RHS
+ 		    && commutative_tree_code (code))
+ 		  {
+ 		    tree rhs2 = gimple_assign_rhs2 (stmt);
+ 		    if (TREE_CODE (rhs2) == SSA_NAME
+ 			&& gimple_can_coalesce_p (lhs, rhs2))
+ 		      {
+ 			v1 = SSA_NAME_VERSION (lhs);
+ 			v2 = SSA_NAME_VERSION (rhs2);
+ 			add_coalesce (cl, v1, v2, 0);
+ 			bitmap_set_bit (used_in_copy, v1);
+ 			bitmap_set_bit (used_in_copy, v2);
+ 		      }
+ 		  }
  	      }
  	      break;
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]