This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Fix PR71632, remove parts of TER


On Thu, 30 Jun 2016, Richard Biener wrote:

> 
> The following patch fixes PR71632 by removing delayed expansion of
> TERed defs.  Instead it adds code to apply the scheduling effect
> to the GIMPLE IL (so you also get better interleaved GIMPLE stmt
> / generated RTL dumps in .expand).
> 
> This removes the quadratic re-expansion of TERed RHS as seen in
> the testcase.
> 
> It affects initial RTL generation in following ways:
> 
>  1) it may expand stmts with unused LHS
>  2) it may expand dead code when use stmts are expanded and those
>     use get_gimple_for_ssa, expanding its operands (again).
>  3) it no longer automatically tries to combine with defs doing
>     memory loads (in case we TERed the load and expansion of the
>     RHS resulted in a MEM).
>  4) the depth-first, left-to-right "expansion" of ops might not
>     be a 100% match of what expand currently does
> 
> I expect that 1) and 2) are mostly a non-issue (and not worse than
> the effect seen in the PR).  I also expect that 3) is "quickly"
> recovered by fwprop or combine.
> 
> While I'm quite sure 3) exists I'm not 100% sure we never return
> a non-constant/reg for SSA name expansion (considering expand
> modifiers like EXPAND_SUM).
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> I think this is also an important step towards eventually doing
> a meaningful GIMPLE-level scheduling (for register pressure,
> also to remove this awkward code-placement code from reassoc
> for example).  Without this TER would disrupt the result quite
> a bit (so the scheduling part of TER would be disabled when scheduling
> on GIMPLE).
> 
> Comments?  I'll throw this on one of our spec testers tonight any
> testing on non-x86 archs appreciated.

So testing works ok apart from a few fails:

x86_64/-m64

FAIL: gcc.target/i386/pr70155-22.c scan-assembler-not \\\\*movdi_internal
FAIL: gcc.target/i386/xorps-sse2.c scan-assembler xorps[ \\t]
FAIL: gcc.target/i386/xorps-sse2.c scan-assembler-not pxor

x86_64/-m32

XPASS: gcc.target/i386/addr-sel-1.c scan-assembler b\\\\+1
FAIL: gcc.target/i386/movbe-2.c scan-assembler-times movbe[ \\t] 4
FAIL: gcc.target/i386/movq-2.c scan-assembler movzbl[ \\t]*123
FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps
FAIL: gcc.target/i386/xorps-sse2.c scan-assembler xorps[ \\t]
FAIL: gcc.target/i386/xorps-sse2.c scan-assembler-not pxor

I so far looked at gcc.target/i386/pr70155-22.c and without the patch
RTL expansion creates

(insn 7 6 0 2 (parallel [
            (set (mem/c:TI (symbol_ref:DI ("c") [flags 0x40]  <var_decl 
0x7f370f13be10 c>) [1 c+0 S16 A128])
                (plus:TI (mem/c:TI (symbol_ref:DI ("c") [flags 0x40]  
<var_decl 0x7f370f13be10 c>) [1 c+0 S16 A128])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) 
/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.target/i386/pr70155-22.c:10 
210 {*addti3_doubleword}
     (nil))

directly while with the patch we get the load and store separated from the
add but then at combine-time:

Trying 7, 8 -> 9:
Successfully matched this instruction:
(set (mem/c:TI (symbol_ref:DI ("c") [flags 0x40]  <var_decl 0x7f0d8f0d4e10 
c>) [1 c+0 S16 A128])
    (plus:TI (mem/c:TI (symbol_ref:DI ("c") [flags 0x40]  <var_decl 
0x7f0d8f0d4e10 c>) [1 c+0 S16 A128])
        (const_int 1 [0x1])))
rejecting combination of insns 7, 8 and 9
original costs 8 + 8 + 4 = 20
replacement cost 24

so for some reason RTX costs in the backend reject this variant.  Still
the testcase wants it to happen.

I suspect the other testcases run into similar issues.  Like
gcc.target/i386/xorps-sse2.c somehow magically wanting the
vector integer xor operation in the GIMPLE IL be rewritten to vector float
which probably happens through some simplify-rtx tricks at RTL
expansion time (not sure if it is a good idea to do that, the
vector int constant could be a NaN/denormal when loaded as vector float
and thus trigger a similar issue as the testcase wants to avoid,
re-interpret of int vector as float causing slowdowns).

Richard.

> Thanks,
> Richard.
> 
> PS: avoid_deep_ter_for_debug could go away as well I think.
> 
> 2016-06-30  Richard Biener  <rguenther@suse.de>
> 
> 	PR middle-end/71632
> 	* expr.c (expand_expr_real_1): Do not expand TERed SSA name
> 	def rhs.
> 	* cfgexpand.c (expand_gimple_basic_block): Expand defs of
> 	TERed SSA names.  Move debug stmt use handling ...
> 	* tree-outof-ssa.c (move_uses_r): ... here.  New helper
> 	for ...
> 	(rewrite_trees): ... code to apply the scheduling effect of TER
> 	to the GIMPLE IL.
> 	(remove_ssa_form): Move rewrite_trees call.
> 
> 	* gcc.dg/torture/pr71632.c: New testcase.
> 
> Index: gcc/expr.c
> ===================================================================
> *** gcc/expr.c	(revision 237873)
> --- gcc/expr.c	(working copy)
> *************** expand_expr_real_1 (tree exp, rtx target
> *** 9695,9704 ****
>   			      LAST_VIRTUAL_REGISTER + 1);
>   	}
>   
> !       g = get_gimple_for_ssa_name (exp);
> !       /* For EXPAND_INITIALIZER try harder to get something simpler.  */
> !       if (g == NULL
> ! 	  && modifier == EXPAND_INITIALIZER
>   	  && !SSA_NAME_IS_DEFAULT_DEF (exp)
>   	  && (optimize || !SSA_NAME_VAR (exp)
>   	      || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
> --- 9677,9685 ----
>   			      LAST_VIRTUAL_REGISTER + 1);
>   	}
>   
> !       g = NULL;
> !       /* For EXPAND_INITIALIZER try to get something simpler.  */
> !       if (modifier == EXPAND_INITIALIZER
>   	  && !SSA_NAME_IS_DEFAULT_DEF (exp)
>   	  && (optimize || !SSA_NAME_VAR (exp)
>   	      || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
> Index: gcc/cfgexpand.c
> ===================================================================
> *** gcc/cfgexpand.c	(revision 237873)
> --- gcc/cfgexpand.c	(working copy)
> *************** expand_gimple_basic_block (basic_block b
> *** 5510,5611 ****
>         basic_block new_bb;
>   
>         stmt = gsi_stmt (gsi);
> - 
> -       /* If this statement is a non-debug one, and we generate debug
> - 	 insns, then this one might be the last real use of a TERed
> - 	 SSA_NAME, but where there are still some debug uses further
> - 	 down.  Expanding the current SSA name in such further debug
> - 	 uses by their RHS might lead to wrong debug info, as coalescing
> - 	 might make the operands of such RHS be placed into the same
> - 	 pseudo as something else.  Like so:
> - 	   a_1 = a_0 + 1;   // Assume a_1 is TERed and a_0 is dead
> - 	   use(a_1);
> - 	   a_2 = ...
> -            #DEBUG ... => a_1
> - 	 As a_0 and a_2 don't overlap in lifetime, assume they are coalesced.
> - 	 If we now would expand a_1 by it's RHS (a_0 + 1) in the debug use,
> - 	 the write to a_2 would actually have clobbered the place which
> - 	 formerly held a_0.
> - 
> - 	 So, instead of that, we recognize the situation, and generate
> - 	 debug temporaries at the last real use of TERed SSA names:
> - 	   a_1 = a_0 + 1;
> -            #DEBUG #D1 => a_1
> - 	   use(a_1);
> - 	   a_2 = ...
> -            #DEBUG ... => #D1
> - 	 */
> -       if (MAY_HAVE_DEBUG_INSNS
> - 	  && SA.values
> - 	  && !is_gimple_debug (stmt))
> - 	{
> - 	  ssa_op_iter iter;
> - 	  tree op;
> - 	  gimple *def;
> - 
> - 	  location_t sloc = curr_insn_location ();
> - 
> - 	  /* Look for SSA names that have their last use here (TERed
> - 	     names always have only one real use).  */
> - 	  FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
> - 	    if ((def = get_gimple_for_ssa_name (op)))
> - 	      {
> - 		imm_use_iterator imm_iter;
> - 		use_operand_p use_p;
> - 		bool have_debug_uses = false;
> - 
> - 		FOR_EACH_IMM_USE_FAST (use_p, imm_iter, op)
> - 		  {
> - 		    if (gimple_debug_bind_p (USE_STMT (use_p)))
> - 		      {
> - 			have_debug_uses = true;
> - 			break;
> - 		      }
> - 		  }
> - 
> - 		if (have_debug_uses)
> - 		  {
> - 		    /* OP is a TERed SSA name, with DEF its defining
> - 		       statement, and where OP is used in further debug
> - 		       instructions.  Generate a debug temporary, and
> - 		       replace all uses of OP in debug insns with that
> - 		       temporary.  */
> - 		    gimple *debugstmt;
> - 		    tree value = gimple_assign_rhs_to_tree (def);
> - 		    tree vexpr = make_node (DEBUG_EXPR_DECL);
> - 		    rtx val;
> - 		    machine_mode mode;
> - 
> - 		    set_curr_insn_location (gimple_location (def));
> - 
> - 		    DECL_ARTIFICIAL (vexpr) = 1;
> - 		    TREE_TYPE (vexpr) = TREE_TYPE (value);
> - 		    if (DECL_P (value))
> - 		      mode = DECL_MODE (value);
> - 		    else
> - 		      mode = TYPE_MODE (TREE_TYPE (value));
> - 		    DECL_MODE (vexpr) = mode;
> - 
> - 		    val = gen_rtx_VAR_LOCATION
> - 			(mode, vexpr, (rtx)value, VAR_INIT_STATUS_INITIALIZED);
> - 
> - 		    emit_debug_insn (val);
> - 
> - 		    FOR_EACH_IMM_USE_STMT (debugstmt, imm_iter, op)
> - 		      {
> - 			if (!gimple_debug_bind_p (debugstmt))
> - 			  continue;
> - 
> - 			FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> - 			  SET_USE (use_p, vexpr);
> - 
> - 			update_stmt (debugstmt);
> - 		      }
> - 		  }
> - 	      }
> - 	  set_curr_insn_location (sloc);
> - 	}
> - 
>         currently_expanding_gimple_stmt = stmt;
>   
>         /* Expand this statement, then evaluate the resulting RTL and
> --- 5510,5515 ----
> *************** expand_gimple_basic_block (basic_block b
> *** 5732,5749 ****
>   	    }
>   	  else
>   	    {
> - 	      def_operand_p def_p;
> - 	      def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
> - 
> - 	      if (def_p != NULL)
> - 		{
> - 		  /* Ignore this stmt if it is in the list of
> - 		     replaceable expressions.  */
> - 		  if (SA.values
> - 		      && bitmap_bit_p (SA.values,
> - 				       SSA_NAME_VERSION (DEF_FROM_PTR (def_p))))
> - 		    continue;
> - 		}
>   	      last = expand_gimple_stmt (stmt);
>   	      maybe_dump_rtl_for_gimple_stmt (stmt, last);
>   	    }
> --- 5636,5641 ----
> Index: gcc/tree-outof-ssa.c
> ===================================================================
> *** gcc/tree-outof-ssa.c	(revision 237873)
> --- gcc/tree-outof-ssa.c	(working copy)
> *************** along with GCC; see the file COPYING3.
> *** 42,47 ****
> --- 42,49 ----
>   #include "tree-ssa-coalesce.h"
>   #include "tree-outof-ssa.h"
>   #include "dojump.h"
> + #include "tree-ssa.h"
> + #include "gimple-walk.h"
>   
>   /* FIXME: A lot of code here deals with expanding to RTL.  All that code
>      should be in cfgexpand.c.  */
> *************** eliminate_useless_phis (void)
> *** 870,875 ****
> --- 872,941 ----
>   }
>   
>   
> + /* walk_gimple_op callback to schedule TERed defs before uses.  */
> + 
> + static tree
> + move_uses_r (tree *tp, int *, void *data)
> + {
> +   walk_stmt_info *wi = (walk_stmt_info *)data;
> + 
> +   if (TREE_CODE (*tp) == SSA_NAME
> +       && bitmap_bit_p ((bitmap)wi->info, SSA_NAME_VERSION (*tp)))
> +     {
> +       gimple *def = SSA_NAME_DEF_STMT (*tp);
> +       /* We are only interested in uses.  */
> +       if (def == wi->stmt)
> + 	return NULL_TREE;
> + 
> +       /* Avoid useless work.  We should enhance gsi_move_before
> + 	 so it can avoid the redundant stmt updating it
> + 	 performs as it uses gsi_remove/gsi_insert_before.  */
> +       if (wi->gsi.ptr->prev == def)
> + 	{
> + 	  gsi_prev (&wi->gsi);
> + 	  return NULL_TREE;
> + 	}
> + 
> +       /* If this statement is a non-debug one, and we generate
> + 	 debug insns, then this one might be the last real use of
> + 	 a TERed SSA_NAME, but where there are still some debug
> + 	 uses further down.  Expanding the current SSA name in
> + 	 such further debug uses by their RHS might lead to wrong
> + 	 debug info, as coalescing might make the operands of
> + 	 such RHS be placed into the same pseudo as something
> + 	 else.  Like so:
> + 	   a_1 = a_0 + 1;   // Assume a_1 is TERed and a_0 is dead
> + 	   use(a_1);
> + 	   a_2 = ...
> + 	   #DEBUG ... => a_1
> + 	 As a_0 and a_2 don't overlap in lifetime, assume they
> + 	 are coalesced.
> + 	 If we now would expand a_1 by it's RHS (a_0 + 1) in the
> + 	 debug use, the write to a_2 would actually have
> + 	 clobbered the place which formerly held a_0.
> + 
> + 	 So, instead of that, we recognize the situation, and
> + 	 generate debug temporaries at the last real use of
> + 	 TERed SSA names:
> + 	   a_1 = a_0 + 1;
> + 	   #DEBUG #D1 => a_1
> + 	   use(a_1);
> + 	   a_2 = ...
> + 	   #DEBUG ... => #D1  */
> +       insert_debug_temp_for_var_def (NULL, *tp);
> + 
> +       /* wi->gsi starts at 'stmt' and the walker visits operands
> +          left-to-right.  Implement depth-first and left-to-right
> + 	 evaluation (the depth step is performed by walking the
> + 	 here inserted stmts backwards).  */
> +       gimple_stmt_iterator gsi2 = gsi_for_stmt (def);
> +       gsi_remove (&gsi2, false);
> +       gsi_insert_before (&wi->gsi, def, GSI_NEW_STMT);
> +     }
> + 
> +   return NULL_TREE;
> + }
> + 
>   /* This function will rewrite the current program using the variable mapping
>      found in MAP.  If the replacement vector VALUES is provided, any
>      occurrences of partitions with non-null entries in the vector will be
> *************** eliminate_useless_phis (void)
> *** 877,888 ****
>      variable.  */
>   
>   static void
> ! rewrite_trees (var_map map)
>   {
>     if (!flag_checking)
>       return;
>   
> -   basic_block bb;
>     /* Search for PHIs where the destination has no partition, but one
>        or more arguments has a partition.  This should not happen and can
>        create incorrect code.  */
> --- 943,979 ----
>      variable.  */
>   
>   static void
> ! rewrite_trees (var_map map, struct ssaexpand *sa)
>   {
> +   basic_block bb;
> + 
> +   /* Apply TER stmt re-ordering.  */
> +   if (sa->values)
> +     FOR_EACH_BB_FN (bb, cfun)
> +       {
> + 	/* Re-materialize replaceable expressions before stmts using
> + 	   them.  We walk the BBs backwards so we simulate the way
> + 	   delayed expansion worked by expanding things depth-first
> + 	   and left-to-right (in most cases).  */
> + 	for (gimple_stmt_iterator gsi = gsi_last_bb (bb);
> + 	     !gsi_end_p (gsi); gsi_prev (&gsi))
> + 	  {
> + 	    gimple *stmt = gsi_stmt (gsi);
> + 	    if (is_gimple_debug (stmt))
> + 	      continue;
> + 
> + 	    walk_stmt_info wi;
> + 	    memset (&wi, 0, sizeof (walk_stmt_info));
> + 	    wi.stmt = stmt;
> + 	    wi.gsi = gsi;
> + 	    wi.info = SA.values;
> + 	    walk_gimple_op (stmt, move_uses_r, &wi);
> + 	  }
> +       }
> + 
>     if (!flag_checking)
>       return;
>   
>     /* Search for PHIs where the destination has no partition, but one
>        or more arguments has a partition.  This should not happen and can
>        create incorrect code.  */
> *************** remove_ssa_form (bool perform_ter, struc
> *** 994,1004 ****
>   	dump_replaceable_exprs (dump_file, values);
>       }
>   
> -   rewrite_trees (map);
> - 
>     sa->map = map;
>     sa->values = values;
>     sa->partitions_for_parm_default_defs = get_parm_default_def_partitions (map);
>   }
>   
>   
> --- 1085,1095 ----
>   	dump_replaceable_exprs (dump_file, values);
>       }
>   
>     sa->map = map;
>     sa->values = values;
>     sa->partitions_for_parm_default_defs = get_parm_default_def_partitions (map);
> + 
> +   rewrite_trees (map, sa);
>   }
>   
>   
> 
> Index: gcc/testsuite/gcc.dg/torture/pr71632.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/torture/pr71632.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/torture/pr71632.c	(working copy)
> @@ -0,0 +1,52 @@
> +/* { dg-do compile } */
> +
> +void
> +foo (double d, double *p, double *q)
> +{
> +  int i;
> +  for (i = 0; i < 64; i++)
> +    {
> +      double t1 = d > p[0] ? 1.0 : 0.0;
> +      double t2 = t1 > p[1] ? 1.0 : 0.0;
> +      double t3 = t2 > p[2] ? 1.0 : 0.0;
> +      double t4 = t3 > p[3] ? 1.0 : 0.0;
> +      double t5 = t4 > p[4] ? 1.0 : 0.0;
> +      double t6 = t5 > p[5] ? 1.0 : 0.0;
> +      double t7 = t6 > p[6] ? 1.0 : 0.0;
> +      double t8 = t7 > p[7] ? 1.0 : 0.0;
> +      double t9 = t8 > p[8] ? 1.0 : 0.0;
> +      double t10 = t9 > p[9] ? 1.0 : 0.0;
> +      double t11 = t10 > p[10] ? 1.0 : 0.0;
> +      double t12 = t11 > p[11] ? 1.0 : 0.0;
> +      double t13 = t12 > p[12] ? 1.0 : 0.0;
> +      double t14 = t13 > p[13] ? 1.0 : 0.0;
> +      double t15 = t14 > p[14] ? 1.0 : 0.0;
> +      double t16 = t15 > p[15] ? 1.0 : 0.0;
> +      double t17 = t16 > p[16] ? 1.0 : 0.0;
> +      double t18 = t17 > p[17] ? 1.0 : 0.0;
> +      double t19 = t18 > p[18] ? 1.0 : 0.0;
> +      double t20 = t19 > p[19] ? 1.0 : 0.0;
> +      double t21 = t20 > p[20] ? 1.0 : 0.0;
> +      double t22 = t21 > p[21] ? 1.0 : 0.0;
> +      double t23 = t22 > p[22] ? 1.0 : 0.0;
> +      double t24 = t23 > p[23] ? 1.0 : 0.0;
> +      double t25 = t24 > p[24] ? 1.0 : 0.0;
> +      double t26 = t25 > p[25] ? 1.0 : 0.0;
> +      double t27 = t26 > p[26] ? 1.0 : 0.0;
> +      double t28 = t27 > p[27] ? 1.0 : 0.0;
> +      double t29 = t28 > p[28] ? 1.0 : 0.0;
> +      double t30 = t29 > p[29] ? 1.0 : 0.0;
> +      double t31 = t30 > p[30] ? 1.0 : 0.0;
> +      double t32 = t31 > p[31] ? 1.0 : 0.0;
> +      double t33 = t32 > p[32] ? 1.0 : 0.0;
> +      double t34 = t33 > p[33] ? 1.0 : 0.0;
> +      double t35 = t34 > p[34] ? 1.0 : 0.0;
> +      double t36 = t35 > p[35] ? 1.0 : 0.0;
> +      double t37 = t36 > p[36] ? 1.0 : 0.0;
> +      double t38 = t37 > p[37] ? 1.0 : 0.0;
> +      double t39 = t38 > p[38] ? 1.0 : 0.0;
> +      *q = t39;
> +      p += 39;
> +      q++;
> +    }
> +}
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]