This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Make vectorizer to skip loops with small iteration estimate


> > So SOC cancels out in the runtime check.
> > I still think we need two formulas - one determining if vectorization is
> > profitable, other specifying the threshold for scalar path at runtime (that
> > will generally give lower values).
> 
> True, we want two values.  But part of the scalar path right now
> is all the computation required for alias and alignment runtime checks
> (because the way all the conditions are combined).
> 
> I'm not much into the details of what we account for in SOC (I suppose
> it's everything we insert in the preheader of the vector loop).

Yes, it seems contain everything we insert prior the loop in unfolded form.
> 
> +      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> +        fprintf (vect_dump, "not vectorized: estimated iteration count 
> too small.");
> +      if (vect_print_dump_info (REPORT_DETAILS))
> +        fprintf (vect_dump, "not vectorized: estimated iteration count 
> smaller than "
> +                 "user specified loop bound parameter or minimum "
> +                 "profitable iterations (whichever is more 
> conservative).");
> 
> this won't work anymore btw - dumping infrastructure changed.

Ah, will update that.
> 
> I suppose your patch is a step in the right direction, but to really
> make progress we need to re-organize the loop and predicate structure
> produced by the vectorizer.

This reminds me what I did for string functions on x86. It gets very hard
to get all the paths right when one starts to be really cureful to not
output too much cruft on the short paths + do not consume too many registers.

In fact I want to re-think this for the SSE string ops patch, so I may try to
look into that incrementally.
> 
> So, please update your patch, re-test and then it's ok.

Thanks.
> > I tested enabling loop_ch in early passes with -fprofile-feedback and it is SPEC
> > neutral.  Given that it improves loop count estimates, I would still like mainline
> > doing that.  I do not like these quite important estimates to be wrong most of time.
> 
> I agree.  It also helps getting rid of once rolling loops I think.

I am attaching the patch for early-ch.  Will commit it tomorrow.

Concerning jump threading, it would help to make some of it during early passes
so the profile estiamte do not get invalided.  I tried to move VRP early but now it
makes compiler to hang during bootstrap.  I will debug that.
> 
> > > 
> > > Btw, I added a "similar" check in vect_analyze_loop_operations:
> > > 
> > >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >        && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
> > >       || ((max_niter = max_stmt_executions_int (loop)) != -1
> > >           && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
> > >     {
> > >       if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >                          "not vectorized: iteration count too small.");
> > >       if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >                          "not vectorized: iteration count smaller than "
> > >                          "vectorization factor.");
> > >       return false;
> > >     }
> > > 
> > > maybe you simply need to update that to also consider the profile?
> > 
> > Hmm, I am still getting familiar wth the code. Later we later have
> >   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >       && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
> >     {
> >       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> >         fprintf (vect_dump, "not vectorized: vectorization not "
> >                  "profitable.");
> >       if (vect_print_dump_info (REPORT_DETAILS))
> >         fprintf (vect_dump, "not vectorized: iteration count smaller than "
> >                  "user specified loop bound parameter or minimum "
> >                  "profitable iterations (whichever is more conservative).");
> >       return false;
> >     }
> > 
> > where th is always greater or equal than vectorization_factor from the cost model.
> > So this test seems redundant if the max_stmt_executions_int was pushed down
> > to the second conditoinal?
> 
> Yes, sort of.  The new check was supposed to be crystal clear, and
> even with the cost model disabled we want to not vectorize in this
> case.  But yes, the whole cost-model stuff needs TLC.

Ah yes, without cost model we would skip it.  I suppose we do not need to
brother  witht he profile estiamte in the case anyway. They are kind of aprt of
the cost models.

	* passes.c (init_optimization_passes): Schedule early CH.
	* tree-pass.h (pass_early_ch): Declare it.
	* tree-ssa-loop-ch.c (gate_early_ch): New function.
	(pass_early_ch): New pass.
Index: passes.c
===================================================================
--- passes.c	(revision 191852)
+++ passes.c	(working copy)
@@ -1335,6 +1336,7 @@ init_optimization_passes (void)
           NEXT_PASS (pass_cleanup_eh);
           NEXT_PASS (pass_profile);
           NEXT_PASS (pass_local_pure_const);
+          NEXT_PASS (pass_early_ch);
 	  /* Split functions creates parts that are not run through
 	     early optimizations again.  It is thus good idea to do this
 	     late.  */
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 191852)
+++ tree-pass.h	(working copy)
@@ -291,6 +291,7 @@ extern struct gimple_opt_pass pass_loop_
 extern struct gimple_opt_pass pass_iv_optimize;
 extern struct gimple_opt_pass pass_tree_loop_done;
 extern struct gimple_opt_pass pass_ch;
+extern struct gimple_opt_pass pass_early_ch;
 extern struct gimple_opt_pass pass_ccp;
 extern struct gimple_opt_pass pass_phi_only_cprop;
 extern struct gimple_opt_pass pass_build_ssa;
Index: tree-ssa-loop-ch.c
===================================================================
--- tree-ssa-loop-ch.c	(revision 191852)
+++ tree-ssa-loop-ch.c	(working copy)
@@ -275,3 +275,34 @@ struct gimple_opt_pass pass_ch =
     | TODO_verify_flow			/* todo_flags_finish */
  }
 };
+
+/* We duplicate loop headers early to get better profile feedback:
+   the first iteration of loop is special, because the duplicated loop header
+   test will usually pass.  */
+
+static bool
+gate_early_ch (void)
+{
+  return flag_tree_ch != 0 && (flag_branch_probabilities || profile_arc_flag);
+}
+
+struct gimple_opt_pass pass_early_ch =
+{
+ {
+  GIMPLE_PASS,
+  "early_ch",				/* name */
+  gate_early_ch,			/* gate */
+  copy_loop_headers,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_CH,				/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_cleanup_cfg
+    | TODO_verify_ssa
+    | TODO_verify_flow			/* todo_flags_finish */
+ }
+};


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]