This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Make vectorizer to skip loops with small iteration estimate

From: Richard Guenther <rguenther at suse dot de>
To: Jan Hubicka <hubicka at ucw dot cz>
Cc: gcc-patches at gcc dot gnu dot org
Date: Thu, 4 Oct 2012 18:16:23 +0200 (CEST)
Subject: Re: [RFC] Make vectorizer to skip loops with small iteration estimate
References: <20120930111610.GA7097@kam.mff.cuni.cz> <alpine.LNX.2.00.1210011409560.4063@zhemvz.fhfr.qr> <20121001175757.GB11298@kam.mff.cuni.cz> <alpine.LNX.2.00.1210021055170.4063@zhemvz.fhfr.qr> <20121004143910.GB7301@atrey.karlin.mff.cuni.cz>

On Thu, 4 Oct 2012, Jan Hubicka wrote:

> > > So SOC cancels out in the runtime check.
> > > I still think we need two formulas - one determining if vectorization is
> > > profitable, other specifying the threshold for scalar path at runtime (that
> > > will generally give lower values).
> > 
> > True, we want two values.  But part of the scalar path right now
> > is all the computation required for alias and alignment runtime checks
> > (because the way all the conditions are combined).
> > 
> > I'm not much into the details of what we account for in SOC (I suppose
> > it's everything we insert in the preheader of the vector loop).
> 
> Yes, it seems contain everything we insert prior the loop in unfolded form.
> > 
> > +      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> > +        fprintf (vect_dump, "not vectorized: estimated iteration count 
> > too small.");
> > +      if (vect_print_dump_info (REPORT_DETAILS))
> > +        fprintf (vect_dump, "not vectorized: estimated iteration count 
> > smaller than "
> > +                 "user specified loop bound parameter or minimum "
> > +                 "profitable iterations (whichever is more 
> > conservative).");
> > 
> > this won't work anymore btw - dumping infrastructure changed.
> 
> Ah, will update that.
> > 
> > I suppose your patch is a step in the right direction, but to really
> > make progress we need to re-organize the loop and predicate structure
> > produced by the vectorizer.
> 
> This reminds me what I did for string functions on x86. It gets very hard
> to get all the paths right when one starts to be really cureful to not
> output too much cruft on the short paths + do not consume too many registers.
> 
> In fact I want to re-think this for the SSE string ops patch, so I may try to
> look into that incrementally.
> > 
> > So, please update your patch, re-test and then it's ok.
> 
> Thanks.
> > > I tested enabling loop_ch in early passes with -fprofile-feedback and it is SPEC
> > > neutral.  Given that it improves loop count estimates, I would still like mainline
> > > doing that.  I do not like these quite important estimates to be wrong most of time.
> > 
> > I agree.  It also helps getting rid of once rolling loops I think.
> 
> I am attaching the patch for early-ch.  Will commit it tomorrow.
> 
> Concerning jump threading, it would help to make some of it during early passes
> so the profile estiamte do not get invalided.  I tried to move VRP early but now it
> makes compiler to hang during bootstrap.  I will debug that.
> > 
> > > > 
> > > > Btw, I added a "similar" check in vect_analyze_loop_operations:
> > > > 
> > > >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >        && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
> > > >       || ((max_niter = max_stmt_executions_int (loop)) != -1
> > > >           && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
> > > >     {
> > > >       if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > > >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > >                          "not vectorized: iteration count too small.");
> > > >       if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > > >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > >                          "not vectorized: iteration count smaller than "
> > > >                          "vectorization factor.");
> > > >       return false;
> > > >     }
> > > > 
> > > > maybe you simply need to update that to also consider the profile?
> > > 
> > > Hmm, I am still getting familiar wth the code. Later we later have
> > >   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >       && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
> > >     {
> > >       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> > >         fprintf (vect_dump, "not vectorized: vectorization not "
> > >                  "profitable.");
> > >       if (vect_print_dump_info (REPORT_DETAILS))
> > >         fprintf (vect_dump, "not vectorized: iteration count smaller than "
> > >                  "user specified loop bound parameter or minimum "
> > >                  "profitable iterations (whichever is more conservative).");
> > >       return false;
> > >     }
> > > 
> > > where th is always greater or equal than vectorization_factor from the cost model.
> > > So this test seems redundant if the max_stmt_executions_int was pushed down
> > > to the second conditoinal?
> > 
> > Yes, sort of.  The new check was supposed to be crystal clear, and
> > even with the cost model disabled we want to not vectorize in this
> > case.  But yes, the whole cost-model stuff needs TLC.
> 
> Ah yes, without cost model we would skip it.  I suppose we do not need to
> brother  witht he profile estiamte in the case anyway. They are kind of aprt of
> the cost models.
> 
> 	* passes.c (init_optimization_passes): Schedule early CH.
> 	* tree-pass.h (pass_early_ch): Declare it.
> 	* tree-ssa-loop-ch.c (gate_early_ch): New function.
> 	(pass_early_ch): New pass.

Instead of an extra ch pass please move the existing one.  And
benchmark that, of course.

Richard.

> Index: passes.c
> ===================================================================
> --- passes.c	(revision 191852)
> +++ passes.c	(working copy)
> @@ -1335,6 +1336,7 @@ init_optimization_passes (void)
>            NEXT_PASS (pass_cleanup_eh);
>            NEXT_PASS (pass_profile);
>            NEXT_PASS (pass_local_pure_const);
> +          NEXT_PASS (pass_early_ch);
>  	  /* Split functions creates parts that are not run through
>  	     early optimizations again.  It is thus good idea to do this
>  	     late.  */
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h	(revision 191852)
> +++ tree-pass.h	(working copy)
> @@ -291,6 +291,7 @@ extern struct gimple_opt_pass pass_loop_
>  extern struct gimple_opt_pass pass_iv_optimize;
>  extern struct gimple_opt_pass pass_tree_loop_done;
>  extern struct gimple_opt_pass pass_ch;
> +extern struct gimple_opt_pass pass_early_ch;
>  extern struct gimple_opt_pass pass_ccp;
>  extern struct gimple_opt_pass pass_phi_only_cprop;
>  extern struct gimple_opt_pass pass_build_ssa;
> Index: tree-ssa-loop-ch.c
> ===================================================================
> --- tree-ssa-loop-ch.c	(revision 191852)
> +++ tree-ssa-loop-ch.c	(working copy)
> @@ -275,3 +275,34 @@ struct gimple_opt_pass pass_ch =
>      | TODO_verify_flow			/* todo_flags_finish */
>   }
>  };
> +
> +/* We duplicate loop headers early to get better profile feedback:
> +   the first iteration of loop is special, because the duplicated loop header
> +   test will usually pass.  */
> +
> +static bool
> +gate_early_ch (void)
> +{
> +  return flag_tree_ch != 0 && (flag_branch_probabilities || profile_arc_flag);
> +}
> +
> +struct gimple_opt_pass pass_early_ch =
> +{
> + {
> +  GIMPLE_PASS,
> +  "early_ch",				/* name */
> +  gate_early_ch,			/* gate */
> +  copy_loop_headers,			/* execute */
> +  NULL,					/* sub */
> +  NULL,					/* next */
> +  0,					/* static_pass_number */
> +  TV_TREE_CH,				/* tv_id */
> +  PROP_cfg | PROP_ssa,			/* properties_required */
> +  0,					/* properties_provided */
> +  0,					/* properties_destroyed */
> +  0,					/* todo_flags_start */
> +  TODO_cleanup_cfg
> +    | TODO_verify_ssa
> +    | TODO_verify_flow			/* todo_flags_finish */
> + }
> +};
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Follow-Ups:
- Re: [RFC] Make vectorizer to skip loops with small iteration estimate
  - From: Jan Hubicka
- Re: [RFC] Make vectorizer to skip loops with small iteration estimate
  - From: Jan Hubicka

References:
- Re: [RFC] Make vectorizer to skip loops with small iteration estimate
  - From: Richard Guenther
- Re: [RFC] Make vectorizer to skip loops with small iteration estimate
  - From: Jan Hubicka
- Re: [RFC] Make vectorizer to skip loops with small iteration estimate
  - From: Richard Guenther
- Re: [RFC] Make vectorizer to skip loops with small iteration estimate
  - From: Jan Hubicka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]