This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add vector cost model density heuristic


On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> On Fri, 8 Jun 2012, William J. Schmidt wrote:
> 
> > This patch adds a heuristic to the vectorizer when estimating the
> > minimum profitable number of iterations.  The heuristic is
> > target-dependent, and is currently disabled for all targets except
> > PowerPC.  However, the intent is to make it general enough to be useful
> > for other targets that want to opt in.
> > 
> > A previous patch addressed some PowerPC SPEC degradations by modifying
> > the vector cost model values for vec_perm and vec_promote_demote.  The
> > values were set a little higher than their natural values because the
> > natural values were not sufficient to prevent a poor vectorization
> > choice.  However, this is not the right long-term solution, since it can
> > unnecessarily constrain other vectorization choices involving permute
> > instructions.
> > 
> > Analysis of the badly vectorized loop (in sphinx3) showed that the
> > problem was overcommitment of vector resources -- too many vector
> > instructions issued without enough non-vector instructions available to
> > cover the delays.  The vector cost model assumes that instructions
> > always have a constant cost, and doesn't have a way of judging this kind
> > of "density" of vector instructions.
> > 
> > The present patch adds a heuristic to recognize when a loop is likely to
> > overcommit resources, and adds a small penalty to the inside-loop cost
> > to account for the expected stalls.  The heuristic is parameterized with
> > three target-specific values:
> > 
> >  * Density threshold: The heuristic will apply only when the
> >    percentage of inside-loop cost attributable to vectorized
> >    instructions exceeds this value.
> > 
> >  * Size threshold: The heuristic will apply only when the
> >    inside-loop cost exceeds this value.
> > 
> >  * Penalty: The inside-loop cost will be increased by this
> >    percentage value when the heuristic applies.
> > 
> > Thus only reasonably large loop bodies that are mostly vectorized
> > instructions will be affected.
> > 
> > By applying only a small percentage bump to the inside-loop cost, the
> > heuristic will only turn off vectorization for loops that were
> > considered "barely profitable" to begin with (such as the sphinx3 loop).
> > So the heuristic is quite conservative and should not affect the vast
> > majority of vectorization decisions.
> > 
> > Together with the new heuristic, this patch reduces the vec_perm and
> > vec_promote_demote costs for PowerPC to their natural values.
> > 
> > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
> > and verified that no performance regressions occur on SPEC cpu2006.  Is
> > this ok for trunk?
> 
> Hmm.  I don't like this patch or its general idea too much.  Instead
> I'd like us to move more of the cost model detail to the target, giving
> it a chance to look at the whole loop before deciding on a cost.  ISTR
> posting the overall idea at some point, but let me repeat it here instead
> of trying to find that e-mail.
> 
> The basic interface of the cost model should be, in targetm.vectorize
> 
>   /* Tell the target to start cost analysis of a loop or a basic-block
>      (if the loop argument is NULL).  Returns an opaque pointer to
>      target-private data.  */
>   void *init_cost (struct loop *loop);
> 
>   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
>   void add_stmt_cost (void *data, unsigned n,
> 		      vectorized-stmt-kind,
>                       enum machine_mode vector_mode);
> 
>   /* Tell the target to compute and return the cost of the accumulated
>      statements and free any target-private data.  */
>   unsigned finish_cost (void *data);
> 
> with eventually slightly different signatures for add_stmt_cost
> (like pass in the original scalar stmt?).
> 
> It allows the target, at finish_cost time, to evaluate things like
> register pressure and resource utilization.

OK, I'm trying to understand how you would want this built into the
present structure.  Taking just the loop case for now:

Judging by your suggested API, we would have to call add_stmt_cost ()
everywhere that we now call stmt_vinfo_set_inside_of_loop_cost ().  For
now this would be an additional call, not a replacement, though maybe
the other goes away eventually.  This allows the target to save more
data about the vectorized instructions than just an accumulated cost
number (order and quantity of various kinds of instructions can be
maintained for better modeling).  Presumably the call to finish_cost
would be done within vect_estimate_min_profitable_iters () to produce
the final value of inside_cost for the loop.

The default target hook for add_stmt_cost would duplicate what we
currently do for calculating the inside_cost of a statement, and the
default target hook for finish_cost would just return the sum.

I'll have to go hunting where the similar code would fit for SLP in a
basic block.

If I read you correctly, you don't object to a density heuristic such as
the one I implemented here, but you want to see such heuristics confined
to a specific target rather than parameterized for all targets.

How'm I doing?  I want to be sure we're on the same page before I delve
into this.

Thanks,
Bill

> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Bill

<patch snipped>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]