[PATCH 1v2/3][vect] Add main vectorized loop unrolling

Thu Oct 21 12:14:39 GMT 2021

On Wed, 20 Oct 2021, Andre Vieira (lists) wrote:

> On 15/10/2021 09:48, Richard Biener wrote:
> > On Tue, 12 Oct 2021, Andre Vieira (lists) wrote:
> >
> >> Hi Richi,
> >>
> >> I think this is what you meant, I now hide all the unrolling cost
> >> calculations
> >> in the existing target hooks for costs. I did need to adjust 'finish_cost'
> >> to
> >> take the loop_vinfo so the target's implementations are able to set the
> >> newly
> >> renamed 'suggested_unroll_factor'.
> >>
> >> Also added the checks for the epilogue's VF.
> >>
> >> Is this more like what you had in mind?
> > Not exactly (sorry..).  For the target hook I think we don't want to
> > pass vec_info but instead another output parameter like the existing
> > ones.
> >
> > vect_estimate_min_profitable_iters should then via
> > vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll
> > suggestion to vect_analyze_loop which should then, if the suggestion
> > was > 1, instead of iterating to the next vector mode run again
> > with a fixed VF (old VF times suggested unroll factor - there's
> > min_vf in vect_analyze_loop_2 which we should adjust to
> > the old VF times two for example and maybe store the suggested
> > factor as hint) - if it succeeds the result will end up in the
> > list of considered modes (where we now may have more than one
> > entry for the same mode but a different VF), we probably want to
> > only consider more unrolling once.
> >
> > For simplicity I'd probably set min_vf = max_vf = old VF * suggested
> > factor, thus take the targets request literally.
> >
> > Richard.
> 
> Hi,
> 
> I now pass an output parameter to finish_costs and route it through the
> various calls up to vect_analyze_loop.  I tried to rework
> vect_determine_vectorization_factor and noticed that merely setting min_vf and
> max_vf is not enough, we only use these to check whether the vectorization
> factor is within range, well actually we only use max_vf at that stage. We
> only seem to use 'min_vf' to make sure the data_references are valid.  I am
> not sure my changes are the most appropriate here, for instance I am pretty
> sure the checks for max and min vf I added in
> vect_determine_vectorization_factor are currently superfluous as they will
> pass by design, but thought they might be good future proofing?

Ah, ok - well, max_vf is determined by dependence analysis so we do
have to check that.  I think that

+         && known_le (unrolled_vf, min_vf))

is superfluous.  So if we use min_vf as passed to vect_analyze_loop_2
to carry the suggested unroll factor then the changes look reasonable.

What's

   if (max_vf != MAX_VECTORIZATION_FACTOR
-      && maybe_lt (max_vf, min_vf))
+      && loop_vinfo->suggested_unroll_factor == 1
+      && max_vf < estimated_poly_value (min_vf, POLY_VALUE_MAX))
     return opt_result::failure_at (vect_location, "bad data 
dependence.\n");
   LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) = max_vf;

supposed to be though?  Why can we allow the unroll case to
get past this point?  I suppose initializing max_vf to
MAX_VECTORIZATION_FACTOR doesn't play well with poly-ints?
That is, how does estimated_poly_value (min_vf, POLY_VALUE_MAX)
differ here?  I would have expected that we can drop the
max_vf != MAX_VECTORIZATION_FACTOR part but otherwise leave
the test the same?

Note you adjust the vectorization factor in 
vect_determine_vectorization_factor but I think you have to delay
that until after vect_update_vf_for_slp which means doing it
before the start_over: label.

@@ -3038,6 +3203,18 @@ vect_analyze_loop (class loop *loop, 
vec_info_shared *shared)

       if (res)
        {
+         /* Only try unrolling main loops.  */
+         if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+           {
+             opt_loop_vec_info unrolled_vinfo =
+               vect_try_unrolling (loop_vinfo, &n_stmts,
+                                   suggested_unroll_factor);
+             if (unrolled_vinfo)
+               loop_vinfo = unrolled_vinfo;
+             /* Reset suggested_unroll_factor for next loop_vinfo.  */
+             suggested_unroll_factor = 1;
+           }

so it looks like this eventually will leak 'loop_vinfo' but it
also looks like you throw away the not unrolled analysis and costing.
I was suggesting to simply add the unrolled variant to the
alternatives considered - the flow of vect_analyze_loop is already
quite complicated and the changes following this hunk do not help :/
(and I'm not sure what vect_reanalyze_as_main_loop is supposed to do
or why we need to consider unrolling there as well).  If we
don't want to globally consider the unrolled variant maybe
we can at least decide between the unrolled and not unrolled
variant in vect_try_unrolling?  But of course only the whole
series, (unrolled) main + vectorized epilogues, determine the
final cost.

That said, the overall flow is OK now, some details about the
max_vf check and where to compute the unrolled VF needs to be
fleshed out.  And then there's the main analysis loop which,
frankly, is a mess right now, even before your patch :/

I'm thinking of rewriting the analysis loop in vect_analyze_loop
to use a worklist initially seeded by the vector_modes[] but
that we can push things like as-main-loop, unrolled and
epilogue analysis to.  Maybe have the worklist specify
pairs of mode and kind or tuples of mode, min-VF and kind where
'kind' is as-main/epilogue/unroll (though maybe 'kind' is
redundant there).  Possibly as preparatory step.

> Also I changed how we compare against max_vf, rather than relying on the
> 'MAX_VECTORIZATION' I decided to use the estimated_poly_value with
> POLY_VALUE_MAX, to be able to bound it further in case we have knowledge of
> the VL. I am not entirely about the validity of this change, maybe we are
> better off keeping the MAX_VECTORIZATION in place and not making any changes
> to max_vf for unrolling.

Yeah, I guess that would simplify things.  The only bit we really
want to make sure is that we don't re-analyze with exactly the same
VF, so

+  if (vect_analyze_loop_2 (unrolled_vinfo, unrolling_fatal, n_stmts,
+                          &suggested_unroll_factor,
+                          unrolled_vf)
+      && known_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+                  LOOP_VINFO_VECT_FACTOR (unrolled_vinfo)))

the VF check here should be always true.  I was initially considering
to take a target hit of unrolling by 4 as to also allow two times
unrolling if max_vf constrains us this way but of course never unroll
more than 4 times.  But we can play with adjustments here later
(also considering #pragma unroll, etc.)

Richard, do you have any ideas on how to improve the iteration
mess we have?

Richard.