[PATCH 1v2/3][vect] Add main vectorized loop unrolling

Wed Nov 24 09:46:19 GMT 2021

On 22/11/2021 12:39, Richard Biener wrote:
> +  if (first_loop_vinfo->suggested_unroll_factor > 1)
> +    {
> +      if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Re-trying analysis with first vector
> mode"
> +                            " %s for epilogue with partial vectors of"
> +                            " unrolled first loop.\n",
> +                            GET_MODE_NAME (vector_modes[0]));
> +         mode_i = 0;
>
> and the later done check for bigger VF than main loop - why would
> we re-start at 0 rather than at the old mode?  Maybe we want to
> remember the iterator value we started at when arriving at the
> main loop mode?  So if we analyzed successfully with mode_i == 2,
> then sucessfully at mode_i == 4 which suggested an unroll of 2,
> re-start at the mode_i we continued after the mode_i == 2
> successful analysis?  To just consider the "simple" case of
> AVX vs SSE it IMHO doesn't make much sense to succeed with
> AVX V4DF, succeed with SSE V2DF and figure it's better than V4DF AVX
> but get a suggestion of 2 times unroll and then re-try AVX V4DF
> just to re-compute that yes, it's worse than SSE V2DF?  You
> are probably thinking of SVE vs ADVSIMD here but do we need to
> start at 0?  Adding a comment to the code would be nice.
>
> Thanks,

I was indeed thinking SVE vs Advanced SIMD where we end up having to 
compare different vectorization strategies, which will have different 
costs depending. The hypothetical case, as in I don't think I've come 
across one, is where if we decide to vectorize the main loop for V8QI 
and unroll 2x, yielding a VF of 16, we may then want to then use a 
predicated VNx16QI epilogue. Though the question here is whether it is 
possible for an Advanced SIMD V8QI vectorization to beat V16QI but a SVE 
predicated VNx16QI to beat a VNx8QI for the same loop.  Might be good to 
get Sandiford's opinion on this.

I do think that initially I was more concerned with skipping a VNx8QI 
after selecting a V8QI but I just checked and Advanced SIMD modes are 
listed before SVE for (among others) this reason.

Regards,
Andre