[PATCH 1v2/3][vect] Add main vectorized loop unrolling

Richard Biener rguenther@suse.de
Wed Nov 24 11:00:20 GMT 2021


On Wed, 24 Nov 2021, Andre Vieira (lists) wrote:

> 
> On 22/11/2021 12:39, Richard Biener wrote:
> > +  if (first_loop_vinfo->suggested_unroll_factor > 1)
> > +    {
> > +      if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> > +       {
> > +         if (dump_enabled_p ())
> > +           dump_printf_loc (MSG_NOTE, vect_location,
> > +                            "***** Re-trying analysis with first vector
> > mode"
> > +                            " %s for epilogue with partial vectors of"
> > +                            " unrolled first loop.\n",
> > +                            GET_MODE_NAME (vector_modes[0]));
> > +         mode_i = 0;
> >
> > and the later done check for bigger VF than main loop - why would
> > we re-start at 0 rather than at the old mode?  Maybe we want to
> > remember the iterator value we started at when arriving at the
> > main loop mode?  So if we analyzed successfully with mode_i == 2,
> > then sucessfully at mode_i == 4 which suggested an unroll of 2,
> > re-start at the mode_i we continued after the mode_i == 2
> > successful analysis?  To just consider the "simple" case of
> > AVX vs SSE it IMHO doesn't make much sense to succeed with
> > AVX V4DF, succeed with SSE V2DF and figure it's better than V4DF AVX
> > but get a suggestion of 2 times unroll and then re-try AVX V4DF
> > just to re-compute that yes, it's worse than SSE V2DF?  You
> > are probably thinking of SVE vs ADVSIMD here but do we need to
> > start at 0?  Adding a comment to the code would be nice.
> >
> > Thanks,
> 
> I was indeed thinking SVE vs Advanced SIMD where we end up having to compare
> different vectorization strategies, which will have different costs depending.
> The hypothetical case, as in I don't think I've come across one, is where if
> we decide to vectorize the main loop for V8QI and unroll 2x, yielding a VF of
> 16, we may then want to then use a predicated VNx16QI epilogue.

But this isn't the epilogue handling ...

> Though the
> question here is whether it is possible for an Advanced SIMD V8QI
> vectorization to beat V16QI but a SVE predicated VNx16QI to beat a VNx8QI for
> the same loop.  Might be good to get Sandiford's opinion on this.
> 
> I do think that initially I was more concerned with skipping a VNx8QI after
> selecting a V8QI but I just checked and Advanced SIMD modes are listed before
> SVE for (among others) this reason.
> 
> Regards,
> Andre
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


More information about the Gcc-patches mailing list