[PATCH 1v2/3][vect] Add main vectorized loop unrolling

Andre Vieira (lists) andre.simoesdiasvieira@arm.com
Thu Nov 25 10:40:14 GMT 2021


On 24/11/2021 11:00, Richard Biener wrote:
> On Wed, 24 Nov 2021, Andre Vieira (lists) wrote:
>
>> On 22/11/2021 12:39, Richard Biener wrote:
>>> +  if (first_loop_vinfo->suggested_unroll_factor > 1)
>>> +    {
>>> +      if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
>>> +       {
>>> +         if (dump_enabled_p ())
>>> +           dump_printf_loc (MSG_NOTE, vect_location,
>>> +                            "***** Re-trying analysis with first vector
>>> mode"
>>> +                            " %s for epilogue with partial vectors of"
>>> +                            " unrolled first loop.\n",
>>> +                            GET_MODE_NAME (vector_modes[0]));
>>> +         mode_i = 0;
>>>
>>> and the later done check for bigger VF than main loop - why would
>>> we re-start at 0 rather than at the old mode?  Maybe we want to
>>> remember the iterator value we started at when arriving at the
>>> main loop mode?  So if we analyzed successfully with mode_i == 2,
>>> then sucessfully at mode_i == 4 which suggested an unroll of 2,
>>> re-start at the mode_i we continued after the mode_i == 2
>>> successful analysis?  To just consider the "simple" case of
>>> AVX vs SSE it IMHO doesn't make much sense to succeed with
>>> AVX V4DF, succeed with SSE V2DF and figure it's better than V4DF AVX
>>> but get a suggestion of 2 times unroll and then re-try AVX V4DF
>>> just to re-compute that yes, it's worse than SSE V2DF?  You
>>> are probably thinking of SVE vs ADVSIMD here but do we need to
>>> start at 0?  Adding a comment to the code would be nice.
>>>
>>> Thanks,
>> I was indeed thinking SVE vs Advanced SIMD where we end up having to compare
>> different vectorization strategies, which will have different costs depending.
>> The hypothetical case, as in I don't think I've come across one, is where if
>> we decide to vectorize the main loop for V8QI and unroll 2x, yielding a VF of
>> 16, we may then want to then use a predicated VNx16QI epilogue.
> But this isn't the epilogue handling ...
Am I misunderstanding the code here? To me it looks like this is picking 
what mode_i to start the 'while (1)' loop does the loop analysis for the 
epilogues?


More information about the Gcc-patches mailing list