This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR60505


On Thu, Mar 13, 2014 at 2:27 AM, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 12 Mar 2014, Cong Hou wrote:
>
>> Thank you for pointing it out. I didn't realized that alias analysis
>> has influences on this issue.
>>
>> The current problem is that the epilogue may be unnecessary if the
>> loop bound cannot be larger than the number of iterations of the
>> vectorized loop multiplied by VF when the vectorized loop is supposed
>> to be executed. My method is incorrect because I assume the vectorized
>> loop will be executed which is actually guaranteed by loop bound check
>> (and also alias checks). So if the alias checks exist, my method is
>> fine as both conditions are met.
>
> But there is still the loop bound check which, if it fails, uses
> the epilogue loop as fallback, not the scalar versioned loop.

The loop bound check is already performed together with alias checks
(assume we need alias checks). Actually, I did observe that the loop
bound check in the true body of alias checks may be unnecessary. For
example, for the following loop

  for(i=0; i < num ; ++i)
    out[i] = (ovec[i] = in[i]);

GCC now generates the following GIMPLE code after vectorization:



  <bb 3>: // loop bound check (with cost model) and alias checks
  _29 = (unsigned int) num_5(D);
  _28 = _29 > 15;
  _24 = in_9(D) + 16;
  _23 = out_7(D) >= _24;
  _2 = out_7(D) + 16;
  _1 = _2 <= in_9(D);
  _32 = _1 | _23;
  _31 = _28 & _32;
  if (_31 != 0)
    goto <bb 4>;
  else
    goto <bb 12>;

  <bb 4>:
  niters.3_44 = (unsigned int) num_5(D);
  _46 = niters.3_44 + 4294967280;
  _47 = _46 >> 4;
  bnd.4_45 = _47 + 1;
  ratio_mult_vf.5_48 = bnd.4_45 << 4;
  _59 = (unsigned int) num_5(D);
  _60 = _59 + 4294967295;
  if (_60 <= 14)                      <================ is this necessary?
    goto <bb 10>;
  else
    goto <bb 5>;


The check _60<=14 should be unnecessary because it is implied by the
fact _29 > 15 in bb3.

Consider this fact and if there are alias checks, we can safely remove
the epilogue if the maximum trip count of the loop is less than or
equal to the calculated threshold.


Cong



>
>> If there is no alias checks, I must
>> consider the possibility that the vectorized loop may not be executed
>> at runtime and then the epilogue should not be eliminated. The warning
>> appears on epilogue, and with loop bound checks (and without alias
>> checks) the warning will be gone. So I think the key is alias checks:
>> my method only works if there is no alias checks.
>>
>> How about adding one more condition that checks if alias checks are
>> needed, as the code shown below?
>>
>>   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
>>   || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
>>       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
>>       && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
>>                    || (unsigned HOST_WIDE_INT)max_stmt_executions_int
>>        (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
>>     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
>>
>>
>> thanks,
>> Cong
>>
>>
>> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
>> >> This patch is fixing PR60505 in which the vectorizer may produce
>> >> unnecessary epilogues.
>> >>
>> >> Bootstrapped and tested on a x86_64 machine.
>> >>
>> >> OK for trunk?
>> >
>> > That looks wrong.  Consider the case where the loop isn't versioned,
>> > if you disable generation of the epilogue loop, you end up only with
>> > a vector loop.
>> >
>> > Say:
>> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
>> > void
>> > foo (char *__restrict in, char *__restrict out, int num)
>> > {
>> >   int i;
>> >
>> >   in = __builtin_assume_aligned (in, 16);
>> >   out = __builtin_assume_aligned (out, 16);
>> >   for (i = 0; i < num; ++i)
>> >     out[i] = (ovec[i] = in[i]);
>> >   out[num] = ovec[num / 2];
>> > }
>> > -O2 -ftree-vectorize.  Now, consider if this function is called
>> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
>> > valid and your patch will cause a wrong-code in this case).
>> >
>> >         Jakub
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]