This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[patch] [4.3 projects] outer-loop vectorization patch 3/n

From: Dorit Nuzman <DORIT at il dot ibm dot com>
To: gcc-patches at gcc dot gnu dot org
Date: Sat, 11 Aug 2007 20:12:24 +0300
Subject: [patch] [4.3 projects] outer-loop vectorization patch 3/n

Hi,

This part is not specific to outer-loop vectorization, but may needed even
more when doing outer-loop vectorization. This back to the issue raised
here: http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00136.html. In cases
that the compiler can't prove that the loop-count is non-zero, it returns
"scev_not_known", and vectorization fails cause it thinks that the loop is
not countable. Instead, with this patch, we have the number-of-iterations
analysis return both the loop-bound expression and the maybe_zero
expression, and from them build a COND_EXPR that represents the loop count.
i.e. : "lb = may_be_zero ? zero : lb".
The one limitation is that currently the compiler does not let us generate
COND_EXPR as a rhs (at least until this patch is committed:
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02052.html), so we have to
prevent situations in which the vectorizer would actually generate stmts
using this expression (e.g. peeling the loop).

Bootstrapped on powerpc64-linux,
bootstrapped with vectorization enabled on i386-linux,
passed full regression testing on both platforms.

This part needs approval.

thanks,
dorit

ChangeLog:

        * tree-scalar-evolutions.c (number_of_latch_executions_1): New.
        Contains code that was factored out from
number_of_latch_executions.
        (number_of_latch_executions): Call number_of_latch_executions_1
        instead of code that was factored out.
        (number_of_exit_cond_executions): Takes additional argument
        may_be_zero. Call  number_of_latch_executions_1 instead of
        number_of_latch_executions.
        * tree-scalar-evolutions.h (number_of_exit_cond_executions): Takes
        additional argument.
        * tree-vect-analyze.c (vect_analyze_operations): Avoid
vectorization
        in case the loop-bound is a COND_EXPR and peeling needs to be done.
        (vect_get_loop_niters): Call number_of_exit_cond_executions with
        additional argument may_be_zero. Use it to build a (potentially)
        conditional expression for the loop niters.

testsuite/ChangeLog:

      * gcc.dg/vect/vect-outer-fir-lb.c: First loop now gets vectorized.

P.S.1 One example where this helps is in this case:

 for (k = 0; k < 4; k++) {
  for (i = 0; i < N; i++) {
    diff = 0;
    j = k;
    do {
      diff += in[j+i]*coeff[j];
      j+=4;
    } while (j < M);
    out[i] += diff;
  }
 }

(taken from vect-outer-fir-lb.c, as posted here:
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00742.html).

Without the patch we get:
"
Analyzing # of iterations of loop 4
  exit condition [k_45 + 4, + , 4](no_overflow) <= 63
  bounds on difference of bases: -2147483584 ... 2147483707
  result:
    zero if k_45 > 63
    # of iterations (63 - (unsigned int) k_45) /[fl] 4, bounded by
536870927
  (set_nb_iterations_in_loop = scev_not_known))
(get_loop_exit_condition
  if (j_19 <= 63))

vect-outer-fir-lb.c:33: note: not vectorized: number of iterations cannot
be computed.
"

With the patch, the loop gets vectorized.

Of-course, what would really be good is if range propoagation could help
the number-of-iterations analysis realize that since k<4 it holds that
k<127 and so the loop-bound is never zero.

P.S.2 When the above loop is written this way:

 for (k = 0; k < 4; k++) {
  for (i = 0; i < N; i++) {
    diff = 0;
    for (j = k; j < M; j+=4) {
      diff += in[j+i]*coeff[j];
    }
    out[i] += diff;
  }
 }

(i.e. the inner-loop is a for-loop instead of a do-while loop; taken from
vect-outer-fir.c, as posted here:
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00729.html), the compiler
returns the following expression:
"
Analyzing # of iterations of loop 4
  exit condition [k_45 + 4, + , 4](no_overflow) <= 127
  bounds on difference of bases: -4 ... 2147483771
  result:
    # of iterations (127 - (unsigned int) k_45) /[fl] 4, bounded by
536870943
  (set_nb_iterations_in_loop = (127 - (unsigned int) k_45) /[fl] 4))

vect-outer-fir.c:29: note: ==> get_loop_niters:(127 - (unsigned int) k_45)
/[fl] 4 + 1
"
...but although there isn't a maybe_zero part, the compiler still thinks
that the inner-loop may iterate 0 times, and so it generates a guard code
around it to skip the inner-loop in case it's loop-bound it 0 (which the
outer-loop vectorizer doesn't like...):

outer-loop-header:
      ...
      if (k_45 <= 127)
            then goto inner-loop
      else
            goto outer-loop tail
inner-loop:
      ...
outer-loop tail:
      ...


(See attached file: p4.txt)

Attachment: p4.txt
Description: Text document

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]