This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, This part is not specific to outer-loop vectorization, but may needed even more when doing outer-loop vectorization. This back to the issue raised here: http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00136.html. In cases that the compiler can't prove that the loop-count is non-zero, it returns "scev_not_known", and vectorization fails cause it thinks that the loop is not countable. Instead, with this patch, we have the number-of-iterations analysis return both the loop-bound expression and the maybe_zero expression, and from them build a COND_EXPR that represents the loop count. i.e. : "lb = may_be_zero ? zero : lb". The one limitation is that currently the compiler does not let us generate COND_EXPR as a rhs (at least until this patch is committed: http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02052.html), so we have to prevent situations in which the vectorizer would actually generate stmts using this expression (e.g. peeling the loop). Bootstrapped on powerpc64-linux, bootstrapped with vectorization enabled on i386-linux, passed full regression testing on both platforms. This part needs approval. thanks, dorit ChangeLog: * tree-scalar-evolutions.c (number_of_latch_executions_1): New. Contains code that was factored out from number_of_latch_executions. (number_of_latch_executions): Call number_of_latch_executions_1 instead of code that was factored out. (number_of_exit_cond_executions): Takes additional argument may_be_zero. Call number_of_latch_executions_1 instead of number_of_latch_executions. * tree-scalar-evolutions.h (number_of_exit_cond_executions): Takes additional argument. * tree-vect-analyze.c (vect_analyze_operations): Avoid vectorization in case the loop-bound is a COND_EXPR and peeling needs to be done. (vect_get_loop_niters): Call number_of_exit_cond_executions with additional argument may_be_zero. Use it to build a (potentially) conditional expression for the loop niters. testsuite/ChangeLog: * gcc.dg/vect/vect-outer-fir-lb.c: First loop now gets vectorized. P.S.1 One example where this helps is in this case: for (k = 0; k < 4; k++) { for (i = 0; i < N; i++) { diff = 0; j = k; do { diff += in[j+i]*coeff[j]; j+=4; } while (j < M); out[i] += diff; } } (taken from vect-outer-fir-lb.c, as posted here: http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00742.html). Without the patch we get: " Analyzing # of iterations of loop 4 exit condition [k_45 + 4, + , 4](no_overflow) <= 63 bounds on difference of bases: -2147483584 ... 2147483707 result: zero if k_45 > 63 # of iterations (63 - (unsigned int) k_45) /[fl] 4, bounded by 536870927 (set_nb_iterations_in_loop = scev_not_known)) (get_loop_exit_condition if (j_19 <= 63)) vect-outer-fir-lb.c:33: note: not vectorized: number of iterations cannot be computed. " With the patch, the loop gets vectorized. Of-course, what would really be good is if range propoagation could help the number-of-iterations analysis realize that since k<4 it holds that k<127 and so the loop-bound is never zero. P.S.2 When the above loop is written this way: for (k = 0; k < 4; k++) { for (i = 0; i < N; i++) { diff = 0; for (j = k; j < M; j+=4) { diff += in[j+i]*coeff[j]; } out[i] += diff; } } (i.e. the inner-loop is a for-loop instead of a do-while loop; taken from vect-outer-fir.c, as posted here: http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00729.html), the compiler returns the following expression: " Analyzing # of iterations of loop 4 exit condition [k_45 + 4, + , 4](no_overflow) <= 127 bounds on difference of bases: -4 ... 2147483771 result: # of iterations (127 - (unsigned int) k_45) /[fl] 4, bounded by 536870943 (set_nb_iterations_in_loop = (127 - (unsigned int) k_45) /[fl] 4)) vect-outer-fir.c:29: note: ==> get_loop_niters:(127 - (unsigned int) k_45) /[fl] 4 + 1 " ...but although there isn't a maybe_zero part, the compiler still thinks that the inner-loop may iterate 0 times, and so it generates a guard code around it to skip the inner-loop in case it's loop-bound it 0 (which the outer-loop vectorizer doesn't like...): outer-loop-header: ... if (k_45 <= 127) then goto inner-loop else goto outer-loop tail inner-loop: ... outer-loop tail: ... (See attached file: p4.txt)
Attachment:
p4.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |