This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[patch] Fix loop bound comparison in the vectorizer
- From: Ira Rosen <IRAR at il dot ibm dot com>
- To: gcc-patches at gnu dot org
- Cc: "Jagasia, Harsha" <harsha dot jagasia at amd dot com>
- Date: Tue, 11 Sep 2007 14:33:24 +0300
- Subject: [patch] Fix loop bound comparison in the vectorizer
This part of the x86 vect cost model patch
2007-09-10 Harsha Jagasia <harsha.jagasia@amd.com>
Jan Sjodin <jan.sjodin@amd.com>
* tree-vect-analyze.c (vect_analyze_operations): Change
comparison of loop iterations with threshold to less than
or equal to instead of less than. Reduce
min_scalar_loop_bound by one.
makes the threshold negative in the default case where
PARAM_MIN_VECT_LOOP_BOUND is 0.
min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
* vectorization_factor) - 1);
...
th = (unsigned) min_scalar_loop_bound;
...
and this makes the following condition always true (at least on
x86_64-linux):
if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
&& LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: vectorization not "
"profitable.");
...
return false;
}
and no loop can get vectorized now.
I suggest to change the default and the minimum value of
PARAM_MIN_VECT_LOOP_BOUND to 1. In addition to fixing the above problem, it
seems reasonable to vectorize loops with at least one iteration.
Bootstrapping with vectorization enabled and testing on x86_64-linux . O.K.
for mainline once the testing completes?
Thanks,
Ira
ChangeLog:
* params.def (PARAM_MIN_VECT_LOOP_BOUND): Change default and minimum
to 1.
Index: params.def
===================================================================
--- params.def (revision 128363)
+++ params.def (working copy)
@@ -148,7 +148,7 @@ DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS,
DEFPARAM (PARAM_MIN_VECT_LOOP_BOUND,
"min-vect-loop-bound",
"If -ftree-vectorize is used, the minimal loop bound of a loop to
be considered for vectorization",
- 0, 0, 0)
+ 1, 1, 0)
/* The maximum number of instructions to consider when looking for an
instruction to fill a delay slot. If more than this arbitrary