This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 10 Apr 2015 11:16:53 +0000
- Subject: [Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance
- Auto-submitted: auto-generated
- References: <bug-57796-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another use-case for gathers is that of strided loads where we do
for (j = 0; ; j += VF*stride)
tmp1 = array[j];
tmp2 = array[j + stride];
...
vectemp = {tmp1, tmp2, ...}
but could as well do
off = { 0, stride, ..., stride * N };
for (j = 0; ; j += VF*stride)
vectemp = gather (&array[j], off, -1);
still need a separate IV. Currently the cost of strided loads is
/* N scalar loads plus gathering them into a vector. */
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
inside_cost += record_stmt_cost (body_cost_vec,
ncopies * TYPE_VECTOR_SUBPARTS
(vectype),
scalar_load, stmt_info, 0, vect_body);
inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
stmt_info, 0, vect_body);
where a good(?) approximation for gather loads could be just omitting
the vec_construct cost? (well, a new target cost for gather would be
most appropriate I guess)