This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another use-case for gathers is that of strided loads where we do

           for (j = 0; ; j += VF*stride)
             tmp1 = array[j];
             tmp2 = array[j + stride];
             ...
             vectemp = {tmp1, tmp2, ...}

but could as well do

           off = { 0, stride, ..., stride * N };
           for (j = 0; ; j += VF*stride)
             vectemp = gather (&array[j], off, -1);

still need a separate IV.  Currently the cost of strided loads is

      /* N scalar loads plus gathering them into a vector.  */
      tree vectype = STMT_VINFO_VECTYPE (stmt_info);
      inside_cost += record_stmt_cost (body_cost_vec,
                                       ncopies * TYPE_VECTOR_SUBPARTS
(vectype),
                                       scalar_load, stmt_info, 0, vect_body);
      inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
                                       stmt_info, 0, vect_body);

where a good(?) approximation for gather loads could be just omitting
the vec_construct cost?  (well, a new target cost for gather would be
most appropriate I guess)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]