This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Fri, 10 Apr 2015 11:16:53 +0000
Subject: [Bug target/57796] AVX2 gather vectorization: code bloat and reduction of performance
Auto-submitted: auto-generated
References: <bug-57796-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another use-case for gathers is that of strided loads where we do

           for (j = 0; ; j += VF*stride)
             tmp1 = array[j];
             tmp2 = array[j + stride];
             ...
             vectemp = {tmp1, tmp2, ...}

but could as well do

           off = { 0, stride, ..., stride * N };
           for (j = 0; ; j += VF*stride)
             vectemp = gather (&array[j], off, -1);

still need a separate IV.  Currently the cost of strided loads is

      /* N scalar loads plus gathering them into a vector.  */
      tree vectype = STMT_VINFO_VECTYPE (stmt_info);
      inside_cost += record_stmt_cost (body_cost_vec,
                                       ncopies * TYPE_VECTOR_SUBPARTS
(vectype),
                                       scalar_load, stmt_info, 0, vect_body);
      inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
                                       stmt_info, 0, vect_body);

where a good(?) approximation for gather loads could be just omitting
the vec_construct cost?  (well, a new target cost for gather would be
most appropriate I guess)

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]