This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/52056] Code optimization sensitive to trivial changes


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52056

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-01-30 23:16:03 UTC ---
The signed vs. unsigned long right shift is quite significant, because Intel
chips don't support signed quadword right shifts, only unsigned quadword right
shifts (and left shifts), except that AMD chips with -mxop do support that.
So, with the unsigned long right shift the loop is vectorized, while with
signed long right shift it is not, and clearly in this case the vectorization
(at least two elements at a time) isn't beneficial, but the cost model doesn't
figure that out.  So the faster times are without vectorization, you can get
the same speed with -O3 -fno-tree-vectorize even with the unsigned shift.
Even AVX can't process more than two elements at a time, only AVX2 will be
able, how fast is that loop on AVX2 capable chips compared to non-vectorized
remains to be seen.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]