This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/63271] New: Should commute arithmetic with vector load


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271

            Bug ID: 63271
           Summary: Should commute arithmetic with vector load
           Product: gcc
           Version: 4.9.1
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zackw at panix dot com

Consider

    #include <emmintrin.h>

    __m128i foo(char C)
    {
      return _mm_set_epi8(   0,    C,  2*C,  3*C,
                           4*C,  5*C,  6*C,  7*C,
                           8*C,  9*C, 10*C, 11*C,
                          12*C, 13*C, 14*C, 15*C);
    }

    __m128i bar(char C)
    {
      __m128i v = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7,
                               8, 9,10,11,12,13,14,15);
      v *= C;
      return v;
    }

I *believe* these functions compute the same value, and should therefore
generate identical code, but with gcc 4.9 foo() generates considerably larger
and slower code.

The test case is expressed in terms of x86 <emmintrin.h> but I have no reason
to believe it isn't a generic missed optimization in the tree-level vectorizer.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]