This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/51492] New: vectorizer generates unnecessary code


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

             Bug #: 51492
           Summary: vectorizer generates unnecessary code
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: drepper.fsp@gmail.com
             Build: x86_64-linux


Compile this code with 4.6.2 on a x86-64 machine with -O3:

#define SIZE 65536
#define WSIZE 64
unsigned short head[SIZE] __attribute__((aligned(64)));

void
f(void)
{
  for (unsigned n = 0; n < SIZE; ++n) {
    unsigned short m = head[n];
    head[n] = (unsigned short)(m >= WSIZE ? m-WSIZE : 0);
  }
}

The result I see is this:

0000000000000000 <f>:
   0:    66 0f ef d2              pxor   %xmm2,%xmm2
   4:    b8 00 00 00 00           mov    $0x0,%eax
            5: R_X86_64_32    head
   9:    66 0f 6f 25 00 00 00     movdqa 0x0(%rip),%xmm4        # 11 <f+0x11>
  10:    00 
            d: R_X86_64_PC32    .LC0-0x4
  11:    66 0f 6f 1d 00 00 00     movdqa 0x0(%rip),%xmm3        # 19 <f+0x19>
  18:    00 
            15: R_X86_64_PC32    .LC1-0x4
  19:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
  20:    66 0f 6f 00              movdqa (%rax),%xmm0
  24:    66 0f 6f c8              movdqa %xmm0,%xmm1
  28:    66 0f d9 c4              psubusw %xmm4,%xmm0
  2c:    66 0f 75 c2              pcmpeqw %xmm2,%xmm0
  30:    66 0f fd cb              paddw  %xmm3,%xmm1
  34:    66 0f df c1              pandn  %xmm1,%xmm0
  38:    66 0f 7f 00              movdqa %xmm0,(%rax)
  3c:    48 83 c0 10              add    $0x10,%rax
  40:    48 3d 00 00 00 00        cmp    $0x0,%rax
            42: R_X86_64_32S    head+0x20000
  46:    75 d8                    jne    20 <f+0x20>
  48:    f3 c3                    repz retq 


There is a lot of unnecessary code.  The psubusw instruction alone is
sufficient.  The purpose of this instruction is to implement saturated
subtraction.  Why does gcc create all this extra code?  The code should just be

   movdqa (%rax), %xmm0
   psubusw %xmm1, %xmm0
   movdqa %mm0, (%rax)

where %xmm1 has WSIZE in the 16-bit values.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]