This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Missing optimizations


on GCC-trunk/Cygwin/Core2 I observe the following behaviour.

g++ -std=gnu++0x -O2 -m32 -march=native -msse -msse2 -msse3 -Wall
-Werror -Wno-unused -Wno-strict-aliasing -march=native
-fomit-frame-pointer -Wno-pmf-conversions -g main.cpp

-----------8<--------------

#include <x86intrin.h>

int test1(__m128i v) {

    return _mm_cvtsi128_si32(v);
}

-----------8<--------------

emits:

004012e0 <__Z5test1U8__vectorx>:
  4012e0:	83 ec 0c             	sub    $0xc,%esp
  4012e3:	66 0f 7e c0          	movd   %xmm0,%eax
  4012e7:	83 c4 0c             	add    $0xc,%esp
  4012ea:	c3                   	ret

which shows that the stack pointer is being updated
without any purpose.

GCC also happens to lose the consition codes,
as shown here:

  4011a0:	66 0f df 01          	pandn  (%ecx),%xmm0
  4011a4:	39 d9                	cmp    %ebx,%ecx
  4011a6:	66 0f 7f 0c 24       	movdqa %xmm1,(%esp)
  4011ab:	75 04                	jne    4011b1 <__Z8popcountPKU8__vectorxjj+0x61>
  4011ad:	66 0f db c1          	pand   %xmm1,%xmm0
  4011b1:	66 0f 6f 1d 90 28 40 	movdqa 0x402890,%xmm3
  4011b8:	00
  4011b9:	66 0f 6f 15 a0 28 40 	movdqa 0x4028a0,%xmm2
  4011c0:	00
  4011c1:	66 0f 6f f3          	movdqa %xmm3,%xmm6
  4011c5:	66 0f 6f fb          	movdqa %xmm3,%xmm7
  4011c9:	66 0f db f0          	pand   %xmm0,%xmm6
  4011cd:	66 0f df f8          	pandn  %xmm0,%xmm7
  4011d1:	66 0f 6f ca          	movdqa %xmm2,%xmm1
  4011d5:	66 0f 6f c7          	movdqa %xmm7,%xmm0
  4011d9:	66 0f 38 00 ce       	pshufb %xmm6,%xmm1
  4011de:	66 0f 71 d0 04       	psrlw  $0x4,%xmm0
  4011e3:	66 0f 6f f1          	movdqa %xmm1,%xmm6
  4011e7:	66 0f 6f fa          	movdqa %xmm2,%xmm7
  4011eb:	39 d9                	cmp    %ebx,%ecx
  4011ed:	66 0f 38 00 f8       	pshufb %xmm0,%xmm7
  4011f2:	66 0f fc f7          	paddb  %xmm7,%xmm6
  4011f6:	66 0f ef ff          	pxor   %xmm7,%xmm7
  4011fa:	66 0f f6 f7          	psadbw %xmm7,%xmm6
  4011fe:	0f 84 be 00 00 00    	je     4012c2
<__Z8popcountPKU8__vectorxjj+0x172>

The second cmp is superfluous, as the SSE instructions in between
do not modify CC.

Are these known issues?

Best regards
Piotr Wyderski


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]