[Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Mon Nov 25 11:08:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Kind-of a testcase for SSE2, but this has a matching BIT_FIELD_REF at least,
but still "fails" at the vector source.  Skia seems to pun to __int128
before doing the extracts somehow (maybe that's our intrinsics, who knows).

typedef unsigned short v8hi __attribute__((vector_size(16)));
typedef unsigned int v4si __attribute__((vector_size(16)));

void foo (v4si *dst, v8hi src)
{
  unsigned int tem[8];
  tem[0] = src[0];
  tem[1] = src[1];
  tem[2] = src[2];
  tem[3] = src[3];
  tem[4] = src[4];
  tem[5] = src[5];
  tem[6] = src[6];
  tem[7] = src[7];
  dst[0] = *(v4si *)tem;
  dst[1] = *(v4si *)&tem[4];
}


More information about the Gcc-bugs mailing list