[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled

kretz at kde dot org gcc-bugzilla@gcc.gnu.org
Tue Feb 25 10:12:00 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919

--- Comment #4 from Matthias Kretz (Vir) <kretz at kde dot org> ---
Yes, this is the same issue.

FWIW, a vectorization with SSE4.1 could do:
  pxor xmm0, xmm0
  pinsrw xmm0, WORD PTR in[rip], 0
  pmovsxbw xmm0, xmm0
  movd DWORD PTR out[rip], xmm0

Whether that's faster than
  movsx eax, BYTE PTR in[rip]
  mov WORD PTR out[rip], ax
  movsx eax, BYTE PTR in[rip+1]
  mov WORD PTR out[rip+2], ax

probably depends on whether the load/store ports are limiting the performance
on this section of code. Without SSE4.1 I don't think it's worth vectorizing
this conversion.

In any case, my analysis that there's an out-of-bounds store was wrong. Please
disregard.


More information about the Gcc-bugs mailing list