[Bug middle-end/93919] [10 Regression] vectorization of 18 char to char16_t conversion is miscompiled
kretz at kde dot org
gcc-bugzilla@gcc.gnu.org
Tue Feb 25 10:12:00 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919
--- Comment #4 from Matthias Kretz (Vir) <kretz at kde dot org> ---
Yes, this is the same issue.
FWIW, a vectorization with SSE4.1 could do:
pxor xmm0, xmm0
pinsrw xmm0, WORD PTR in[rip], 0
pmovsxbw xmm0, xmm0
movd DWORD PTR out[rip], xmm0
Whether that's faster than
movsx eax, BYTE PTR in[rip]
mov WORD PTR out[rip], ax
movsx eax, BYTE PTR in[rip+1]
mov WORD PTR out[rip+2], ax
probably depends on whether the load/store ports are limiting the performance
on this section of code. Without SSE4.1 I don't think it's worth vectorizing
this conversion.
In any case, my analysis that there's an out-of-bounds store was wrong. Please
disregard.
More information about the Gcc-bugs
mailing list