[Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)
hjl dot tools at gmail dot com
gcc-bugzilla@gcc.gnu.org
Fri Aug 27 16:16:00 GMT 2010
------- Comment #5 from hjl dot tools at gmail dot com 2010-08-27 16:16 -------
(In reply to comment #4)
> Created an attachment (id=21576)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21576&action=view) [edit]
> Patch to remove special (vec_duplicate ...) insn RTXes
>
> This patch removes special (vec_duplicate ...) forms of zero/sign extension
> instructions. This is similar to existing sse2_cvtps2pd pattern that access
> full 128bit memory even if only low 64bits are used.
>
> Also, current gcc generates:
>
> movdqa (%rdi), %xmm0 # 6 *movv16qi_internal/2 [length = 4]
> pmovzxbd %xmm0, %xmm0 # 7 sse4_1_zero_extendv4qiv4si2
>
> which also access full 128bit 16byte aligned value. This is no better than:
>
> pmovzxbd (%rdi), %xmm0 # 7 sse4_1_zero_extendv4qiv4si2
>
> Patch is untested, since I don't have required HW.
>
I tested it on Linux/ia32 and Linux/Intel64 with SSE4.1. There are no
regressions. Thanks.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484
More information about the Gcc-bugs
mailing list