[Bug target/41484] Please add memory forms of pmovzx* (SSE4.1)

hjl dot tools at gmail dot com gcc-bugzilla@gcc.gnu.org
Fri Aug 27 16:16:00 GMT 2010



------- Comment #5 from hjl dot tools at gmail dot com  2010-08-27 16:16 -------
(In reply to comment #4)
> Created an attachment (id=21576)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21576&action=view) [edit]
> Patch to remove special (vec_duplicate ...) insn RTXes
> 
> This patch removes special (vec_duplicate ...) forms of zero/sign extension
> instructions. This is similar to existing sse2_cvtps2pd pattern that access
> full 128bit memory even if only low 64bits are used.
> 
> Also, current gcc generates:
> 
>         movdqa  (%rdi), %xmm0   # 6     *movv16qi_internal/2    [length = 4]
>         pmovzxbd        %xmm0, %xmm0    # 7     sse4_1_zero_extendv4qiv4si2     
> 
> which also access full 128bit 16byte aligned value. This is no better than:
> 
>         pmovzxbd        (%rdi), %xmm0   # 7     sse4_1_zero_extendv4qiv4si2     
> 
> Patch is untested, since I don't have required HW.
> 

I tested it on Linux/ia32 and Linux/Intel64 with SSE4.1. There are no
regressions. Thanks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484



More information about the Gcc-bugs mailing list