This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1)
- From: "sgunderson at bigfoot dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 27 Sep 2009 21:03:43 -0000
- Subject: [Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1)
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
Hi,
SSE4.1 introduced zero-extending and sign-extending loads, such as
pmovzxbd (%rax), %mm0
which takes four bytes from (%rax), zero-extends them to four 32-bit dwords,
and put them into %mm0. However, GCC's intrinsics support only the form
pmovzxbd %mm1, %mm0
which take the lower 32 bits from %mm1 and does the same. This is reflected in
the definition of the intrinsic (from the GCC 4.4.1 manual):
v4si __builtin_ia32_pmovzxbd128 (v16qi)
This makes it rather hard and indirect to load, say, 32 bits from an unaligned
char* -- especially if you're not sure that the next 96 bits are readable.
(Just casting the char* pointer to an v16qi* and dereferencing it in the
intrinsic's argument causes GCC to emit an aligned load to a register, followed
by a pmovzxbd reg/reg, at least in my program.)
Could you please add the forms that take v2qi/v4qi/v8qi/v2hi/v4hi/v2si as well,
for the entire pmovzx* and pmovsx* family?
--
Summary: Please add memory forms of pmovzx* (SSE4.1)
Product: gcc
Version: 4.4.1
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: sgunderson at bigfoot dot com
GCC build triplet: x86_64-linux-gnu
GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484