This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c/41484] New: Please add memory forms of pmovzx* (SSE4.1)


Hi,

SSE4.1 introduced zero-extending and sign-extending loads, such as

  pmovzxbd (%rax), %mm0

which takes four bytes from (%rax), zero-extends them to four 32-bit dwords,
and put them into %mm0. However, GCC's intrinsics support only the form

  pmovzxbd %mm1, %mm0

which take the lower 32 bits from %mm1 and does the same. This is reflected in
the definition of the intrinsic (from the GCC 4.4.1 manual):

  v4si __builtin_ia32_pmovzxbd128 (v16qi)

This makes it rather hard and indirect to load, say, 32 bits from an unaligned
char* -- especially if you're not sure that the next 96 bits are readable.
(Just casting the char* pointer to an v16qi* and dereferencing it in the
intrinsic's argument causes GCC to emit an aligned load to a register, followed
by a pmovzxbd reg/reg, at least in my program.)

Could you please add the forms that take v2qi/v4qi/v8qi/v2hi/v4hi/v2si as well,
for the entire pmovzx* and pmovsx* family?


-- 
           Summary: Please add memory forms of pmovzx* (SSE4.1)
           Product: gcc
           Version: 4.4.1
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sgunderson at bigfoot dot com
 GCC build triplet: x86_64-linux-gnu
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]