Bug 45294 - pextrw, redundant zero (or otherwise) extension
Summary: pextrw, redundant zero (or otherwise) extension
Status: RESOLVED DUPLICATE of bug 41323
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-16 04:20 UTC by tbp
Modified: 2010-08-16 14:52 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tbp 2010-08-16 04:20:45 UTC
This is a friendly reminder there's still no way to enjoy pextrw without undue zero/sign extension unless inline asm is used; there's even a gradient of ignominy from intrinsic to builtins, as exemplified by:
$ cat pextrw.cc
#include <smmintrin.h>
long unsigned int foo1(__m128i x) { return _mm_extract_epi16(x, 3); }
long unsigned int foo2(__v8hi x) { return __builtin_ia32_vec_ext_v8hi((__v8hi) x, 3); }
int main() { return 0; }
$ /usr/local/gcc-4.6-20100811/bin/g++ -O3 -march=native pextrw.cc
00000000004004a0 <_Z4foo1Dv2_x>:
  4004a0:       66 0f c5 c0 03          pextrw $0x3,%xmm0,%eax
  4004a5:       98                      cwtl   
  4004a6:       48 98                   cltq   
  4004a8:       c3                      retq   

00000000004004b0 <_Z4foo2Dv8_s>:
  4004b0:       66 0f c5 c0 03          pextrw $0x3,%xmm0,%eax
  4004b5:       48 0f bf c0             movswq %ax,%rax
  4004b9:       c3                      retq   

That's on x86-64, on a Intel I7 which, incidentally, is much faster at that whole pextrw business than previous generations.

This report may or may not be construed as a duplicate of the long forgotten PR 41323.
Comment 1 Uroš Bizjak 2010-08-16 10:19:36 UTC
(In reply to comment #0)
> This is a friendly reminder there's still no way to enjoy pextrw without undue
> zero/sign extension unless inline asm is used; there's even a gradient of
> ignominy from intrinsic to builtins, as exemplified by:

GCC does not simplify following instruction:

Trying 8 -> 9:
Failed to match this instruction:
(set (reg:DI 65 [ D.6814 ])
    (sign_extend:DI (sign_extend:SI (reg:HI 64))))

IMO, this RTX should simplify to:

(set (reg:DI 65 [ D.6814 ])
    (sign_extend:DI (reg:HI 64)))

Comment 2 Richard Biener 2010-08-16 10:20:19 UTC
The sign extension is because the builtin returns a signed quantity (unlike
the machine instruction, which zero-extends), so the conversion is inserted
by the language frontend.
Comment 3 H.J. Lu 2010-08-16 14:52:57 UTC

*** This bug has been marked as a duplicate of 41323 ***