__builtin_bswap16 is supported on Powerpc, but is missing on x86. We can use __builtin_bswap32 (x << 16). But it it is less efficient: [hjl@gnu-6 tmp]$ cat b.c #include <byteswap.h> short b1 (short x) { return __bswap_16 (x); } short b2 (short x) { return __builtin_bswap32 (x << 16); } [hjl@gnu-6 tmp]$ gcc -S -O b.c [hjl@gnu-6 tmp]$ cat b.s .file "b.c" .text .globl b1 .type b1, @function b1: .LFB0: .cfi_startproc movl %edi, %eax #APP # 6 "b.c" 1 rorw $8, %ax # 0 "" 2 #NO_APP ret .cfi_endproc .LFE0: .size b1, .-b1 .globl b2 .type b2, @function b2: .LFB1: .cfi_startproc movl %edi, %eax sall $16, %eax bswap %eax ret .cfi_endproc .LFE1: .size b2, .-b2 .ident "GCC: (GNU) 4.6.3 20120306 (Red Hat 4.6.3-2)" .section .note.GNU-stack,"",@progbits [hjl@gnu-6 tmp]$
static inline unsigned short __builtin_bswap16(unsigned short a) { return (a<<8)|(a>>8); }
This is IMO a valid request given that PowerPC has it (and x86 has rolw/xchg).
Created attachment 26946 [details] Patch that implements __builtin_bswap16 Untested patch.
I think that it should be available on all architectures, like the 32-bit and 64-bit flavors. And, for x86, you don't really need to add new patterns.
(In reply to comment #4) > I think that it should be available on all architectures, like the 32-bit and > 64-bit flavors. And, for x86, you don't really need to add new patterns. I agree. Regarding new patterns - we need at least named expander, and the existing ones are strict_low_part types. They model the fact that higpart of the register is preserved, so ideal to implement bswap32. Can you please take the middle-end part of the generic implementation?
> Regarding new patterns - we need at least named expander, and the existing ones > are strict_low_part types. They model the fact that higpart of the register is > preserved, so ideal to implement bswap32. I think that the builtin should be expanded into a rotate (left or right) if they are available. On x86 this works out of the box since the rotates are there. > Can you please take the middle-end part of the generic implementation? Yes, will do. In fact, I already have a sketch of an implementation because of an internal project I'm working on.
(In reply to comment #6) > I think that the builtin should be expanded into a rotate (left or right) if > they are available. On x86 this works out of the box since the rotates are > there. Please note that the rotate is split back into bswap, so we can avoid rotate by using "xchg %rh, %rl" on P4.
> Please note that the rotate is split back into bswap, so we can avoid rotate by > using "xchg %rh, %rl" on P4. Sure, 16-bit rotates are already emitted as xchg when appropriate.
Do we need to optimize for partial register stall?
(In reply to comment #9) > Do we need to optimize for partial register stall? xchg is enabled only for Pentium4, and this is not partial reg stall target. BTW: According to the docs, rol/ror on P4 has latency of 4 cycles + false flags dependency, where xchg has latency of 1.5 cycles.
Author: ebotcazou Date: Wed Apr 11 11:13:39 2012 New Revision: 186308 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186308 Log: PR target/52624 * doc/extend.texi (Other Builtins): Document __builtin_bswap16. (PowerPC AltiVec/VSX Built-in Functions): Remove it. * doc/md.texi (Standard Names): Add bswap. * builtin-types.def (BT_UINT16): New primitive type. (BT_FN_UINT16_UINT16): New function type. * builtins.def (BUILT_IN_BSWAP16): New. * builtins.c (expand_builtin_bswap): Add TARGET_MODE argument. (expand_builtin) <BUILT_IN_BSWAP16>: New case. Pass TARGET_MODE to expand_builtin_bswap. (fold_builtin_bswap): Add BUILT_IN_BSWAP16 case. (fold_builtin_1): Likewise. (is_inexpensive_builtin): Likewise. * optabs.c (expand_unop): Deal with bswap in HImode specially. Add missing bits for bswap to libcall code. * tree.c (build_common_tree_nodes): Build uint16_type_node. * tree.h (enum tree_index): Add TI_UINT16_TYPE. (uint16_type_node): New define. * config/rs6000/rs6000-builtin.def (RS6000_BUILTIN_BSWAP_HI): Delete. * config/rs6000/rs6000.c (rs6000_expand_builtin): Remove handling of above builtin. (rs6000_init_builtins): Likewise. * config/rs6000/rs6000.md (bswaphi2): Add TARGET_POWERPC predicate. c-family/ * c-common.h (uint16_type_node): Rename into... (c_uint16_type_node): ...this. * c-common.c (c_common_nodes_and_builtins): Adjust for above renaming. * c-cppbuiltin.c (builtin_define_stdint_macros): Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/builtin-bswap-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/builtin-types.def trunk/gcc/builtins.c trunk/gcc/builtins.def trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-common.c trunk/gcc/c-family/c-common.h trunk/gcc/c-family/c-cppbuiltin.c trunk/gcc/config/rs6000/rs6000-builtin.def trunk/gcc/config/rs6000/rs6000.c trunk/gcc/config/rs6000/rs6000.md trunk/gcc/doc/extend.texi trunk/gcc/doc/md.texi trunk/gcc/optabs.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/builtin-bswap-1.c trunk/gcc/testsuite/gcc.dg/builtin-bswap-4.c trunk/gcc/testsuite/gcc.dg/builtin-bswap-5.c trunk/gcc/tree.c trunk/gcc/tree.h
On mainline.