__builtin_bswap16 is supported on Powerpc, but is missing on x86.
We can use __builtin_bswap32 (x << 16). But it it is less
[hjl@gnu-6 tmp]$ cat b.c
b1 (short x)
return __bswap_16 (x);
b2 (short x)
return __builtin_bswap32 (x << 16);
[hjl@gnu-6 tmp]$ gcc -S -O b.c
[hjl@gnu-6 tmp]$ cat b.s
.type b1, @function
movl %edi, %eax
# 6 "b.c" 1
rorw $8, %ax
# 0 "" 2
.size b1, .-b1
.type b2, @function
movl %edi, %eax
sall $16, %eax
.size b2, .-b2
.ident "GCC: (GNU) 4.6.3 20120306 (Red Hat 4.6.3-2)"
static inline unsigned short __builtin_bswap16(unsigned short a)
This is IMO a valid request given that PowerPC has it (and x86 has rolw/xchg).
Created attachment 26946 [details]
Patch that implements __builtin_bswap16
I think that it should be available on all architectures, like the 32-bit and 64-bit flavors. And, for x86, you don't really need to add new patterns.
(In reply to comment #4)
> I think that it should be available on all architectures, like the 32-bit and
> 64-bit flavors. And, for x86, you don't really need to add new patterns.
Regarding new patterns - we need at least named expander, and the existing ones are strict_low_part types. They model the fact that higpart of the register is preserved, so ideal to implement bswap32.
Can you please take the middle-end part of the generic implementation?
> Regarding new patterns - we need at least named expander, and the existing ones
> are strict_low_part types. They model the fact that higpart of the register is
> preserved, so ideal to implement bswap32.
I think that the builtin should be expanded into a rotate (left or right) if they are available. On x86 this works out of the box since the rotates are there.
> Can you please take the middle-end part of the generic implementation?
Yes, will do. In fact, I already have a sketch of an implementation because of an internal project I'm working on.
(In reply to comment #6)
> I think that the builtin should be expanded into a rotate (left or right) if
> they are available. On x86 this works out of the box since the rotates are
Please note that the rotate is split back into bswap, so we can avoid rotate by using "xchg %rh, %rl" on P4.
> Please note that the rotate is split back into bswap, so we can avoid rotate by
> using "xchg %rh, %rl" on P4.
Sure, 16-bit rotates are already emitted as xchg when appropriate.
Do we need to optimize for partial register stall?
(In reply to comment #9)
> Do we need to optimize for partial register stall?
xchg is enabled only for Pentium4, and this is not partial reg stall target.
BTW: According to the docs, rol/ror on P4 has latency of 4 cycles + false flags dependency, where xchg has latency of 1.5 cycles.
Date: Wed Apr 11 11:13:39 2012
New Revision: 186308
* doc/extend.texi (Other Builtins): Document __builtin_bswap16.
(PowerPC AltiVec/VSX Built-in Functions): Remove it.
* doc/md.texi (Standard Names): Add bswap.
* builtin-types.def (BT_UINT16): New primitive type.
(BT_FN_UINT16_UINT16): New function type.
* builtins.def (BUILT_IN_BSWAP16): New.
* builtins.c (expand_builtin_bswap): Add TARGET_MODE argument.
(expand_builtin) <BUILT_IN_BSWAP16>: New case. Pass TARGET_MODE to
(fold_builtin_bswap): Add BUILT_IN_BSWAP16 case.
* optabs.c (expand_unop): Deal with bswap in HImode specially. Add
missing bits for bswap to libcall code.
* tree.c (build_common_tree_nodes): Build uint16_type_node.
* tree.h (enum tree_index): Add TI_UINT16_TYPE.
(uint16_type_node): New define.
* config/rs6000/rs6000-builtin.def (RS6000_BUILTIN_BSWAP_HI): Delete.
* config/rs6000/rs6000.c (rs6000_expand_builtin): Remove handling of
* config/rs6000/rs6000.md (bswaphi2): Add TARGET_POWERPC predicate.
* c-common.h (uint16_type_node): Rename into...
* c-common.c (c_common_nodes_and_builtins): Adjust for above renaming.
* c-cppbuiltin.c (builtin_define_stdint_macros): Likewise.