Bug 52624 - missing __builtin_bswap16
Summary: missing __builtin_bswap16
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: 4.8.0
Assignee: Eric Botcazou
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-19 17:24 UTC by H.J. Lu
Modified: 2012-04-11 11:19 UTC (History)
2 users (show)

See Also:
Host:
Target: x86
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-03-21 00:00:00


Attachments
Patch that implements __builtin_bswap16 (1.04 KB, patch)
2012-03-21 23:25 UTC, Uroš Bizjak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2012-03-19 17:24:01 UTC
__builtin_bswap16 is supported on Powerpc, but is missing on x86.
We can use __builtin_bswap32 (x << 16).  But it it is less
efficient:

[hjl@gnu-6 tmp]$ cat b.c
#include <byteswap.h>

short
b1 (short x)
{
  return  __bswap_16 (x);
}

short
b2 (short x)
{
  return  __builtin_bswap32 (x << 16);
}
[hjl@gnu-6 tmp]$ gcc -S -O b.c
[hjl@gnu-6 tmp]$ cat b.s
	.file	"b.c"
	.text
	.globl	b1
	.type	b1, @function
b1:
.LFB0:
	.cfi_startproc
	movl	%edi, %eax
#APP
# 6 "b.c" 1
	rorw $8, %ax
# 0 "" 2
#NO_APP
	ret
	.cfi_endproc
.LFE0:
	.size	b1, .-b1
	.globl	b2
	.type	b2, @function
b2:
.LFB1:
	.cfi_startproc
	movl	%edi, %eax
	sall	$16, %eax
	bswap	%eax
	ret
	.cfi_endproc
.LFE1:
	.size	b2, .-b2
	.ident	"GCC: (GNU) 4.6.3 20120306 (Red Hat 4.6.3-2)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-6 tmp]$
Comment 1 Andrew Pinski 2012-03-19 18:26:58 UTC
static inline unsigned short __builtin_bswap16(unsigned short a)
{
  return (a<<8)|(a>>8);
}
Comment 2 Eric Botcazou 2012-03-21 16:09:40 UTC
This is IMO a valid request given that PowerPC has it (and x86 has rolw/xchg).
Comment 3 Uroš Bizjak 2012-03-21 23:25:37 UTC
Created attachment 26946 [details]
Patch that implements __builtin_bswap16

Untested patch.
Comment 4 Eric Botcazou 2012-03-22 07:41:25 UTC
I think that it should be available on all architectures, like the 32-bit and 64-bit flavors.  And, for x86, you don't really need to add new patterns.
Comment 5 Uroš Bizjak 2012-03-22 07:51:20 UTC
(In reply to comment #4)
> I think that it should be available on all architectures, like the 32-bit and
> 64-bit flavors.  And, for x86, you don't really need to add new patterns.

I agree.

Regarding new patterns - we need at least named expander, and the existing ones are strict_low_part types. They model the fact that higpart of the register is preserved, so ideal to implement bswap32.

Can you please take the middle-end part of the generic implementation?
Comment 6 Eric Botcazou 2012-03-22 08:05:52 UTC
> Regarding new patterns - we need at least named expander, and the existing ones
> are strict_low_part types. They model the fact that higpart of the register is
> preserved, so ideal to implement bswap32.

I think that the builtin should be expanded into a rotate (left or right) if they are available.  On x86 this works out of the box since the rotates are there.

> Can you please take the middle-end part of the generic implementation?

Yes, will do.  In fact, I already have a sketch of an implementation because of an internal project I'm working on.
Comment 7 Uroš Bizjak 2012-03-22 09:56:18 UTC
(In reply to comment #6)

> I think that the builtin should be expanded into a rotate (left or right) if
> they are available.  On x86 this works out of the box since the rotates are
> there.

Please note that the rotate is split back into bswap, so we can avoid rotate by using "xchg %rh, %rl" on P4.
Comment 8 Eric Botcazou 2012-03-22 10:04:44 UTC
> Please note that the rotate is split back into bswap, so we can avoid rotate by
> using "xchg %rh, %rl" on P4.

Sure, 16-bit rotates are already emitted as xchg when appropriate.
Comment 9 H.J. Lu 2012-03-22 16:24:41 UTC
Do we need to optimize for partial register stall?
Comment 10 Uroš Bizjak 2012-03-22 16:59:43 UTC
(In reply to comment #9)
> Do we need to optimize for partial register stall?

xchg is enabled only for Pentium4, and this is not partial reg stall target.

BTW: According to the docs, rol/ror on P4 has latency of 4 cycles + false flags dependency, where xchg has latency of 1.5 cycles.
Comment 11 Eric Botcazou 2012-04-11 11:13:43 UTC
Author: ebotcazou
Date: Wed Apr 11 11:13:39 2012
New Revision: 186308

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186308
Log:
	PR target/52624
	* doc/extend.texi (Other Builtins): Document __builtin_bswap16.
	(PowerPC AltiVec/VSX Built-in Functions): Remove it.
	* doc/md.texi (Standard Names): Add bswap.
	* builtin-types.def (BT_UINT16): New primitive type.
	(BT_FN_UINT16_UINT16): New function type.
	* builtins.def (BUILT_IN_BSWAP16): New.
	* builtins.c (expand_builtin_bswap): Add TARGET_MODE argument.
	(expand_builtin) <BUILT_IN_BSWAP16>: New case.  Pass TARGET_MODE to
	expand_builtin_bswap.
	(fold_builtin_bswap): Add BUILT_IN_BSWAP16 case.
	(fold_builtin_1): Likewise.
	(is_inexpensive_builtin): Likewise.
	* optabs.c (expand_unop): Deal with bswap in HImode specially.  Add
	missing bits for bswap to libcall code.
	* tree.c (build_common_tree_nodes): Build uint16_type_node.
	* tree.h (enum tree_index): Add TI_UINT16_TYPE.
	(uint16_type_node): New define.
	* config/rs6000/rs6000-builtin.def (RS6000_BUILTIN_BSWAP_HI): Delete.
	* config/rs6000/rs6000.c (rs6000_expand_builtin): Remove handling of
	above builtin.
	(rs6000_init_builtins): Likewise.
	* config/rs6000/rs6000.md (bswaphi2): Add TARGET_POWERPC predicate.
c-family/
	* c-common.h (uint16_type_node): Rename into...
	(c_uint16_type_node): ...this.
	* c-common.c (c_common_nodes_and_builtins): Adjust for above renaming.
	* c-cppbuiltin.c (builtin_define_stdint_macros): Likewise.

Added:
    trunk/gcc/testsuite/gcc.target/i386/builtin-bswap-4.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtin-types.def
    trunk/gcc/builtins.c
    trunk/gcc/builtins.def
    trunk/gcc/c-family/ChangeLog
    trunk/gcc/c-family/c-common.c
    trunk/gcc/c-family/c-common.h
    trunk/gcc/c-family/c-cppbuiltin.c
    trunk/gcc/config/rs6000/rs6000-builtin.def
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/config/rs6000/rs6000.md
    trunk/gcc/doc/extend.texi
    trunk/gcc/doc/md.texi
    trunk/gcc/optabs.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/builtin-bswap-1.c
    trunk/gcc/testsuite/gcc.dg/builtin-bswap-4.c
    trunk/gcc/testsuite/gcc.dg/builtin-bswap-5.c
    trunk/gcc/tree.c
    trunk/gcc/tree.h
Comment 12 Eric Botcazou 2012-04-11 11:19:03 UTC
On mainline.