This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFC PATCH, i386]: Generate bit test (bt) instructions


This optimization might be advantageous on AMD platforms as well, though
we haven't tested this patch on an AMD platform yet (I'll test this on a
Barcelona machine within a couple of days). According to the AMD
optimization guide for AMDFAM10 (aka Barcelona) and k8,
register/register BT is a DirectPath single cycle instruction (the
memory version is slower as it is VectorPath instruction). This can also
be enabled for generic given that it is fast for both AMD and Intel
current shipping systems.

This should also be emitted for -Os optimizations since it eliminates
the move instruction to load 1.

-Dwarak
 


> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org
[mailto:gcc-patches-owner@gcc.gnu.org]
> On Behalf Of Uros Bizjak
> Sent: Monday, June 09, 2008 1:18 PM
> To: GCC Patches
> Cc: H.J. Lu; Meissner, Michael
> Subject: [RFC PATCH, i386]: Generate bit test (bt) instructions
> 
> Hello!
> 
> According to Intel Technology Journal [1], page 270, bt instruction
runs
> 20% faster on Core2 Duo than equivalent generic code.
> 
> ---Qoute from p.270---
> The bit test instruction bt was introduced in the i386(tm)
> processor. In some implementations, including the Intel
> NetBurst(r) micro-architecture, the instruction has a high
> latency. The Intel Core micro-architecture executes bt in
> a single cycle, when the bit base operand is a register.
> Therefore, the Intel C++/Fortran compiler uses the bt
> instruction to implement a common bit test idiom when
> optimizing for the Intel Core micro-architecture. The
> optimized code runs about 20% faster than the generic
> version on an Intel Core 2 Duo processor. Both of these
> versions are shown below:
> 
> C source code
> int x, n;
> ...
> if (x & (1 << n)) ...
> 
> Generic code generation
> ; edx contains x, ecx contains n.
> mov eax, 1
> shl eax, cl
> test edx, eax
> je taken
> 
> Intel Core micro-architecture code generation
> ; edx contains x, eax contains n.
> bt edx, eax
> jae taken
> ---/Quote---
> 
> GCC compiles following code:
> 
> --cut here--
> void foo (void);
> 
> int test (int x, int n)
> {
> if (x & (1 << n))
> foo ();
> 
> return 0;
> }
> --cut here--
> 
> using -O2 to:
> 
> test:
> subl $12, %esp
> movl 16(%esp), %eax
> movl 20(%esp), %ecx
> sarl %cl, %eax
> testb $1, %al
> je .L2
> call foo
> .L2:
> xorl %eax, %eax
> addl $12, %esp
> ret
> 
> With attached patch, -O2 -mtune=core2 produces:
> 
> test:
> subl $12, %esp
> movl 20(%esp), %edx
> movl 16(%esp), %eax
> btl %edx, %eax
> jnc .L2
> call foo
> .L2:
> xorl %eax, %eax
> addl $12, %esp
> ret
> 
> The patch without TARGET_USE_BT insn predicates was used to bootstrap
> gcc on i686-pc-linux-gnu and x86_64-pc-linux-gnu, where it converts
>  >1800 shift-and-test sequences into eqivalent bt instructions.
> 
> Attached patch adds TARGET_USE_BT insn predicates and adds core2 to
> TARGET_USE_BT group.
> 
> 
> 2008-06-09 Uros Bizjak <ubizjak@gmail.com>
> 
> PR target/36473
> * config/i386/i386.c (ix86_tune_features) [TUNE_USE_BT]: Add m_CORE2.
> * config/i386/predicates.md (bt_comparison_operator): New predicate.
> * config/i386/i386.md (*btdi_rex64): New instruction pattern.
> (*btsi): Ditto.
> (*jcc_btdi_rex64): New instruction and split pattern.
> (*jcc_btsi): Ditto.
> (*jcc_btsi_1): Ditto.
> (*btsq): Fix Intel asm dialect operand order.
> (*btrq): Ditto.
> (*btcq): Ditto.
> 
> 
> The patch was bootstrapped and regression tested on
x86_64-pc-linux-gnu
> as well as i686-pc-linux-gnu, with and without TARGET_USE_BT insn
> predicates.
> 
> [1] Inside the Intel(r) 10.1 Compilers: New Threadizer and New
Vectorizer
> for Intel(r) Core(tm)2 Processors, Intel Technology Journal, Vol. 11,
Issue
> 4, November 15, 2007,
> http://download.intel.com/technology/itj/2007/v11i4/1-inside/1-
> Inside_the_Intel_Compilers.pdf
> 
> Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]