Bug 36473 - Generate bit test (bt) instructions
Summary: Generate bit test (bt) instructions
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: ---
Assignee: Uroš Bizjak
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Depends on:
Reported: 2008-06-09 09:53 UTC by Uroš Bizjak
Modified: 2008-07-13 08:44 UTC (History)
2 users (show)

See Also:
Target: x86
Known to work:
Known to fail:
Last reconfirmed: 2008-06-09 11:41:11

Proposed patch (1.05 KB, patch)
2008-06-09 11:39 UTC, Uroš Bizjak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2008-06-09 09:53:49 UTC
According to Intel Technology Journal [1], page 270, bt instruction runs 20% faster on Core2 Duo than equivalent generic code.

---Qoute from p.270---
The bit test instruction bt was introduced in the i386™
processor. In some implementations, including the Intel
NetBurst® micro-architecture, the instruction has a high
latency. The Intel Core micro-architecture executes bt in
a single cycle, when the bit base operand is a register.
Therefore, the Intel C++/Fortran compiler uses the bt
instruction to implement a common bit test idiom when
optimizing for the Intel Core micro-architecture. The
optimized code runs about 20% faster than the generic
version on an Intel Core 2 Duo processor. Both of these
versions are shown below:

C source code
  int x, n;
  if (x & (1 << n)) ...

Generic code generation
  ; edx contains x, ecx contains n.
  mov eax, 1
  shl eax, cl
  test edx, eax
  je taken

Intel Core micro-architecture code generation
  ; edx contains x, eax contains n.
  bt edx, eax
  jae taken

I have a patch in testing that implements suggested optimization for TARGET_USE_BT (including core2) targets.

[1] Inside the Intel® 10.1 Compilers: New Threadizer and New
Vectorizer for Intel® Core™2 Processors, Intel Technology Journal, Vol. 11, Issue 4, November 15, 2007, http://download.intel.com/technology/itj/2007/v11i4/1-inside/1-Inside_the_Intel_Compilers.pdf
Comment 1 Uroš Bizjak 2008-06-09 11:39:16 UTC
Created attachment 15741 [details]
Proposed patch

Following code:

--cut here--
void foo (void);

int test (int x, int n)
  if (x & (1 << n))
    foo ();

  return 0;
--cut here--

compiles using -O2 to:

        subl    $12, %esp
        movl    16(%esp), %eax
        movl    20(%esp), %ecx
        sarl    %cl, %eax
        testb   $1, %al
        je      .L2
        call    foo
        xorl    %eax, %eax
        addl    $12, %esp

With attached patch, following code is produced:

        subl    $12, %esp
        movl    20(%esp), %edx
        movl    16(%esp), %eax
        btl     %edx, %eax
        jnc     .L2
        call    foo
        xorl    %eax, %eax
        addl    $12, %esp

Attached patch doesn't have TARGET_USE_BT insn predicates, and it generates bt instructions by default. It was used to bootstrap gcc on i686-pc-linux-gnu, where it converts >1800 shift-and-test sequences into eqivalent bt instruction.

The patch as is was bootstrapped and regression tested on x86_64-pc-linux-gnu as well as i686-pc-linux-gnu.
Comment 2 Uroš Bizjak 2008-06-09 11:41:11 UTC
Mine, the patch needs testcases.

2008-06-09  Uros Bizjak  <ubizjak@gmail.com>

        * config/i386/predicates.md (bt_comparison_operator): New predicate.
        * config/i386/i386.md (*btdi_rex64): New instruction pattern.
        (*btsi): Ditto.
        (*jcc_btdi_rex64): New instruction and split pattern.
        (*jcc_btsi): Ditto.
        (*jcc_btsi_1): Ditto.
Comment 3 uros 2008-06-10 10:30:13 UTC
Subject: Bug 36473

Author: uros
Date: Tue Jun 10 10:29:36 2008
New Revision: 136615

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136615
	PR target/36473
	* config/i386/i386.c (ix86_tune_features) [TUNE_USE_BT]:
	Add m_CORE2 and m_GENERIC.
	* config/i386/predicates.md (bt_comparison_operator): New predicate.
	* config/i386/i386.md (*btdi_rex64): New instruction pattern.
	(*btsi): Ditto.
	(*jcc_btdi_rex64): New instruction and split pattern.
	(*jcc_btsi): Ditto.
	(*jcc_btsi_1): Ditto.
	(*btsq): Fix Intel asm dialect operand order.
	(*btrq): Ditto.
	(*btcq): Ditto.


	PR target/36473
	* testsuite/gcc.target/i386/bt-1.c: New test.
	* testsuite/gcc.target/i386/bt-2.c: Ditto.


Comment 4 Uroš Bizjak 2008-07-13 08:44:05 UTC
This is now implemented in mainline.