[Bug target/36473] New: Generate bit test (bt) instructions
ubizjak at gmail dot com
gcc-bugzilla@gcc.gnu.org
Mon Jun 9 09:54:00 GMT 2008
According to Intel Technology Journal [1], page 270, bt instruction runs 20%
faster on Core2 Duo than equivalent generic code.
---Qoute from p.270---
The bit test instruction bt was introduced in the i386Â
processor. In some implementations, including the Intel
NetBurst® micro-architecture, the instruction has a high
latency. The Intel Core micro-architecture executes bt in
a single cycle, when the bit base operand is a register.
Therefore, the Intel C++/Fortran compiler uses the bt
instruction to implement a common bit test idiom when
optimizing for the Intel Core micro-architecture. The
optimized code runs about 20% faster than the generic
version on an Intel Core 2 Duo processor. Both of these
versions are shown below:
C source code
int x, n;
...
if (x & (1 << n)) ...
Generic code generation
; edx contains x, ecx contains n.
mov eax, 1
shl eax, cl
test edx, eax
je taken
Intel Core micro-architecture code generation
; edx contains x, eax contains n.
bt edx, eax
jae taken
---/Quote---
I have a patch in testing that implements suggested optimization for
TARGET_USE_BT (including core2) targets.
[1] Inside the Intel® 10.1 Compilers: New Threadizer and New
Vectorizer for Intel® CoreÂ2 Processors, Intel Technology Journal, Vol. 11,
Issue 4, November 15, 2007,
http://download.intel.com/technology/itj/2007/v11i4/1-inside/1-Inside_the_Intel_Compilers.pdf
--
Summary: Generate bit test (bt) instructions
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: ubizjak at gmail dot com
GCC target triplet: x86
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473
More information about the Gcc-bugs
mailing list