This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH i386] Allow cltd/cqto etc on modern CPUs
- From: Xinliang David Li <davidxl at google dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 30 Nov 2012 21:50:59 -0800
- Subject: [PATCH i386] Allow cltd/cqto etc on modern CPUs
Compiling the following code with O2
typedef unsigned long ulong;
typedef __SIZE_TYPE__ size_t;
long woo_i(long a, long b) { return a/b; }
GCC generates:
.LFB0:
.cfi_startproc
movq %rdi, %rdx
movq %rdi, %rax
sarq $63, %rdx
idivq %rsi
ret
but both ICC and LLVM generate smaller and faster version:
movq %rdi, %rax
cqto
idivq %rsi
ret
for reference see
http://www.agner.org/optimize/instruction_tables.pdf. On Pentium, the
latency of the instruction is 3 cycles while on modern CPUs, the
instruction has only one uOp with 1 cycle latency.
The following proposed patch fixed the problem. Note that for Atom,
only the CWD instruction is slow with 5 cycle latency, the rest sign
extension instructions are fast -- the fix for Atom needs finer grain
control and can be done separately.
Ok to install after testing?
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 193861)
+++ config/i386/i386.c (working copy)
@@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
m_K6,
/* X86_TUNE_USE_CLTD */
- ~(m_PENT | m_CORE2I7 | m_ATOM | m_K6 | m_GENERIC),
+ ~(m_PENT | m_ATOM | m_K6),
/* X86_TUNE_USE_XCHGB: Use xchgb %rh,%rl instead of rolw/rorw $8,rx. */
m_PENT4,
2010-11-30 Xinliang David Li <davidxl@google.com>
* config/i386/i386.c: Allow sign extend instructions (cltd etc)
on modern CPUs.
thanks,
David