This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH i386] Allow cltd/cqto etc on modern CPUs


Compiling the following code with O2

typedef unsigned long ulong;
typedef __SIZE_TYPE__ size_t;
long woo_i(long a, long b) { return a/b; }

GCC generates:

.LFB0:
        .cfi_startproc
        movq    %rdi, %rdx
        movq    %rdi, %rax
        sarq    $63, %rdx
        idivq   %rsi
        ret

but both ICC and LLVM generate smaller and faster version:

        movq      %rdi, %rax
        cqto
        idivq     %rsi
        ret

for reference see
http://www.agner.org/optimize/instruction_tables.pdf.  On Pentium, the
latency of the instruction is 3 cycles while on modern CPUs, the
instruction has only one uOp with 1 cycle latency.

The following proposed patch fixed the problem. Note that for Atom,
only the CWD instruction is slow with 5 cycle latency, the rest sign
extension instructions are fast -- the fix for Atom needs finer grain
control and can be done separately.

Ok to install after testing?

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 193861)
+++ config/i386/i386.c (working copy)
@@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
   m_K6,

   /* X86_TUNE_USE_CLTD */
-  ~(m_PENT | m_CORE2I7 | m_ATOM | m_K6 | m_GENERIC),
+  ~(m_PENT | m_ATOM | m_K6),

   /* X86_TUNE_USE_XCHGB: Use xchgb %rh,%rl instead of rolw/rorw $8,rx.  */
   m_PENT4,

2010-11-30  Xinliang David Li  <davidxl@google.com>

        * config/i386/i386.c: Allow sign extend instructions (cltd etc)
        on modern CPUs.


thanks,

David


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]