This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
x86 paired 32-bit load/stores
- From: Will Cohen <wcohen at redhat dot com>
- To: gcc at gcc dot gnu dot org
- Date: Mon, 17 Jun 2002 12:13:14 -0400
- Subject: x86 paired 32-bit load/stores
- Organization: Red Hat, Inc.
Looking through some of assembly language code generated for the
whetstone benchmark I found that gcc generates paired 32-bit
loads/stores to implement 64-bit load/stores for the Pentium Pro/II/III
processors. According to the intel optimization manual for the Pentium
III this is undesirable because the hardware cannot foward the data. The
result for these paired operations is the processor stalls waiting for
the paired operations to complete. I noticed that the compiler avoids
generating these paired 32-bit operations for the Athlon and Pentium 4.
The same should be done for the Pentium Pro/II/III processors. The
following is the change made to avoid producing the paired 32-bit
load/stores on the Pentium Pro/II/III:
[wcohen@litespeed wcohen]$ diff gcc31/gcc/gcc/config/i386/i386.c
gcc31a/gcc/gcc/config/i386/i386.c
388c388
< const int x86_integer_DFmode_moves = ~(m_ATHLON | m_PENT4);
---
> const int x86_integer_DFmode_moves = ~(m_ATHLON | m_PENT4 | m_PPRO);
This change produced about a 3% improvement in the whetstone benchmark.
This change really improved the code in the P0 function that just
shuffles around 64-bit quantities around in memory. The P0 function when
from 8.9% of total runtime to 5%.
I expect this change will also help in situations where a C return
statement copies a 64-bit value from a local variable to another
location in memory and the value is used for a 64-bit FP computation
immediately after returning from the function.
-Will