This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

x86 paired 32-bit load/stores


Looking through some of assembly language code generated for the whetstone benchmark I found that gcc generates paired 32-bit loads/stores to implement 64-bit load/stores for the Pentium Pro/II/III processors. According to the intel optimization manual for the Pentium III this is undesirable because the hardware cannot foward the data. The result for these paired operations is the processor stalls waiting for the paired operations to complete. I noticed that the compiler avoids generating these paired 32-bit operations for the Athlon and Pentium 4. The same should be done for the Pentium Pro/II/III processors. The following is the change made to avoid producing the paired 32-bit load/stores on the Pentium Pro/II/III:


[wcohen@litespeed wcohen]$ diff gcc31/gcc/gcc/config/i386/i386.c
gcc31a/gcc/gcc/config/i386/i386.c
388c388
< const int x86_integer_DFmode_moves = ~(m_ATHLON | m_PENT4);
---
> const int x86_integer_DFmode_moves = ~(m_ATHLON | m_PENT4 | m_PPRO);

This change produced about a 3% improvement in the whetstone benchmark. This change really improved the code in the P0 function that just shuffles around 64-bit quantities around in memory. The P0 function when from 8.9% of total runtime to 5%.

I expect this change will also help in situations where a C return statement copies a 64-bit value from a local variable to another location in memory and the value is used for a 64-bit FP computation immediately after returning from the function.

-Will



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]