#define SLOW_BYTE_ACCESS in i386.h
H. J. Lu
hjl@lucon.org
Fri Sep 1 20:47:00 GMT 2006
On Fri, Sep 01, 2006 at 12:24:18PM -0700, Hui-May Chang wrote:
>
> On Sep 1, 2006, at 11:26 AM, H. J. Lu wrote:
>
> >On Fri, Sep 01, 2006 at 11:03:31AM -0700, Eric Christopher wrote:
> >>Hui-May Chang wrote:
> >>>I have a question regarding "#define SLOW_BYTE_ACCESS" in i386.h.
> >>>It is
> >>>used in "get_best_mode" routine which finds the best mode to use
> >>>when
> >>>referencing a bit field. It is currently set to 0. If it is set it
> >>>to 1,
> >>>it means "accessing less than a word of memory is no faster than
> >>>accessing a word of memory". I experimented with it and observed
> >>>great
> >>>performance improvement. It is set to 1 for some other
> >>>configurations
> >>>(e.g., rs6000, pa, ia64). Is it always a win to set it? Is it
> >>>better to
> >>>set it for certain i386 architectures?
> >>>
> >>
> >>I'll bet that it's probably advantageous to set it for a couple of
> >>reasons in the new chips at least:
> >>
> >>1) You avoid the problem that got you here of large bitfields needing
> >>shift/insert operations
> >>
> >>2) You avoid length changing since you're mostly operating on
> >>things in
> >>word mode.
> >>
> >>However, I'm not an expert on the chip so I'd suggest posting a small
> >>testcase that shows #1 for people and the resultant code
> >>differences so
> >>they can see the difference. Hopefully someone with more intel
> >>experience (like HJ or Jan) can comment on whether or not this is a
> >>good
> >>general idea for the processor.
> >>
> >
> >I tried this patch and enabled it for Conroe and Nocona. It doesn't
> >have much impact on SPEC CPU 2000 on Conroe and it seems bad on
> >Nocona.
> >Maybe we should add -mslow-byte-access to investigate it further.
> >
> >
> >H.J.
> >---
> >--- gcc/config/i386/i386.c.slow 2006-08-23 17:15:14.000000000 -0700
> >+++ gcc/config/i386/i386.c 2006-08-24 12:10:25.000000000 -0700
> >@@ -831,6 +831,8 @@ const int x86_cmpxchg16b = m_NOCONA;
> > const int x86_xadd = ~m_386;
> > const int x86_pad_returns = m_ATHLON_K8 | m_GENERIC;
> >
> >+const int x86_slow_byte_access = 0;
> >+
> > /* In case the average insn count for single function invocation is
> > lower than this constant, emit fast (but longer) prologue and
> > epilogue code. */
> >--- gcc/config/i386/i386.h.slow 2006-08-23 17:15:14.000000000 -0700
> >+++ gcc/config/i386/i386.h 2006-08-24 12:01:49.000000000 -0700
> >@@ -164,6 +164,7 @@ extern const int x86_use_bt;
> > extern const int x86_cmpxchg, x86_cmpxchg8b, x86_cmpxchg16b,
> >x86_xadd;
> > extern const int x86_use_incdec;
> > extern const int x86_pad_returns;
> >+extern const int x86_slow_byte_access;
> > extern int x86_prefetch_sse;
> >
> > #define TARGET_USE_LEAVE (x86_use_leave & TUNEMASK)
> >@@ -219,6 +220,7 @@ extern int x86_prefetch_sse;
> > #define TARGET_USE_INCDEC (x86_use_incdec & TUNEMASK)
> > #define TARGET_PAD_RETURNS (x86_pad_returns & TUNEMASK)
> > #define TARGET_EXT_80387_CONSTANTS (x86_ext_80387_constants &
> >TUNEMASK)
> >+#define TARGET_SLOW_BYTE_ACCESS (x86_slow_byte_access & TUNEMASK)
> >
> > #define ASSEMBLER_DIALECT (ix86_asm_dialect)
> >
> >@@ -1840,7 +1842,7 @@ do { \
> > subsequent accesses occur to other fields in the same word of the
> > structure, but to different bytes. */
> >
> >-#define SLOW_BYTE_ACCESS 0
> >+#define SLOW_BYTE_ACCESS TARGET_SLOW_BYTE_ACCESS
> >
> > /* Nonzero if access to memory by shorts is slow and undesirable. */
> > #define SLOW_SHORT_ACCESS 0
> We got the following request from a customer,
>
> When accessing a 32-bit bitfield on x86, gcc automatically allocates a
> 8-bit or 16-bit register to manipulate the portion of the bitfield
> modified rather than using a whole 32-bit register. This leads to
> poor performance when multiple updates to that 32-bit bitfield are
> performed as the portions are modified are always written to memory
> before a read of another portion are performed. If a sequence
> contains N modifications, there will be N loads and N stores of 8 or
> 16-bit values rather than a single 32-bit load and 32-bit store.
>
It is pretty bad.
> I am interested to see which CPU 2000 benchmark got affected.
>
-O2 -O2 -mslow-byte-access
164.gzip 998 998 0%
175.vpr 1118 1102 -1.43113%
176.gcc 1535 1534 -0.0651466%
181.mcf 819 820 0.1221%
186.crafty 1543 1541 -0.129618%
197.parser 966 965 -0.10352%
252.eon 1712 1713 0.0584112%
253.perlbmk 1629 1626 -0.184162%
254.gap 1680 1679 -0.0595238%
255.vortex 1701 1700 -0.0587889%
256.bzip2 1289 1285 -0.310318%
300.twolf 1650 1589 -3.69697%
Est. SPECint_base2000 1346 1340 -0.445765%
H.J.
More information about the Gcc-patches
mailing list