This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Remove SLOW_BYTE_ACCESS from get_best_mode


On Wed, Sep 13, 2006 at 08:57:54AM +0200, Paolo Bonzini wrote:
> 
> >I don't understand why we use byte instruction to access 32bit field.
> >There may be some hardwares where 4 byte accesses are faster than one
> >32bit access. But I am not aware any x86 one. I tried this patch on
> >Prescott, Nocona, Core and Core 2. There are no regressions in gcc
> >testsuites nor negative performance impact on SPEC CPU 2K.
> 
> First, the patch should have subject "Rename ..." rather than "Remove". 
>  As it is, it conveys the idea that there is no differentiation anymore 
> in get_best_mode.
> 
> Second, I don't like the name FAST_BYTE_ACCESS_BITFIELD.  It could be 
> named SLOW_BYTE_ACCESS_BITFIELD (switching the default definition and 
> use to remove the !), or WORD_ACCESS_BITFIELD.
> 
> Third, documentation is needed for the new macro.
> 
> Fourth, the default value for the new target macro should be in 
> defaults.h rather than in stor-layout.c.
> 
> Fifth, it seems plausible to me that SLOW_BYTE_ACCESS would still win if 
> there was only one bitfield accessed.  dojump.c is using byte 
> instructions when jumping, i.e. when obviously you access only one 
> field, and it is a win there.  I think the changes in twolf should be 
> analyzed more carefully before proposing a patch, to identify the 
> pessimization there and whether it could happen with the proposed patch 
> on similar code.
> 

Here is the updated patch. I ran a micro benchmark with code difference
similar to twolf:

-       movl    (%rdi), %eax
-       xorb    %ah, %ah
-       subl    $1, %eax
+       movq    (%rdi), %rax
+       andl    $4294902015, %eax
+       subq    $1, %rax
        jne     .L8

There is no speed difference on Nocona. I checked my SPEC results.
twolf isn't very stable on my Nocona. For gcc 4.2 2006-06-20, I got

			#1		#2
164.gzip                 1014            1014    0%
175.vpr                  1112            1031    -7.28417%
176.gcc                  1521            1526    0.328731%
181.mcf                  820             821     0.121951%
186.crafty               1521            1521    0%
197.parser               964             963     -0.103734%
252.eon                  1736            1743    0.403226%
253.perlbmk              1610            1604    -0.372671%
254.gap                  1673            1673    0%
255.vortex               1703            1705    0.11744%
256.bzip2                1289            1290    0.0775795%
300.twolf                1586            1650    4.03531%
Est. SPECint_base2000    1340            1337    -0.223881%

They are from the same binary.

BTW, get_best_mode is also used for accessing normal field. Maybe,
FAST_WORD_ACCESS or WORD_ACCESS_FIELD is better than
WORD_ACCESS_BITFIELD.



H.J.
----
2006-09-13  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.h (SLOW_BYTE_ACCESS): Update comment.
	(WORD_ACCESS_BITFIELD): New.
	(SLOW_SHORT_ACCESS): Removed.

	* doc/tm.texi (SLOW_BYTE_ACCESS): Updated.
	(WORD_ACCESS_BITFIELD): Document.

	* defaults.h (WORD_ACCESS_BITFIELD): New. Default to
	SLOW_BYTE_ACCESS.

	* stor-layout.c (SLOW_BYTE_ACCESS): Renamed to ...
	(WORD_ACCESS_BITFIELD): This.

--- gcc/config/i386/i386.h.slow	2006-09-13 07:01:23.000000000 -0700
+++ gcc/config/i386/i386.h	2006-09-13 10:42:43.000000000 -0700
@@ -1834,7 +1834,7 @@ do {							\
    require more than one instruction or if there is no difference in
    cost between byte and (aligned) word loads.
 
-   When this macro is not defined, the compiler will access a field by
+   When this macro is zero, the compiler will access a field by
    finding the smallest containing object; when it is defined, a
    fullword load will be used if alignment permits.  Unless bytes
    accesses are faster than word accesses, using word accesses is
@@ -1844,8 +1844,10 @@ do {							\
 
 #define SLOW_BYTE_ACCESS 0
 
-/* Nonzero if access to memory by shorts is slow and undesirable.  */
-#define SLOW_SHORT_ACCESS 0
+/* Define this macro as a C expression which is nonzero if accessing a
+   word memory is no slower than accessing less than a word of memory
+   (i.e. a `char' or a `short').  */
+#define WORD_ACCESS_BITFIELD 1
 
 /* Define this macro to be the value 1 if unaligned accesses have a
    cost many times greater than aligned accesses, for example if they
--- gcc/defaults.h.slow	2006-03-22 07:17:20.000000000 -0800
+++ gcc/defaults.h	2006-09-13 10:35:15.000000000 -0700
@@ -895,4 +895,10 @@ Software Foundation, 51 Franklin Street,
 #define INCOMING_FRAME_SP_OFFSET 0
 #endif
 
+/* Determines whether we should use word access for bitfield.  Default
+   to SLOW_BYTE_ACCESS if not specified.  */
+#ifndef WORD_ACCESS_BITFIELD
+#define WORD_ACCESS_BITFIELD SLOW_BYTE_ACCESS
+#endif
+
 #endif  /* ! GCC_DEFAULTS_H */
--- gcc/doc/tm.texi.slow	2006-08-29 08:38:33.000000000 -0700
+++ gcc/doc/tm.texi	2006-09-13 10:29:03.000000000 -0700
@@ -5595,14 +5595,23 @@ faster than accessing a word of memory, 
 require more than one instruction or if there is no difference in cost
 between byte and (aligned) word loads.
 
-When this macro is not defined, the compiler will access a field by
-finding the smallest containing object; when it is defined, a fullword
+When this macro is zero, the compiler will access a field by
+finding the smallest containing object; when it is nonzero, a fullword
 load will be used if alignment permits.  Unless bytes accesses are
 faster than word accesses, using word accesses is preferable since it
 may eliminate subsequent memory access if subsequent accesses occur to
 other fields in the same word of the structure, but to different bytes.
 @end defmac
 
+@defmac WORD_ACCESS_BITFIELD
+Define this macro as a C expression which is nonzero if accessing a
+word memory is no slower than accessing less than a word of memory
+(i.e.@: a @code{char} or a @code{short}).
+
+When this macro is not defined, it will be defined as
+@code{SLOW_BYTE_ACCESS}.
+@end defmac
+
 @defmac SLOW_UNALIGNED_ACCESS (@var{mode}, @var{alignment})
 Define this macro to be the value 1 if memory accesses described by the
 @var{mode} and @var{alignment} parameters have a cost many times greater
--- gcc/stor-layout.c.slow	2006-09-07 11:13:09.000000000 -0700
+++ gcc/stor-layout.c	2006-09-13 10:13:30.000000000 -0700
@@ -2113,10 +2113,10 @@ fixup_unsigned_type (tree type)
 
    If no mode meets all these conditions, we return VOIDmode.
 
-   If VOLATILEP is false and SLOW_BYTE_ACCESS is false, we return the
+   If VOLATILEP is false and WORD_ACCESS_BITFIELD is false, we return the
    smallest mode meeting these conditions.
 
-   If VOLATILEP is false and SLOW_BYTE_ACCESS is true, we return the
+   If VOLATILEP is false and WORD_ACCESS_BITFIELD is true, we return the
    largest mode (but a mode no wider than UNITS_PER_WORD) that meets
    all the conditions.
 
@@ -2151,7 +2151,7 @@ get_best_mode (int bitsize, int bitpos, 
       || (largest_mode != VOIDmode && unit > GET_MODE_BITSIZE (largest_mode)))
     return VOIDmode;
 
-  if ((SLOW_BYTE_ACCESS && ! volatilep)
+  if ((WORD_ACCESS_BITFIELD && ! volatilep)
       || (volatilep && !targetm.narrow_volatile_bitfield()))
     {
       enum machine_mode wide_mode = VOIDmode, tmode;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]