This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Committed] store_fixed_bit_field improvements

From: Roger Sayle <roger at eyesopen dot com>
To: gcc-patches at gcc dot gnu dot org
Date: Thu, 27 Apr 2006 10:56:21 -0600 (MDT)
Subject: [Committed] store_fixed_bit_field improvements

This is the next in the series of patch to address the meta-bug
PR middle-end/19466, and more specifically my investigations of
PR middle-end/18041.  Following my earlier patch, we now generate
the following x86 code for a simple bit-field copy.

foo:    movl    8(%esp), %edx
        movl    4(%esp), %eax
        movzbl  (%edx), %edx
        andb    $-5, (%eax)   <- exhibit A
        andl    $4, %edx
        orb     %dl, (%eax)   <- exhibit B
        ret

The subtle quirk demonstrated by the instructions labelled "exhibit A"
and "exhibit B" above is that the middle-end tries hard to keep the
bitfield intermediate resident in memory for targets that support
logical operations whose destination is a memory location.  It's so
effective at this, that for "exhibit A", the processor reads the byte
location (%eax), modifies it and writes it back to memory.  Likewise,
for "exhibit B".  Even with the advent of caches, this is a lot of
memory traffic, and commonly in bitfield examples, prevents the reads
and writes being optimized between consecutive bitfield operations.

The patch below, identifies the case that a write to a bitfield
requires both a bitwise-AND and a bitwise-IOR, and if so explicitly
forces the intermediate into a new pseudo.  For the example, above
we now generate:

foo:    movl    8(%esp), %eax
        movl    4(%esp), %ecx
        movzbl  (%eax), %edx
        movzbl  (%ecx), %eax
        andl    $4, %edx
        andl    $-5, %eax
        orl     %edx, %eax
        movb    %al, (%ecx)
        ret

which is both more memory system friendly, and allows CSE and GCSE to
eliminate duplicate loads and stores.  The downside however is that
the above sequence is slightly larger.  To get a better handle on the
size aspects, I evaluated this change on CSiBE.

object file		before	after	delta
catdvi,fontinfo		3692	3660	-32
sed,compile		7669	7675	+6
sed,fmt			1696	1726	+30
sed,regex		 587	 591	+4

It looks like code size isn't a major factor with only four object
files changing size with this patch, for a net loss of eight bytes.
Being paranoid that the impact may be more significant on some other
bit manipulation heavy codes, I guarded this transformation with
!optimize_size just in case.

Unfortunately, I'm planning yet more improvements in this area, so
writing a test case to scan i386 assembler would be fragile.  Not
all targets support arithmetic instructions with memory destinations,
so this patch shouldn't affect most/many/some backends.


The following patch was tested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages including Ada, and regression
tested with a top-level "make -k check" with no new failures.

Committed to mainline as revision 113318.



2006-04-27  Roger Sayle  <roger@eyesopen.com>

	* expmed.c (store_fixed_bit_field): If we're not optimizing for
	size, force the intermediate into a new pseudo rather instead of
	performing both a bitwise AND and a bitwise IOR in memory.


Index: expmed.c
===================================================================
*** expmed.c	(revision 113265)
--- expmed.c	(working copy)
*************** store_fixed_bit_field (rtx op0, unsigned
*** 924,930 ****

    if (! all_one)
      {
!       temp = expand_binop (mode, and_optab, op0,
  			   mask_rtx (mode, bitpos, bitsize, 1),
  			   subtarget, 1, OPTAB_LIB_WIDEN);
        subtarget = temp;
--- 924,935 ----

    if (! all_one)
      {
!       /* Don't try and keep the intermediate in memory, if we need to
! 	 perform both a bit-wise AND and a bit-wise IOR (except when
! 	 we're optimizing for size).  */
!       if (MEM_P (subtarget) && !all_zero && !optimize_size)
! 	subtarget = force_reg (mode, subtarget);
!       temp = expand_binop (mode, and_optab, subtarget,
  			   mask_rtx (mode, bitpos, bitsize, 1),
  			   subtarget, 1, OPTAB_LIB_WIDEN);
        subtarget = temp;


Roger
--

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]