This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[Committed] store_fixed_bit_field improvements
- From: Roger Sayle <roger at eyesopen dot com>
- To: gcc-patches at gcc dot gnu dot org
- Date: Thu, 27 Apr 2006 10:56:21 -0600 (MDT)
- Subject: [Committed] store_fixed_bit_field improvements
This is the next in the series of patch to address the meta-bug
PR middle-end/19466, and more specifically my investigations of
PR middle-end/18041. Following my earlier patch, we now generate
the following x86 code for a simple bit-field copy.
foo: movl 8(%esp), %edx
movl 4(%esp), %eax
movzbl (%edx), %edx
andb $-5, (%eax) <- exhibit A
andl $4, %edx
orb %dl, (%eax) <- exhibit B
ret
The subtle quirk demonstrated by the instructions labelled "exhibit A"
and "exhibit B" above is that the middle-end tries hard to keep the
bitfield intermediate resident in memory for targets that support
logical operations whose destination is a memory location. It's so
effective at this, that for "exhibit A", the processor reads the byte
location (%eax), modifies it and writes it back to memory. Likewise,
for "exhibit B". Even with the advent of caches, this is a lot of
memory traffic, and commonly in bitfield examples, prevents the reads
and writes being optimized between consecutive bitfield operations.
The patch below, identifies the case that a write to a bitfield
requires both a bitwise-AND and a bitwise-IOR, and if so explicitly
forces the intermediate into a new pseudo. For the example, above
we now generate:
foo: movl 8(%esp), %eax
movl 4(%esp), %ecx
movzbl (%eax), %edx
movzbl (%ecx), %eax
andl $4, %edx
andl $-5, %eax
orl %edx, %eax
movb %al, (%ecx)
ret
which is both more memory system friendly, and allows CSE and GCSE to
eliminate duplicate loads and stores. The downside however is that
the above sequence is slightly larger. To get a better handle on the
size aspects, I evaluated this change on CSiBE.
object file before after delta
catdvi,fontinfo 3692 3660 -32
sed,compile 7669 7675 +6
sed,fmt 1696 1726 +30
sed,regex 587 591 +4
It looks like code size isn't a major factor with only four object
files changing size with this patch, for a net loss of eight bytes.
Being paranoid that the impact may be more significant on some other
bit manipulation heavy codes, I guarded this transformation with
!optimize_size just in case.
Unfortunately, I'm planning yet more improvements in this area, so
writing a test case to scan i386 assembler would be fragile. Not
all targets support arithmetic instructions with memory destinations,
so this patch shouldn't affect most/many/some backends.
The following patch was tested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages including Ada, and regression
tested with a top-level "make -k check" with no new failures.
Committed to mainline as revision 113318.
2006-04-27 Roger Sayle <roger@eyesopen.com>
* expmed.c (store_fixed_bit_field): If we're not optimizing for
size, force the intermediate into a new pseudo rather instead of
performing both a bitwise AND and a bitwise IOR in memory.
Index: expmed.c
===================================================================
*** expmed.c (revision 113265)
--- expmed.c (working copy)
*************** store_fixed_bit_field (rtx op0, unsigned
*** 924,930 ****
if (! all_one)
{
! temp = expand_binop (mode, and_optab, op0,
mask_rtx (mode, bitpos, bitsize, 1),
subtarget, 1, OPTAB_LIB_WIDEN);
subtarget = temp;
--- 924,935 ----
if (! all_one)
{
! /* Don't try and keep the intermediate in memory, if we need to
! perform both a bit-wise AND and a bit-wise IOR (except when
! we're optimizing for size). */
! if (MEM_P (subtarget) && !all_zero && !optimize_size)
! subtarget = force_reg (mode, subtarget);
! temp = expand_binop (mode, and_optab, subtarget,
mask_rtx (mode, bitpos, bitsize, 1),
subtarget, 1, OPTAB_LIB_WIDEN);
subtarget = temp;
Roger
--