This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Merge adjacent stores of constants (PR middle-end/22141)


Jakub Jelinek <jakub@redhat.com> writes:
> Unfortunately, when I said for PR37135 that my patch transforms that
> bug into a dup of PR22141, it isn't the case, as that transformation
> needs both DSE1 (with my PR37135 change) and combine.  But DSE2 is
> run after reload and thus merging of adjacent stores is much harder
> (not done in the patch below).  As the code already calls 
> cselib_expand_value_rtx for rhs (not needed for i386/x86_64, but needed
> e.g. for ppc/ppc64), I wonder if it just shouldn't also special case
> common bitfield setting patterns (i.e. read from memory, and with a
> constant, optionally or with some constant, store into the same memory;
> as a follow-up patch).

Isnt't this something that is better done at the tree-level in the bitfield
folder?

> --- gcc/dse.c.jj	2008-11-24 12:05:40.000000000 +0100
> +++ gcc/dse.c	2008-11-27 13:18:42.000000000 +0100
> @@ -188,6 +188,10 @@ along with GCC; see the file COPYING3.  
>       does, assuming that the alias sets can be manipulated in the same
>       way.  */
>  
> +#ifndef SLOW_UNALIGNED_ACCESS
> +#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT
> +#endif

...

> +  for (mode = GET_MODE_WIDER_MODE (GET_MODE (store_info->mem));
> +       GET_MODE_SIZE (mode) <= UNITS_PER_WORD;
> +       mode = GET_MODE_WIDER_MODE (mode))
> +    {
> +      unsigned HOST_WIDE_INT mask, offset;
> +
> +      offset = ((store_info->begin & (UNITS_PER_WORD - 1))
> +		- (store_info->begin & (GET_MODE_SIZE (mode) - 1)));
> +      mask = (((HOST_WIDE_INT) 1 << GET_MODE_SIZE (mode)) - 1) << offset;
> +      if ((adjacent_store->mask & mask) == mask
> +	  && !SLOW_UNALIGNED_ACCESS (mode, adjacent_store->alignment[offset]))
> +	{
> +	  chosen_mode = mode;
> +	  chosen_offset = offset;
> +	}
> +    }
> +  if (chosen_mode == VOIDmode)
> +    return;

I might be misreading this but it seems that for strict-aligned targets we
will never perform this optimization.  I think the cost model should be more
flexible than what the SLOW_UNALIGNED_ACCESS/STRICT_ALIGNMENT flags allow.
Like shouldn't this still be optimized on a strict-aligned target when
SLOW_BYTE_ACCESS is defined and the original access is byte sized?

It probably should on MIPS for example where unaligned accesses are typically
twice as slow as aligned so optimizting a series of byte accesses into a
unaligned store of 32-bit or more is a clear win.  Maybe relying on
TARGET_RTX_COSTS would give targets more flexibility to fine-tune this
optimization.

Adam


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]