This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Merge adjacent stores of constants (PR middle-end/22141)

On Thu, Nov 27, 2008 at 10:15:59AM -0800, Adam Nemet wrote:
> Jakub Jelinek <> writes:
> > Unfortunately, when I said for PR37135 that my patch transforms that
> > bug into a dup of PR22141, it isn't the case, as that transformation
> > needs both DSE1 (with my PR37135 change) and combine.  But DSE2 is
> > run after reload and thus merging of adjacent stores is much harder
> > (not done in the patch below).  As the code already calls 
> > cselib_expand_value_rtx for rhs (not needed for i386/x86_64, but needed
> > e.g. for ppc/ppc64), I wonder if it just shouldn't also special case
> > common bitfield setting patterns (i.e. read from memory, and with a
> > constant, optionally or with some constant, store into the same memory;
> > as a follow-up patch).
> Isnt't this something that is better done at the tree-level in the bitfield
> folder?

Given that BIT_FIELD_REF restrictions were strictened a lot in 4.4, this
really isn't something you can do at the tree level, unless we come up
with a new representation of this stuff (and that wouldn't be 4.4 material).
It can't be done in the folder, given that you want to optimize say:
struct A { unsigned int i : 4; unsigned int j : 28; } u;

void foo ()
  u.i = 8;
... // u isn't touched here
  u.j = 16; // u.i and u.j writes together overwrite the whole SImode
	    // location, so it can be simply written as one constant using
	    // SImode store


void bar ()
  u.i |= 3;
  u.j |= 12;

etc., it would have to be a special very late pass (during TER or around
that time).  Something can't be done at the RTL level (e.g. we need to
ressurrect in some way optimize_bit_field_compare for PR37248, if
BIT_FIELD_REF isn't appropriate, then for very small structs using
VIEW_CONVERT_REF to some integer type and masking, if the struct
is large, then by taking address of it, casting to a REF_ALL pointer
and reading the bitfield bits together that way.

> I might be misreading this but it seems that for strict-aligned targets we
> will never perform this optimization.  I think the cost model should be more

The intent was certainly to optimize even on STRICT_ALIGNMENT targets
when the memory is sufficiently aligned, which still happens very often,
but apparently I should have written:
      if ((adjacent_store->mask & mask) == mask
	  && (GET_MODE_BITSIZE (mode) <= adjacent_store->alignment[offset]
	      || !SLOW_UNALIGNED_ACCESS (mode,
	  chosen_mode = mode;
	  chosen_offset = offset;

> It probably should on MIPS for example where unaligned accesses are typically
> twice as slow as aligned so optimizting a series of byte accesses into a
> unaligned store of 32-bit or more is a clear win.  Maybe relying on
> TARGET_RTX_COSTS would give targets more flexibility to fine-tune this
> optimization.

If MIPS supports unaligned accesses and they are just twice as slow as
aligned, I'd say it shouldn't have defined STRICT_ALIGNMENT.

In any case, I of course appreciate suggestions how to do a reasonable
cost model for this, but I'd say it can be done incrementally when the main
patch is in (if it is approved).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]