This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Merge adjacent stores of constants (PR middle-end/22141)
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Adam Nemet <anemet at caviumnetworks dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Thu, 27 Nov 2008 21:31:47 +0100
- Subject: Re: [PATCH] Merge adjacent stores of constants (PR middle-end/22141)
- References: <20081127153824.GN17496@tyan-ft48-01.lab.bos.redhat.com> <o7abblt4ps.fsf@foo.home>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Nov 27, 2008 at 10:15:59AM -0800, Adam Nemet wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > Unfortunately, when I said for PR37135 that my patch transforms that
> > bug into a dup of PR22141, it isn't the case, as that transformation
> > needs both DSE1 (with my PR37135 change) and combine. But DSE2 is
> > run after reload and thus merging of adjacent stores is much harder
> > (not done in the patch below). As the code already calls
> > cselib_expand_value_rtx for rhs (not needed for i386/x86_64, but needed
> > e.g. for ppc/ppc64), I wonder if it just shouldn't also special case
> > common bitfield setting patterns (i.e. read from memory, and with a
> > constant, optionally or with some constant, store into the same memory;
> > as a follow-up patch).
>
> Isnt't this something that is better done at the tree-level in the bitfield
> folder?
Given that BIT_FIELD_REF restrictions were strictened a lot in 4.4, this
really isn't something you can do at the tree level, unless we come up
with a new representation of this stuff (and that wouldn't be 4.4 material).
It can't be done in the folder, given that you want to optimize say:
struct A { unsigned int i : 4; unsigned int j : 28; } u;
void foo ()
{
u.i = 8;
... // u isn't touched here
u.j = 16; // u.i and u.j writes together overwrite the whole SImode
// location, so it can be simply written as one constant using
// SImode store
}
or
void bar ()
{
u.i |= 3;
u.j |= 12;
}
etc., it would have to be a special very late pass (during TER or around
that time). Something can't be done at the RTL level (e.g. we need to
ressurrect in some way optimize_bit_field_compare for PR37248, if
BIT_FIELD_REF isn't appropriate, then for very small structs using
VIEW_CONVERT_REF to some integer type and masking, if the struct
is large, then by taking address of it, casting to a REF_ALL pointer
and reading the bitfield bits together that way.
> I might be misreading this but it seems that for strict-aligned targets we
> will never perform this optimization. I think the cost model should be more
The intent was certainly to optimize even on STRICT_ALIGNMENT targets
when the memory is sufficiently aligned, which still happens very often,
but apparently I should have written:
if ((adjacent_store->mask & mask) == mask
&& (GET_MODE_BITSIZE (mode) <= adjacent_store->alignment[offset]
|| !SLOW_UNALIGNED_ACCESS (mode,
adjacent_store->alignment[offset])))
{
chosen_mode = mode;
chosen_offset = offset;
}
> It probably should on MIPS for example where unaligned accesses are typically
> twice as slow as aligned so optimizting a series of byte accesses into a
> unaligned store of 32-bit or more is a clear win. Maybe relying on
> TARGET_RTX_COSTS would give targets more flexibility to fine-tune this
> optimization.
If MIPS supports unaligned accesses and they are just twice as slow as
aligned, I'd say it shouldn't have defined STRICT_ALIGNMENT.
In any case, I of course appreciate suggestions how to do a reasonable
cost model for this, but I'd say it can be done incrementally when the main
patch is in (if it is approved).
Jakub