This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFA][PATCH 1b/4] [PR tree-optimization/33562] Improve DSE of complex stores
- From: Richard Sandiford <richard dot sandiford at linaro dot org>
- To: Jeff Law <law at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 05 May 2017 13:13:13 +0100
- Subject: Re: [RFA][PATCH 1b/4] [PR tree-optimization/33562] Improve DSE of complex stores
- Authentication-results: sourceware.org; auth=none
- References: <96ad6c5e-7c9a-946b-1b2a-b08c832d799d@redhat.com>
Hi Jeff,
Jeff Law <law@redhat.com> writes:
> +/* Compute the number of elements that we can trim from the head and
> + tail of ORIG resulting in a bitmap that is a superset of LIVE.
> +
> + Store the number of elements trimmed from the head and tail in
> + TRIM_HEAD and TRIM_TAIL. */
> +
> +static void
> +compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
> +{
> + /* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
> + extends through ref->size. So we know that in the original bitmap
> + bits 0..ref->size were true. We don't actually need the bitmap, just
> + the REF to compute the trims. */
> +
> + /* Now identify how much, if any of the tail we can chop off. */
> + *trim_tail = 0;
> + int last_orig = (ref->size / BITS_PER_UNIT) - 1;
> + int last_live = bitmap_last_set_bit (live);
> + *trim_tail = (last_orig - last_live) & ~0x1;
> +
> + /* Identify how much, if any of the head we can chop off. */
> + int first_orig = 0;
> + int first_live = bitmap_first_set_bit (live);
> + *trim_head = (first_live - first_orig) & ~0x1;
> +}
Can you remember why you needed to force the lengths to be even (the & ~0x1s)?
I was wondering whether it might have been because trimming single bytes
interferes with the later strlen optimisations, which the patch I just
posted should fix.
I guess there's also a risk that trimming a byte from a memcpy that has
a "nice" length could make things less efficient, but that could go both
ways: changing a memcpy of 9 bytes to a mempcy of 8 bytes would be good,
while changing from 8 to 7 might not be. The same goes for even lengths
too though, like 10->8 (good) and 16->14 (maybe not a win). FWIW, it
looks like the strlen pass uses:
/* Don't adjust the length if it is divisible by 4, it is more efficient
to store the extra '\0' in that case. */
if ((tree_to_uhwi (len) & 3) == 0)
return;
for that.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK if the strlen
patch is OK?
Thanks,
Richard
2017-05-05 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-ssa-dse.c (compute_trims): Remove restriction that the
trimmed amount must be even.
Index: gcc/tree-ssa-dse.c
===================================================================
--- gcc/tree-ssa-dse.c 2017-04-18 19:52:34.024592656 +0100
+++ gcc/tree-ssa-dse.c 2017-05-05 13:01:51.793723330 +0100
@@ -229,12 +229,12 @@ compute_trims (ao_ref *ref, sbitmap live
/* Now identify how much, if any of the tail we can chop off. */
int last_orig = (ref->size / BITS_PER_UNIT) - 1;
int last_live = bitmap_last_set_bit (live);
- *trim_tail = (last_orig - last_live) & ~0x1;
+ *trim_tail = last_orig - last_live;
/* Identify how much, if any of the head we can chop off. */
int first_orig = 0;
int first_live = bitmap_first_set_bit (live);
- *trim_head = (first_live - first_orig) & ~0x1;
+ *trim_head = first_live - first_orig;
if ((*trim_head || *trim_tail)
&& dump_file && (dump_flags & TDF_DETAILS))