[RFA][PATCH 1b/4] [PR tree-optimization/33562] Improve DSE of complex stores

Jeff Law law@redhat.com
Fri May 5 14:04:00 GMT 2017


On 05/05/2017 06:13 AM, Richard Sandiford wrote:
> Hi Jeff,
> 
> Jeff Law <law@redhat.com> writes:
>> +/* Compute the number of elements that we can trim from the head and
>> +   tail of ORIG resulting in a bitmap that is a superset of LIVE.
>> +
>> +   Store the number of elements trimmed from the head and tail in
>> +   TRIM_HEAD and TRIM_TAIL.  */
>> +
>> +static void
>> +compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
>> +{
>> +  /* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
>> +     extends through ref->size.  So we know that in the original bitmap
>> +     bits 0..ref->size were true.  We don't actually need the bitmap, just
>> +     the REF to compute the trims.  */
>> +
>> +  /* Now identify how much, if any of the tail we can chop off.  */
>> +  *trim_tail = 0;
>> +  int last_orig = (ref->size / BITS_PER_UNIT) - 1;
>> +  int last_live = bitmap_last_set_bit (live);
>> +  *trim_tail = (last_orig - last_live) & ~0x1;
>> +
>> +  /* Identify how much, if any of the head we can chop off.  */
>> +  int first_orig = 0;
>> +  int first_live = bitmap_first_set_bit (live);
>> +  *trim_head = (first_live - first_orig) & ~0x1;
>> +}
> 
> Can you remember why you needed to force the lengths to be even (the & ~0x1s)?
> I was wondering whether it might have been because trimming single bytes
> interferes with the later strlen optimisations, which the patch I just
> posted should fix.
> 
> I guess there's also a risk that trimming a byte from a memcpy that has
> a "nice" length could make things less efficient, but that could go both
> ways: changing a memcpy of 9 bytes to a mempcy of 8 bytes would be good,
> while changing from 8 to 7 might not be.  The same goes for even lengths
> too though, like 10->8 (good) and 16->14 (maybe not a win).  FWIW, it
> looks like the strlen pass uses:
> 
>        /* Don't adjust the length if it is divisible by 4, it is more efficient
>           to store the extra '\0' in that case.  */
>        if ((tree_to_uhwi (len) & 3) == 0)
>          return;
> 
> for that.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK if the strlen
> patch is OK?
It was primarily to avoid mucking up alignments of the start of the copy 
or leaving residuals at the end of a copy.  It's an idea I saw while 
scanning the LLVM implementation of DSE.  The fact that it avoids 
mucking things up for tree-ssa-strlen was a unplanned side effect.

I never did any real benchmarking either way.  If you've got any hard 
data which shows it's a bad idea, then let's remove it and deal with the 
tree-ssa-strlen stuff (as I noted you'd done this morning).

jeff



More information about the Gcc-patches mailing list