Make tree-ssa-strlen.c handle partial unterminated strings

Richard Sandiford rdsandiford@googlemail.com
Sun May 7 10:18:00 GMT 2017


Jakub Jelinek <jakub@redhat.com> writes:
> On Fri, May 05, 2017 at 01:01:08PM +0100, Richard Sandiford wrote:
>> tree-ssa-strlen.c looks for cases in which a string is built up using
>> operations like:
>> 
>>     memcpy (a, "foo", 4);
>>     memcpy (a + 3, "bar", 4);
>>     int x = strlen (a);
>> 
>> As a side-effect, it optimises the non-final memcpys so that they don't
>> include the nul terminator.
>> 
>> However, after removing some "& ~0x1"s from tree-ssa-dse.c, the DSE pass
>> does this optimisation itself (because it can tell that later memcpys
>> overwrite the terminators).  The strlen pass wasn't able to handle these
>> pre-optimised calls in the same way as the unoptimised ones.
>> 
>> This patch adds support for tracking unterminated strings.
>
> I'm not sure I like the terminology (terminated vs. !terminated), I wonder
> if it wouldn't be better to add next to length field minimum_length field,
> length would be what it is now, tree representing the string length,
> while minimum_length would be just a guarantee that strlen (ptr) >=
> minimum_length, i.e. that in the first minimum_length bytes (best would be
> to guarantee that it is just a constant if non-NULL) are non-zero.
> It shouldn't be handled just by non-zero terminated memcpy, but e.g. even if
> you e.g. construct it byte by byte, etc.
>   a[0] = 'a';
>   a[1] = 'b';
>   a[2] = 'c';
>   a[3] = 'd';
>   a[4] = '\0';
>   x = strlen (a);
> etc., or
>   strcpy (a, "abcdefg");
>   strcpy (a + 8, "hijk");
>   a[7] = 'q';
>   x = strlen (a);
> or say by storing 4 non-zero bytes at a time...

I've got most of the way through a version that uses min_length instead.
But one thing that the terminated flag allows that a constant min_length
doesn't is:

  size_t
  f1 (char *a1)
  {
    size_t x = strlen (a1);
    char *a3 = a1 + x;
    a3[0] = '1';  // a1 length x + 1, unterminated  (min length x + 1)
    a3[1] = '2';  // a1 length x + 2, unterminated  (min length x + 2)
    a3[2] = '3';  // a1 length x + 3, unterminated  (min length x + 3)
    a3[3] = 0;    // a1 length x + 3, terminated
    return strlen (a1);
  }

For the a3[3] = 0, we know a3's min_length is 3 and so it's obvious
that we can convert its min_length to a length.  But even if we allow
a1's min_length to be nonconstant, it seems a bit risky to assume that
we can convert its min_length to a length as well.  It would only work
if the min_lengths in related strinfos are kept in sync, whereas it
ought to be safe to say that the minimum length of something is 0.

Having two separate fields could allow us to say that a1's length is x
and its minimum length is 3 in:

  a1[0] = '1';
  a1[1] = '2';
  a1[2] = '3';
  size_t x = strlen (a1);

which in turn would allow the final length of a1 in:

  a1[0] = '1';
  a1[1] = '2';
  a1[2] = '3';
  size_t x = strlen (a1);
  a1[3] = '4';   // drop a1's length here
  a1[4] = 0;

to be optimised to 4.  But we'd then have to drop the constant minimum for:

  a1[0] = '1';
  a1[1] = '2';
  a1[2] = '3';
  size_t x = strlen (a1);
  char *a3 = a1 + x;
  a3[0] = '4';  // a1's min_length is x + 1, not 4

in order to support:

  a3[1] = 0;    // a1's length is x + 1

even though the minimum of 4 would be correct.  So there will be very
few cases in which length is not null and not equal to min_length.

Maybe we could keep both pieces of information around if we had both a
terminated flag and a constant mininum length.

So I think that gives four possiblities:

  (1) Keep the terminated flag, but extend the original patch to handle
      strings built up a character at a time.  This would handle f1 above.

  (2) Replace the terminated flag with a constant minimum length, don't
      handle f1 above.

  (3) Replace the terminated flag with an arbitrary minimum length and
      ensure that it's always valid to copy the minimum length to the
      length when we do so for the final strinfo in a chain.

  (4) Keep the terminated flag and also add a constant minimum length.
      This would handle all the cases above.

Thanks,
Richard



More information about the Gcc-patches mailing list