[PATCH] handle local aggregate initialization in strlen (PR 83821)

Fri May 25 16:32:00 GMT 2018

On 05/25/2018 12:06 AM, Marc Glisse wrote:
> On Thu, 24 May 2018, Martin Sebor wrote:
>
>> On 05/24/2018 03:40 PM, Marc Glisse wrote:
>>> On Wed, 23 May 2018, Martin Sebor wrote:
>>>
>>>> On 05/23/2018 08:57 AM, Jeff Law wrote:
>>>>> On 05/10/2018 04:05 PM, Marc Glisse wrote:
>>>>>> On Thu, 10 May 2018, Martin Sebor wrote:
>>>>>>
>>>>>>> Can you please comment/respond to Jeff's question below and
>>>>>>> confirm whether my understanding of the restriction (below)
>>>>>>> is correct?
>>>>>>
>>>>>> I don't remember it at all, I really should have expanded that
>>>>>> comment...
>>>>>>
>>>>>> The documentation of nonzero_chars seems to indicate that, unless
>>>>>> full_string_p, it is only a lower bound on the length of the
>>>>>> string, so
>>>>>> not suitable for this kind of alias check. I don't know if we also
>>>>>> have
>>>>>> easy access to some upper bound.
>>>>>>
>>>>>> (I noticed while looking at this pass that it could probably use
>>>>>> POINTER_DIFF_EXPR more)
>>>>> So ISTM that we'd need to guard the code that uses
>>>>> si->nonzero_chars in
>>>>> maybe_invalidate to also check FULL_STRING_P since it appears we're
>>>>> using si->nonzero_chars as a string length.
>>>>
>>>> I'm not sure I see why.  Can you explain?
>>>>
>>>> Here's my explanation of the current approach.  si->nonzero_chars
>>>> is the lower bound on si's length.  maybe_invalidate() invalidates
>>>> the string length which is only necessary when something overwrites
>>>> one of the first si->nonzero_chars of the array.  It doesn't matter
>>>> whether si is nul-terminated at that point.
>>>
>>> When you say "invalidates the string length", does that mean it only
>>> invalidates nonzero_chars (the lower bound), or does that also determine
>>> if you can CSE a call to strlen before and after this instruction? Your
>>> argument only applies in the first case.
>>
>> "invalidates the string length" means that maybe_invalidate(STMT)
>> removes the length information record for a string whose length
>> might be modified by STMT.
>>
>> If the exact string length is known (full_string_p is true) then
>> its length can only be modified by writing a nul into one of
>> the leading nonzero_chars.  If the exact string length is not
>> known (i.e., full_string_p is false), then nonzero_chars is not
>> constant
>
> Why couldn't nonzero_chars be constant in that case?
>
> void f(const char*s){
>   s[0]='a';
>   // I know that strlen(s) is at least 1 here
> }

I was responding specifically to your question about the strlen()
CSE.  Above there is no call to strlen().  What I understood you
were asking about is something like:

   int f (char *s)
   {
     int n0 = strlen (s);
     s[7]='a';
     int n1 = strlen (s);   // can this be replaced by n0?
     return n0 == n1;
   }

At the point of the assignment, nonzero_chars for s is known but
not constant and so s's strinfo is invalidated/removed.

I should correct one thing: when a non-constant string length
has been computed and stored in nonzerro_chars full_string_p
is set to true (by handle_builtin_strlen).  But because
nonzero_chars is non-constant, writing into the string at
any offset invalidates the string length (the non-constant
nonzero_chars is effectively treated as PTRDIFF_MAX).

>> and writes into the string are assumed to change its
>> length, and thus invalidate it.
>>
>> But I'm not sure I understand from your brief description what
>> use case you have in mind.  If you have an example I'll test it.
>
> The example may be completely off, but just to try and describe what I
> am wondering about:
>
> int f(const char*s,char*restrict t,char c){
> // we know nothing about s for now.
> s[0]='a'; // nonzero_chars should now be 1
> strcpy(t,s);
> s[2]=c; // this is after nonzero_chars
> strcpy(t,s);
> }
>
> If we assume that writing to s[2] is after the end of the original
> string s, we can remove one of the strcpy. But that's not a valid
> transformation in general.

Above, s[0] creates strinfo for s with nonzero_chars == 1 and
full_string_p == false.  Then, after the first call to strcpy,
that same strinfo is invalidated/removed because the length of
s is unknown.  The next assignment to s[2] doesn't do anything
(in the strlen pass) because the right operand is not constant).
In the second strcpy there is no strinfo for s.

>
> I originally wanted to see if we could CSE calls to strlen(s) before and
> after the write, but the lhs of the first call to strlen would replace
> nonzero_chars. And I don't know so well what we use strinfo for and thus
> how dangerous it is not to invalidate it.

The pass does track non-constant strlen() results so I think
the optimization you describe should be possible.  For instance,
here it should be safe to eliminate the second strlen:

   int f (char *s)
   {
     s[0] = 'a';
     s[1] = 'b';
     int n0 = __builtin_strlen (s);

     s[0] = 'x';
     int n1 = __builtin_strlen (s);

     return n0 == n1;
   }

It isn't optimized today because (as I explained above) the first
strlen() replaces the constant nonzero_chars == 2 with the non-
constant result of strlen(s).

I have been thinking about enhancing the strlen pass to deal
with ranges (not just the VRP kind but ranges of string lengths).
With that, it would be possible to optimize the above because
in addition to replacing nonzero_chars with strlen(s), the first
strlen would also set s's length range to [2, PTRDIFF_MAX].

Martin