[PATCH] Make strlen range computations more conservative

Fri Aug 3 07:19:00 GMT 2018

On 07/25/2018 01:23 AM, Richard Biener wrote:
> On Tue, 24 Jul 2018, Bernd Edlinger wrote:
> 
>> On 07/24/18 23:46, Jeff Law wrote:
>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>>>> Hi!
>>>>
>>>> This patch makes strlen range computations more conservative.
>>>>
>>>> Firstly if there is a visible type cast from type A to B before passing
>>>> then value to strlen, don't expect the type layout of B to restrict the
>>>> possible return value range of strlen.
>>> Why do you think this is the right thing to do?  ie, is there language
>>> in the standards that makes you think the code as it stands today is
>>> incorrect from a conformance standpoint?  Is there a significant body of
>>> code that is affected in an adverse way by the current code?  If so,
>>> what code?
>>>
>>>
>>
>> I think if you have an object, of an effective type A say char[100], then
>> you can cast the address of A to B, say typedef char (*B)[2] for instance
>> and then to const char *, say for use in strlen.  I may be wrong, but I think
>> that we should at least try not to pick up char[2] from B, but instead
>> use A for strlen ranges, or leave this range open.  Currently the range
>> info for strlen is [0..1] in this case, even if we see the type cast
>> in the generic tree.
> 
> You raise a valid point - namely that the middle-end allows
> any object (including storage with a declared type) to change
> its dynamic type (even of a piece of it).  So unless you can
> prove that the dynamic type of the thing you are looking at
> matches your idea of that type you may not derive any string
> lengths (or ranges) from it.
> 
> BUT - for the string_constant and c_strlen functions we are,
> in all cases we return something interesting, able to look
> at an initializer which then determines that type.  Hopefully.
> I think the strlen() folding code when it sets SSA ranges
> now looks at types ...?
I'm leaning towards a similar conclusion, namely that we can only rely
on type information for the pointer that actually gets passed to strlen,
which 99.9% of the time is (char *), potentially with const qualifiers.

It's tempting to look back through the cast to find a cast from a char
array but I'm more and more concerned that it's not safe unless we can
walk back to an initializer.

What this might argue is that we need to distinguish between a known
range and a likely range.  I really dislike doing that again.  We may
have to see more real world cases where the likely range allows us to
improve the precision of the sprintf warnings (since that's really the
goal of improved string length ranges).

> 
> Consider
> 
> struct X { int i; char c[4]; int j;};
> struct Y { char c[16]; };
> 
> void foo (struct X *p, struct Y *q)
> {
>   memcpy (p, q, sizeof (struct Y));
>   if (strlen ((char *)(struct Y *)p + 4) < 7)
>     abort ();
> }
> 
> here the GIMPLE IL looks like
> 
>   const char * _1;
> 
>   <bb 2> [local count: 1073741825]:
>   _5 = MEM[(char * {ref-all})q_4(D)];
>   MEM[(char * {ref-all})p_6(D)] = _5;
>   _1 = p_6(D) + 4;
>   _2 = __builtin_strlen (_1);
> 
> and I guess Martin would argue that since p is of type struct X
> + 4 gets you to c[4] and thus strlen of that cannot be larger
> than 3.
But _1 is of type const char * and that's what's passed to strlen.  The
type of P and Q are irrelevant ISTM.

Jeff