[PATCH] Make strlen range computations more conservative

Martin Sebor msebor@gmail.com
Wed Aug 8 02:33:00 GMT 2018


On 08/07/2018 11:44 AM, Richard Biener wrote:
> On August 7, 2018 4:37:00 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
>> On 08/07/2018 02:51 AM, Richard Biener wrote:
>>> On August 7, 2018 4:24:42 AM GMT+02:00, Martin Sebor
>> <msebor@gmail.com> wrote:
>>>> On 07/26/2018 02:55 AM, Richard Biener wrote:
>>>>> On Wed, 25 Jul 2018, Martin Sebor wrote:
>>>>>
>>>>>>> BUT - for the string_constant and c_strlen functions we are,
>>>>>>> in all cases we return something interesting, able to look
>>>>>>> at an initializer which then determines that type.  Hopefully.
>>>>>>> I think the strlen() folding code when it sets SSA ranges
>>>>>>> now looks at types ...?
>>>>>>>
>>>>>>> Consider
>>>>>>>
>>>>>>> struct X { int i; char c[4]; int j;};
>>>>>>> struct Y { char c[16]; };
>>>>>>>
>>>>>>> void foo (struct X *p, struct Y *q)
>>>>>>> {
>>>>>>>   memcpy (p, q, sizeof (struct Y));
>>>>>>>   if (strlen ((char *)(struct Y *)p + 4) < 7)
>>>>>>>     abort ();
>>>>>>> }
>>>>>>>
>>>>>>> here the GIMPLE IL looks like
>>>>>>>
>>>>>>>   const char * _1;
>>>>>>>
>>>>>>>   <bb 2> [local count: 1073741825]:
>>>>>>>   _5 = MEM[(char * {ref-all})q_4(D)];
>>>>>>>   MEM[(char * {ref-all})p_6(D)] = _5;
>>>>>>>   _1 = p_6(D) + 4;
>>>>>>>   _2 = __builtin_strlen (_1);
>>>>>>>
>>>>>>> and I guess Martin would argue that since p is of type struct X
>>>>>>> + 4 gets you to c[4] and thus strlen of that cannot be larger
>>>>>>> than 3.  But of course the middle-end doesn't work like that
>>>>>>> and luckily we do not try to draw such conclusions or we
>>>>>>> are somehow lucky that for the testcase as written above we do
>> not
>>>>>>> (I'm not sure whether Martins changes in this area would derive
>>>>>>> such conclusions in principle).
>>>>>>
>>>>>> Only if the strlen argument were p->c.
>>>>>>
>>>>>>> NOTE - we do not know the dynamic type here since we do not know
>>>>>>> the dynamic type of the memory pointed-to by q!  We can only
>>>>>>> derive that at q+4 there must be some object that we can
>>>>>>> validly call strlen on (where Martin again thinks strlen
>>>>>>> imposes constrains that memchr does not - sth I do not agree
>>>>>>> with from a QOI perspective)
>>>>>>
>>>>>> The dynamic type is a murky area.
>>>>>
>>>>> It's well-specified in the middle-end.  A store changes the
>>>>> dynamic type of the stored-to object.  If that type is
>>>>> compatible with the surrounding objects dynamic type that one
>>>>> is not affected, if not then the surrounding objects dynamic
>>>>> type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
>>>>> to somewhat control "compatibility" of subobjects.
>>>>
>>>> I never responded to this.  Using a dynamic (effective) type as
>>>> you describe it would invalidate the aggressive loop optimization
>>>> in the following:
>>>>
>>>>   void foo (struct X *p)
>>>>   {
>>>>       struct Y y = { "12345678" };
>>>>       memcpy (p, &y, sizeof (struct Y));
>>>>
>>>>       // *p effective type is now struct Y
>>>>
>>>>       int n = 0;
>>>>       while (p->c[n])
>>>>         ++n;
>>>>
>>>>       if (n < 7)
>>>>         abort ();
>>>>   }
>>>>
>>>> GCC unconditionally aborts, just as it does with strlen(p->c).
>>>> Why is that not wrong (in either case)?
>>>>
>>>> Because the code is invalid either way, for two reasons:
>>>
>>> No, because the storage has only 4 non-null characters starting at
>> offset 4?
>>
>> No, for the reasons below.  I made a mistake of making
>> the initializer string too short.  If we make it longer it
>> still aborts.  Say with this
>>
>>       struct Y y = { "123456789012345" };
>>
>> we end up with this DCE:
>>
>>  struct Y y;
>>
>>   <bb 2> :
>>   MEM[(char * {ref-all})p_6(D)] = 0x353433323130393837363534333231;
>>   __builtin_abort ();
>>
>> With -fdump-tree-cddce1-details (and a patch to show the upper
>> bound) we see:
>>
>>   Found loop 1 to be finite: upper bound found: 3.
>>
>> With -fno-aggressive-loop-optimizations the abort becomes
>> conditional because the array bound isn't considered. I would
>> expect you to know this since you implemented the feature.
>
> Honza added the array indexing part and it may very well be too aggressive. I have to take a closer look after vacation to tell. Can you open a PR and CC me there?

I opened bug 86884.

Martin

>
> Richard.
>
>>
>> Martin
>>>
>>>> 1) it accesses an object of (effective) type struct Y via
>>>>    an lvalue of type struct X (specifically via (*p).c)
>>>> 2) it relies on p->c
>>>>
>>>> The loop optimization relies on the exact same requirement
>>>> as the strlen one.  Either they are both valid or neither is.
>>>>
>>>> Martin
>>>
>



More information about the Gcc-patches mailing list