[PATCH] Make strlen range computations more conservative

Martin Sebor msebor@gmail.com
Thu Aug 2 17:00:00 GMT 2018


As an alternate approach I have been thinking about, if
there is a strong feeling that allowing strlen to iterate
past the subobject boundary is necessary (I don't believe
it is.)

Rather than indiscriminately expanding the provenance of
the subobject regardless of what members follow it in
the enclosing structure, only consider doing that if
the next member is an array of the same type.  E.g.,

   struct S { char a[4], b[3], c[2], d; };
   extern struct S *p;

   strlen (p->a);   // consider p->a's bounds to be char[9]

I.e., treat p->a, p->b, and p->c as one array but exclude
from it d because it's not an array.

(This wouldn't solve the warning problem below -- a separate
computation would still be necessary to determine the tighter
bound of the member itself.)

On 08/02/2018 09:42 AM, Martin Sebor wrote:
> On 08/02/2018 04:22 AM, Bernd Edlinger wrote:
>> Hi,
>>
>> this is an update of the patch to prevent unsafe optimizations due to
>> strlen range assuming
>> always zero-terminated char arrays.
>>
>> Since the previous version I do no longer try to locate the outermost
>> char array,
>> but just bail out if there is a typecast, that means, supposed we have
>> a 2-dimensional
>> array, char a[x][y], strlen (s.a[x]) may assume a[x] is
>> zero-terminated, if the optimization
>> is enabled.  strlen ((char*)s.a) involves a type cast and should
>> assume nothing.
>>
>> Additionally due to the discussion, I came to the conclusion that this
>> strlen range optimization
>> should only be used when enabled with
>> -fassume-zero-terminated-char-arrays or -Ofast.
>>
>> Note that I included the test case with the removed assertion as
>> gcc/testsuite/gcc.dg/strlenopt-57.c
>>
>> Initially it would only miscompile with -Ofast, but with r263018 aka
>> PR tree-optimization/86043, PR tree-optimization/86042 it was
>> miscompiled with -O3 as well.
>>
>> I located the reason in gcc/tree-ssa-strlen.c (get_min_string_length)
>> which did not
>> check if the string constant is in fact zero terminated:
>>
>> @@ -3192,7 +3194,9 @@ get_min_string_length (tree rhs, bool *full_string
>>        && TREE_READONLY (rhs))
>>      rhs = DECL_INITIAL (rhs);
>>
>> -  if (rhs && TREE_CODE (rhs) == STRING_CST)
>> +  if (rhs && TREE_CODE (rhs) == STRING_CST
>> +      && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)),
>> +                          TREE_STRING_LENGTH (rhs)) >= 0)
>>      {
>>        *full_string_p = true;
>>        return strlen (TREE_STRING_POINTER (rhs));
>>
>> Fortunately this also shows a way how to narrow strlen return value
>> expectations when
>> we are able to positively prove that a string must be zero terminated.
>>
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>
> The warning bits are definitely not okay by me.  The purpose
> of the warnings (-W{format,sprintf}-{overflow,truncation} is
> to detect buffer overflows.  When a warning doesn't have access
> to string length information for dynamically created strings
> (like the strlen pass does) it uses array sizes as a proxy.
> This is useful both to detect possible buffer overflows and
> to prevent false positives for overflows that cannot happen
> in correctly written programs.
>
> By changing the logic to not consider the bounds of the array
> the warnkngs will become prone to many false positives as is
> evident from your changes to the tests (you had to explicitly
> enable the new option).
>
> In effect, the patch undoes a couple of years worth of fine
> tuning of the warnings to strike a balance between true and
> false positives.  That's completely unacceptable to me, and,
> frankly, proposals along these lines are exceedingly
> disruptive.
>
> Beyond that, I don't have a problem with adding a new option
> but I continue to strongly disagree with disabling the strlen
> optimization by default.  It implies that by default GCC
> targets invalid code.  If there is consensus that some new
> option is necessary, the right setting is on by default,
> along with warnings for programs that pass non-terminated
> arrays to string functions.  I have already submitted a patch
> that effect for strlen and const arrays and am working on
> extending it to other built-in functions and dynamically
> created strings (when I have a chance between defending
> my past work).
>
> Either way, if a new option is introduced, the array bound
> computation for sprintf (and all other buffer overflow
> warnings, likely including -Wrestrict) will need to be
> decoupled from the option so the current behavior is
> preserved.
>
> As I have also said, the area of the provenance of subobject
> vs enclosing object pointers is under discussion in WG14 and
> the C Object Model study group.  The informal consensus so
> far is to maintain the provenance of subobjects and introduce
> some mechanism to make it possible to extend the provenance
> to the enclosing object.  This would be a pervasive change,
> not one just limited to strings.  So if introducing some new
> option at this stage is thought to be necessary this should
> be taken into consideration.
>
> Martin



More information about the Gcc-patches mailing list