This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative

On 08/02/18 19:00, Martin Sebor wrote:
> As an alternate approach I have been thinking about, if
> there is a strong feeling that allowing strlen to iterate
> past the subobject boundary is necessary (I don't believe
> it is.)
> Rather than indiscriminately expanding the provenance of
> the subobject regardless of what members follow it in
> the enclosing structure, only consider doing that if
> the next member is an array of the same type.  E.g.,
>    struct S { char a[4], b[3], c[2], d; };
>    extern struct S *p;
>    strlen (p->a);   // consider p->a's bounds to be char[9]

No, initially I thought in the same direction,
but looking at the way how X-server is broken,
I realized that will probably not be sufficient.

Maybe it would be good to have one set of optimistic range infos
that follow the standards, and can be used to control the warnings,
and another set of pessimistic range infos, that control optimizations.

extern char garbage[10];
char x[10];
memcpy(x, garbage, 10);
x[9] = 0;

strlen(x) will be 0..9 no matter what was in garbage.
That information can be safely used for optimizations.

But if we have
extern char garbage[10];
char x[10];
memcpy(x, garbage, 10);
char *y = x;
if (strlen(y) < 10) <= may not be removed pessimistically

char z[10];
sprintf(z, "%s", y); <= omitting warning would be okay optimistically.

It is not really easy to do but possible.

In the moment I would like to concentrate exclusively on wrong code issues
and not new warnings, even some regressions on the warnings look
acceptable to me.


> I.e., treat p->a, p->b, and p->c as one array but exclude
> from it d because it's not an array.
> (This wouldn't solve the warning problem below -- a separate
> computation would still be necessary to determine the tighter
> bound of the member itself.)
> On 08/02/2018 09:42 AM, Martin Sebor wrote:
>> On 08/02/2018 04:22 AM, Bernd Edlinger wrote:
>>> Hi,
>>> this is an update of the patch to prevent unsafe optimizations due to
>>> strlen range assuming
>>> always zero-terminated char arrays.
>>> Since the previous version I do no longer try to locate the outermost
>>> char array,
>>> but just bail out if there is a typecast, that means, supposed we have
>>> a 2-dimensional
>>> array, char a[x][y], strlen (s.a[x]) may assume a[x] is
>>> zero-terminated, if the optimization
>>> is enabled.  strlen ((char*)s.a) involves a type cast and should
>>> assume nothing.
>>> Additionally due to the discussion, I came to the conclusion that this
>>> strlen range optimization
>>> should only be used when enabled with
>>> -fassume-zero-terminated-char-arrays or -Ofast.
>>> Note that I included the test case with the removed assertion as
>>> gcc/testsuite/gcc.dg/strlenopt-57.c
>>> Initially it would only miscompile with -Ofast, but with r263018 aka
>>> PR tree-optimization/86043, PR tree-optimization/86042 it was
>>> miscompiled with -O3 as well.
>>> I located the reason in gcc/tree-ssa-strlen.c (get_min_string_length)
>>> which did not
>>> check if the string constant is in fact zero terminated:
>>> @@ -3192,7 +3194,9 @@ get_min_string_length (tree rhs, bool *full_string
>>>        && TREE_READONLY (rhs))
>>>      rhs = DECL_INITIAL (rhs);
>>> -  if (rhs && TREE_CODE (rhs) == STRING_CST)
>>> +  if (rhs && TREE_CODE (rhs) == STRING_CST
>>> +      && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)),
>>> +                          TREE_STRING_LENGTH (rhs)) >= 0)
>>>      {
>>>        *full_string_p = true;
>>>        return strlen (TREE_STRING_POINTER (rhs));
>>> Fortunately this also shows a way how to narrow strlen return value
>>> expectations when
>>> we are able to positively prove that a string must be zero terminated.
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>> The warning bits are definitely not okay by me.  The purpose
>> of the warnings (-W{format,sprintf}-{overflow,truncation} is
>> to detect buffer overflows.  When a warning doesn't have access
>> to string length information for dynamically created strings
>> (like the strlen pass does) it uses array sizes as a proxy.
>> This is useful both to detect possible buffer overflows and
>> to prevent false positives for overflows that cannot happen
>> in correctly written programs.
>> By changing the logic to not consider the bounds of the array
>> the warnkngs will become prone to many false positives as is
>> evident from your changes to the tests (you had to explicitly
>> enable the new option).
>> In effect, the patch undoes a couple of years worth of fine
>> tuning of the warnings to strike a balance between true and
>> false positives.  That's completely unacceptable to me, and,
>> frankly, proposals along these lines are exceedingly
>> disruptive.
>> Beyond that, I don't have a problem with adding a new option
>> but I continue to strongly disagree with disabling the strlen
>> optimization by default.  It implies that by default GCC
>> targets invalid code.  If there is consensus that some new
>> option is necessary, the right setting is on by default,
>> along with warnings for programs that pass non-terminated
>> arrays to string functions.  I have already submitted a patch
>> that effect for strlen and const arrays and am working on
>> extending it to other built-in functions and dynamically
>> created strings (when I have a chance between defending
>> my past work).
>> Either way, if a new option is introduced, the array bound
>> computation for sprintf (and all other buffer overflow
>> warnings, likely including -Wrestrict) will need to be
>> decoupled from the option so the current behavior is
>> preserved.
>> As I have also said, the area of the provenance of subobject
>> vs enclosing object pointers is under discussion in WG14 and
>> the C Object Model study group.  The informal consensus so
>> far is to maintain the provenance of subobjects and introduce
>> some mechanism to make it possible to extend the provenance
>> to the enclosing object.  This would be a pervasive change,
>> not one just limited to strings.  So if introducing some new
>> option at this stage is thought to be necessary this should
>> be taken into consideration.
>> Martin

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]