This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
On 08/24/18 08:36, Jeff Law wrote:
> On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
>> On 08/02/18 04:44, Martin Sebor wrote:
>>> Since the foundation of the patch is detecting and avoiding
>>> the overly aggressive folding of unterminated char arrays,
>>> besides issuing a warning for such arguments to strlen,
>>> the patch also fixes pr86711 - wrong folding of memchr, and
>>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>>
>>> The substance of the attached updated patch is unchanged,
>>> I have just added test cases for the two additional bugs.
>>>
>>> Bernd, as I mentioned Wednesday, the patch supersedes
>>> yours here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>>
>>
>> No problem, but I hope you understand, that I still uphold
>> my patch.
>>
>> So we have two patches now:
>> - mine, fixing a wrong code bug,
>> - yours, implementing a new warning and fixing a wrong
>> code bug at the same time.
>>
>> I will add a few comments to your patch below.
> [ ... ]
>
> So a lot of the comments are out of date, presumably because Martin
> fixed the issues you pointed out in his second version of the patch.
> But there's still some useful nuggets in your comments that are still
> relevant.
>
Yes, possible, and therefore I would like updated patches summarize
the changes.
> FYI it appears that sometimes you comment above a chunk of code, and
> other times below. That makes it exceedingly difficult to figure out
> the issue you're trying to raise.
>
Okydoky.
>
>>
>>> Martin
>>>
>>> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>>>> Attached is an updated version of the patch that handles more
>>>> instances of calling strlen() on a constant array that is not
>>>> a nul-terminated string.
>>>>
>>>> No other functions except strlen are explicitly handled yet,
>>>> and neither are constant arrays with braced-initializer lists
>>>> like const char a[] = { 'a', 'b', 'c' }; I am testing
>>>> an independent solution for those (bug 86552). Once those
>>>> are handled the warning will be able to detect those as well.
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>>>
>>>>> The fix for bug 86532 has been checked in so this enhancement
>>>>> can now be applied on top of it (with only minor adjustments).
>>>>>
>>>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>>>>> In the discussion of my patch for pr86532 Bernd noted that
>>>>>> GCC silently accepts constant character arrays with no
>>>>>> terminating nul as arguments to strlen (and other string
>>>>>> functions).
>>>>>>
>>>>>> The attached patch is a first step in detecting these kinds
>>>>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>>>>> The next step is to modify all other handlers of built-in
>>>>>> functions to detect the same problem (not part of this patch).
>>>>>> Yet another step is to detect these problems in arguments
>>>>>> initialized using the non-string form:
>>>>>>
>>>>>> const char a[] = { 'a', 'b', 'c' };
>>>>>>
>>>>>> This patch is meant to apply on top of the one for bug 86532
>>>>>> (I tested it with an earlier version of that patch so there
>>>>>> is code in the context that does not appear in the latest
>>>>>> version of the other diff).
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>
>>>>
>>>
>>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
>>> PR tree-optimization/86711 - wrong folding of memchr
>>> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
>>>
>>> gcc/ChangeLog:
>>>
>>> PR tree-optimization/86714
>>> PR tree-optimization/86711
>>> PR tree-optimization/86552
>>> * builtins.h (warn_string_no_nul): Declare..
>>> (c_strlen): Add argument.
>>> * builtins.c (warn_string_no_nul): New function.
>>> (fold_builtin_strlen): Add argument. Detect missing nul.
>>> (fold_builtin_1): Adjust.
>>> (string_length): Add argument and use it.
>>> (c_strlen): Same.
>>> (expand_builtin_strlen): Detect missing nul.
>>> * expr.c (string_constant): Add arguments. Detect missing nul
>>> terminator and outermost declaration it's missing in.
>>> * expr.h (string_constant): Add argument.
>>> * fold-const.c (c_getstr): Change argument to bool*, rename
>>> other arguments.
>>> * fold-const-call.c (fold_const_call): Detect missing nul.
>>> * gimple-fold.c (get_range_strlen): Add argument.
>>> (get_maxval_strlen): Adjust.
>>> * gimple-fold.h (get_range_strlen): Add argument.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR tree-optimization/86714
>>> PR tree-optimization/86711
>>> PR tree-optimization/86552
>>> * gcc.c-torture/execute/memchr-1.c: New test.
>>> * gcc.c-torture/execute/pr86714.c: New test.
>>> * gcc.dg/warn-strlen-no-nul.c: New test.
>>>
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> index aa3e0d8..f4924d5 100644
>>> --- a/gcc/builtins.c
>>> +++ b/gcc/builtins.c
>>> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>>> static rtx expand_builtin_expect (tree, rtx);
>>> static tree fold_builtin_constant_p (tree);
>>> static tree fold_builtin_classify_type (tree);
>>> -static tree fold_builtin_strlen (location_t, tree, tree);
>>> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>>> static tree fold_builtin_inf (location_t, tree, int);
>>> static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>>> static bool validate_arg (const_tree, enum tree_code code);
>>> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>> return n;
>>> }
>>>
>>> +/* For a call expression EXP to a function that expects a string argument,
>>> + issue a diagnostic due to it being a called with an argument NONSTR
>>> + that is a character array with no terminating NUL. */
>>> +
>>> +void
>>> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
>>> +{
>>> + loc = expansion_point_location_if_in_system_header (loc);
>>> +
>>> + bool warned;
>>> + if (exp)
>>> + {
>>> + if (!fndecl)
>>> + fndecl = get_callee_fndecl (exp);
>>> + warned = warning_at (loc, OPT_Wstringop_overflow_,
>>> + "%K%qD argument missing terminating nul",
>>> + exp, fndecl);
>>> + }
>>> + else
>>> + {
>>> + gcc_assert (fndecl);
>>> + warned = warning_at (loc, OPT_Wstringop_overflow_,
>>> + "%qD argument missing terminating nul",
>>> + fndecl);
>>> + }
>>> +
>>> + if (warned && DECL_P (nonstr))
>>> + inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
>>> +}
>>> +
>>> /* Compute the length of a null-terminated character string or wide
>>> character string handling character sizes of 1, 2, and 4 bytes.
>>> TREE_STRING_LENGTH is not the right way because it evaluates to
>>> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>> accesses. Note that this implies the result is not going to be emitted
>>> into the instruction stream.
>>>
>>> + When ARR is non-null and the string is not properly nul-terminated,
>>> + set *ARR to the declaration of the outermost constant object whose
>>> + initializer (or one of its elements) is not nul-terminated.
>>> +
>>> The value returned is of type `ssizetype'.
>>>
>>> Unfortunately, string_constant can't access the values of const char
>>> arrays with initializers, so neither can we do so here. */
>>>
>>> tree
>>> -c_strlen (tree src, int only_value)
>>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>> {
>>> STRIP_NOPS (src);
>>> +
>>> + /* Used to detect non-nul-terminated strings in subexpressions
>>> + of a conditional expression. When ARR is null, point it at
>>> + one of the elements for simplicity. */
>>> + tree arrs[] = { NULL_TREE, NULL_TREE };
>>> + if (!arr)
>>> + arr = arrs;
>>
>> now arr is always != NULL
> Right. It's renamed in the follow-up patch from Martin, but yes, it's
> always non-null. Note in this case your comment is after the code
> you're referring to.
>
>>
>>> +
>>> if (TREE_CODE (src) == COND_EXPR
>>> && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>> {
>>> - tree len1, len2;
>>> -
>>> - len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>>> - len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>>> + tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>>> + tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>> if (tree_int_cst_equal (len1, len2))
>>> - return len1;
>>> + {
>>
>> funny, if called with NULL *arr and arrs[0] alias each other.
>
>>
>>> + *arr = arrs[0] ? arrs[0] : arrs[1];
>>> + return len1;
>>> + }
>>> }
>
> And in this case it looks like your comment is before the code you're
> commenting about. It was fairly obvious in this case because the code
> prior to your "funny, if called..." comment didn't reference arr or arrs
> at all.
>
> And more importantly, why are you concerned about the aliasing?
>
It is just *arr = arrs[0] does nothing, but it looks like the author
was not aware of it. It may be okay, but causes head-scratching.
If you don't have the context you will think this does something different.
>
>
>
>
>>>
>>> if (TREE_CODE (src) == COMPOUND_EXPR
>>> && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>> - return c_strlen (TREE_OPERAND (src, 1), only_value);
>>> + return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
>>>
>>> location_t loc = EXPR_LOC_OR_LOC (src, input_location);
>>>
>>> /* Offset from the beginning of the string in bytes. */
>>> tree byteoff;
>>> - src = string_constant (src, &byteoff);
>>> - if (src == 0)
>>> - return NULL_TREE;
>>> + /* Set if array is nul-terminated, false otherwise. */
>>> + bool nulterm;
>>
>> note arr is always != null or pointing to arrs[0].
>>
>>> + src = string_constant (src, &byteoff, &nulterm, arr);
>>> + if (!src)
>>> + {
>>> + *arr = arrs[0] ? arrs[0] : arrs[1];
>>> + return NULL_TREE;
>>> + }
>>> +
>>> + /* Clear *ARR when the string is nul-terminated. It should be
>>> + of no interest to callers. */
>>> + if (nulterm)
>>> + *arr = NULL_TREE;
>>>
>>> /* Determine the size of the string element. */
>>> unsigned eltsize
>>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>> maxelts = maxelts / eltsize - 1;
>>> }
>>>
>>> + /* Unless the caller is prepared to handle it by passing in a non-null
>>> + ARR, fail if the terminating nul doesn't fit in the array the string
>>> + is stored in (as in const char a[3] = "123"; */
>>
>> note arr is always != NULL, thus this if is never taken.
>>
>>> + if (!arr && maxelts < strelts)
>>> + return NULL_TREE;
>>> +
> Right. And I think that check is gone in the second version of Martin's
> patch.
>
>
>>> /* PTR can point to the byte representation of any string type, including
>>> char* and wchar_t*. */
>>> const char *ptr = TREE_STRING_POINTER (src);
>>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>> offsave = fold_convert (ssizetype, offsave);
>>> tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>> build_int_cst (ssizetype, len * eltsize));
>>
>> this computation is wrong, it computes not in units of eltsize,
>> I am however not sure if it is really good that this function tries to
>> compute strlen of wide character strings.
> Note that in the second version of Martin's patch eltsize will always be
> 1 when we get here. Furthermore, the multiplication by eltsize is gone.
> It looks like you switched back to commenting after the code for that
> comment, but then immediately thereafter...
>
Acutally I had no idea that the second patch did resolve some of my comments
and which those were.
I had the impression that it is just splitting up of a large patch into
several smaller without reworking at the same time.
Once again, a summary what was changed would be helpful.
>
>>
>> That said, please fix this computation first, in a different patch
>> instead of just fixing the indentation. (I know I pointed that lines are too
>> long here, but that was before I realized that the whole length computation
>> here is wrong).
>>
>>> - tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
>>> + tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
>>> + offsave);
>>> return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>>> build_zero_cst (ssizetype));
>>> }
>>> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
>>> Since ELTOFF is our starting index into the string, no further
>>> calculation is needed. */
> The comment shown above appears to refer to the code below the comment.
> Again, this makes it exceedingly confusing to understand your comments
> and take appropriate action.
>
>
>
>
>>
>> What are you fixing here, I think that was another bug.
>> If this fixes something then it should be in a different patch,
>> just handling this.
>>
>>> unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>>> - maxelts - eltoff);
>>> + strelts - eltoff);
>>>
>>> return ssize_int (len);
>>> }
> I'm guessing the comment in this case refers to the code after.
> Presumably questioning the change from maxelts to strelts.
>
>
Yes. I was thinking that could be a patch of its own.
>
>
>>> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
>>>
>>> struct expand_operand ops[4];
>>> rtx pat;
>>> - tree len;
>>> tree src = CALL_EXPR_ARG (exp, 0);
>>> rtx src_reg;
>>> rtx_insn *before_strlen;
>>> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
>>> unsigned int align;
>>>
>>> /* If the length can be computed at compile-time, return it. */
>>> - len = c_strlen (src, 0);
>>> + tree array;
>>> + tree len = c_strlen (src, 0, &array);
>>
>> You know the c_strlen tries to compute wide character sizes,
>> but strlen does not do that, strlen (L"abc") should give 1
>> (or 0 on a BE machine)
>> I wonder if that is correct.
> So I think this is fixed by your change which restored the default
> behavior of c_strlen to count bytes. Which restores the behavior of
> c_strlen to match that of the strlen library call.
>
> So for something like L"abc", the right value is 1 or 0 for LE and BE
> targets respectively.
>
Well, yes, although I changed my mind on strlen(L"abc") meanwhile.
This may appear as if I contradict myself but, it is more like I learn.
The installed patch for the ELTSIZE parameter was in part inspired by the
thought about "not folding invalid stuff" that was forming in my mind
at that time, even before I wrote it down and sent it to this list.
So the patch does reject to give a value for strlen(L"abc") since
it is an invalid call.
Previously that was counting bytes in a wide char string, but I doubt
that was the original intention though.
>
>
>>
>>> if (len)
>>> - return expand_expr (len, target, target_mode, EXPAND_NORMAL);
>>> + {
>>> + if (array)
>>> + {
>>> + /* Array refers to the non-nul terminated constant array
>>> + whose length is attempted to be computed. */
>>
>> I really wonder if it would not make more sense to have a
>> nonterminated_string_constant_p instead.
> Rather than a boolean I think we want a tree * so that we can bubble up
> more information to the caller WRT non-terminated strings.
>
>>
>> Last time I wanted to implement a warning in expand I faced the
>> problem that inlined functions will get one warning per invocation?
> Yea, but that's just the way things are. I don't think it's really an
> issue for the patches we're looking at right now.
>
No.
>>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>> return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>> }
>>>
>>> -/* Fold a call to __builtin_strlen with argument ARG. */
>>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG. */
>>>
>>> static tree
>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>> {
>>> if (!validate_arg (arg, POINTER_TYPE))
>>> return NULL_TREE;
>>> else
>>> {
>>> - tree len = c_strlen (arg, 0);
>>> -
>>> + tree arr = NULL_TREE;
>>> + tree len = c_strlen (arg, 0, &arr);
>>
>> Is it possible to write a test case where strlen(L"test") reaches this point?
>> what will c_strlen return then?
> Given your fix to have c_strlen count bytes by default, I think we're OK
> here.
>
Yes. But my fix was still incomplete.
So incremental progress is necessary here.
>>
>>> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
>>> return NULL_TREE;
>>>
>>> tree offset;
>>> - if (tree str = string_constant (arg0, &offset))
>>> + if (tree str = string_constant (arg0, &offset, nulterm, decl))
>>> {
>>> /* Avoid pointers to arrays (see bug 86622). */
>>> if (POINTER_TYPE_P (TREE_TYPE (arg))
>>> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
>>> if (TREE_CODE (array) == STRING_CST)
>>
>> Well, actually I think there _are_ STING_CSTs which are not null terminated.
>> Maybe not in C. But Fortran, Ada, Go...
>>
>>> {
>>> *ptr_offset = fold_convert (sizetype, offset);
>>> + if (nulterm)
>>> + *nulterm = true;
>>> + if (decl)
>>> + *decl = NULL_TREE;
>>> return array;
>>> }
> So your comment seems to be referring to the code above the comment as
> well as below. Confusing. Consistency really helps.
>
> Are we at a point where we're ready to declare STRING_CSTs as always
> being properly terminated? If not the hunk of code above needs some
> rethinking since we can't guarantee the string is properly terminated.
>
No. That is still in the flux.
And I am not sure if the "properly terminated" will survive.
I am digging deep to the ground now.
And the correctness of the code in questing depends heavily on the
results. Therefore I would like to fix this from the ground.
Adding lots of new code that is based on wrong assumptions
will not help.
>
>>>
>>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>> if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>> return NULL_TREE;
>>>
>>> + unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>>> +
>>
>> I don't understand why this is necessary at all.
>> It looks way too complicated, to say the least.
>>
>> TREE_TYPE (init) has already the type of the member.
>>
>>> + /* When ARG refers to an aggregate (of arrays) try to determine
>>> + the size of the character array within the aggregate. */
>>> + tree ref = arg;
>>> + tree reftype = TREE_TYPE (arg);
>>> +
>>> + if (TREE_CODE (ref) == MEM_REF)
>>> + {
>>> + ref = TREE_OPERAND (ref, 0);
>>> + if (TREE_CODE (ref) == ADDR_EXPR)
>>> + {
>>> + ref = TREE_OPERAND (ref, 0);
>>> + reftype = TREE_TYPE (ref);
>>> + }
>>> + }
>>> + else
>>> + while (TREE_CODE (ref) == ARRAY_REF)
>>> + {
>>> + reftype = TREE_TYPE (ref);
>>> + ref = TREE_OPERAND (ref, 0);
>>> + }
>>> +
>>> + if (TREE_CODE (ref) == COMPONENT_REF)
>>> + reftype = TREE_TYPE (ref);
>>> +
>>> + while (TREE_CODE (reftype) == ARRAY_TYPE)
>>> + {
>>> + tree next = TREE_TYPE (reftype);
>>> + if (TREE_CODE (next) == INTEGER_TYPE)
>>> + {
>>> + if (tree size = TYPE_SIZE_UNIT (reftype))
>>> + if (tree_fits_uhwi_p (size))
>>> + array_elts = tree_to_uhwi (size);
>>
>> so array_elts is measued in bytes.
> So I'm guessing your comment about the code looking way too complicated
> is referring to the code *after* your comment. That code is not in the
> v2 patch. At least not 01/06 which addresses the codegen/opt issues. I
> haven't checked the full kit to see if this code appears in a subsequent
> patch.
>
No I meant the code between
/* When ARG refers to an aggregate (of arrays) try to determine
the size of the character array within the aggregate. */
and here. It is wrong (assuming C semantic on GIMPLE).
>
>>
>>> + break;
>>> + }
>>> +
>>> + reftype = TREE_TYPE (reftype);
>>> + }
>>> +
>>> + if (decl)
>>> + *decl = array;
>>> +
>>> /* Avoid returning a string that doesn't fit in the array
>>> it is stored in, like
>>> const char a[4] = "abcde";
>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>> unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>> length = string_length (TREE_STRING_POINTER (init), charsize,
>>> length / charsize);
>>
>> Some callers especially those where the wrong code happens, expect
>> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
>> But using string_length assume thy stop at the first nul char.
> Example please?
>
tree-ssa-forwprop.c:
str1 = string_constant (src1, &off1);
if (str1 == NULL_TREE)
break;
if (!tree_fits_uhwi_p (off1)
|| compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
|| compare_tree_int (len1, TREE_STRING_LENGTH (str1)
- tree_to_uhwi (off1)) > 0
Bernd.