This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )


On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
> On 08/02/18 04:44, Martin Sebor wrote:
>> Since the foundation of the patch is detecting and avoiding
>> the overly aggressive folding of unterminated char arrays,
>> besides issuing a warning for such arguments to strlen,
>> the patch also fixes pr86711 - wrong folding of memchr, and
>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>
>> The substance of the attached updated patch is unchanged,
>> I have just added test cases for the two additional bugs.
>>
>> Bernd, as I mentioned Wednesday, the patch supersedes
>> yours here:
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>
> 
> No problem, but I hope you understand, that I still uphold
> my patch.
> 
> So we have two patches now:
> - mine, fixing a wrong code bug,
> - yours, implementing a new warning and fixing a wrong
> code bug at the same time.
> 
> I will add a few comments to your patch below.
[ ... ]

So a lot of the comments are out of date, presumably because Martin
fixed the issues you pointed out in his second version of the patch.
But there's still some useful nuggets in your comments that are still
relevant.

FYI it appears that sometimes you comment above a chunk of code, and
other times below.  That makes it exceedingly difficult to figure out
the issue you're trying to raise.


> 
>> Martin
>>
>> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>>> Attached is an updated version of the patch that handles more
>>> instances of calling strlen() on a constant array that is not
>>> a nul-terminated string.
>>>
>>> No other functions except strlen are explicitly handled yet,
>>> and neither are constant arrays with braced-initializer lists
>>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>>> an independent solution for those (bug 86552).  Once those
>>> are handled the warning will be able to detect those as well.
>>>
>>> Tested on x86_64-linux.
>>>
>>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>>
>>>> The fix for bug 86532 has been checked in so this enhancement
>>>> can now be applied on top of it (with only minor adjustments).
>>>>
>>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>>>> In the discussion of my patch for pr86532 Bernd noted that
>>>>> GCC silently accepts constant character arrays with no
>>>>> terminating nul as arguments to strlen (and other string
>>>>> functions).
>>>>>
>>>>> The attached patch is a first step in detecting these kinds
>>>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>>>> The next step is to modify all other handlers of built-in
>>>>> functions to detect the same problem (not part of this patch).
>>>>> Yet another step is to detect these problems in arguments
>>>>> initialized using the non-string form:
>>>>>
>>>>>   const char a[] = { 'a', 'b', 'c' };
>>>>>
>>>>> This patch is meant to apply on top of the one for bug 86532
>>>>> (I tested it with an earlier version of that patch so there
>>>>> is code in the context that does not appear in the latest
>>>>> version of the other diff).
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>
>>
>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
>> PR tree-optimization/86711 - wrong folding of memchr
>> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/86714
>> 	PR tree-optimization/86711
>> 	PR tree-optimization/86552
>> 	* builtins.h (warn_string_no_nul): Declare..
>> 	(c_strlen): Add argument.
>> 	* builtins.c (warn_string_no_nul): New function.
>> 	(fold_builtin_strlen): Add argument.  Detect missing nul.
>> 	(fold_builtin_1): Adjust.
>> 	(string_length): Add argument and use it.
>> 	(c_strlen): Same.
>> 	(expand_builtin_strlen): Detect missing nul.
>> 	* expr.c (string_constant): Add arguments.  Detect missing nul
>> 	terminator and outermost declaration it's missing in.
>> 	* expr.h (string_constant): Add argument.
>> 	* fold-const.c (c_getstr): Change argument to bool*, rename
>> 	other arguments.
>> 	* fold-const-call.c (fold_const_call): Detect missing nul.
>> 	* gimple-fold.c (get_range_strlen): Add argument.
>> 	(get_maxval_strlen): Adjust.
>> 	* gimple-fold.h (get_range_strlen): Add argument.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/86714
>> 	PR tree-optimization/86711
>> 	PR tree-optimization/86552
>> 	* gcc.c-torture/execute/memchr-1.c: New test.
>> 	* gcc.c-torture/execute/pr86714.c: New test.
>> 	* gcc.dg/warn-strlen-no-nul.c: New test.
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index aa3e0d8..f4924d5 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>>  static rtx expand_builtin_expect (tree, rtx);
>>  static tree fold_builtin_constant_p (tree);
>>  static tree fold_builtin_classify_type (tree);
>> -static tree fold_builtin_strlen (location_t, tree, tree);
>> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>>  static tree fold_builtin_inf (location_t, tree, int);
>>  static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>>  static bool validate_arg (const_tree, enum tree_code code);
>> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>    return n;
>>  }
>>  
>> +/* For a call expression EXP to a function that expects a string argument,
>> +   issue a diagnostic due to it being a called with an argument NONSTR
>> +   that is a character array with no terminating NUL.  */
>> +
>> +void
>> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
>> +{
>> +  loc = expansion_point_location_if_in_system_header (loc);
>> +
>> +  bool warned;
>> +  if (exp)
>> +    {
>> +      if (!fndecl)
>> +	fndecl = get_callee_fndecl (exp);
>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>> +			   "%K%qD argument missing terminating nul",
>> +			   exp, fndecl);
>> +    }
>> +  else
>> +    {
>> +      gcc_assert (fndecl);
>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>> +			   "%qD argument missing terminating nul",
>> +			   fndecl);
>> +    }
>> +
>> +  if (warned && DECL_P (nonstr))
>> +    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
>> +}
>> +
>>  /* Compute the length of a null-terminated character string or wide
>>     character string handling character sizes of 1, 2, and 4 bytes.
>>     TREE_STRING_LENGTH is not the right way because it evaluates to
>> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>     accesses.  Note that this implies the result is not going to be emitted
>>     into the instruction stream.
>>  
>> +   When ARR is non-null and the string is not properly nul-terminated,
>> +   set *ARR to the declaration of the outermost constant object whose
>> +   initializer (or one of its elements) is not nul-terminated.
>> +
>>     The value returned is of type `ssizetype'.
>>  
>>     Unfortunately, string_constant can't access the values of const char
>>     arrays with initializers, so neither can we do so here.  */
>>  
>>  tree
>> -c_strlen (tree src, int only_value)
>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>  {
>>    STRIP_NOPS (src);
>> +
>> +  /* Used to detect non-nul-terminated strings in subexpressions
>> +     of a conditional expression.  When ARR is null, point it at
>> +     one of the elements for simplicity.  */
>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>> +  if (!arr)
>> +    arr = arrs;
> 
> now arr is always != NULL
Right.  It's renamed in the follow-up patch from Martin, but yes, it's
always non-null.  Note in this case your comment is after the code
you're referring to.

> 
>> +
>>    if (TREE_CODE (src) == COND_EXPR
>>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>      {
>> -      tree len1, len2;
>> -
>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>        if (tree_int_cst_equal (len1, len2))
>> -	return len1;
>> +	{
> 
> funny, if called with NULL *arr and arrs[0] alias each other.

> 
>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>> +	  return len1;
>> +	}
>>      }

And in this case it looks like your comment is before the code you're
commenting about.  It was fairly obvious in this case because the code
prior to your "funny, if called..." comment didn't reference arr or arrs
at all.

And more importantly, why are you concerned about the aliasing?





>>  
>>    if (TREE_CODE (src) == COMPOUND_EXPR
>>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>> -    return c_strlen (TREE_OPERAND (src, 1), only_value);
>> +    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
>>  
>>    location_t loc = EXPR_LOC_OR_LOC (src, input_location);
>>  
>>    /* Offset from the beginning of the string in bytes.  */
>>    tree byteoff;
>> -  src = string_constant (src, &byteoff);
>> -  if (src == 0)
>> -    return NULL_TREE;
>> +  /* Set if array is nul-terminated, false otherwise.  */
>> +  bool nulterm;
> 
> note arr is always != null or pointing to arrs[0].
> 
>> +  src = string_constant (src, &byteoff, &nulterm, arr);
>> +  if (!src)
>> +    {
>> +      *arr = arrs[0] ? arrs[0] : arrs[1];
>> +      return NULL_TREE;
>> +    }
>> +
>> +  /* Clear *ARR when the string is nul-terminated.  It should be
>> +     of no interest to callers.  */
>> +  if (nulterm)
>> +    *arr = NULL_TREE;
>>  
>>    /* Determine the size of the string element.  */
>>    unsigned eltsize
>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>  	maxelts = maxelts / eltsize - 1;
>>        }
>>  
>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>> +     ARR, fail if the terminating nul doesn't fit in the array the string
>> +     is stored in (as in const char a[3] = "123";  */
> 
> note arr is always != NULL, thus this if is never taken.
> 
>> +  if (!arr && maxelts < strelts)
>> +    return NULL_TREE;
>> +
Right.  And I think that check is gone in the second version of Martin's
patch.


>>    /* PTR can point to the byte representation of any string type, including
>>       char* and wchar_t*.  */
>>    const char *ptr = TREE_STRING_POINTER (src);
>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>        offsave = fold_convert (ssizetype, offsave);
>>        tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>  				      build_int_cst (ssizetype, len * eltsize));
> 
> this computation is wrong, it computes not in units of eltsize,
> I am however not sure if it is really good that this function tries to
> compute strlen of wide character strings.
Note that in the second version of Martin's patch eltsize will always be
1 when we get here.  Furthermore, the multiplication by eltsize is gone.
 It looks like you switched back to commenting after the code for that
comment, but then immediately thereafter...


> 
> That said, please fix this computation first, in a different patch
> instead of just fixing the indentation. (I know I pointed that lines are too
> long here, but that was before I realized that the whole length computation
> here is wrong).
> 
>> -      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
>> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
>> +				     offsave);
>>        return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>>  			      build_zero_cst (ssizetype));
>>      }
>> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
>>       Since ELTOFF is our starting index into the string, no further
>>       calculation is needed.  */
The comment shown above appears to refer to the code below the comment.
Again, this makes it exceedingly confusing to understand your comments
and take appropriate action.




> 
> What are you fixing here, I think that was another bug.
> If this fixes something then it should be in a different patch,
> just handling this.
> 
>>    unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>> -				maxelts - eltoff);
>> +				strelts - eltoff);
>>  
>>    return ssize_int (len);
>>  }
I'm guessing the comment in this case refers to the code after.
Presumably questioning the change from maxelts to strelts.




>> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
>>  
>>    struct expand_operand ops[4];
>>    rtx pat;
>> -  tree len;
>>    tree src = CALL_EXPR_ARG (exp, 0);
>>    rtx src_reg;
>>    rtx_insn *before_strlen;
>> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
>>    unsigned int align;
>>  
>>    /* If the length can be computed at compile-time, return it.  */
>> -  len = c_strlen (src, 0);
>> +  tree array;
>> +  tree len = c_strlen (src, 0, &array);
> 
> You know the c_strlen tries to compute wide character sizes,
> but strlen does not do that, strlen (L"abc") should give 1
> (or 0 on a BE machine)
> I wonder if that is correct.
So I think this is fixed by your change which restored the default
behavior of c_strlen to count bytes.  Which restores the behavior of
c_strlen to match that of the strlen library call.

So for something like L"abc", the right value is 1 or 0 for LE and BE
targets respectively.



> 
>>    if (len)
>> -    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
>> +    {
>> +      if (array)
>> +	{
>> +	  /* Array refers to the non-nul terminated constant array
>> +	     whose length is attempted to be computed.  */
> 
> I really wonder if it would not make more sense to have a
> nonterminated_string_constant_p instead.
Rather than a boolean I think we want a tree * so that we can bubble up
more information to the caller WRT non-terminated strings.

> 
> Last time I wanted to implement a warning in expand I faced the
> problem that inlined functions will get one warning per invocation?
Yea, but that's just the way things are.  I don't think it's really an
issue for the patches we're looking at right now.

>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>    return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>  }
>>  
>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>  
>>  static tree
>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>  {
>>    if (!validate_arg (arg, POINTER_TYPE))
>>      return NULL_TREE;
>>    else
>>      {
>> -      tree len = c_strlen (arg, 0);
>> -
>> +      tree arr = NULL_TREE;
>> +      tree len = c_strlen (arg, 0, &arr);
> 
> Is it possible to write a test case where strlen(L"test") reaches this point?
> what will c_strlen return then?
Given your fix to have c_strlen count bytes by default, I think we're OK
here.

> 
>> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
>>  	return NULL_TREE;
>>  
>>        tree offset;
>> -      if (tree str = string_constant (arg0, &offset))
>> +      if (tree str = string_constant (arg0, &offset, nulterm, decl))
>>  	{
>>  	  /* Avoid pointers to arrays (see bug 86622).  */
>>  	  if (POINTER_TYPE_P (TREE_TYPE (arg))
>> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
>>    if (TREE_CODE (array) == STRING_CST)
> 
> Well, actually I think there _are_ STING_CSTs which are not null terminated.
> Maybe not in C. But Fortran, Ada, Go...
> 
>>      {
>>        *ptr_offset = fold_convert (sizetype, offset);
>> +      if (nulterm)
>> +	*nulterm = true;
>> +      if (decl)
>> +	*decl = NULL_TREE;
>>        return array;
>>      }
So your comment seems to be referring to the code above the comment as
well as below.  Confusing.  Consistency really helps.

Are we at a point where we're ready to declare STRING_CSTs as always
being properly terminated?  If not the hunk of code above needs some
rethinking since we can't guarantee the string is properly terminated.


>>  
>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>    if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>      return NULL_TREE;
>>  
>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>> +
> 
> I don't understand why this is necessary at all.
> It looks way too complicated, to say the least.
> 
> TREE_TYPE (init) has already the type of the member.
> 
>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>> +     the size of the character array within the aggregate.  */
>> +  tree ref = arg;
>> +  tree reftype = TREE_TYPE (arg);
>> +
>> +  if (TREE_CODE (ref) == MEM_REF)
>> +    {
>> +      ref = TREE_OPERAND (ref, 0);
>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>> +	{
>> +	  ref = TREE_OPERAND (ref, 0);
>> +	  reftype = TREE_TYPE (ref);
>> +	}
>> +    }
>> +  else
>> +    while (TREE_CODE (ref) == ARRAY_REF)
>> +      {
>> +	reftype = TREE_TYPE (ref);
>> +	ref = TREE_OPERAND (ref, 0);
>> +      }
>> +
>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>> +    reftype = TREE_TYPE (ref);
>> +
>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>> +    {
>> +      tree next = TREE_TYPE (reftype);
>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>> +	{
>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>> +	    if (tree_fits_uhwi_p (size))
>> +	      array_elts = tree_to_uhwi (size);
> 
> so array_elts is measued in bytes.
So I'm guessing your comment about the code looking way too complicated
is referring to the code *after* your comment.  That code is not in the
v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
haven't checked the full kit to see if this code appears in a subsequent
patch.


> 
>> +	  break;
>> +	}
>> +
>> +      reftype = TREE_TYPE (reftype);
>> +    }
>> +
>> +  if (decl)
>> +    *decl = array;
>> +
>>    /* Avoid returning a string that doesn't fit in the array
>>       it is stored in, like
>>       const char a[4] = "abcde";
>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>    unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>    length = string_length (TREE_STRING_POINTER (init), charsize,
>>  			  length / charsize);
> 
> Some callers especially those where the wrong code happens, expect
> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
> But using string_length assume thy stop at the first nul char.
Example please?


> 
>> -  if (compare_tree_int (array_size, length + 1) < 0)
>> +  if (nulterm)
> 
> but here you compare bytes with length which is measued un chars.
> 
>> +    *nulterm = array_elts > length;
>> +  else if (array_elts <= length)
>>      return NULL_TREE;
>>  
>>    *ptr_offset = offset;
Seems wrong.  But my eyes are glazing over badly.  I'm going to have to
look at this and your subsequent comments tomorrow.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]