[PATCH] detect missing nuls in address of const char (PR 87756)

Martin Sebor msebor@gmail.com
Wed Nov 7 00:05:00 GMT 2018


Jeff, I'd like to go ahead and commit the patch as is.  I believe
the use of the default argument is appropriate and in line with
GCC practice.  Please let me know if you have strong objections.
If I don't hear any I will proceed later this week

Thanks
Martin

On 10/30/2018 10:38 AM, Martin Sebor wrote:
> On 10/30/2018 09:54 AM, Jeff Law wrote:
>> On 10/30/18 9:44 AM, Martin Sebor wrote:
>>> On 10/30/2018 09:27 AM, Jeff Law wrote:
>>>> On 10/29/18 5:51 PM, Martin Sebor wrote:
>>>>> The missing nul detection fails when the argument of the %s or
>>>>> similar sprintf directive is the address of a non-nul character
>>>>> constant such as in:
>>>>>
>>>>>   const char c = 'a';
>>>>>   int f (void)
>>>>>   {
>>>>>     return snprintf (0, 0, "%s", &c);
>>>>>   }
>>>>>
>>>>> This is because the string_constant function only succeeds for
>>>>> arguments that refer to STRRING_CSTs, not to individual characters.
>>>>>
>>>>> For the same reason, calls to memchr() such as the one below aren't
>>>>> folded into constants:
>>>>>
>>>>>   const char d = '\0';
>>>>>   void* g (void)
>>>>>   {
>>>>>     return memchr (&d, 0, 1);
>>>>>   }
>>>>>
>>>>> To detect and diagnose the missing nul in the first example and
>>>>> to fold the second, the attached patch modifies string_constant
>>>>> to return a synthesized STRING_CST object for such references
>>>>> (while also indicating whether such an object is properly
>>>>> nul-terminated).
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> gcc-87756.diff
>>>>>
>>>>> PR tree-optimization/87756 - missing unterminated argument warning
>>>>> using address of a constant character
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>     PR tree-optimization/87756
>>>>>     * expr.c (string_constant): Synthesize a string literal from
>>>>>     the address of a constant character.
>>>>>     * tree.c (build_string_literal): Add an argument.
>>>>>     * tree.h (build_string_literal): Same.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>     PR tree-optimization/87756
>>>>>     * gcc.dg/builtin-memchr-2.c: New test.
>>>>>     * gcc.dg/builtin-memchr-3.c: Same.
>>>>>     * gcc.dg/warn-sprintf-no-nul-2.c: Same.
>>>>>
>>>>> Index: gcc/expr.c
>>>>> ===================================================================
>>>>> --- gcc/expr.c    (revision 265496)
>>>>> +++ gcc/expr.c    (working copy)
>>>>> @@ -11484,18 +11484,40 @@ string_constant (tree arg, tree
>>>>> *ptr_offset, tree
>>>>>      offset = off;
>>>>>      }
>>>>>
>>>>> -  if (!init || TREE_CODE (init) != STRING_CST)
>>>>> +  if (!init)
>>>>>      return NULL_TREE;
>>>>>
>>>>> +  *ptr_offset = offset;
>>>>> +
>>>>> +  tree eltype = TREE_TYPE (init);
>>>>> +  tree initsize = TYPE_SIZE_UNIT (eltype);
>>>>>    if (mem_size)
>>>>> -    *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (init));
>>>>> +    *mem_size = initsize;
>>>>> +
>>>>>    if (decl)
>>>>>      *decl = array;
>>>>>
>>>>> -  gcc_checking_assert (tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE
>>>>> (init)))
>>>>> -               >= TREE_STRING_LENGTH (init));
>>>>> +  if (TREE_CODE (init) == INTEGER_CST)
>>>>> +    {
>>>>> +      /* For a reference to (address of) a single constant character,
>>>>> +     store the native representation of the character in
>>>>> CHARBUF.   */
>>>>> +      unsigned char charbuf[MAX_BITSIZE_MODE_ANY_MODE /
>>>>> BITS_PER_UNIT];
>>>>> +      int len = native_encode_expr (init, charbuf, sizeof charbuf,
>>>>> 0);
>>>>> +      if (len > 0)
>>>>> +    {
>>>>> +      /* Construct a string literal with elements of ELTYPE and
>>>>> +         the representation above.  Then strip
>>>>> +         the ADDR_EXPR (ARRAY_REF (...)) around the STRING_CST.  */
>>>>> +      init = build_string_literal (len, (char *)charbuf, eltype);
>>>>> +      init = TREE_OPERAND (TREE_OPERAND (init, 0), 0);
>>>>> +    }
>>>>> +    }
>>>>>
>>>>> -  *ptr_offset = offset;
>>>>> +  if (TREE_CODE (init) != STRING_CST)
>>>>> +    return NULL_TREE;
>>>>> +
>>>>> +  gcc_checking_assert (tree_to_shwi (initsize) >= TREE_STRING_LENGTH
>>>>> (init));
>>>>> +
>>>>>    return init;
>>>>>  }
>>>>>
>>>>> Index: gcc/tree.c
>>>>> ===================================================================
>>>>> --- gcc/tree.c    (revision 265496)
>>>>> +++ gcc/tree.c    (working copy)
>>>>
>>>>> Index: gcc/tree.h
>>>>> ===================================================================
>>>>> --- gcc/tree.h    (revision 265496)
>>>>> +++ gcc/tree.h    (working copy)
>>>>> @@ -4194,7 +4194,7 @@ extern tree
>>>>> build_call_expr_internal_loc_array (lo
>>>>>  extern tree maybe_build_call_expr_loc (location_t, combined_fn, tree,
>>>>>                         int, ...);
>>>>>  extern tree build_alloca_call_expr (tree, unsigned int,
>>>>> HOST_WIDE_INT);
>>>>> -extern tree build_string_literal (int, const char *);
>>>>> +extern tree build_string_literal (int, const char *, tree =
>>>>> char_type_node);
>>>>>
>>>>>  /* Construct various nodes representing data types.  */
>>>> There's only about a dozen calls to build_string_literal.  Instead of
>>>> defaulting the argument, just fix them.    OK with that change.  Make
>>>> sure to catch those in config/{rs6000,i386}/ and cp/
>>>
>>> Why?  Default arguments (and overloading) exist in C++ to deal
>>> with just this case: to avoid having to provide the common
>>> argument value while letting callers provide a different value
>>> when they need to.  What purpose will it serve to make these
>>> unnecessary changes and to force new callers to provide
>>> the default argument value?  It will only make using
>>> the function more error-prone and its callers harder
>>> to read.I find them much harder to read especially once you get more
>>> than one.
>> In cases where we have a small number of call sites we should just fix
>> them.  I think that we're well in that range here.  If there's a large
>> number of call sites, then overloading via default args makes plenty of
>> sense.
>
> Sorry, but I still don't see why or agree with the rule.  There
> is just one call site out of the 14 that provides a value other
> than the default.  There are APIs in GCC with default arguments
> with fewer calls than that.
>
> Why do you think we should change the other 14 callers only to have
> the function do for them what it does now, or that it's a good rule
> to follow in general?  And where is the threshold between a small
> and large number of call sites?  What if I'm adding a new API with
> just a handful of callers?
>
> Martin



More information about the Gcc-patches mailing list