[PATCH] implement -Wformat-diag, v2

Martin Sebor msebor@gmail.com
Tue Jun 18 19:21:00 GMT 2019


On 6/18/19 12:59 PM, Jeff Law wrote:
> On 5/22/19 10:42 AM, Martin Sebor wrote:
>> Attached is a revised implementation of the -Wformat-diag checker
>> incorporating the feedback I got on the first revision.
>>
>> Martin
>>
>> gcc-wformat-diag-checker.diff
>>
>> gcc/c-family/ChangeLog:
>>
>> 	* c-format.c (function_format_info::format_type): Adjust type.
>> 	(function_format_info::is_raw): New member.
>> 	(decode_format_type): Adjust signature.  Handle "raw" diag attributes.
>> 	(decode_format_attr): Adjust call to decode_format_type.
>> 	Avoid a redundant call to convert_format_name_to_system_name.
>> 	Avoid abbreviating the word "arguments" in a diagnostic.
>> 	(format_warning_substr): New function.
>> 	(avoid_dollar_number): Quote dollar sign in a diagnostic.
>> 	(finish_dollar_format_checking): Same.
>> 	(check_format_info): Same.
>> 	(struct baltoks_t): New.
>> 	(c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
>> 	(maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
>> 	functions.
>> 	(check_format_info_main): Call check_plain.  Use baltoks_t.  Call
>> 	maybe_diag_unbalanced_tokens.
>> 	(handle_format_attribute): Spell out the word "arguments" in
>> 	a diagnostic.
>>
>> gcc/testsuite/ChangeLog:
>> 	* gcc.dg/format/gcc_diag-11.c: New test.
> High level comment.  This is painful to read.  But that's probably an
> artifact of trying to cobble together C code to parse strings and codify
> the conventions.    ie, it's likely inherent for the problem you're
> trying to solve.

It wasn't exactly a lot of fun to write either.  I suspect it
would have been even worse if I had used regular expressions.
It is more complicated than strictly necessary because it's
trying to balance usability of the warning with efficiency.
(Although I'm sure both could be improved.)

>> diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
>> index a7f76c1c01d..713fce442c9 100644
>> --- a/gcc/c-family/c-format.c
>> +++ b/gcc/c-family/c-format.c
>> @@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree atname, tree args,
>>       {
>>         const char *p = IDENTIFIER_POINTER (format_type_id);
>>   
>> -      p = convert_format_name_to_system_name (p);
>> +      info->format_type = decode_format_type (p, &info->is_raw);
>>   
>> -      info->format_type = decode_format_type (p);
>> -
>>         if (!c_dialect_objc ()
>>   	   && info->format_type == gcc_objc_string_format_type)
>>   	{
> Did you mean to drop the call to convert_format_name_to_system_name?

Yes, it's redundant (it's the first thing decode_format_type does).

>> @@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info *fci,
>>     return true;
>>   }
>>   
>> +
>> +/* Helper for initializing global token_t arrays below.  */
>> +#define NAME(name) { name, sizeof name - 1, NULL }
>> +
>> +/* C/C++ operators that are expected to be quoted within the format
>> +   string.  */
>> +
>> +static const token_t c_opers[] =
>> +  {
>> +   NAME ("!="), NAME ("%="),  NAME ("&&"),  NAME ("&="), NAME ("*="),
>> +   NAME ("++"), NAME ("+="),  NAME ("--"),  NAME ("-="), NAME ("->"),
>> +   NAME ("/="), NAME ("<<"),  NAME ("<<="), NAME ("<="), NAME ("=="),
>> +   NAME (">="), NAME (">>="), NAME (">>"),  NAME ("?:"),  NAME ("^="),
>> +   NAME ("|="), NAME ("||")
>> +  };
> This clearly isn't a full list of operators.  Is there a particular
> reason why we aren't diagnosing others.  I guess you're catching the
> single character operators via the ISPUNCT checks?  That seems a little
> loose (@ isn't an operator for example).  It  may be OK in practice though.

Yes, it only handles two-character operators and its only purpose
is to make diagnostics more readable and less repetitive (otherwise
we'd get one for each occurrence of the punctuator). I think @ is
an operator in Objective-C (I only know this because I fixed a few
instances of it there).

>> +
>> +  if (nchars)
>> +    {
>> +      /* This is the most common problem: go the extra mile to decribe
> s/decribe/describe/
> 
> 
> 
>> +
>> +static void
>> +maybe_diag_unbalanced_tokens (location_t format_string_loc,
>> +			      const char *orig_format_chars,
>> +			      tree format_string_cst,
>> +			      baltoks_t &baltoks)
> Needs a function comment.
> 
> 
>> @@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results *res,
>>   
>>     init_dollar_format_checking (info->first_arg_num, first_fillin_param);
>>   
>> +  /* In GCC diagnostic functions check plain directives (substrings within
>> +     the format string that don't start with %) for quoting and punctuations
>> +     problems.  */
>> +  bool ck_plain = (!info->is_raw
>> +		   && (info->format_type == gcc_diag_format_type
>> +		       || info->format_type == gcc_tdiag_format_type
>> +		       || info->format_type == gcc_cdiag_format_type
>> +		       || info->format_type == gcc_cxxdiag_format_type));
>> +
>>     while (*format_chars != 0)
>>       {
>> -      if (*format_chars++ != '%')
>> +      if (ck_plain)
>> +	format_chars = check_plain (format_string_loc,
>> +				    format_string_cst,
>> +				    orig_format_chars, format_chars,
>> +				    baltoks);
>> +
>> +      if (*format_chars == 0 || *format_chars++ != '%')
>>   	continue;
>> +
>>         if (*format_chars == 0)
> Isn't the second test of *format_chars == 0 dead now?

It's there in case ck_plain is false.

> I'm going to throw this into my tester and see what, if anything, pops
> out while you fixup the nits above.  Assuming there isn't anything
> major, my inclination is to go forward.  We may over time improve the
> rules around diagnostics which will require iteration here, but this is
> clearly a step forward.

The warning is prevented from causing errors for now so with
the exception of bugs (ICEs or the suppression not working) it
shouldn't cause any noticeable fallout.  What I would expect
is -Wformat-diag instances in build logs pointing out all
the remaining violations that are yet to be fixed.  Especially
in back-end code for targets other than x86_64.

Martin



More information about the Gcc-patches mailing list