[PATCH] implement -Wformat-diag, v2
Martin Sebor
msebor@gmail.com
Tue Jun 18 19:21:00 GMT 2019
On 6/18/19 12:59 PM, Jeff Law wrote:
> On 5/22/19 10:42 AM, Martin Sebor wrote:
>> Attached is a revised implementation of the -Wformat-diag checker
>> incorporating the feedback I got on the first revision.
>>
>> Martin
>>
>> gcc-wformat-diag-checker.diff
>>
>> gcc/c-family/ChangeLog:
>>
>> * c-format.c (function_format_info::format_type): Adjust type.
>> (function_format_info::is_raw): New member.
>> (decode_format_type): Adjust signature. Handle "raw" diag attributes.
>> (decode_format_attr): Adjust call to decode_format_type.
>> Avoid a redundant call to convert_format_name_to_system_name.
>> Avoid abbreviating the word "arguments" in a diagnostic.
>> (format_warning_substr): New function.
>> (avoid_dollar_number): Quote dollar sign in a diagnostic.
>> (finish_dollar_format_checking): Same.
>> (check_format_info): Same.
>> (struct baltoks_t): New.
>> (c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
>> (maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
>> functions.
>> (check_format_info_main): Call check_plain. Use baltoks_t. Call
>> maybe_diag_unbalanced_tokens.
>> (handle_format_attribute): Spell out the word "arguments" in
>> a diagnostic.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/format/gcc_diag-11.c: New test.
> High level comment. This is painful to read. But that's probably an
> artifact of trying to cobble together C code to parse strings and codify
> the conventions. ie, it's likely inherent for the problem you're
> trying to solve.
It wasn't exactly a lot of fun to write either. I suspect it
would have been even worse if I had used regular expressions.
It is more complicated than strictly necessary because it's
trying to balance usability of the warning with efficiency.
(Although I'm sure both could be improved.)
>> diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
>> index a7f76c1c01d..713fce442c9 100644
>> --- a/gcc/c-family/c-format.c
>> +++ b/gcc/c-family/c-format.c
>> @@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree atname, tree args,
>> {
>> const char *p = IDENTIFIER_POINTER (format_type_id);
>>
>> - p = convert_format_name_to_system_name (p);
>> + info->format_type = decode_format_type (p, &info->is_raw);
>>
>> - info->format_type = decode_format_type (p);
>> -
>> if (!c_dialect_objc ()
>> && info->format_type == gcc_objc_string_format_type)
>> {
> Did you mean to drop the call to convert_format_name_to_system_name?
Yes, it's redundant (it's the first thing decode_format_type does).
>> @@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info *fci,
>> return true;
>> }
>>
>> +
>> +/* Helper for initializing global token_t arrays below. */
>> +#define NAME(name) { name, sizeof name - 1, NULL }
>> +
>> +/* C/C++ operators that are expected to be quoted within the format
>> + string. */
>> +
>> +static const token_t c_opers[] =
>> + {
>> + NAME ("!="), NAME ("%="), NAME ("&&"), NAME ("&="), NAME ("*="),
>> + NAME ("++"), NAME ("+="), NAME ("--"), NAME ("-="), NAME ("->"),
>> + NAME ("/="), NAME ("<<"), NAME ("<<="), NAME ("<="), NAME ("=="),
>> + NAME (">="), NAME (">>="), NAME (">>"), NAME ("?:"), NAME ("^="),
>> + NAME ("|="), NAME ("||")
>> + };
> This clearly isn't a full list of operators. Is there a particular
> reason why we aren't diagnosing others. I guess you're catching the
> single character operators via the ISPUNCT checks? That seems a little
> loose (@ isn't an operator for example). It may be OK in practice though.
Yes, it only handles two-character operators and its only purpose
is to make diagnostics more readable and less repetitive (otherwise
we'd get one for each occurrence of the punctuator). I think @ is
an operator in Objective-C (I only know this because I fixed a few
instances of it there).
>> +
>> + if (nchars)
>> + {
>> + /* This is the most common problem: go the extra mile to decribe
> s/decribe/describe/
>
>
>
>> +
>> +static void
>> +maybe_diag_unbalanced_tokens (location_t format_string_loc,
>> + const char *orig_format_chars,
>> + tree format_string_cst,
>> + baltoks_t &baltoks)
> Needs a function comment.
>
>
>> @@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results *res,
>>
>> init_dollar_format_checking (info->first_arg_num, first_fillin_param);
>>
>> + /* In GCC diagnostic functions check plain directives (substrings within
>> + the format string that don't start with %) for quoting and punctuations
>> + problems. */
>> + bool ck_plain = (!info->is_raw
>> + && (info->format_type == gcc_diag_format_type
>> + || info->format_type == gcc_tdiag_format_type
>> + || info->format_type == gcc_cdiag_format_type
>> + || info->format_type == gcc_cxxdiag_format_type));
>> +
>> while (*format_chars != 0)
>> {
>> - if (*format_chars++ != '%')
>> + if (ck_plain)
>> + format_chars = check_plain (format_string_loc,
>> + format_string_cst,
>> + orig_format_chars, format_chars,
>> + baltoks);
>> +
>> + if (*format_chars == 0 || *format_chars++ != '%')
>> continue;
>> +
>> if (*format_chars == 0)
> Isn't the second test of *format_chars == 0 dead now?
It's there in case ck_plain is false.
> I'm going to throw this into my tester and see what, if anything, pops
> out while you fixup the nits above. Assuming there isn't anything
> major, my inclination is to go forward. We may over time improve the
> rules around diagnostics which will require iteration here, but this is
> clearly a step forward.
The warning is prevented from causing errors for now so with
the exception of bugs (ICEs or the suppression not working) it
shouldn't cause any noticeable fallout. What I would expect
is -Wformat-diag instances in build logs pointing out all
the remaining violations that are yet to be fixed. Especially
in back-end code for targets other than x86_64.
Martin
More information about the Gcc-patches
mailing list