Bug 52952 - Wformat location info is bad (wrong column number)
Summary: Wformat location info is bad (wrong column number)
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: unknown
: P3 normal
Target Milestone: ---
Assignee: David Malcolm
URL:
Keywords: diagnostic, patch
: 60287 71360 (view as bug list)
Depends on: 56856
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-12 21:22 UTC by Manuel López-Ibáñez
Modified: 2020-11-12 22:56 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-09-08 00:00:00


Attachments
Pass around the location of the format string (4.61 KB, patch)
2012-05-21 23:21 UTC, Steven Bosscher
Details | Diff
loc_token_hash (1.67 KB, patch)
2013-03-30 12:32 UTC, Manuel López-Ibáñez
Details | Diff
add locations and offsets to format warnings (7.54 KB, patch)
2013-03-30 15:03 UTC, Manuel López-Ibáñez
Details | Diff
add locations and offsets to format warnings + loc_token hash (8.91 KB, patch)
2013-04-01 14:20 UTC, Manuel López-Ibáñez
Details | Diff
create locations from loc + offset (8.21 KB, patch)
2014-10-04 23:36 UTC, Manuel López-Ibáñez
Details | Diff
dynamically create locations from loc + offset (9.90 KB, patch)
2014-10-05 13:11 UTC, Manuel López-Ibáñez
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Manuel López-Ibáñez 2012-04-12 21:22:14 UTC
#include <stdio.h>
void f() {
   printf("%.*d");
}

$ gcc-4.8 -fsyntax-only  -Wformat format-strings.c
format-strings.c: In function 'f':
format-strings.c:3:4: warning: field precision specifier '.*' expects a matching 'int' argument [-Wformat]
    printf("%.*d");
    ^
format-strings.c:3:4: warning: format '%d' expects a matching 'int' argument [-Wformat]
    printf("%.*d");
    ^

$ clang-3.1 -fsyntax-only format-strings.c
format-strings.c:3:14: warning: '.*' specified field precision is missing a matching 'int' argument
   printf("%.*d");
           ~~^~
Comment 1 Steven Bosscher 2012-05-17 21:04:56 UTC
extern int printf (__const char *__restrict __format, ...);
void f(void) {
    printf("%.*d");
}
Comment 2 Steven Bosscher 2012-05-17 21:19:28 UTC
To fix this properly, the input location should be tracked for the format string. The location of the format string as argument to printf is available in 
c-family/c-format.c:check_format_info() but there is no location information for the characters within the string. The location for every character in the format string should be recorded to pin-point the right line/column (think e.g. for a broken long line).
Comment 3 Steven Bosscher 2012-05-21 23:15:31 UTC
What does clang report for this:

#include <stdio.h>
void f() {
   printf(
"%."
"*d");
}

?
Comment 4 Steven Bosscher 2012-05-21 23:21:04 UTC
Created attachment 27466 [details]
Pass around the location of the format string

First, admittedly rather lame, attempt at some improvement. With this patch, the warning now points to the format string, at least:

$ cat pr525952.c 
extern int printf (__const char *__restrict __format, ...);
void f(void) {
    printf(
   "%."
"*d");
}
$ ./cc1 -quiet -Wformat pr525952.c 
pr525952.c: In function ‘f’:
pr525952.c:4:4: warning: field precision specifier ‘.*’ expects a matching ‘int’ argument [-Wformat]
    "%."
    ^
pr525952.c:4:4: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat]

Handling broken-up format lines is rather more difficult.  Not sure how to tackle that.
Comment 5 Jakub Jelinek 2012-05-22 08:29:08 UTC
The format string could be even something like
   void f() {
         __builtin_printf(
            u8R"abcd(%.)abcd"
               "*d");
                  }
So, the question is, if we have a way to find from the source_location on the ADDR_EXPR around STRING_CST the original non-concatenated tokens, and given index into the (possibly) translated STRING_CST redo the lexing of those tokens, looking at which byte in the source tokens maps to that offset in the STRING_CST.
Comment 6 Manuel López-Ibáñez 2012-05-22 09:54:29 UTC
(In reply to comment #3)
> What does clang report for this:
> 
> #include <stdio.h>
> void f() {
>    printf(
> "%."
> "*d");
> }
> 
> ?

/tmp/webcompile/_2090_0.c:5:2: warning: '.*' specified field precision is missing a matching 'int' argument
"*d");
~^~
1 warning generated.

They have an awesome online demo: http://llvm.org/demo/
Comment 7 Manuel López-Ibáñez 2012-05-22 10:55:49 UTC
(In reply to comment #5)
> The format string could be even something like
>    void f() {
>          __builtin_printf(
>             u8R"abcd(%.)abcd"
>                "*d");
>                   }
> So, the question is, if we have a way to find from the source_location on the
> ADDR_EXPR around STRING_CST the original non-concatenated tokens, and given
> index into the (possibly) translated STRING_CST redo the lexing of those
> tokens, looking at which byte in the source tokens maps to that offset in the
> STRING_CST.

Is it possible to re-preprocess some tokens only? Do the line-maps keep track of the locations of the tokens which make up a single string? If not, we would need to create new source locations on-the-fly. Is this possible at all?
Comment 8 Manuel López-Ibáñez 2012-05-22 11:04:41 UTC
(In reply to comment #3)
> What does clang report for this:
> 
> #include <stdio.h>
> void f() {
>    printf(
> "%."
> "*d");
> }
> 
> ?

An even more interesting example is this:

#define FORMAT ".*d"
#include <stdio.h>
void f() {
   printf(
"%"
FORMAT);
}

for which Clang prints:

/tmp/webcompile/_17428_0.c:6:1: warning: '.*' specified field precision is missing a matching 'int' argument
FORMAT);
^
/tmp/webcompile/_17428_0.c:1:18: note: expanded from:
#define FORMAT ".*d"
                 ^
1 warning generated.

So either one keeps track of all source locations of all "interesting" characters within strings, which sounds infeasible. Or one needs to re-preprocess the format string, creating new locations on-the-fly. Dodji, is this possible?
Comment 9 dodji@seketeli.org 2012-05-24 18:38:40 UTC
"manu at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> a écrit:

> So either one keeps track of all source locations of all "interesting"
> characters within strings, which sounds infeasible. Or one needs to
> re-preprocess the format string, creating new locations on-the-fly. Dodji, is
> this possible?

With the current infrastructure, I fear we cannot re-process the format
string *after* the initial pre-processing phase is done, to create new
locations that we'd a in the line maps.

However, briefly looking at the source code, we might be able to pull
this whole shebang off, in a way.

I am thinking that in gcc/c-family/c-format.c:check_format_info_main, we
can arrange for the instance of format_kind_info (that is the result of
parsing the format string) to carry the virtual location  of the
beginning of the format string *and* the offset of the relevant
character we might want to warn about.

Then, later the routines that trigger the warning by analyzing the instance of
format_kind_info could be passed the relevant location for the beginning
of the of the format string as well as the byte offset of the relevant
character I talked about above.  At warning type, it could re-construct
the exact column where of that relevant character, with its byte offset
and the virtual location of the beginning of the format string.

Does that make sense?
Comment 10 Manuel López-Ibáñez 2012-05-24 19:05:26 UTC
(In reply to comment #9)
> "manu at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> a écrit:
> 
> > So either one keeps track of all source locations of all "interesting"
> > characters within strings, which sounds infeasible. Or one needs to
> > re-preprocess the format string, creating new locations on-the-fly. Dodji, is
> > this possible?
> 
> With the current infrastructure, I fear we cannot re-process the format
> string *after* the initial pre-processing phase is done, to create new
> locations that we'd a in the line maps.

Could you elaborate on the reasons for this? Is it impossible to create new locations on the fly? 

As a brute-force approach, we at least should be able to re-preproces the whole file, no? Could we do this by invoking libcpp directly rather than calling a command? 

> Does that make sense?

This implies that the diagnostics code would need to handle a byte offset, no? 

And I am not sure this will handle well the case of split strings and macro expansion, like Clang does.
Comment 11 dodji@seketeli.org 2012-05-24 20:30:10 UTC
"manu at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> a écrit:

>> With the current infrastructure, I fear we cannot re-process the format
>> string *after* the initial pre-processing phase is done, to create new
>> locations that we'd a in the line maps.
>
> Could you elaborate on the reasons for this? Is it impossible to create new
> locations on the fly? 

Not really, no.  It is not practical to insert new locations inside an
existing location map because that would imply to update (modify) all
the maps that come after the one you have modified.  Also, that would
invalidate all the locations that have been encoded by the maps that you
would have updated.

Basically, the current encoding of the map requires that a new location
encoding in a map must always be the last location of that map.  You
cannot insert a location in the "middle" of an existing map.

> As a brute-force approach, we at least should be able to re-preproces the whole
> file, no?

I guess that would take re-processing the whole compilation unit,
starting from the location map that you have changed.  And, just handling
locations wouldn't be enough, we'd need to basically re-tokenize the
files that are re-processed, because the locations are primarily carried
by instances of cpp_token, and we need the locations of these tokens to
be updated.

That doesn't seem practical.

> Could we do this by invoking libcpp directly rather than calling a
> command?

Yes, even though it wouldn't be practical, in my opinion.

>
>> Does that make sense?
>
> This implies that the diagnostics code would need to handle a byte
> offset, no?

Yes, probably.

> And I am not sure this will handle well the case of split strings and macro
> expansion, like Clang does.

Yeah.  Which makes me think that maybe we might want to introduce a new
way to represent string literal tokens in libcpp, that keeps the
underlying raw format.  There would be a character oriented iterator API
for that string literal representation.  And that iterator API could
provide its user with the file/line/column information for the current
character.  And one could pass any such iterator to some of the
diagnostic routines.
Comment 12 Manuel López-Ibáñez 2012-05-25 14:09:00 UTC
> 
> Basically, the current encoding of the map requires that a new location
> encoding in a map must always be the last location of that map.  You
> cannot insert a location in the "middle" of an existing map.

Just continuing with the thought-experiment... would it be possible to duplicate the current line-table, then start re-encoding locations from the last location before the interesting character, and simply stop when the interesting character is reached? Then switch line-tables on the fly before giving the diagnostic, and switch back to the original line-table after it.

I guess Clang has a different encoding for source locations, because it seems they are generating new location indexes on the fly.

> I guess that would take re-processing the whole compilation unit,
> starting from the location map that you have changed.  And, just handling
> locations wouldn't be enough, we'd need to basically re-tokenize the
> files that are re-processed, because the locations are primarily carried
> by instances of cpp_token, and we need the locations of these tokens to
> be updated.
> 
> That doesn't seem practical.

Practical because of complexity, ugliness or computation time?


> > And I am not sure this will handle well the case of split strings and macro
> > expansion, like Clang does.
> 
> Yeah.  Which makes me think that maybe we might want to introduce a new
> way to represent string literal tokens in libcpp, that keeps the
> underlying raw format.  There would be a character oriented iterator API
> for that string literal representation.  And that iterator API could
> provide its user with the file/line/column information for the current
> character.  And one could pass any such iterator to some of the
> diagnostic routines.

This sounds interesting, but I still do not understand how it can handle macro expansions. Could you elaborate?
Comment 13 Jakub Jelinek 2012-05-25 14:17:41 UTC
I guess the C/C++ FEs could for non-trivial string literals put into a hash table mapping from locus_t (of ADDR_EXPR around STRING_CST) to the first cpp token for that string, then the diagnostic code could go from there.
Trivial string literal above would be a string literal that doesn't have any prefixes (L/u/u8/U and variants with R), isn't contatenated from several parts
and didn't need to be translated.  So, printf ("%.*d"); (the common case) wouldn't have to be recorded, while printf (R"<<<(%)>>>" "." R"(*)" "d"); would need that.  For "trivial" string literals you'd just shift the locus by the offset, for non-trivial look up the tokens and process them again, looking at where the corresponding byte would appear to come from.
Comment 14 Manuel López-Ibáñez 2012-06-04 11:18:46 UTC
(In reply to comment #13)
> I guess the C/C++ FEs could for non-trivial string literals put into a hash
> table mapping from locus_t (of ADDR_EXPR around STRING_CST) to the first cpp
> token for that string, then the diagnostic code could go from there.

Where is the best place to store this hash table? And you say the C/C++ FEs, but it seems to me this would need to be done in libcpp, no?

> Trivial string literal above would be a string literal that doesn't have any
> prefixes (L/u/u8/U and variants with R), isn't contatenated from several parts
> and didn't need to be translated.  So, printf ("%.*d"); (the common case)
> wouldn't have to be recorded, while printf (R"<<<(%)>>>" "." R"(*)" "d"); would
> need that.  For "trivial" string literals you'd just shift the locus by the
> offset, for non-trivial look up the tokens and process them again, looking at
> where the corresponding byte would appear to come from.

What do you mean by "process them again"? Once we have a mapping of location_t to individual token, the only thing to be done is look inside the string for the relevant character, no?

Well, this seems to require several big parts:

First, handle locations + offsets in diagnostics. Perhaps we need new warning_at_offset(location, offset, ...) functions? Or should we encode the offset in location_t somehow? If an offset parameter is passed explicitly, the diagnostics code would require a lot of changes to pass the offset around. Not sure what is the best way to implement this and avoid massive changes.

Second, building the hash table discussed above. Where should it be built? Where should it be stored? 

Third, location+offset to expanded_location code. This is the part that will use the above hash table and reprocess the tokens. Exactly how it will work is not entirely clear to me.

Fourth, use the new infrastructure in Wformat warnings and other places that inspect strings. Perhaps the offset could also be re-used to implement ranges, but the diagnostics code would need to handle them differently when used for ranges and when used to specify a location within a string.

I added Tom Tromey to the CC, perhaps he has some ideas about how to implement this.
Comment 15 Manuel López-Ibáñez 2013-03-30 12:32:08 UTC
(In reply to comment #13)
> I guess the C/C++ FEs could for non-trivial string literals put into a hash
> table mapping from locus_t (of ADDR_EXPR around STRING_CST) to the first cpp
> token for that string, then the diagnostic code could go from there.
> Trivial string literal above would be a string literal that doesn't have any
> prefixes (L/u/u8/U and variants with R), isn't contatenated from several parts
> and didn't need to be translated.  So, printf ("%.*d"); (the common case)
> wouldn't have to be recorded, while printf (R"<<<(%)>>>" "." R"(*)" "d"); would
> need that.  For "trivial" string literals you'd just shift the locus by the
> offset, for non-trivial look up the tokens and process them again, looking at
> where the corresponding byte would appear to come from.

Jakub, I am trying to implement this idea, but I am not sure I got the hash_table right. It looks overly complicated to me. Could you take a look?
Comment 16 Manuel López-Ibáñez 2013-03-30 12:32:57 UTC
Created attachment 29753 [details]
loc_token_hash
Comment 17 Manuel López-Ibáñez 2013-03-30 12:40:24 UTC
BTW, I have a patch based on Steven's one that is almost working and for simple strings handles a location+offset. However, there are several issues:

* Cases like:

1: const str[] = "something %d";
2: printf(str);

will report a location at 2 plus offset of %d, not nice.

* I didn't transform the most obscure format warnings. Some of them have no easy location at hand and others I simply don't know what we want to point at.

* It would be extremely nice to update the testsuite to check the locations are correct. This is unfortunately a lot of boring work, so if I cannot get help to do this, I hope it is not a pre-condition for approval.
Comment 18 stevenb.gcc@gmail.com 2013-03-30 12:54:12 UTC
> * It would be extremely nice to update the testsuite to check the locations are
> correct. This is unfortunately a lot of boring work, so if I cannot get help to
> do this, I hope it is not a pre-condition for approval.

You mean for existing test cases? I will help with that, just say
what/where/when ;-)
Comment 19 Manuel López-Ibáñez 2013-03-30 13:26:19 UTC
Unfortunately, there are some stuff like this:

 <addr_expr 0x7ffff708f680
    type <pointer_type 0x7ffff7094690
        type <array_type 0x7ffff7094540 type <integer_type 0x7ffff6ef3dc8 char>
            BLK
            size <integer_cst 0x7ffff708f540 constant 280>
            unit size <integer_cst 0x7ffff708f620 constant 35>
            align 8 symtab 0 alias set -1 canonical type 0x7ffff7094540 domain <integer_type 0x7ffff70943f0>
            pointer_to_this <pointer_type 0x7ffff7094690>>
        unsigned DI
        size <integer_cst 0x7ffff6ecddc0 constant 64>
        unit size <integer_cst 0x7ffff6ecdde0 constant 8>
        align 64 symtab 0 alias set -1 canonical type 0x7ffff7094690>
    readonly constant
    arg 0 <string_cst 0x7ffff7059038 type <array_type 0x7ffff7094540>
        readonly constant static "%s:%d: %s: Assertion '%s' failed.\012\000">>

so we cannot assert that the format_expr has no unknown location.
Comment 20 Manuel López-Ibáñez 2013-03-30 15:02:11 UTC
(In reply to comment #18)
> > * It would be extremely nice to update the testsuite to check the locations are
> > correct. This is unfortunately a lot of boring work, so if I cannot get help to
> > do this, I hope it is not a pre-condition for approval.
> 
> You mean for existing test cases? I will help with that, just say
> what/where/when ;-)

I personally think that it would be better to spend our extremely limited human resources on adding more explicit locations and offsets to the various warnings and checking that they are correct. This can be done by running with my current patch the following:

make check-gcc RUNTESTFLAGS='format.exp=*.c --target_board="unix/-fdiagnostics-show-caret"'

and checking if the output is what one would expect.

If later we regress, then we could update/add testcases.
Comment 21 Manuel López-Ibáñez 2013-03-30 15:03:14 UTC
Created attachment 29755 [details]
add locations and offsets to format warnings

My current patch, untested except for a few testcases.
Comment 22 Manuel López-Ibáñez 2013-04-01 14:17:45 UTC
(In reply to comment #13)
> and didn't need to be translated.  So, printf ("%.*d"); (the common case)
> wouldn't have to be recorded, while printf (R"<<<(%)>>>" "." R"(*)" "d"); would

BTW, your testcase does not work for me and confuses GCC:

/home/manuel/loc_offset.c:7:21: error: unterminated raw string
   __builtin_printf (R"<<<(%)>>>" R"(*d)" );
                     ^
/home/manuel/loc_offset.c:7:3: error: expected expression at end of input
   __builtin_printf (R"<<<(%)>>>" R"(*d)" );
   ^
/home/manuel/loc_offset.c:7:3: error: expected declaration or statement at end of input

It is also very unclear to me how to re-process such a token stream in order to find out the real location+offset.

A further issue is that for:

#define FORMAT ".*d"
#include <stdio.h>
void f() {
   printf(
"%"
FORMAT);
}

the token representing "%" is overriden when we call cpp_get_token next time:

@@ -1003,10 +1004,13 @@ lex_string (const cpp_token *tok, tree *
      concatenation.  It is 'true' if we have seen an '@' before the
      current string, and 'false' if not.  We must see exactly one or
      zero '@' before each string.  */
   bool objc_at_sign_was_seen = false;

+  // This token is overwritten! How can that happen?
+  const cpp_token *prev_tok = tok;
+
  retry:
   tok = cpp_get_token (parse_in);
   switch (tok->type)
     {
     case CPP_PADDING:

so prev_tok above may or may not change by the call of cpp_get_token.
Comment 23 Manuel López-Ibáñez 2013-04-01 14:20:16 UTC
Created attachment 29765 [details]
add locations and offsets to format warnings + loc_token hash

This patch works perfect for simple sequences of plain strings and when the whole format string comes from a macro expansion.
Comment 24 Paolo Carlini 2013-04-01 14:32:27 UTC
(Nit: careful with the GCC coding style, eg, open curly bracket on a new line)
Comment 25 Manuel López-Ibáñez 2013-04-05 22:02:26 UTC
I am currently stuck on the three problems I described above and I cannot figure out a way to fix any of them:

* How to reprocess tokens that need to be translated/have prefixes.

* The issue that the current token may get modified when we call cpp_get_token next time (I could store the full token, but that seems a bit waste of memory).

* Cases like:

1: const str[] = "something %d";
2: printf(str);

where we don't have (or I cannot find) the location of "something %d".

Any ideas? 

I am also not sure if I implemented correctly the hash_table. 

Another possibility is to make the offset work only with simple strings, if there is something strange, simply set the offset to zero. It is a poor solution, but it is better than nothing.
Comment 26 Manuel López-Ibáñez 2014-02-20 12:05:11 UTC
*** Bug 60287 has been marked as a duplicate of this bug. ***
Comment 27 Manuel López-Ibáñez 2014-08-05 18:44:04 UTC
(In reply to Manuel López-Ibáñez from comment #25)
> * Cases like:
> 
> 1: const str[] = "something %d";
> 2: printf(str);

Note that clang is able to handle this:

manuel@gcc10:~$ clang -Wformat  format.c
format.c:4:19: warning: more '%' conversions than data arguments [-Wformat]
 __builtin_printf(str);
                  ^~~
format.c:3:33: note: format string is defined here
 const char str[] = "something %d";
                               ~^
Comment 28 Manuel López-Ibáñez 2014-08-05 18:52:43 UTC
Testcase:

void foo(void)
{
  char str[] = "something %d";
 __builtin_printf(str);

}
Comment 29 Manuel López-Ibáñez 2014-08-19 02:02:44 UTC
Author: manu
Date: Tue Aug 19 02:02:09 2014
New Revision: 214129

URL: https://gcc.gnu.org/viewcvs?rev=214129&root=gcc&view=rev
Log:
gcc/c-family/ChangeLog:

2014-08-19  Manuel López-Ibáñez  <manu@gcc.gnu.org>
	    Steven Bosscher  <steven@gcc.gnu.org>

	PR c/52952
	* c-format.c: Add extra_arg_loc and format_string_loc to struct
	format_check_results.
	(check_function_format): Use true and add comment for boolean
	argument.
	(finish_dollar_format_checking): Use explicit location when warning.
	(check_format_info): Likewise.
	(check_format_arg): Set extra_arg_loc and format_string_loc.
	(check_format_info_main): Use explicit location when warning.
	(check_format_types): Pass explicit location.
	(format_type_warning): Likewise.

gcc/testsuite/ChangeLog:

2014-08-19  Manuel López-Ibáñez  <manu@gcc.gnu.org>
	    Steven Bosscher  <steven@gcc.gnu.org>

	PR c/52952
	* gcc.dg/redecl-4.c: Add column markers.
	* gcc.dg/format/bitfld-1.c: Likewise.
	* gcc.dg/format/attr-2.c: Likewise.
	* gcc.dg/format/attr-6.c: Likewise.
	* gcc.dg/format/array-1.c: Likewise.
	* gcc.dg/format/attr-7.c: Likewise.
	* gcc.dg/format/asm_fprintf-1.c: Likewise.
	* gcc.dg/format/attr-4.c: Likewise.
	* gcc.dg/format/branch-1.c: Likewise.
	* gcc.dg/format/c90-printf-1.c: Likewise.


Modified:
    trunk/gcc/c-family/ChangeLog
    trunk/gcc/c-family/c-format.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/format/array-1.c
    trunk/gcc/testsuite/gcc.dg/format/asm_fprintf-1.c
    trunk/gcc/testsuite/gcc.dg/format/attr-2.c
    trunk/gcc/testsuite/gcc.dg/format/attr-4.c
    trunk/gcc/testsuite/gcc.dg/format/attr-6.c
    trunk/gcc/testsuite/gcc.dg/format/attr-7.c
    trunk/gcc/testsuite/gcc.dg/format/bitfld-1.c
    trunk/gcc/testsuite/gcc.dg/format/branch-1.c
    trunk/gcc/testsuite/gcc.dg/format/c90-printf-1.c
    trunk/gcc/testsuite/gcc.dg/redecl-4.c
Comment 30 Manuel López-Ibáñez 2014-10-04 23:36:09 UTC
Created attachment 33647 [details]
create locations from loc + offset

This variant works for simple strings. However, it cannot handle even simple macros:

format.c:1:18: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat=]
 #define FORMAT "%d"
                  ^
format.c:8:24: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat=]
    __builtin_printf("a" FORMAT);
                        ^

whereas Clang can:

format.c:7:20: warning: more '%' conversions than data arguments [-Wformat]
  __builtin_printf(FORMAT);
                   ^
format.c:1:18: note: expanded from macro 'FORMAT'
#define FORMAT "%d"
                ~^
format.c:8:25: warning: more '%' conversions than data arguments [-Wformat]
   __builtin_printf("a" FORMAT);
                        ^
format.c:1:18: note: expanded from macro 'FORMAT'
#define FORMAT "%d"
                ~^
Comment 31 Manuel López-Ibáñez 2014-10-04 23:41:23 UTC
(In reply to Manuel López-Ibáñez from comment #30)
> Created attachment 33647 [details]
> create locations from loc + offset
> 
> This variant works for simple strings. However, it cannot handle even simple
> macros:

In addition, there is the issue that GCC does not track the location of initializers. Thus, I had to explicitly disable the offset computation in case of VAR_DECL.

Clang by comparison perfectly handles this case:

format.c:11:21: warning: more '%' conversions than data arguments [-Wformat]
   __builtin_printf(a);
                    ^
format.c:5:18: note: format string is defined here
const char a[] = FORMAT;
                 ^
format.c:1:18: note: expanded from macro 'FORMAT'
#define FORMAT "%d"
                ~^
Comment 32 Manuel López-Ibáñez 2014-10-05 13:11:38 UTC
Created attachment 33648 [details]
dynamically create locations from loc + offset
Comment 33 Manuel López-Ibáñez 2014-10-05 13:22:52 UTC
(In reply to Manuel López-Ibáñez from comment #32)
> Created attachment 33648 [details]
> dynamically create locations from loc + offset

This version is able to handle macros:

format.c: In function ‘foo’:
format.c:5:18: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat=]
 #define FORMAT "%d"
                  ^
format.c:9:20: note: in definition of macro ‘FORMAT3’
 #define FORMAT3(x) x
                    ^
format.c:21:29: note: in expansion of macro ‘FORMAT’
   __builtin_printf(FORMAT3 (FORMAT));
                             ^
format.c:23:31: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat=]
   __builtin_printf(FORMAT3 ("%d"));
                               ^
format.c:9:20: note: in definition of macro ‘FORMAT3’
 #define FORMAT3(x) x
                    ^
format.c:5:18: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat=]
 #define FORMAT "%d"
                  ^
format.c:28:20: note: in expansion of macro ‘FORMAT’
   __builtin_printf(FORMAT);
                    ^

The approach followed is based on reserving space for one token in every macro map. If this is expensive, the alternative could be to reserve the virtual location for the additional token, but not reserve the memory until the virtual location is requested. The code assumes that the new location is immediately used, otherwise a subsequent location may overwrite it. The problem is that the virtual locations of macro maps are consecutive, thus we cannot simply dynamically create virtual locations without reserving some of them in advance.


Perhaps this is good enough for a first commit? 

Dodji, what do you think?
Comment 34 Manuel López-Ibáñez 2014-10-05 13:39:36 UTC
(In reply to Manuel López-Ibáñez from comment #27)
> (In reply to Manuel López-Ibáñez from comment #25)
> > * Cases like:
> > 
> > 1: const str[] = "something %d";
> > 2: printf(str);
> 
> Note that clang is able to handle this:
> 
> manuel@gcc10:~$ clang -Wformat  format.c
> format.c:4:19: warning: more '%' conversions than data arguments [-Wformat]
>  __builtin_printf(str);
>                   ^~~
> format.c:3:33: note: format string is defined here
>  const char str[] = "something %d";
>                                ~^

Note to myself:

Supporting this requires tracking the location of the initializer. Perhaps we could track this explicitly in var_decl. Or by adding an expression around the initializer.

Then, we would need to pass the additional location to the point of warning_at, so we can print the note when appropriate.
Comment 35 Manuel López-Ibáñez 2014-10-05 13:45:00 UTC
(In reply to Jakub Jelinek from comment #13)
> I guess the C/C++ FEs could for non-trivial string literals put into a hash
> table mapping from locus_t (of ADDR_EXPR around STRING_CST) to the first cpp
> token for that string, then the diagnostic code could go from there.
> Trivial string literal above would be a string literal that doesn't have any
> prefixes (L/u/u8/U and variants with R), isn't contatenated from several
> parts
> and didn't need to be translated.  So, printf ("%.*d"); (the common case)
> wouldn't have to be recorded, while printf (R"<<<(%)>>>" "." R"(*)" "d");
> would need that.  For "trivial" string literals you'd just shift the locus
> by the offset, for non-trivial look up the tokens and process them again,
> looking at where the corresponding byte would appear to come from.

Perhaps we could use ADHOC_LOCATIONS to implement this? That is, for multi-string/prefixed string constants, the adhoc data could contain the additional location. Then, on-the-fly offset locations could be generated by looking at each piece, if the offset is longer than the piece, then try the next piece and adjust the offset for the characters already seen in previous pieces. Simple strings will not have any adhoc data, so they will work as usual.
Comment 36 Manuel López-Ibáñez 2014-11-08 16:20:11 UTC
Latest patch: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00663.html

Joseph pointed out that it does not handle escape sequences. The only solution I can think of is to open the file and re-parse the string. However, it doesn't seem trivial to re-use libcpp to parse a string given as a char *. Any ideas how to achieve this or a better alternative?

Escape sequences are too common to ignore in a first implementation.
Comment 37 Manuel López-Ibáñez 2015-03-27 15:09:06 UTC
(In reply to dodji@seketeli.org from comment #9)
> "manu at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> a écrit:
> 
> > So either one keeps track of all source locations of all "interesting"
> > characters within strings, which sounds infeasible. Or one needs to
> > re-preprocess the format string, creating new locations on-the-fly. Dodji, is
> > this possible?
> 
> With the current infrastructure, I fear we cannot re-process the format
> string *after* the initial pre-processing phase is done, to create new
> locations that we'd a in the line maps.

I'm thinking whether we cannot simply add a new RENAME map on the fly just before emitting an error. We could even replace the source_location of the string with this new map starting location.
Comment 38 Manuel López-Ibáñez 2015-05-21 06:50:10 UTC
Author: manu
Date: Thu May 21 06:49:38 2015
New Revision: 223470

URL: https://gcc.gnu.org/viewcvs?rev=223470&root=gcc&view=rev
Log:
gcc/testsuite/ChangeLog:

2015-05-21  Manuel López-Ibáñez  <manu@gcc.gnu.org>

	PR c/52952
	* gcc.dg/redecl-4.c: Update column numbers.
	* gcc.dg/format/bitfld-1.c: Likewise.
	* gcc.dg/format/attr-2.c: Likewise.
	* gcc.dg/format/attr-6.c: Likewise.
	* gcc.dg/format/attr-7.c (baz): Likewise.
	* gcc.dg/format/asm_fprintf-1.c: Likewise.
	* gcc.dg/format/attr-4.c: Likewise.
	* gcc.dg/format/branch-1.c: Likewise.
	* gcc.dg/format/c90-printf-1.c: Likewise. Add tests for column
	locations within strings with embedded escape sequences.

gcc/c-family/ChangeLog:

2015-05-21  Manuel López-Ibáñez  <manu@gcc.gnu.org>
	
	PR c/52952
	* c-format.c (location_column_from_byte_offset): New.
	(location_from_offset): New.
	(struct format_wanted_type): Add offset_loc field.
	(check_format_info): Move handling of location for extra arguments
	closer to the point of warning.
	(check_format_info_main): Pass the result of location_from_offset
	to warning_at.
	(format_type_warning): Pass the result of location_from_offset
	to warning_at.


Modified:
    trunk/gcc/c-family/ChangeLog
    trunk/gcc/c-family/c-format.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/format/asm_fprintf-1.c
    trunk/gcc/testsuite/gcc.dg/format/attr-2.c
    trunk/gcc/testsuite/gcc.dg/format/attr-4.c
    trunk/gcc/testsuite/gcc.dg/format/attr-6.c
    trunk/gcc/testsuite/gcc.dg/format/attr-7.c
    trunk/gcc/testsuite/gcc.dg/format/bitfld-1.c
    trunk/gcc/testsuite/gcc.dg/format/branch-1.c
    trunk/gcc/testsuite/gcc.dg/format/c90-printf-1.c
    trunk/gcc/testsuite/gcc.dg/redecl-4.c
Comment 39 Manuel López-Ibáñez 2015-05-21 07:06:32 UTC
A summary of what is still pending:

1. Handle macros

#define c               " %d "
  __builtin_printf(c, 0.5);

2. Handle non-contiguous strings:

  __builtin_printf(" %" "d ", 0.5);

3. Handle const arrays:

  const char a[] = " %d ";
  __builtin_printf(a, 0.5);


I have an idea on how to fix 1 and 2 but no idea how to fix 3.
Comment 40 Manuel López-Ibáñez 2015-09-08 15:07:33 UTC
Unreviewed patch that increases precision in some cases: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01373.html
Comment 41 Manuel López-Ibáñez 2016-06-09 20:19:30 UTC
*** Bug 71360 has been marked as a duplicate of this bug. ***
Comment 42 Manuel López-Ibáñez 2016-06-09 20:22:36 UTC
(In reply to Manuel López-Ibáñez from comment #39)
> 2. Handle non-contiguous strings:
> 
>   __builtin_printf(" %" "d ", 0.5);

Right now, we detect that the offset is outside the first string and give up. Supporting this might be just as easier as keeping track in location_column_from_byte_offset of the closing ", skipping to the opening " and keep counting until reaching the offset.

No time to do it myself but worth a try.
Comment 43 David Malcolm 2016-06-23 15:11:48 UTC
(In reply to Manuel López-Ibáñez from comment #40)
> Unreviewed patch that increases precision in some cases:
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01373.html

Looks like this was committed as r227975.
Comment 44 David Malcolm 2016-07-26 16:47:55 UTC
Candidate patch: https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01733.html
Comment 45 David Malcolm 2016-08-08 20:10:51 UTC
Author: dmalcolm
Date: Mon Aug  8 20:10:19 2016
New Revision: 239253

URL: https://gcc.gnu.org/viewcvs?rev=239253&root=gcc&view=rev
Log:
Use class substring_loc in c-format.c (PR c/52952)

gcc/c-family/ChangeLog:
	PR c/52952
	* c-format.c: Include "diagnostic.h".
	(location_column_from_byte_offset): Delete.
	(location_from_offset): Delete.
	(format_warning_va): New function.
	(format_warning_at_substring): New function.
	(format_warning_at_char): New function.
	(check_format_arg): Capture location of format_tree and pass to
	check_format_info_main.
	(argument_parser): Add fields "start_of_this_format" and
	"format_string_cst".
	(flag_chars_t::validate): Add param "format_string_cst".  Convert
	warning_at call using location_from_offset to call to
	format_warning_at_char.
	(argument_parser::argument_parser): Add param "format_string_cst_"
	and use use it to initialize field "format_string_cst".
	Initialize new field "start_of_this_format".
	(argument_parser::read_format_flags): Convert warning_at call
	using location_from_offset to a call to format_warning_at_char.
	(argument_parser::read_any_format_left_precision): Likewise.
	(argument_parser::read_any_format_precision): Likewise.
	(argument_parser::read_any_other_modifier): Likewise.
	(argument_parser::find_format_char_info): Likewise, in three places.
	(argument_parser::parse_any_scan_set): Likewise, in one place.
	(argument_parser::handle_conversions): Likewise, in two places.
	(argument_parser::check_argument_type): Add param "fmt_param_loc"
	and use it to make a substring_loc.  Pass the latter to
	check_format_types.
	(check_format_info_main): Add params "fmt_param_loc" and
	"format_string_cst".  Convert warning_at calls using
	location_from_offset to calls to format_warning_at_char.  Pass the
	new params to the arg_parser ctor.  Pass "format_string_cst" to
	flag_chars.validate.  Pass "fmt_param_loc" to
	arg_parser.check_argument_type.
	(check_format_types): Convert first param from a location_t
	to a const substring_loc & and rename to "fmt_loc".  Attempt
	to extract the range of the relevant parameter and pass it
	to format_type_warning.
	(format_type_warning): Convert first param from a location_t
	to a const substring_loc & and rename to "fmt_loc".  Add
	params "param_range" and "type".  Replace calls to warning_at
	with calls to format_warning_at_substring.

gcc/testsuite/ChangeLog:
	PR c/52952
	* gcc.dg/cpp/pr66415-1.c: Likewise.
	* gcc.dg/format/asm_fprintf-1.c: Update column numbers.
	* gcc.dg/format/c90-printf-1.c: Likewise.
	* gcc.dg/format/diagnostic-ranges.c: New test case.


Added:
    trunk/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
Modified:
    trunk/gcc/c-family/ChangeLog
    trunk/gcc/c-family/c-format.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/cpp/pr66415-1.c
    trunk/gcc/testsuite/gcc.dg/format/asm_fprintf-1.c
    trunk/gcc/testsuite/gcc.dg/format/c90-printf-1.c
Comment 46 Bernd Edlinger 2016-08-19 21:14:39 UTC
(In reply to David Malcolm from comment #45)
>     trunk/gcc/testsuite/gcc.dg/cpp/pr66415-1.c


David, this test case does fail because the caret is not always shown in
column 71.

I've got it once, but and when I tried to repeat this test alone it works. 

It seems to depend if stderr is a tty (isatty(2) != 0)
I think the problem is in diagnostics.c

diagnostic_set_caret_max_width (diagnostic_context *context, int value)
{
  /* One minus to account for the leading empty space.  */
  value = value ? value - 1
    : (isatty (fileno (pp_buffer (context->printer)->stream))
       ? get_terminal_width () - 1: INT_MAX);

and:

get_terminal_width (void)
{
  const char * s = getenv ("COLUMNS");
  if (s != NULL) {
    int n = atoi (s);
    if (n > 0)
      return n;
  }

#ifdef TIOCGWINSZ
  struct winsize w;
  w.ws_col = 0;
  if (ioctl (0, TIOCGWINSZ, &w) == 0 && w.ws_col > 0)
    return w.ws_col;
#endif

  return INT_MAX;
}

I think it depends on the terminal window, where I start the
make check, it is often a COLUMNS=80 then maybe the test will
fail, because the output is squeezed into 80 column, but
if COLUMNS=120 the test will pass.

What do you think?
Comment 47 Bernd Edlinger 2016-08-20 05:06:30 UTC
COLUMNS=82 make check-gcc-c RUNTESTFLAGS="cpp.exp=pr66415-1.c"
...
Running /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/cpp/cpp.exp ...

		=== gcc Summary ===

# of expected passes		3
/home/ed/gnu/gcc-build/gcc/xgcc  version 7.0.0 20160819 (experimental) (GCC) 


COLUMNS=80 make check-gcc-c RUNTESTFLAGS="cpp.exp=pr66415-1.c"
...
Running /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/cpp/cpp.exp ...
FAIL: gcc.dg/cpp/pr66415-1.c expected multiline pattern lines 11-12 not found: "\s*__builtin_printf                                \("xxxxxxxxxxxxxxxxx%dxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\);.*\n                                                                      ~\^\n"
FAIL: gcc.dg/cpp/pr66415-1.c (test for excess errors)

		=== gcc Summary ===

# of expected passes		1
# of unexpected failures	2
/home/ed/gnu/gcc-build/gcc/xgcc  version 7.0.0 20160819 (experimental) (GCC)
Comment 48 Bernd Edlinger 2016-08-20 05:18:40 UTC
somethin like that fixes it for me:

Index: pr66415-1.c
===================================================================
--- pr66415-1.c	(revision 239624)
+++ pr66415-1.c	(working copy)
@@ -1,6 +1,7 @@
 /* PR c/66415 */
 /* { dg-do compile } */
 /* { dg-options "-Wformat -fdiagnostics-show-caret" } */
+/* { dg-set-compiler-env-var COLUMNS "82" } */
 
 void
 fn1 (void)
Comment 49 Bernd Edlinger 2016-08-22 07:35:06 UTC
Author: edlinger
Date: Mon Aug 22 07:34:34 2016
New Revision: 239649

URL: https://gcc.gnu.org/viewcvs?rev=239649&root=gcc&view=rev
Log:
2016-08-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        PR c/52952
        * gcc.dg/cpp/pr66415-1.c: Fix sporadic failure.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/cpp/pr66415-1.c
Comment 50 Eric Gallager 2018-07-30 02:15:01 UTC
(In reply to Bernd Edlinger from comment #49)
> Author: edlinger
> Date: Mon Aug 22 07:34:34 2016
> New Revision: 239649
> 
> URL: https://gcc.gnu.org/viewcvs?rev=239649&root=gcc&view=rev
> Log:
> 2016-08-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
>         PR c/52952
>         * gcc.dg/cpp/pr66415-1.c: Fix sporadic failure.
> 
> Modified:
>     trunk/gcc/testsuite/ChangeLog
>     trunk/gcc/testsuite/gcc.dg/cpp/pr66415-1.c

...so is it fixed yet?
Comment 51 Manuel López-Ibáñez 2018-09-16 17:57:38 UTC
There are few things still that don't work:

1. C++ does not work

2. Locations within strings expanded from a macro

3. Location within strings from a const char array.

void foo(void) {
#define c               " %d %d "
  __builtin_printf(c, 0.5, 0);

  const char a[] = " %d ";
  __builtin_printf(a, 0.5);
}

GCC 9:

<source>:2:25: error: format '%d' expects argument of type 'int', but argument 2 has type 'double' [-Werror=format=]
2 | #define c               " %d %d "
  |                         ^~~~~~~~~
<source>:3:20: note: in expansion of macro 'c'
3 |   __builtin_printf(c, 0.5, 0);
  |                    ^
<source>:6:20: error: format '%d' expects argument of type 'int', but argument 2 has type 'double' [-Werror=format=]
6 |   __builtin_printf(a, 0.5);
  |                    ^  ~~~
  |                       |
  |                       double

Clang can do (3) but not (2):
  
<source>:3:23: error: format specifies type 'int' but the argument has type 'double' [-Werror,-Wformat]
  __builtin_printf(c, 0.5, 0);
                   ~  ^~~
<source>:6:23: error: format specifies type 'int' but the argument has type 'double' [-Werror,-Wformat]
  __builtin_printf(a, 0.5);
                   ~  ^~~
<source>:5:22: note: format string is defined here
  const char a[] = " %d ";
                     ^~
                     %f
Comment 52 David Malcolm 2018-10-06 11:02:16 UTC
(In reply to Manuel López-Ibáñez from comment #51)
> 1. C++ does not work

C++ should be fixed for gcc 9 by r264887.
Comment 53 David Malcolm 2018-10-06 11:16:04 UTC
(In reply to Manuel López-Ibáñez from comment #51)
> 2. Locations within strings expanded from a macro

(2) should also be fixed for gcc 9 by r264887:

/tmp/foo.c: In function ‘foo’:
/tmp/foo.c:2:25: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘double’ [-Wformat=]
2 | #define c               " %d %d "
  |                         ^~~~~~~~~
/tmp/foo.c:3:20: note: in expansion of macro ‘c’
3 |   __builtin_printf(c, 0.5, 0);
  |                    ^
/tmp/foo.c:2:28: note: format string is defined here
2 | #define c               " %d %d "
  |                           ~^
  |                            |
  |                            int
  |                           %f
/tmp/foo.c:6:20: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘double’ [-Wformat=]
6 |   __builtin_printf(a, 0.5);
  |                    ^  ~~~
  |                       |
  |                       double

(same output for both C and C++)

> 3. Location within strings from a const char array.

(3) isn't yet fixed, as can be seen above.