Bug 47781 - warnings from custom printf format specifiers
Summary: warnings from custom printf format specifiers
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 4.4.5
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic
: 58512 78183 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-02-17 11:43 UTC by Mark Glines
Modified: 2024-01-09 23:52 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-09-23 00:00:00


Attachments
47781.c (353 bytes, text/x-csrc)
2011-02-17 11:58 UTC, Mark Glines
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Glines 2011-02-17 11:43:01 UTC
Glibc allows a project to define custom printf conversions, via one of two APIs: register_printf_function, and more recently, register_printf_specifier.  For instance, my project has a custom %v conversion, which takes a pointer to a vector structure that is heavily used within the project, and pretty-prints it.

The problem is, every time the custom format conversion is used, gcc (which is invoked with -Wall) generates warnings.

test.c:198: warning: unknown conversion type character ‘v’ in format
test.c:198: warning: too many arguments for format

I can get rid of the warnings with -Wno-format, but that also disables the rest of gcc's format string checking (which is very helpful!).

I'd like to request a finer grained means of control.  A syntactical element (builtin/pragma/attribute/whatever) to pre-declare a format conversion and the typedef to check it against would be very nice, if complex.  A much simpler solution would be a -Wno-format-unknown-specifier option, which skips the argument in the argument list and otherwise ignores conversions it doesn't recognize.

Any solution along those lines would be very helpful.
Comment 1 Manuel López-Ibáñez 2011-02-17 11:57:54 UTC
Which project is this?

I think a patch that adds -Wno-format-unknown-specifier would be accepted if properly submitted:

http://gcc.gnu.org/contribute.html

See how the other Wformat-* options are defined in gcc/c-family/c.opt. Then, grep for unknown conversion type character, and just change OPT_Wformat in the warning call. You'll have to add new testcases and adjust existing ones.
Comment 2 Mark Glines 2011-02-17 11:58:22 UTC
Created attachment 23380 [details]
47781.c

Here's a rather silly test case that demonstrates the problem with a simple "bool" type.

$ gcc -O2 -Wall -o 47781 47781.c
47781.c: In function ‘main’:
47781.c:12: warning: unknown conversion type character ‘b’ in format
47781.c:12: warning: unknown conversion type character ‘b’ in format
47781.c:12: warning: too many arguments for format
$ ./47781
true bool: TRUE  false bool: FALSE
$


(That's on x86-64 linux with gcc 4.4.4-14ubuntu5 and libc6 2.12.1-0ubuntu10.2.)
Comment 3 Mark Glines 2011-02-17 12:00:40 UTC
(In reply to comment #1)
> I think a patch that adds -Wno-format-unknown-specifier would be accepted if
> properly submitted:

Okay, I'll take a look at putting together a patch.  Thanks!
Comment 4 jsm-csl@polyomino.org.uk 2011-02-17 18:24:25 UTC
On Thu, 17 Feb 2011, mark-gcc at glines dot org wrote:

> I'd like to request a finer grained means of control.  A syntactical element
> (builtin/pragma/attribute/whatever) to pre-declare a format conversion and the
> typedef to check it against would be very nice, if complex.  A much simpler
> solution would be a -Wno-format-unknown-specifier option, which skips the
> argument in the argument list and otherwise ignores conversions it doesn't
> recognize.

You can't reliably know how many arguments the unknown specifier takes, 
though assuming them to take one argument would be a reasonable heuristic 
for such an option.

For the general issue, my inclination is that we should add plugin hooks 
into the format checking machinery that allow plugins to define formats 
with the full flexibility of all the format checking datastructures in 
GCC.  Using GCC plugins for this avoids problems with defining complicated 
syntax in the source file to describe the peculiarities of different 
formats, which might constrain future changes to the format checking 
implementation by making too much of the internals visible to user source 
code, because by design GCC plugins can use GCC internals which are free 
to change incompatibly in ways that require plugin changes.
Comment 5 Andrew Pinski 2013-09-23 21:54:48 UTC
Confirmed.
Comment 6 Andrew Pinski 2013-09-23 21:55:46 UTC
*** Bug 58512 has been marked as a duplicate of this bug. ***
Comment 7 Andrew Pinski 2013-09-23 21:57:38 UTC
Related to bug 15338.
Comment 8 Philip Prindeville 2014-08-21 00:02:44 UTC
(In reply to joseph@codesourcery.com from comment #4)

> For the general issue, my inclination is that we should add plugin hooks 
> into the format checking machinery that allow plugins to define formats 
> with the full flexibility of all the format checking datastructures in 
> GCC.  Using GCC plugins for this avoids problems with defining complicated 
> syntax in the source file to describe the peculiarities of different 
> formats, which might constrain future changes to the format checking 
> implementation by making too much of the internals visible to user source 
> code, because by design GCC plugins can use GCC internals which are free 
> to change incompatibly in ways that require plugin changes.

What about using pragmas to describe the new format specifier?
Comment 9 jsm-csl@polyomino.org.uk 2014-08-21 17:06:54 UTC
On Thu, 21 Aug 2014, philipp_subx@redfish-solutions.com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47781
> 
> --- Comment #8 from Philip Prindeville <philipp_subx@redfish-solutions.com> ---
> (In reply to joseph@codesourcery.com from comment #4)
> 
> > For the general issue, my inclination is that we should add plugin hooks 
> > into the format checking machinery that allow plugins to define formats 
> > with the full flexibility of all the format checking datastructures in 
> > GCC.  Using GCC plugins for this avoids problems with defining complicated 
> > syntax in the source file to describe the peculiarities of different 
> > formats, which might constrain future changes to the format checking 
> > implementation by making too much of the internals visible to user source 
> > code, because by design GCC plugins can use GCC internals which are free 
> > to change incompatibly in ways that require plugin changes.
> 
> What about using pragmas to describe the new format specifier?

Those have the issue of either being limited in the sorts of formats that 
can be described, or else exposing more internals than seems desirable to 
expose as a stable interface.  Plugins allow full flexibility (with 
possible instability of interfaces), though a stable subset (e.g. formats 
that take no length modifiers or flags) could probably be defined that has 
a stable interface in source files (such as through attributes or pragmas) 
that doesn't unduly constrain the internals of the implementation.  But I 
think any such stable interface would not be able to describe the full 
generality of the existing built-in formats.

One interesting question would be whether a good stable interface can be 
defined that is general enough to describe GCC's internal formats - 
whether those are regular enough that a description isn't tied to 
hardcoded special cases or extremely complicated descriptions of what 
cases should / should not get warnings.
Comment 10 Philip Prindeville 2014-08-21 17:54:44 UTC
On Aug 21, 2014, at 11:06 AM, joseph at codesourcery dot com <gcc-bugzilla@gcc.gnu.org> wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47781
> 
> --- Comment #9 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
> On Thu, 21 Aug 2014, philipp_subx@redfish-solutions.com wrote:
> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47781
>> 
>> --- Comment #8 from Philip Prindeville <philipp_subx@redfish-solutions.com> ---
>> (In reply to joseph@codesourcery.com from comment #4)
>> 
>>> For the general issue, my inclination is that we should add plugin hooks 
>>> into the format checking machinery that allow plugins to define formats 
>>> with the full flexibility of all the format checking datastructures in 
>>> GCC.  Using GCC plugins for this avoids problems with defining complicated 
>>> syntax in the source file to describe the peculiarities of different 
>>> formats, which might constrain future changes to the format checking 
>>> implementation by making too much of the internals visible to user source 
>>> code, because by design GCC plugins can use GCC internals which are free 
>>> to change incompatibly in ways that require plugin changes.
>> 
>> What about using pragmas to describe the new format specifier?
> 
> Those have the issue of either being limited in the sorts of formats that 
> can be described, or else exposing more internals than seems desirable to 
> expose as a stable interface.  Plugins allow full flexibility (with 
> possible instability of interfaces), though a stable subset (e.g. formats 
> that take no length modifiers or flags) could probably be defined that has 
> a stable interface in source files (such as through attributes or pragmas) 
> that doesn't unduly constrain the internals of the implementation.  But I 
> think any such stable interface would not be able to describe the full 
> generality of the existing built-in formats.
> 
> One interesting question would be whether a good stable interface can be 
> defined that is general enough to describe GCC's internal formats - 
> whether those are regular enough that a description isn't tied to 
> hardcoded special cases or extremely complicated descriptions of what 
> cases should / should not get warnings.
> 

Yeah, I agree: if the notation is adequate, all existing formats should be expressible using it.
Comment 11 Tom Tromey 2015-01-29 16:42:39 UTC
(In reply to joseph@codesourcery.com from comment #4)

> For the general issue, my inclination is that we should add plugin hooks 
> into the format checking machinery that allow plugins to define formats 
> with the full flexibility of all the format checking datastructures in 
> GCC.

I agree this makes sense for the general case, but I wanted to point out
that requiring a plugin for the simple cases is significantly harder for
users than some in-source extension mechanism.

E.g., firefox has a logging printf that accepts "%hs" to print char16_t*
strings.  This extension means that printf checking can't be used here.
Requiring a plugin to deal with this situation would also be difficult.
However letting one write __attribute__((printf, 1, 2, "hs", char16_t*))
would solve this nicely.

I suppose I think that a format-for-a-specific-type is the most common
kind of extension and so may deserve special treatment.
Comment 12 jsm-csl@polyomino.org.uk 2015-01-29 21:55:15 UTC
On Thu, 29 Jan 2015, tromey at gcc dot gnu.org wrote:

> E.g., firefox has a logging printf that accepts "%hs" to print char16_t*
> strings.  This extension means that printf checking can't be used here.
> Requiring a plugin to deal with this situation would also be difficult.
> However letting one write __attribute__((printf, 1, 2, "hs", char16_t*))
> would solve this nicely.

Do you then take this as being length modifier 'h' followed by format 
specifier 's', or is it a complete specifier on its own with everything 
that would otherwise be length and specifier being reparsed as an 
extension if it can't be parsed as a standard format?  Do the flags "-wp" 
and "cR" for %s formats apply to this format?
Comment 13 Tom Tromey 2015-02-04 17:38:54 UTC
(In reply to joseph@codesourcery.com from comment #12)
> On Thu, 29 Jan 2015, tromey at gcc dot gnu.org wrote:
> 
> > E.g., firefox has a logging printf that accepts "%hs" to print char16_t*
> > strings.  This extension means that printf checking can't be used here.
> > Requiring a plugin to deal with this situation would also be difficult.
> > However letting one write __attribute__((printf, 1, 2, "hs", char16_t*))
> > would solve this nicely.
> 
> Do you then take this as being length modifier 'h' followed by format 
> specifier 's', or is it a complete specifier on its own with everything 
> that would otherwise be length and specifier being reparsed as an 
> extension if it can't be parsed as a standard format?  Do the flags "-wp" 
> and "cR" for %s formats apply to this format?

I see what you mean -- maybe "simple" isn't straightforward.

I have been reconsidering the plugin approach given some new things
I learned about the details of the firefox code (namely that it doesn't
faithfully follow printf semantics, sigh).

One additional note for this bug is that it would be nice if any
such addition by a plugin worked properly with -Wmissing-format-attribute.
Comment 14 Manuel López-Ibáñez 2015-02-04 18:44:04 UTC
(In reply to Tom Tromey from comment #13)
> I have been reconsidering the plugin approach given some new things
> I learned about the details of the firefox code (namely that it doesn't
> faithfully follow printf semantics, sigh).
> 
> One additional note for this bug is that it would be nice if any
> such addition by a plugin worked properly with -Wmissing-format-attribute.

Note that plugins can define attributes. Perhaps one way to go about this would be to create a plugin that parsed some kind of GCC_printf_format_info attribute that matches GCC internal printf checking. Then move GCC own format checking to use this attribute and enable the plugin by default when building GCC.

This will give you as much flexibility as GCC format checking supports, and the plugin will be developed, build, tested and distributed alongside GCC. Users outside GCC just need to use the plugin and add the attributes to their own printf-style functions. Moreover, since the plugin is developed alongside GCC, it would be logical to add whatever hooks the plugin needs.

Moreover, nothing stops users from creating some kind of intermediate language that simplifies custom printf attribute syntax. Probably some C preprocessor magic could be enough.

The challenge is the define the syntax of the attribute, but I think this challenge is unavoidable for whoever wants to implement this. You may present a simplified syntax to the user, but you still need to handle correctly all the complexity and corner cases in c-format.c.
Comment 15 Eric Gallager 2015-09-21 19:16:16 UTC
(In reply to Tom Tromey from comment #11)
> ...I wanted to point out that requiring a plugin for the simple cases is
> significantly harder for users than some in-source extension mechanism.
> 
> E.g., firefox has a logging printf that accepts "%hs" to print char16_t*
> strings.  This extension means that printf checking can't be used here.
> Requiring a plugin to deal with this situation would also be difficult.
> However letting one write __attribute__((printf, 1, 2, "hs", char16_t*))
> would solve this nicely.
> 
> I suppose I think that a format-for-a-specific-type is the most common
> kind of extension and so may deserve special treatment.

Wow, this is pretty much the same syntax I imagined when coming across this issue independently! Except in my idea, I changed the name of the format attribute to "printf-extended", to make it more obvious what the extra arguments are. The case where I came across it was in trying to build a forked old version bfd with -Wsuggest-attribute=format and -Wformat=2, where I was unable to attach a format attribute to the bfd_error_handler_type declaration. This is because _bfd_default_error_handler is extended to accept 2 new format specifiers: %A, which takes args of type asection*, and %B, which takes args of type bfd*. Using an attribute as proposed above, it'd be simple to just write something like,

__attribute__((format(printf-extended, 1, 2, "A", asection*, "B", bfd*)))

Although checking the commentary on newer mainline versions of the _bfd_default_error_handler function, it looks like it does some additional weird stuff with the argument order, but still, support for extending the format attribute like this would still be a good start!
Comment 16 Manuel López-Ibáñez 2015-09-21 19:54:15 UTC
(In reply to Eric Gallager from comment #15)
> Although checking the commentary on newer mainline versions of the
> _bfd_default_error_handler function, it looks like it does some additional
> weird stuff with the argument order, but still, support for extending the
> format attribute like this would still be a good start!

As suggested above, whoever wants to see progress on this should start developing a plugin that hooks into gcc/c-family/c-format.c. Whether your plugin will parse an attribute, a pragma, an internal representation or define the formats programmatically is up to you. The important thing is to figure out what plugin hooks you need in GCC to make it work, which will require making the format checking extensible at runtime. Until you get that part working, there is little benefit in discussing any possible syntax.
Comment 17 Eric Gallager 2017-09-28 22:12:16 UTC
*** Bug 78183 has been marked as a duplicate of this bug. ***
Comment 18 Martin Sebor 2017-09-28 23:36:17 UTC
The Linux kernel also has a bunch of printf format extensions that GCC doesn't know anything about: https://www.kernel.org/doc/Documentation/printk-formats.txt.  The extensions take the form of a suffix to the %p directive and take a pointer argument so the GCC format checker treats them all as a plain old %p but the sprintf optimization pass punts when it sees a %p because it doesn't know how much output it might produce (largely because of the Linux kernel extensions, but partly also because each OS has its own slightly different format even for plain %p and it was thought to be simpler to punt than to maintain a database of formats for all supported systems).  It would be nice if there were an easy way to describe these extensions not just for the benefit of the format checker but also so that the sprintf pass could do its own thing (i.e., check for buffer overflow).
Comment 19 Daniel Santos 2017-09-30 14:13:51 UTC
(In reply to Martin Sebor from comment #18)
> The Linux kernel also has a bunch of printf format extensions that GCC
> doesn't know anything about:
> https://www.kernel.org/doc/Documentation/printk-formats.txt.

Further, the printf format extensions in the kernel are designed so as to not create warnings and so are often two character combinations by using a standard format specifier followed by a modifying character.  I think that I ran a script once to count how much extra memory the two bytes vs a single byte take and it ended up in the 10s of kilobytes.  While this may not sound like much, remember that the kernel data is never paged out and on some embedded systems, it actually does make a difference.

Should GCC begin supporting custom printf format specifiers, then I would propose we begin changing them in the kernel to take advantage of that small savings.
Comment 20 Cj Welborn 2019-09-16 00:43:47 UTC
Has anything changed since 2017 that would let me use register_printf_specifier and -Wformat warnings at the same time? These two features are in direct conflict with each other. I expected a GNU extension to be compatible with a GNU warning, and all I know to do right now is disable all of the warnings related to format specifiers.
Comment 21 Eric Gallager 2019-12-23 04:18:16 UTC
(In reply to Cj Welborn from comment #20)
> Has anything changed since 2017 that would let me use
> register_printf_specifier and -Wformat warnings at the same time? 

Not that I know of; people still can't agree on a proper design AFAIK... contributions welcome: https://gcc.gnu.org/wiki/GettingStarted#Basics:_Contributing_to_GCC_in_10_easy_steps
Comment 22 Cj Welborn 2019-12-23 04:33:11 UTC
Thank you for the reply. It's probably out of my league, but I might take a look when I get time.
Comment 23 David Crocker 2020-12-13 15:48:10 UTC
I need this feature too. Instead of waiting several more years for an all-singing all-dancing solution, PLEASE can we have a simple solution that allows me to use a custom format specifier and skips a single argument for that specifier. I believe this would cover the vast majority of uses custom format specifiers. My particular use case is that my application generates a lot of JSON strings, so in my printf replacement I want to implement a specifier similar to %s that performs JSON escaping on characters in the string.
Comment 24 Tom Tromey 2020-12-14 14:10:01 UTC
(In reply to David Crocker from comment #23)
> I need this feature too. Instead of waiting several more years for an
> all-singing all-dancing solution, PLEASE can we have a simple solution that
> allows me to use a custom format specifier and skips a single argument for
> that specifier. I believe this would cover the vast majority of uses custom
> format specifiers. My particular use case is that my application generates a
> lot of JSON strings, so in my printf replacement I want to implement a
> specifier similar to %s that performs JSON escaping on characters in the
> string.

As a workaround, see the kernel doc linked earlier in this bug.
gdb uses this hack as well -- e.g., it uses "%ps" in its formatter
to mean a styled string, passed as a pointer to get past gcc's checking.
Comment 25 Grant Edwards 2021-12-06 17:47:11 UTC
10 years later, still no solution? I too would really like to be able to
use custom single-argument, single-character format specifies (e.g. %b to
print an integer in binary).

The Linux-kernel work-around with %p<whatever> is painful for two reasons:

 * My printf function doesn't support format modifiers like that. All
   format specifiers are single characters.

 * You have to cast the integer value to a void*, and that just confuses
   the reader.
Comment 26 jsm-csl@polyomino.org.uk 2021-12-06 22:48:36 UTC
It's hard to define something that is sufficiently general to be useful 
but doesn't expose too much of the details of GCC's internal data 
structures for describing standard formats.  %b for binary is now a 
standard C23 format and supported for GCC 12 and later.
Comment 27 Jan Wielemaker 2023-01-18 11:03:03 UTC
It is really a pity this can't be resolved :(  We have quite a few extensions in the SWI-Prolog source code, mostly for debug messages that deal with internal data structures.   It makes writing debug messages a lot easier.

What about this:  add a pragma that associates a regular expression with a list of types.  For example (don't take this literally, I know little about the #pragma conventions).

#pragma GCC printf "t" (term_t)

Now if the compiler scans a template and finds a %, it runs through these declarations in the order they have been declared.  On the first match it knows the type(s) expected from the argument list and continues after the regex match.
Comment 28 Manuel López-Ibáñez 2023-01-18 12:11:04 UTC
(In reply to Jan Wielemaker from comment #27)
> It is really a pity this can't be resolved :(  We have quite a few
> extensions in the SWI-Prolog source code, mostly for debug messages that
> deal with internal data structures.   It makes writing debug messages a lot
> easier.

This can be resolved. It only needs someone(s) interested enough to implement it or pay someone else to implement it. 

There are a lot of suggestions in this page on how to proceed. Personally, I think the best would be to start with a simple design for an attribute rather than a pragma and implement it as a plugin to faster development and testing. Then submit it for comments. The simplest design that will get you faster feedback would be something the replaces some of the current GCC-specific printf formats, like %E, %T, %q, etc. 

(I don't remember where these are documented and implemented right now)

It just needs people with time and patience to do it.
Comment 29 jsm-csl@polyomino.org.uk 2023-01-18 17:40:00 UTC
As I said before, the issue is still how to define something general 
enough to be useful but that doesn't expose too much of the details of 
GCC's internal data structures for format checking.
Comment 30 Manuel López-Ibáñez 2023-01-18 17:46:19 UTC
(In reply to joseph@codesourcery.com from comment #29)
> As I said before, the issue is still how to define something general 
> enough to be useful but that doesn't expose too much of the details of 
> GCC's internal data structures for format checking.

Indeed, the first step does not even require looking at GCC code or an implementation, but coming up with a design that is flexible enough to be useful.