User account creation filtered due to spam.

Bug 49905 - Better sanity checking on sprintf src & dest to produce warning for dodgy code ?
Summary: Better sanity checking on sprintf src & dest to produce warning for dodgy code ?
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Martin Sebor
URL:
Keywords:
: 80807 (view as bug list)
Depends on:
Blocks: 71501
  Show dependency treegraph
 
Reported: 2011-07-29 19:53 UTC by David Binderman
Modified: 2017-05-18 15:07 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 5.3.0, 6.0
Last reconfirmed: 2016-05-02 00:00:00


Attachments
C++ source code (1.79 KB, patch)
2016-05-03 07:08 UTC, David Binderman
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Binderman 2011-07-29 19:53:42 UTC
I just tried to compile the following C++ code with the latest trunk
snapshot 20110723 on an AMD x86_64 box.

# include <stdio.h>

void f()
{
    char * p = new char [4];
    char q[4];

    sprintf(p, "ian");  // Legal
    sprintf(q, "ian");

    sprintf(p, "bert"); // One over
    sprintf(q, "bert");

    sprintf(p, "harry");    // definately wrong.
    sprintf(q, "harry");

    sprintf(p, "%s", "harry");  // more subtle.
    sprintf(q, "%s", "harry");

    sprintf(p, "%s %s", "ab", "cd");    // more subtle still.
    sprintf(q, "%s %s", "ab", "cd");

    sprintf(p, "%s %d", "ab", 1000);    // overpoints.
    sprintf(q, "%s %d", "ab", 1000);
}

Much to my surprise, the compiler, even with lots of flags, said not much

bash-4.2$ ~/gcc/20110723/results/bin/g++ -c -O2 -Wall -Wextra -pedantic bug29.cc
bash-4.2$

I'd have expected a few warnings, at very least.

Surely something in the compiler could be done to check that sprintf is
called, the destination buffer size is known, the minimum source buffer
size is known, and compare the two to make sure the source fits inside
the destination ?

Such a warning will in my experience find plenty of bugs in application code.
Comment 1 Jakub Jelinek 2011-07-29 21:10:06 UTC
Just use also -D_FORTIFY_SOURCE=1 -O2 or -D_FORTIFY_SOURCE=2 -O2.
For the first three overflows on q you'll get compile time warnings, and for all overflows on q you'll get the program killed at runtime.
If you use char * p = (char *) malloc (4);
instead of char * p = new char [4];
you'll get protection for the p overflows too, I'll see if __builtin_object_size could be taught about C++ new, at least some forms thereof.
Comment 2 Jakub Jelinek 2011-08-04 07:40:29 UTC
Author: jakub
Date: Thu Aug  4 07:40:24 2011
New Revision: 177316

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177316
Log:
	PR middle-end/49905
	* tree.h (init_attributes): New prototype.
	* attribs.c (init_attributes): No longer static.

	* decl.c (cxx_init_decl_processing): Add alloc_size (1) attribute
	for operator new and operator new [].  Call init_attributes.

	* g++.dg/ext/builtin-object-size3.C: New test.

Added:
    trunk/gcc/testsuite/g++.dg/ext/builtin-object-size3.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/attribs.c
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/decl.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree.h
Comment 3 Martin Sebor 2016-05-02 23:31:47 UTC
5.1 and 6.1 warn on the first six out of the ten buffer overflows, and on Linux the program aborts at runtime in __sprintf_chk.

GCC still doesn't diagnose any of the last four problems at compile time (e.g., in 'char buf [4]; sprintf (buf, "%s %s", "abc", "def");')  It seems that this class of problems could be handled by enhancing maybe_emit_sprintf_chk_warning to loop over the format string, recognize more involved format strings with embedded %s (and other simple directives), and count the number of characters they emit for constant arguments.  For slightly better compile-time coverage the approach could even assume that simple non-string directives like %i result in at least one character and compute an optimistic lower bound on the length of the formatted string.
Comment 4 David Binderman 2016-05-03 07:08:05 UTC
Created attachment 38402 [details]
C++ source code
Comment 5 David Binderman 2016-05-03 07:10:54 UTC
(In reply to Martin Sebor from comment #3)
> 5.1 and 6.1 warn on the first six out of the ten buffer overflows, and on
> Linux the program aborts at runtime in __sprintf_chk.
> 
> GCC still doesn't diagnose any of the last four problems at compile time

My local version does. Some tweeks to gcc/builtins.c. 69 formats understood.

> (e.g., in 'char buf [4]; sprintf (buf, "%s %s", "abc", "def");')  It seems
> that this class of problems could be handled by enhancing
> maybe_emit_sprintf_chk_warning to loop over the format string, recognize
> more involved format strings with embedded %s (and other simple directives),
> and count the number of characters they emit for constant arguments.  

Indeed. Anything it doesn't understand it can afford to ignore.
It is only computing a lower bound.

> For slightly better compile-time coverage the approach could even assume that
> simple non-string directives like %i result in at least one character and
> compute an optimistic lower bound on the length of the formatted string.

That's right - it could even take account of other things like field widths.
My local version can find all the problems mentioned in the original bug report.

Speculative patch attached. It'll need a lot of work to get it up to the
required standard, but it gives the general idea and it's been working 
happily locally for years over the code of Redhat Fedora Linux and some
other projects.
Comment 6 Martin Sebor 2016-06-23 03:46:49 UTC
I'm working on a patch.
Comment 7 Martin Sebor 2016-07-01 18:17:09 UTC
Patch posted for review:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00056.html
Comment 8 David Binderman 2016-07-01 18:35:48 UTC
(In reply to Martin Sebor from comment #7)
> Patch posted for review:
> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00056.html

Fascinating stuff. Thanks.

I'll give it a good going over by throwing Fedora at it sometime early next
week, but first, may I ask if you tried a gcc bootstrap 
or a Linux kernel build with it, and if so, did it find much ?
Comment 9 David Binderman 2016-07-04 07:55:54 UTC
I tried a build of the gcc fortran compiler and I found this warning:

../../../src/trunk/libgfortran/intrinsics/date_and_time.c:173:33: warning: ‘%+03d’ directive output truncated while writing ‘9’ bytes into a region of size ‘6’ [-Wformat-length=]

Source code is

    snprintf (zone, ZONE_LEN + 1, "%+03d%02d",
          values[3] / 60, abs (values[3] % 60));

and

#define ZONE_LEN 5

  char zone[ZONE_LEN + 1];

While it is clear that the region size of 6 is fine, I can't see where the value 
of 9 bytes is coming from.
Comment 10 Jakub Jelinek 2016-07-04 08:22:44 UTC
(In reply to David Binderman from comment #9)
> I tried a build of the gcc fortran compiler and I found this warning:
> 
> ../../../src/trunk/libgfortran/intrinsics/date_and_time.c:173:33: warning:
> ‘%+03d’ directive output truncated while writing ‘9’ bytes into a region of
> size ‘6’ [-Wformat-length=]
> 
> Source code is
> 
>     snprintf (zone, ZONE_LEN + 1, "%+03d%02d",
>           values[3] / 60, abs (values[3] % 60));
> 
> and
> 
> #define ZONE_LEN 5
> 
>   char zone[ZONE_LEN + 1];
> 
> While it is clear that the region size of 6 is fine, I can't see where the
> value 
> of 9 bytes is coming from.

First of all, I'd say the warning is badly worded, because it should make it clear that what it guessed is the maximum length, unless it is equal to minimum it might not really happen.  While it can use VRP, VRP is about what the compiler can prove, while there can be other range restrictions just not visible to the compiler.  I think the warning code should compute both minimum and maximum, and have different wording for the cases where minimum == maximum and there is overflow, or for when even minimum overflows, or when maximum overflows (and the last one should be probably different warning level, because the program could be just fine, have some mechanism hidden not visible to the compiler that guarantees the values will never be out of certain ranges.

For abs (values[3] % 60) the compiler is able to prove the VR is [0, 59] and thus %02d is guaranteed to be exact 2 characters.  For values[3] / 60, I think the compiler can't see such a guarantee, there is just subtraction of fields computed by two functions that are really black boxes for GCC analysis.
Thus, I'd expect that GCC for %+03d should compute minimum length of 3 and maximum length of strlen ("+35791394") - 0x7fffffff / 60, or strlen ("-35791394") - (int) 0x80000000 / 60, i.e. 9.  Thus it is IMHO 9 + 2 + 1.

The code obviously relies on it to be timezone offsets, so expects values[3] / 60 must be in between -14 and +14 or so and then ZONE_LEN 5 is just fine.
Comment 11 David Binderman 2016-07-04 10:16:32 UTC
(In reply to Jakub Jelinek from comment #10)
> I think the warning code should compute both
> minimum and maximum, 

I'd be happy for the code to compute minimum only and have maximum
postponed for the future. One step at a time.

BTW, I tried a Linux kernel build and got this

drivers/char/ipmi/ipmi_msghandler.c: In function ‘guid_show’:
drivers/char/ipmi/ipmi_msghandler.c:2365:9: internal compiler error: in format_integer, at c-family/c-format.c:506
  return snprintf(buf, 100, "%Lx%Lx\n",
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    (long long) bmc->guid[0],
    ~~~~~~~~~~~~~~~~~~~~~~~~~
    (long long) bmc->guid[8]);
    ~~~~~~~~~~~~~~~~~~~~~~~~~

So it looks to me like format %Lx isn't handled.
Comment 12 David Binderman 2016-07-04 13:31:29 UTC
(In reply to David Binderman from comment #11)
> So it looks to me like format %Lx isn't handled.

Also, %lf seems to cause a crash.
Comment 13 Martin Sebor 2016-07-04 17:10:31 UTC
(In reply to David Binderman from comment #9)
> I tried a build of the gcc fortran compiler and I found this warning:
> 
> ../../../src/trunk/libgfortran/intrinsics/date_and_time.c:173:33: warning:
> ‘%+03d’ directive output truncated while writing ‘9’ bytes into a region of
> size ‘6’ [-Wformat-length=]
> 
> Source code is
> 
>     snprintf (zone, ZONE_LEN + 1, "%+03d%02d",
>           values[3] / 60, abs (values[3] % 60));
> 
> and
> 
> #define ZONE_LEN 5
> 
>   char zone[ZONE_LEN + 1];
> 
> While it is clear that the region size of 6 is fine, I can't see where the
> value 
> of 9 bytes is coming from.

Thanks for the testing!  I reproduced the warning the above with -O2 in a small test case:

  extern int values [4];
  inline int abs(int x) { return x < 0 ? -x : x; }

  #define ZONE_LEN 5

  char zone[ZONE_LEN + 1];

  void f (void)
  {
    __builtin_snprintf (zone, ZONE_LEN + 1, "%+03d%02d",
                        values[3] / 60, abs (values[3] % 60));
  }

GCC doesn't know what values[3] evaluates to but since its type is int, the Value Range Propagation (VRP) determines that the range of (values[3] / 60) is [-35791394, 35791394], and the checker sees that as many as 9 bytes of space for the output of any value in that range are required.

The checker prints two general kinds of diagnostics for individual directives:

1) When it knows the space in the destination is insufficient to format either an exact value or a range of values bounded by some limit imposed by the program is uses the words "writing" or "truncating."

2) When it doesn't know the exact value and there is no known range of values beyond the limits of the type of the argument it uses the term "may write" or "may truncate."

The case above falls into (1) but it could be changed to fall into (2) instead, or to use a different phrase.  In either case, the diagnostic could also mention the range of values it thinks the argument can take on.

(In reply to David Binderman from comment #9)
> I'd be happy for the code to compute minimum only and have maximum
> postponed for the future. One step at a time.

The checker has two levels:

1) At level one, it considers the least amount of space needed for the argument of each directive (is assumes unknown integers have the value 1 and strings are empty).  This corresponds to the minimum.  I would expect this to be immediately useful in all existing code since it should have no (or few) false positives.

2) At level two, it uses the maximum.  Unknown singed integers or type T have the value T_MIN, unknown integers of unsigned type T_MAX.  That corresponds to the maximum.  I envision this to be useful mainly in new code and for sanity checking in existing code.

At both levels, the checker treats known values as exact (because it knows exactly how many bytes of output each produces).  For ranges of values obtained from VRP it uses the upper bound of the range.  Due to weaknesses in the range information exposed by VRP we might want to tweak it this strategy or clarify the diagnostic.
Comment 14 Jakub Jelinek 2016-07-04 17:16:29 UTC
(In reply to Martin Sebor from comment #13)
> (In reply to David Binderman from comment #9)
> > I tried a build of the gcc fortran compiler and I found this warning:
> > 
> > ../../../src/trunk/libgfortran/intrinsics/date_and_time.c:173:33: warning:
> > ‘%+03d’ directive output truncated while writing ‘9’ bytes into a region of
> > size ‘6’ [-Wformat-length=]
> > 
> > Source code is
> > 
> >     snprintf (zone, ZONE_LEN + 1, "%+03d%02d",
> >           values[3] / 60, abs (values[3] % 60));
> > 
> > and
> > 
> > #define ZONE_LEN 5
> > 
> >   char zone[ZONE_LEN + 1];
> > 
> > While it is clear that the region size of 6 is fine, I can't see where the
> > value 
> > of 9 bytes is coming from.
> 
> Thanks for the testing!  I reproduced the warning the above with -O2 in a
> small test case:
> 
>   extern int values [4];
>   inline int abs(int x) { return x < 0 ? -x : x; }
> 
>   #define ZONE_LEN 5
> 
>   char zone[ZONE_LEN + 1];
> 
>   void f (void)
>   {
>     __builtin_snprintf (zone, ZONE_LEN + 1, "%+03d%02d",
>                         values[3] / 60, abs (values[3] % 60));
>   }
> 
> GCC doesn't know what values[3] evaluates to but since its type is int, the
> Value Range Propagation (VRP) determines that the range of (values[3] / 60)
> is [-35791394, 35791394], and the checker sees that as many as 9 bytes of
> space for the output of any value in that range are required.

But 9 is maximum length just for the %+03d part, %02d with the limited VRP range is exactly 2 and then the '\0', so that is 12 maximum, 6 minimum.
So printing 9 is just misleading.
Comment 15 Martin Sebor 2016-07-04 17:24:23 UTC
(In reply to David Binderman from comment #11)
> BTW, I tried a Linux kernel build and got this
> 
> drivers/char/ipmi/ipmi_msghandler.c: In function ‘guid_show’:
> drivers/char/ipmi/ipmi_msghandler.c:2365:9: internal compiler error: in
> format_integer, at c-family/c-format.c:506
>   return snprintf(buf, 100, "%Lx%Lx\n",
>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     (long long) bmc->guid[0],
>     ~~~~~~~~~~~~~~~~~~~~~~~~~
>     (long long) bmc->guid[8]);
>     ~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> So it looks to me like format %Lx isn't handled.
...
> Also, %lf seems to cause a crash.

Not all directives or length modifiers are recognized yet but they should be handled gracefully, not by aborting.  (In most cases the checker gives up on the rest of the format strings).  Thanks for pointing out these cases, I'll update the code in the next revision of the patch.
Comment 16 Martin Sebor 2016-07-04 18:03:32 UTC
(In reply to Jakub Jelinek from comment #14)
> But 9 is maximum length just for the %+03d part, %02d with the limited VRP
> range is exactly 2 and then the '\0', so that is 12 maximum, 6 minimum.

Yes.

> So printing 9 is just misleading.

The 9 actually corresponds to a range between 3 and 9 bytes so the warning could print that range, the same way it does for unknown and unbounded values at level 2, like so:

  warning: ‘%+03d’ directive output may be truncated between ‘3’ and ‘9’ bytes into a region of size ‘6’ [-Wformat-length=]

The note could also be extended to print a range:

  note: destination region size is ‘6’ bytes, minimum required ‘6’, maximum ‘12’

Does this make it clearer?
Comment 17 Martin Sebor 2016-07-05 02:58:43 UTC
I have tweaked the patch to print the following for the test case in comment #13:

xyz.c: In function ‘f’:
xyz.c:10:46: warning: ‘%+03d’ directive output may be truncated between ‘3’ and ‘9’ bytes into a region of size ‘6’ [-Wformat-length=]
     __builtin_snprintf (zone, ZONE_LEN + 1, "%+03d%02d",
                                              ^
xyz.c:10:46: note: directive argument determined to be in the range [‘-35791394’, ‘35791394’]
xyz.c:10:5: note: destination region size is ‘6’ bytes, minimum required ‘6’, maximum ‘12’

The computation to determine the range of bytes on output is less straightforward than might be expected when the argument is determined to be in a range whose bounds have different signs and different magnitudes.  The checker then has to decide which of the bounds to use as the maximum (the minimum is zero).  For instance, if the argument is in range [-3, 123], the relevant upper bound is 123 for %i but -3 for %u.  This is because formatting a value in this range results in between 1 and 3 bytes on output for %i, but in as many as 10 bytes for %u given 32-bit ints (-3 formats as "4294967293").  This may not be completely intuitive but I hope the new note helps make it clearer.  For the %u case with [-3, 123], it results in:

warning: ‘%u’ directive output may be truncated between ‘1’ and ‘10’ bytes into a region of size ‘3’ [-Wformat-length=]
     __builtin_snprintf (d, sizeof d, "%u", x);
                                       ^
note: directive argument determined to be in the range [‘-3’, ‘123’]

It still assumes the user will figure out that it's the lower end of the range that results in the maximum of 10 bytes.  I'm not sure how to help with that without cluttering the output with too much verbiage and possibly making some cases seem more complicated than they are.
Comment 18 Martin Sebor 2016-09-21 01:39:59 UTC
Author: msebor
Date: Wed Sep 21 01:39:27 2016
New Revision: 240298

URL: https://gcc.gnu.org/viewcvs?rev=240298&root=gcc&view=rev
Log:
PR middle-end/49905 - Better sanity checking on sprintf src & dest to

gcc/ChangeLog:

	PR middle-end/49905
	* Makefile.in (OBJS): Add gimple-ssa-sprintf.o.
	* config/linux.h (TARGET_PRINTF_POINTER_FORMAT): Redefine.
	* config/linux.c (gnu_libc_printf_pointer_format): New function.
	* config/sol2.h (TARGET_PRINTF_POINTER_FORMAT): Same.
	* config/sol2.c (solaris_printf_pointer_format): New function.
	* doc/invoke.texi (-Wformat-length, -fprintf-return-value): New
	options.
	* doc/tm.texi.in (TARGET_PRINTF_POINTER_FORMAT): Document.
	* doc/tm.texi: Regenerate.
	* gimple-fold.h (get_range_strlen): New function.
	(get_maxval_strlen): Declare existing function.
	* gimple-fold.c (get_range_strlen): Add arguments and compute both
	maximum and minimum.
	 (get_range_strlen): Define overload.
	(get_maxval_strlen): Adjust.
	* gimple-ssa-sprintf.c: New file and pass.
	* passes.def (pass_sprintf_length): Add new pass.
	* targhooks.h (default_printf_pointer_format): Declare new function.
	(gnu_libc_printf_pointer_format): Same.
	(solaris_libc_printf_pointer_format): Same.
	* targhooks.c (default_printf_pointer_format): Define new function.
	* tree-pass.h (make_pass_sprintf_length): Declare new function.
	* print-tree.c: Increase buffer size.

gcc/c-family/ChangeLog:

	PR middle-end/49905
	* c.opt: Add -Wformat-length and -fprintf-return-value.

gcc/testsuite/ChangeLog:

	PR middle-end/49905
	* gcc.dg/builtin-stringop-chk-1.c: Adjust.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-2.c: New test.


Added:
    trunk/gcc/gimple-ssa-sprintf.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/c-family/ChangeLog
    trunk/gcc/c-family/c.opt
    trunk/gcc/config/linux.c
    trunk/gcc/config/linux.h
    trunk/gcc/config/sol2.c
    trunk/gcc/config/sol2.h
    trunk/gcc/doc/invoke.texi
    trunk/gcc/doc/tm.texi
    trunk/gcc/doc/tm.texi.in
    trunk/gcc/gimple-fold.c
    trunk/gcc/gimple-fold.h
    trunk/gcc/passes.def
    trunk/gcc/print-tree.c
    trunk/gcc/target.def
    trunk/gcc/targhooks.c
    trunk/gcc/targhooks.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
    trunk/gcc/tree-pass.h
Comment 19 Martin Sebor 2016-09-21 01:41:13 UTC
Enhancement committed in r240298.
Comment 20 David Binderman 2016-09-21 19:52:15 UTC
(In reply to Martin Sebor from comment #19)
> Enhancement committed in r240298.

The current code seems to crash when given %lf specifier.

It might be better if it just ignored specifiers it doesn't know,
rather than crash. There are many (> 70) specifiers.
Comment 21 Martin Sebor 2016-09-21 20:22:22 UTC
I agree that the compiler shouldn't crash on directives it doesn't expect.

The feature is implemented at this point so let's open new bugs for problems in the implementation (with the expected information such as a test case) rather than reopening the enhancement request.
Comment 22 Martin Sebor 2016-09-21 20:26:14 UTC
I've raised bug 77683 for the ICE on %lf.
Comment 23 Martin Sebor 2017-05-18 15:07:25 UTC
*** Bug 80807 has been marked as a duplicate of this bug. ***