[PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

Manuel López-Ibáñez lopezibanez@gmail.com
Tue Aug 23 23:42:00 GMT 2016


On 23 August 2016 at 22:56, Martin Sebor <msebor@gmail.com> wrote:
> Attached is the latest patch.

Some suggestions:

--Wno-format-contains-nul -Wno-format-extra-args -Wformat-nonliteral @gol
+-Wno-format-contains-nul -Wno-format-extra-args -Wformat-length=1 @gol

Most options that take levels are documented as:

-Wshift-overflow -Wshift-overflow=@var{n}
-Wstrict-overflow -Wstrict-overflow=@var{n}


+@item -Wformat-length
+@itemx -Wformat-length=@var{level}
+@opindex Wformat-length
+@opindex Wno-format-length
+@opindex ffreestanding
+@opindex fno-builtin
+@opindex Wformat-length=

This whole hunk is duplicated

Usually, at the end of the option description, it is mentioned which
options enabled this one (-Wformat=1 in this case, which is in turn
enabled by -Wall).

+The @option{-Wformat-length} option causes GCC to attempt to detect calls
+to formatted functions such as @code{sprintf} that might overflow the
+destination buffer, or bounded functions like @code{snprintf} that result
+in output truncation.

This text is a bit too verbose and quite different from other warning
options. A humble proposal:

Warn about calls to formatted functions, such as @code{sprintf}, that
may overflow the destination buffer, or bounded functions, such as
@code{snprintf} , that may? result in output truncation.

+GCC counts the number of bytes that each format
+string and directive within it writes into the provided buffer and, when
+it detects that more bytes that fit in the destination buffer may be output,
+it emits a warning.

This is saying basically the same as the first sentence. Remove it?

+Directives whose arguments have values that can be
+determined at compile-time account for the exact number of bytes they write.
+Directives with arguments whose values cannot be determined are processed
+based on heuristics that depend on the @var{level} argument to the option,
+and on optimization.

When the exact number of bytes written by a format string directive
cannot be determined at compile-time, it is estimated based on
heuristics that depend on optimization and the @var{level} argument.
(I swapped the two because level is the next thing explained).

+The default setting of @var{level} is 1.  Level
+@var{1} employs a conservative approach that warns only about calls that
+most likely overflow the buffer or result in output truncation.  At this
+level, numeric arguments to format directives whose values are unknown
+are assumed to have the value of one, and strings of unknown length are
+assumed to have a length of zero.  Numeric arguments that are known to
+be bounded to a subrange of their type, or string arguments whose output
+is bounded by their directive's precision, are assumed to take on the value
+within the range that results in the most bytes on output.  Level @var{2}
+warns also bout calls that may overflow the destination buffer or result
+in truncation given an argument of sufficient length or magnitude.  At
+this level, unknown numeric arguments are assumed to have the minimum
+representable value for signed types with a precision greater than 1,
+and the maximum representable value otherwise.  Unknown string arguments
+are assumed to be 1 character long.

A better format (and it enables searching for specific levels) would be:
@table @gcctabopt
@item -Wformat-length=1
This is the warning level enabled by @option{-Wformat-length} and is enabled
by @option{-Wformat=1}, which is in turn enabled by @option{-Wall}. It
employs a conservative heuristic that warns only about calls that
most likely overflow the buffer or result in output truncation.  At this
level, numeric arguments to format directives whose values are unknown
are assumed to have the value of one, and strings of unknown length are
assumed to have a length of zero.  Numeric arguments that are known to
be bounded to a subrange of their type, or string arguments whose output
is bounded by their directive's precision, are assumed to take on the value
within the range that results in the most bytes on output.

@item -Wformat-length=2
This level also warns about calls that may overflow the destination
buffer or result
in truncation given an argument of sufficient length or magnitude.
Unknown numeric arguments are assumed to have the minimum
representable value for signed types with a precision greater than one,
and the maximum representable value otherwise.  Unknown string arguments
are assumed to be one character long.
@end table

+Enabling optimization will in most
+cases improve the accuracy of the warning, although in some cases it may
+also result in false positives.

Since the example is not about optimization. Move this paragraph after
the examples.

+For example, at level @var{1}, the call to @code{sprintf} below is diagnosed

For example, @option{-Wformat-length=1} warns about the call to
@code{sprintf} below

+At level @var{2}, the call
+is again diagnosed, but this time

@option{-Wformat-length=2} also warns about the call to
@code{sprintf}, but this time

+that the most closely corresponds to the @code{%p} format directive.

that most closely


+      if (!val)
+    {
+      if (fuzzy)
+        {
+          if (TREE_CODE (arg) == ADDR_EXPR)
+        return get_range_strlen (TREE_OPERAND (arg, 0), length,
+                     visited, type, fuzzy);
+
+          if (TREE_CODE (arg) == COMPONENT_REF
+          && TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) == ARRAY_TYPE)
+        {
+          /* Use the type of the member array to determine the upper
+             bound on the length of the array.  This may be overly
+             optimistic if the array itself isn't NUL-terminated and
+             the caller relies on the subsequent member to contain
+             the NUL.  */
+          arg = TREE_OPERAND (arg, 1);
+          val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
+          if (!val || integer_zerop (val))
+            return false;
+          val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
+                     integer_one_node);
+          /* Avoid using the array size as the minimum.  */
+          minlen = NULL;
+        }
+        }
+      else
+        return false;
+    }
+
       if (!val)
     return false;


You will save some horizontal space, save the else branch and improve
readability by doing:

if (!val && fuzzy)
{
}
if (!val)
 return false;

+/* Describes a location range outlining a substring within a string
+   literal.  */
+
+class substring_loc
+{
+ public:
+  substring_loc (location_t fmt_string_loc,
+         int caret_idx, int start_idx, int end_idx)
+    : m_fmt_string_loc (fmt_string_loc),
+    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx (end_idx)
+  { }
+
+  const char *get_location (location_t *out_loc) const;
+
+  location_t get_fmt_string_loc () const { return m_fmt_string_loc; }
+
+ private:
+  location_t m_fmt_string_loc;
+  int m_caret_idx;
+  int m_start_idx;
+  int m_end_idx;
+};
+
+/* The global record of string concatentations, for use in extracting
+   locations within string literals.  */
+
+GTY(()) string_concat_db *g_string_concat_db;
+
+/* Attempt to determine the source location of the substring.
+   If successful, return NULL and write the source location to *OUT_LOC.
+   Otherwise return an error message.  Error messages are intended
+   for GCC developers (to help debugging) rather than for end-users.  */
+
+const char *
+substring_loc::get_location (location_t *out_loc) const
+{
+  gcc_assert (out_loc);
+
+  if (!g_string_concat_db)
+    g_string_concat_db
+      = new (ggc_alloc <string_concat_db> ()) string_concat_db ();
+
+  static struct cpp_reader* parse_in;
+  if (!parse_in)
+    {
+      /* Create and initialize a preprocessing reader.  */
+      parse_in = cpp_create_reader (CLK_GNUC99, ident_hash, line_table);
+      cpp_init_iconv (parse_in);
+    }
+
+  return get_source_location_for_substring (parse_in, g_string_concat_db,
+                        m_fmt_string_loc, CPP_STRING,
+                        m_caret_idx, m_start_idx, m_end_idx,
+                        out_loc);
+}
+

Ideally, the diagnostics machinery would hide all these details from
the middle-end. Surely, most of this can be shared with c-format.c.

+/* Return the logarithm of X in BASE.  */
+
+static int
+ilog (unsigned HOST_WIDE_INT x, int base)
+{
+  int res = 0;
+  do {
+    ++res;
+    x /= base;
+  } while (x);
+  return res;
+}

Should this go to hwint.h ?



More information about the Gcc-patches mailing list