[PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

Fri Jul 22 22:13:00 GMT 2016

Working through the new pass...  Overall it looks pretty good.  There's 
a certain level of trust I'll extend WRT getting the low level details 
right -- a thorough testsuite obviously helps there.

> +const pass_data pass_data_sprintf_length = {
> +  GIMPLE_PASS,        // pass type
> +  "sprintf_length",   // pass name
> +  OPTGROUP_NONE,      // optinfo_flags
> +  TV_NONE,            // tv_id
> +  PROP_cfg,           // properties_required
> +  0,                 // properties_provided
> +  0,                 // properties_destroyed
> +  0,                 // properties_start
> +  0,                 // properties_finish
So the worry here is that you need PROP_ssa in the late pass.  I'm not 
sure if we have the infrastructure to allow you to distinguish the two. 
I guess we could literally make it two passes with two distinct 
pass_data structures.

In handle_gimple_call, you don't handle anti ranges -- I haven't looked 
closely at canonicalization rules in VRP, so I don't know if you're 
likely to see a range like ![MININT, -1] which would be a synonym for 
[0..MAXINT].  It might be worth doing some instrumentation to see if 
you're getting useful anti-ranges with any kind of consistency.  ISTM 
the most interesting ones are going to allow you to drop or insist on a 
"-" sign.

> +      /* As a special case, bounded function allow the explicitly
> +        specified destination size argument to be zero as a request
> +        to determine the number of bytes on output without actually
> +        writing any output.  Disable length checking for those.  */
This doesn't parse well.

> +  /* Try to use __builtin_object_size although it rarely returns
> +     a useful result even for straighforward cases.  */
Hopefully you've stashed away some tests for this so that we can work on 
them independently of the sprintf checking itself?  ISTM that the 
recursive handling of POINTER_PLUS_EXPR really ought to move into the 
generic builtin_object_size handling itself as an independent change.

> +  /* Bail early if format length checking is disabled, either
> +     via a command line option, or as a result of the size of
> +     the destination object not being available, or due to
> +     format directives or arguments encountered during processing
> +     that prevent length tracking with any reliability.  */
> +
> +  if (HOST_WIDE_INT_MAX <= info.objsize)
> +      return;
I think the return is indented too deep.

> +      if (*pf == 0)
> +       {
> +         /* Incomplete directive.  */
> +         return;
> +       }
For this and the other early returns from compute_format_length, do we 
know if -Wformat will catch these errors?  Might that be a low priority 
follow-up project if it doesn't?

> +static void
> +add_bytes (const pass_sprintf_length::call_info &info,
> +          const char                           *beg,
> +          const char                           *end,
> +          format_result                        *res)
In general, don't try to line up parameters, args or variable 
declarations like you've done here.  Similarly in other places.

> +  /* if (warn_format_length */
> +  /*     && (overflowed || navail < nbytes */
> +  /*     || (1 < warn_format_length && )) */
Presumably old implementation and/or debugging code...  Please remove it.

Please check indention of the curleys & code block in the loop over the 
phi arguments in get_string_length.

In format_floating, we end up using actual host floating point 
arithmetic.  That's generally been frowned upon.  We're not doing 
anything in terms of code generation here, so ultimately that might be 
what allows us to still be safe from a cross compilation standpoint.

It's also not clear if that routine handles the other target floating 
point formats.  For things like VAX FP, I'm comfortable issuing a 
warning that this stuff isn't supported on that target.   We have other 
targets where a double might be the same size as a float.   Though I 
guess in that case the worst we get is false positives, so that may not 
be a huge issue either.  I'm not sure how to check for this stuff 
without bringing in aspects of the target though, which we'd like to avoid.

In format_integer you have target specific ifdefs (solaris and aix). 
First we want to avoid conditionally compiled code.  Second, from a 
modularity something like TARGET_AIX and TARGET_SOLARIS really aren't 
appropriate in here. I suspect what you're going to want to do is add 
something to the targetm structure that you can query and/or call here 
is the best we can do.

Again, overall it looks good.  My biggest concern is format_integer and 
format_float and the bleeding of target dependencies into this code.  To 
some degree it may be unavoidable and we can mitigate the damage with 
stuff in targetm -- I'd like to hear your thoughts.

Jeff