[PATCH] Make strlen range computations more conservative

Tue Jul 31 03:45:00 GMT 2018

On 07/27/2018 12:48 AM, Bernd Edlinger wrote:
> I have one more example similar to PR86259, that resembles IMHO real world code:
>
> Consider the following:
>
>
> int fun (char *p)
> {
>   char buf[16];
>
>   assert(strlen(p) < 4); //here: security relevant check
>
>   sprintf(buf, "echo %s - %s", p, p); //here: security relevant code
>   return system(buf);
> }
>
>
> What is wrong with the assertion?
>
> Nothing, except it is removed, when this function is called from untrusted code:
>
> untrused_fun ()
> {
>    char b[2] = "ab";
>    fun(b);
> }
>
> !!!! don't try to execute that: after "ab" there can be "; rm -rF / ;" on your stack!!!!
>
> Even the slightly more safe check "assert(strnlen(p, 4) < 4);" would have
> been removed.
>
> Now that is a simple error and it would be easy to fix -- normally.
> But when the assertion is removed, the security relevant code
> is allowed to continue where it creates more damage and is
> suddenly much harder to debug.

sprintf() is a known source of buffer overflows.  The recommended
practice is to use snprintf.  An alternate mechanism to constrain
the number of bytes formatted by an individual %s directive is to
use the precision, such as %.4s.

> So, I start to believe that strlen range assumptions are unsafe, unless
> we can prove that the string is in fact zero terminated.
>
> I would like to guard the strlen range checks with a new option, maybe
> -fassume-zero-terminated-char-arrays, and enable that under -Ofast only.
>
> What do you think?

I'm not opposed to providing options to control various
features but I'm not in favor of disabling them by default as
a solution to accommodate buggy code.  For every instance of
a bug in a program with undefined behavior, whether it's reading
or writing past the end of an object or subobject, or integer
overflow, it's possible to show security-related consequences.
One could just as easily create a test case where allowing strlen
to read past the end of a member array could be exploited to cause
a subsequent buffer overflow.   Some of these consequences might
be in some cases mitigated by one strategy and others in other
cases by another.  There's no silver bullet -- the best approach
is to drive improvements to code to help weed out these bugs.

Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past
the end of subobjects by string functions.  With _FORTIFY_SOURCE=2
it calls abort.  This is the default on popular distributions,
including Fedora, RHEL, and Ubuntu.   -Wstringop-truncation tries
to help detect the creation of unterminated strings by strncpy
and strncat.  There is little reason in my mind to treat strlen
or any other function as special, except perhaps for the few
existing exceptions of the raw memory functions (memcpy, et al.)

As you know, I have already posted a patch to detect a subset
of the problem of calling strlen on non-terminated arrays.
More such issues, including uses of dynamically created and
uninitialized arrays, can be detected by relatively modest
enhancements to the tree-ssa-strlen pass (also on my list
of things to do).  It may also be worth considering moving
the "initializer-string for array chars is too long" warning
from -Wc++-compat to -Wall or -Wextra.  But I would much rather
focus on these solutions and work toward overall improvements
than on weakening optimization to accommodate undefined code.
With sufficient awareness as a result of warnings such code
should all but disappear.  Following stricter rules opens up
opportunities for deeper analyses to enable more optimization
and detect even more bugs.

Martin