This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make strlen range computations more conservative


As an alternate approach I have been thinking about, if
there is a strong feeling that allowing strlen to iterate
past the subobject boundary is necessary (I don't believe
it is.)

Rather than indiscriminately expanding the provenance of
the subobject regardless of what members follow it in
the enclosing structure, only consider doing that if
the next member is an array of the same type.  E.g.,

  struct S { char a[4], b[3], c[2], d; };
  extern struct S *p;

  strlen (p->a);   // consider p->a's bounds to be char[9]

I.e., treat p->a, p->b, and p->c as one array but exclude
from it d because it's not an array.

(This wouldn't solve the warning problem below -- a separate
computation would still be necessary to determine the tighter
bound of the member itself.)

On 08/02/2018 09:42 AM, Martin Sebor wrote:
On 08/02/2018 04:22 AM, Bernd Edlinger wrote:
Hi,

this is an update of the patch to prevent unsafe optimizations due to
strlen range assuming
always zero-terminated char arrays.

Since the previous version I do no longer try to locate the outermost
char array,
but just bail out if there is a typecast, that means, supposed we have
a 2-dimensional
array, char a[x][y], strlen (s.a[x]) may assume a[x] is
zero-terminated, if the optimization
is enabled.  strlen ((char*)s.a) involves a type cast and should
assume nothing.

Additionally due to the discussion, I came to the conclusion that this
strlen range optimization
should only be used when enabled with
-fassume-zero-terminated-char-arrays or -Ofast.

Note that I included the test case with the removed assertion as
gcc/testsuite/gcc.dg/strlenopt-57.c

Initially it would only miscompile with -Ofast, but with r263018 aka
PR tree-optimization/86043, PR tree-optimization/86042 it was
miscompiled with -O3 as well.

I located the reason in gcc/tree-ssa-strlen.c (get_min_string_length)
which did not
check if the string constant is in fact zero terminated:

@@ -3192,7 +3194,9 @@ get_min_string_length (tree rhs, bool *full_string
       && TREE_READONLY (rhs))
     rhs = DECL_INITIAL (rhs);

-  if (rhs && TREE_CODE (rhs) == STRING_CST)
+  if (rhs && TREE_CODE (rhs) == STRING_CST
+      && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (rhs)),
+                          TREE_STRING_LENGTH (rhs)) >= 0)
     {
       *full_string_p = true;
       return strlen (TREE_STRING_POINTER (rhs));

Fortunately this also shows a way how to narrow strlen return value
expectations when
we are able to positively prove that a string must be zero terminated.



Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?

The warning bits are definitely not okay by me.  The purpose
of the warnings (-W{format,sprintf}-{overflow,truncation} is
to detect buffer overflows.  When a warning doesn't have access
to string length information for dynamically created strings
(like the strlen pass does) it uses array sizes as a proxy.
This is useful both to detect possible buffer overflows and
to prevent false positives for overflows that cannot happen
in correctly written programs.

By changing the logic to not consider the bounds of the array
the warnkngs will become prone to many false positives as is
evident from your changes to the tests (you had to explicitly
enable the new option).

In effect, the patch undoes a couple of years worth of fine
tuning of the warnings to strike a balance between true and
false positives.  That's completely unacceptable to me, and,
frankly, proposals along these lines are exceedingly
disruptive.

Beyond that, I don't have a problem with adding a new option
but I continue to strongly disagree with disabling the strlen
optimization by default.  It implies that by default GCC
targets invalid code.  If there is consensus that some new
option is necessary, the right setting is on by default,
along with warnings for programs that pass non-terminated
arrays to string functions.  I have already submitted a patch
that effect for strlen and const arrays and am working on
extending it to other built-in functions and dynamically
created strings (when I have a chance between defending
my past work).

Either way, if a new option is introduced, the array bound
computation for sprintf (and all other buffer overflow
warnings, likely including -Wrestrict) will need to be
decoupled from the option so the current behavior is
preserved.

As I have also said, the area of the provenance of subobject
vs enclosing object pointers is under discussion in WG14 and
the C Object Model study group.  The informal consensus so
far is to maintain the provenance of subobjects and introduce
some mechanism to make it possible to extend the provenance
to the enclosing object.  This would be a pervasive change,
not one just limited to strings.  So if introducing some new
option at this stage is thought to be necessary this should
be taken into consideration.

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]