This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fold more string comparison with known result (PR 90879)

On 8/13/19 4:46 PM, Jeff Law wrote:
On 8/13/19 3:43 PM, Martin Sebor wrote:
On 8/13/19 2:07 PM, Jeff Law wrote:
On 8/9/19 10:51 AM, Martin Sebor wrote:

PR tree-optimization/90879 - fold zero-equality of strcmp between a
longer string and a smaller array


     PR tree-optimization/90879
     * c.opt (-Wstring-compare): New option.


     PR tree-optimization/90879
     * gcc.dg/Wstring-compare-2.c: New test.
     * gcc.dg/Wstring-compare.c: New test.
     * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
     * gcc.dg/strcmpopt_6.c: New test.
     * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
     test cases.
     * gcc.dg/strlenopt-66.c: Run it.
     * gcc.dg/strlenopt-68.c: New test.


     PR tree-optimization/90879
     * builtins.c (check_access): Avoid using maxbound when null.
     * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen
     * doc/invoke.texi (-Wstring-compare): Document new warning option.
     * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
     (get_range_strlen): Overwrite initial maxbound when non-null.
     * gimple-ssa-sprintf.c (get_string_length): Adjust to
     * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
     (used_only_for_zero_equality): New function.
     (handle_builtin_memcmp): Call it.
     (determine_min_objsize): Return an integer instead of tree.
     (get_len_or_size, strxcmp_eqz_result): New functions.
     (maybe_warn_pointless_strcmp): New function.
     (handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
     between a longer string and a smaller array.

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 4af47855e7c..31e012b741b 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c

@@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
       type = TYPE_MAIN_VARIANT (type);
   -  /* We cannot determine the size of the array if it's a flexible
-     which is declared at the end of a structure.  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-      && !array_at_struct_end_p (dest))
+  /* The size of a flexible array cannot be determined.  Otherwise,
+     for arrays with more than one element, return the size of its
+     type.  GCC itself misuses arrays of both zero and one elements
+     as flexible array members so they are excluded as well.  */
+  if (TREE_CODE (type) != ARRAY_TYPE
+      || !array_at_struct_end_p (dest))
-      tree size_t = TYPE_SIZE_UNIT (type);
-      if (size_t && TREE_CODE (size_t) == INTEGER_CST
-      && !integer_zerop (size_t))
-        return size_t;
+      tree type_size = TYPE_SIZE_UNIT (type);
+      if (type_size && TREE_CODE (type_size) == INTEGER_CST
+      && !integer_onep (type_size)
+      && !integer_zerop (type_size))
+        return tree_to_uhwi (type_size);
So I nearly commented on this when looking at the original patch.  Can
we really depend on the size when we've got an array at the end of a
struct with a declared size other than 0/1?   While 0/1 are by far the
most common way to declare them, couldn't someone have used other sizes?
   I think we pondered doing that at one time to cut down on the noise
from Coverity for RTL and TREE operand accessors.

Your code makes us safer, so I'm not saying you've done anything wrong,
just trying to decide if we need to tighten this up even further.

This patch issues a warning in these cases, i.e., when it sees
a call like, say, strcmp("foobar", A) with an A that's smaller
than the string, because it seems they are likely (rare) bugs.
I haven't seen the warning in any of the projects I tested it
with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM).

The warning uses strcmp to detect these mistakes (or misuses)
but I'd like to add similar warnings for other string functions
as well and have code out there that does this on purpose use
true flexible array members (or the zero-length extension)
instead.  That makes the intent clear.

It's a judgment call whether to also fold (or do something else
like insert a trap) in addition to issuing a warning.  In this
case (reading) I don't think it matters as much as it does for
writes.  Either way, it would be nice to set a policy and
document it in the manual so users know what to expect and
so we don't have to revisit this question for each patch that
touches on this subject.
The GCC manual documents zero length arrays at the end of an aggregate
as a GNU extension for variable length objects.  The manual also
documents that it could be done with single element arrays, but that
doing so does contribute to the base size of the aggregate, but
otherwise it's handled like a zero length array.

So both zero and one element arrays are documented as supported for this
use case.  However, I could easily see someone making the case that any
size should work here and I could easily think of cases where that would
be a reasonable thing to do.  We do not handle these cases in a
consistent way -- we'll treat sizes other than 0/1 as being a variable
length object in some cases, but not in others.

I'm tempted to bring consistency here.  We're likely not losing
significant diagnostic opportunities or optimizations if we treat all
trailing arrays as creating potentially variable sized objects.

It's not terribly important in this case but it is very much so
in general.

I would tend to agree that treating all trailing arrays as flexible
array members might not have a dramatic an impact om efficiency.
But when it comes to detecting and diagnosing buffer overflow, it
most certainly does.  For example:

  struct S
    void (*pf)(void);
    char a[8];

  void f (struct S *p)
    strcpy (p->a, "0123456789");

GCC doesn't diagnose the overflow even with _FORTIFY_SOURCE=2,
despite the high likelihood of memory corruption.  How common
are structs with trailing arrays?

In a GCC build on x86_64, there are 726 distinct structs with
an array as the last member.  Of those, 268 have more than one
element.  In Binutils/GDB, it's 638 and 537, respectively.  In
the kernel it's 8283 and 7584 (and 3795 have 10 or more).

Some of these serve the purpose of flexible array members but
it seems highly unlikely it's more than a small subset.  By
assuming the opposite we are leaving all the code that accesses
those arrays with no protection against buffer overflow.

I believe object and subobject boundaries must be respected.
There is room for exemptions but those need to be made explicit
in the code.  In this case, the mechanism is the flexible array
member syntax (or the zero-length array extension(*)).  The GCC
manual explicitly discourages the one-element array case and
so I'd say it would be appropriate to warn about it if we can
determine it's being misused (otherwise, why discourage it if
we don't really mean it?)  The manual doesn't mention support
for larger arrays and writing past the end of those is almost
certainly a bug.  Those should all be diagnosed.  Clang and ICC
already do diagnose these(**), and both, especially Clang, are
being used to compile increasing proportion of the Linux
ecosystem, so there is less and less reason for GCC to be
as permissive as it has been and not issue comparable


[*] To minimize the transition effort we could introduce a new
"flexarray" attribute to annotate larger trailing arrays with
to indicate they are intended to be used as flexible array members.
But I don't have the impression the patter is wide- spread enough
to justify adding yet another extension.

[**] Both Clang and ICC have issued a warning for the out-of-
bounds access below for many releases:

struct S
  void (*pf)(void);
  char a[8];

void f (struct S *p)
  p->a[8] = 0;

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]