This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 3/4] enhance overflow and truncation detection in strncpy and strncat (PR 81117)


On 10/02/2017 04:15 PM, Jeff Law wrote:
On 08/10/2017 01:29 PM, Martin Sebor wrote:
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 016f68d..1aa9e22 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
[ ... ]
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      /* Return the constant size unless it's zero (that's a
zero-length
+     array likely at the end of a struct).  */
+      tree size = TYPE_SIZE_UNIT (type);
+      if (size && TREE_CODE (size) == INTEGER_CST
+      && !integer_zerop (size))
+    return size;
+    }
Q. Do we have a canonical test for the trailing array idiom?   In some
contexts isn't it size 1?  ISTM This test needs slight improvement.
Ideally we'd use some canonical test for detect the trailing array idiom
rather than open-coding it here.  You might look at the array index
warnings in tree-vrp.c to see if it's got a canonical test you can call
or factor and use.

You're right, there is an API for this (array_at_struct_end_p,
as Richard pointed out).  I didn't want to use it because it
treats any array at the end of a struct as a flexible array
member, but simple tests show that that's what -Wstringop-
overflow does now, and it wasn't my intention to tighten up
the checking under this change.  It surprises me that no tests
exposed this. Let me relax the check and think about proposing
to tighten it up separately.



What might be even better would be to use the immediate uses of the
memory tag.  For your case there should be only one immediate use and it
should point to the statement which NUL terminates the destination.  Or
maybe that would be worse in that you only want to allow this exception
when the statements are consecutive.

You said "maybe that would be worse" so I hadn't implemented it.
I went ahead and coded it up but with more testing I don't think
it has the desired result.  See below.


I'll have to try this to better understand how it might work.
It's actually quite simple.

Rather than looking at the next statement in the chain via
gsi_next_nondebug you follow the def->use chain for the memory tag
associated with the string copy statement.

/* Get the memory tag that is defined by this statement.  */
defvar = gimple_vdef (gsi_stmt (gsi));

imm_use_iterator iter;
gimple *use_stmt;

if (num_imm_uses (defvar) == 1)
  {
    imm_use_terator iter;
    gimple *use_stmt;

    /* Iterate over the immediate uses of the memory tag.  */
    FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
      {
        Check if STMT is dst[i] = '\0'
      }
  }



The check that there is a single immediate use is designed to make sure
you get a warning for this scenario:

Thanks for the outline of the solution.  I managed to get it to
work with only a few minor changes(*) but...

strxncpy
read the destination
terminate the destination

Which I think you'd want to consider non-terminated because of the read
of the destination prior to termination.

But avoids warnings for

strxncpy
stuff that doesn't read the destination
terminate the destintion

...while it works fine for the basic cases it has the downside
of missing more subtle problems like this one:

  char a[8];

  void f (void) { puts (a); }

  void g (const char *s)
  {
    strncpy (a, s);

    f ();   // assumes a is a string

    a[sizeof a - 1] = '\0';
  }

or this one:

  struct S { char a[8]; };

  void f (const struct S *p) { puts (p->a); }

  void g (struct S *p, const char *s)
  {
    strncpy (p->a, s);

    f (p);   // assumes p->a is a string

    a[sizeof p->a - 1] = '\0';
  }

I would rather have the test be a little more strict than possibly
miss these kinds of insidious bugs for what seems like a unlike use
case (other code between the strncpy and the *dst = '\0').  I think
sticking with the original also encourages cleaner code: keeping
the nul-termination as close to the strncpy call.

You still need to rename strlen_optimize_stmt since after your changes
it does both optimizations and warnings.

I'm not sure I understand why.  It's a pre-existing function that
just dispatches to the built-in handlers.  We don't rename function
callers each time we improve error/warning detection in some
function they call (case in point: all the expanders in builtins.c)
Why do it here?  And what would be a suitable name?  All that comes
to my mind is awkward variations on strlen_optimize_stmt_and_warn.
Actually we often end up renaming functions as their capabilities
change.  If I was to read that name, I'd think its only purpose was to
optimize.  Given its static with a single use we should just fix it.


Something in compute_objsize I just noticed.

When DEST is an SSA_NAME, you follow the use->def chain back to its
defining statement, then get a new dest from the RHS of that statement:

+  if (TREE_CODE (dest) == SSA_NAME)
+    {
+      gimple *stmt = SSA_NAME_DEF_STMT (dest);
+      if (!is_gimple_assign (stmt))
+	return NULL_TREE;
+
+      dest = gimple_assign_rhs1 (stmt);
+    }

This seems wrong as-written -- you have no idea what the RHS code is.
You're just blindly taking the first operand, then digging into its
type.  It probably works in practice, but it would seem better to verify
that gimple_assign_rhs_code is a conversion first (CONVERT_EXPR_CODE_P).
 If it's not a CONVERT_EXPR_CODE_P, return NULL_TREE.

The code doesn't match CONVERT_EXPR_P().  It's expected to match
ADDR_EXPR and that's also what it tests for on the line just below
the hunk you pasted above.  Here it is:

  if (TREE_CODE (dest) == SSA_NAME)
    {
      gimple *stmt = SSA_NAME_DEF_STMT (dest);
      if (!is_gimple_assign (stmt))
	return NULL_TREE;

      dest = gimple_assign_rhs1 (stmt);
    }

  if (TREE_CODE (dest) != ADDR_EXPR)
    return NULL_TREE;

So unless I'm missing something this does what you're looking
for.


[ Assuming the point here here is to look back through a conversion from
an array type to a pointer. ]

Yes.  It determines the size of the member array in cases like
this one:

  struct S { char a[5]; void (*pf)(void); };

  void sink (void*);

  void g (struct S *p, int n)
  {
    if (n < 7) n = 7;

    __builtin_strncpy (p->a, "123456", n);
    sink (p->a);
  }

Assuming this response resolves all your concerns, is the last
version of the patch okay to commit?

Martin


[*] For reference this is the code I used:

  if (tree defvar = gimple_vdef (gsi_stmt (gsi)))
    {
      if (num_imm_uses (defvar) >= 1)
	{
	  imm_use_iterator iter;
	  gimple *use_stmt;

	  /* Iterate over the immediate uses of the memory tag.  */
	  FOR_EACH_IMM_USE_STMT (use_stmt, iter, defvar)
	    {
	      if (is_gimple_assign (use_stmt))
		{
		  HOST_WIDE_INT off;
		  dstdecl = get_addr_base_and_unit_offset (dstdecl, &off);

		  tree lhs = gimple_assign_lhs (use_stmt);
		  if (tree lhsbase = get_addr_base_and_unit_offset (lhs, &off))
		    if (operand_equal_p (dstdecl, lhsbase, 0))
		      return false;
		}
	    }
	}
    }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]