This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: calloc = malloc + memset


Ping Jakub?
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01104.html

On Wed, 23 Apr 2014, Richard Biener wrote:

On Fri, Apr 18, 2014 at 8:27 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
Thanks for the comments!


On Fri, 18 Apr 2014, Jakub Jelinek wrote:

The passes.def change makes me a little bit nervous, but if it works,
perhaps.


Would you prefer running the pass twice? I thought there would be less
resistance to moving the pass than duplicating it.

Indeed.  I think placing it after loops and CSE (thus what you have done)
makes sense.  strlenopt itself shouldn't enable much additional
optimizations.  But well, pass ordering is always tricky.

Didn't look at the rest of the changes, but Jakub is certainly able to
approve the patch so I leave it to him.

Thanks,
Richard.

By the way, I think even
passes we run only once should have the required functions implemented so
they can be run several times (at least most of them), in case users want to
do that in plugins. I was surprised when I tried adding a second strlen pass
and the compiler refused.


--- gcc/testsuite/g++.dg/tree-ssa/calloc.C      (revision 0)
+++ gcc/testsuite/g++.dg/tree-ssa/calloc.C      (working copy)
@@ -0,0 +1,35 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+#include <new>
+#include <vector>
+#include <cstdlib>
+
+void g(void*);
+inline void* operator new(std::size_t sz)
+{
+  void *p;
+
+  if (sz == 0)
+    sz = 1;
+
+  // Slightly modified from the libsupc++ version, that one has 2 calls
+  // to malloc which makes it too hard to optimize.
+  while ((p = std::malloc (sz)) == 0)
+    {
+      std::new_handler handler = std::get_new_handler ();
+      if (! handler)
+        throw std::bad_alloc();
+      handler ();
+    }
+  return p;
+}
+
+void f(void*p,int n){
+  new(p)std::vector<int>(n);
+}
+
+/* { dg-final { scan-tree-dump-times "calloc" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "memset" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */


This looks to me way too much fragile, any time the libstdc++
or glibc headers change a little bit, you might need to adjust the
dg-final directives.  Much better would be if you just provided
the prototypes yourself and subset of the std::vector you really need for
the testcase.  You can throw some class or int, it doesn't have to be
std::bad_alloc, etc.


I don't understand what seems so fragile to you. There is a single function
in the .optimized dump, which just calls calloc in a loop. It doesn't seem
that likely that a change in glibc/libstdc++ would make an extra memset pop
up. A change in libstdc++ could easily prevent the optimization completely
(I'd like to hope we can avoid that, half of the purpose of the testcase was
making sure libstdc++ didn't change in a bad way), but I don't really see
how it could keep it in a way that requires tweaking dg-final.

While trying to write a standalone version, I hit again many missed
optimizations, getting such nice things in the .optimized dump as:

  _12 = p_13 + sz_7;
  if (_12 != p_13)

or:

  _12 = p_13 + sz_7;
  _30 = (unsigned long) _12;
  _9 = p_13 + 4;
  _10 = (unsigned long) _9;
  _11 = _30 - _10;
  _22 = _11 /[ex] 4;
  _21 = _22;
  _40 = _21 + 1;
  _34 = _40 * 4;

It is embarrassing... I hope the combiner GSoC will work well and we can
just add a dozen patterns to handle this before 4.10.


--- gcc/testsuite/gcc.dg/strlenopt-9.c  (revision 208772)
+++ gcc/testsuite/gcc.dg/strlenopt-9.c  (working copy)
@@ -11,21 +11,21 @@ fn1 (int r)
      optimized away.  */
   return strchr (p, '\0');
 }

 __attribute__((noinline, noclone)) size_t
 fn2 (int r)
 {
   char *p, q[10];
   strcpy (q, "abc");
   p = r ? "a" : q;
-  /* String length for p varies, therefore strlen below isn't
+  /* String length is constant for both alternatives, and strlen is
      optimized away.  */
   return strlen (p);


Is this because of jump threading?


It is PRE that turns:

  if (r_4(D) == 0)
    goto <bb 5>;
  else
    goto <bb 3>;

  <bb 5>:
  goto <bb 4>;

  <bb 3>:

  <bb 4>:
  # p_1 = PHI <&q(5), "a"(3)>
  _5 = __builtin_strlen (p_1);

into:

  if (r_4(D) == 0)
    goto <bb 5>;
  else
    goto <bb 3>;

  <bb 5>:
  _7 = __builtin_strlen (&q);
  pretmp_8 = _7;
  goto <bb 4>;

  <bb 3>:

  <bb 4>:
  # p_1 = PHI <&q(5), "a"(3)>
  # prephitmp_9 = PHI <pretmp_8(5), 1(3)>
  _5 = prephitmp_9;

It says:

Found partial redundancy for expression
{call_expr<__builtin_strlen>,p_1}@.MEM_3 (0005)


--- gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c    (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c    (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include <stdlib.h>
+#include <string.h>


Even this I find unsafe.  The strlenopt*.c tests use it's custom
strlenopt.h header for a reason, you might just add a calloc
prototype in there and use that header.


Might as well use __builtin_* then.


+/* Handle a call to malloc or calloc.  */
+
+static void
+handle_builtin_malloc (enum built_in_function bcode,
gimple_stmt_iterator *gsi)
+{
+  gimple stmt = gsi_stmt (*gsi);
+  tree lhs = gimple_call_lhs (stmt);
+  gcc_assert (get_stridx (lhs) == 0);
+  int idx = new_stridx (lhs);
+  tree length = NULL_TREE;
+  if (bcode == BUILT_IN_CALLOC)
+    length = build_int_cst (size_type_node, 0);


Is this safe?  I mean, if you call int a = 0; ptr = calloc (a, n);
or ptr = calloc (n, a); or ptr = calloc (0, 0); etc., then there is no
zero termination.  Sure, calling strlen or strchr etc. on the result
is undefined behavior, just wondering if it isn't going to break anything.
Though, the result is zero length and thus any dereferencing would be a
bug,
so perhaps we are fine.


You know the pass better than I do... I don't think the pass does anything
like: "all 0-length strings are equivalent, let's replace one by another
one", so I think we should be ok, but I don't mind treating calloc like
malloc here if you think it is too risky, I don't think people call
strlen(calloc(a,b)) very often, and it would let me remove the "don't use
si->length" comment.


Updated and tested on x86_64-linux-gnu.

2014-04-18  Marc Glisse  <marc.glisse@inria.fr>


        PR tree-optimization/57742
gcc/
        * tree-ssa-strlen.c (get_string_length): Ignore malloc.
        (handle_builtin_malloc, handle_builtin_memset): New functions.
        (strlen_optimize_stmt): Call them.
        * passes.def: Move strlen after loop+dom.
gcc/testsuite/
        * g++.dg/tree-ssa/calloc.C: New testcase.
        * gcc.dg/tree-ssa/calloc-1.c: Likewise.
        * gcc.dg/tree-ssa/calloc-2.c: Likewise.
        * gcc.dg/strlenopt-9.c: Adapt.

--
Marc Glisse
Index: gcc/passes.def
===================================================================
--- gcc/passes.def      (revision 209516)
+++ gcc/passes.def      (working copy)
@@ -176,21 +176,20 @@ along with GCC; see the file COPYING3.
         DOM and erroneous path isolation should be due to degenerate PHI
nodes.
         So rather than run the full propagators, run a specialized pass
which
         only examines PHIs to discover const/copy propagation
         opportunities.  */
       NEXT_PASS (pass_phi_only_cprop);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_dce);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);
-      NEXT_PASS (pass_strlen);
       NEXT_PASS (pass_ccp);
       /* After CCP we rewrite no longer addressed locals into SSA
         form if possible.  */
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_optimize_bswap);
       NEXT_PASS (pass_split_crit_edges);
       NEXT_PASS (pass_pre);
       NEXT_PASS (pass_sink_code);
       NEXT_PASS (pass_asan);
@@ -235,20 +234,21 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_strength_reduction);
       NEXT_PASS (pass_dominator);
       /* The only const/copy propagation opportunities left after
         DOM should be due to degenerate PHI nodes.  So rather than
         run the full propagators, run a specialized pass which
         only examines PHIs to discover const/copy propagation
         opportunities.  */
       NEXT_PASS (pass_phi_only_cprop);
+      NEXT_PASS (pass_strlen);
       NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_cd_dce);
       NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_fold_builtins);
       NEXT_PASS (pass_optimize_widening_mul);
       NEXT_PASS (pass_tail_calls);
       NEXT_PASS (pass_rename_ssa_copies);
Index: gcc/testsuite/g++.dg/tree-ssa/calloc.C
===================================================================
--- gcc/testsuite/g++.dg/tree-ssa/calloc.C      (revision 0)
+++ gcc/testsuite/g++.dg/tree-ssa/calloc.C      (working copy)
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef __SIZE_TYPE__ size_t;
+inline void* operator new(size_t, void* p) throw() { return p; }
+
+typedef void (*handler_t)(void);
+extern handler_t get_handle();
+
+inline void* operator new(size_t sz)
+{
+  void *p;
+
+  if (sz == 0)
+    sz = 1;
+
+  while ((p = __builtin_malloc (sz)) == 0)
+    {
+      handler_t handler = get_handle ();
+      if (! handler)
+        throw 42;
+      handler ();
+    }
+  return p;
+}
+
+struct vect {
+  int *start, *end;
+  vect(size_t n) {
+    start = end = 0;
+    if (n > (size_t)-1 / sizeof(int))
+      throw 33;
+    if (n != 0)
+      start = static_cast<int*> (operator new (n * sizeof(int)));
+    end = start + n;
+    int *p = start;
+    for (size_t l = n; l > 0; --l, ++p)
+      *p = 0;
+  }
+};
+
+void f (void *p, int n)
+{
+  new (p) vect(n);
+}
+
+/* { dg-final { scan-tree-dump-times "calloc" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "memset" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/strlenopt-9.c
===================================================================
--- gcc/testsuite/gcc.dg/strlenopt-9.c  (revision 209516)
+++ gcc/testsuite/gcc.dg/strlenopt-9.c  (working copy)
@@ -11,21 +11,21 @@ fn1 (int r)
      optimized away.  */
   return strchr (p, '\0');
 }

 __attribute__((noinline, noclone)) size_t
 fn2 (int r)
 {
   char *p, q[10];
   strcpy (q, "abc");
   p = r ? "a" : q;
-  /* String length for p varies, therefore strlen below isn't
+  /* String length is constant for both alternatives, and strlen is
      optimized away.  */
   return strlen (p);
 }

 __attribute__((noinline, noclone)) size_t
 fn3 (char *p, int n)
 {
   int i;
   p = strchr (p, '\0');
   /* strcat here can be optimized into memcpy.  */
@@ -91,19 +91,19 @@ main ()
       || memcmp (b, "aaaabcdabcdefgabcdefgefgabcdabcd", 33) != 0
       || memcmp (buf, "aaaabcdabcdefgabcdefgefgabcdefg", 32) != 0)
     abort ();
   if (fn4 (b, 256) != 4
       || memcmp (b, "aaaabcdabcdefgabcdefgefgabcdabcdabcd", 37) != 0
       || memcmp (buf, "aaaabcdabcdefgabcdefgefgabcdabcdefgefg", 39) != 0)
     abort ();
   return 0;
 }

-/* { dg-final { scan-tree-dump-times "strlen \\(" 4 "strlen" } } */
+/* { dg-final { scan-tree-dump-times "strlen \\(" 3 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "memcpy \\(" 6 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strcpy \\(" 1 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strcat \\(" 0 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strchr \\(" 3 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "stpcpy \\(" 0 "strlen" } } */
 /* { dg-final { cleanup-tree-dump "strlen" } } */
 /* { dg-final { scan-tree-dump-times "return 4;" 1 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c    (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c    (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+extern int a;
+extern int *b;
+int n;
+void* f(long *q)
+{
+  int *p = __builtin_malloc (n);
+  ++*q;
+  if (p)
+  {
+    ++*q;
+    a = 2;
+    __builtin_memset (p, 0, n);
+    *b = 3;
+  }
+  return p;
+}
+void* g(void)
+{
+  float *p = __builtin_calloc (8, 4);
+  return __builtin_memset (p, 0, 24); // not 32
+}
+
+/* { dg-final { scan-tree-dump-times "calloc" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "memset" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/calloc-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/calloc-2.c    (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/calloc-2.c    (working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int n, nn;
+void* f()
+{
+  char *p = __builtin_calloc (n, 1);
+  p[42] = '\n';
+  __builtin_memset (p, 0, nn);
+  return p;
+}
+
+void* g(int m1, int m2)
+{
+  char *p = __builtin_malloc (m2);
+  while (--m1)
+  {
+    __builtin_memset (p, 0, m2);
+    p[n] = 'b';
+  }
+  return p;
+}
+
+/* { dg-final { scan-tree-dump-times "malloc" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "calloc" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "memset" 2 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/tree-ssa-strlen.c
===================================================================
--- gcc/tree-ssa-strlen.c       (revision 209516)
+++ gcc/tree-ssa-strlen.c       (working copy)
@@ -496,20 +496,23 @@ get_string_length (strinfo si)
        case BUILT_IN_STPCPY_CHK:
          gcc_assert (lhs != NULL_TREE);
          loc = gimple_location (stmt);
          si->endptr = lhs;
          si->stmt = NULL;
          lhs = fold_convert_loc (loc, size_type_node, lhs);
          si->length = fold_convert_loc (loc, size_type_node, si->ptr);
          si->length = fold_build2_loc (loc, MINUS_EXPR, size_type_node,
                                        lhs, si->length);
          break;
+       case BUILT_IN_MALLOC:
+         break;
+       /* BUILT_IN_CALLOC always has si->length set.  */
        default:
          gcc_unreachable ();
          break;
        }
     }

   return si->length;
 }

 /* Invalidate string length information for strings whose length
@@ -521,20 +524,21 @@ maybe_invalidate (gimple stmt)
   strinfo si;
   unsigned int i;
   bool nonempty = false;

   for (i = 1; vec_safe_iterate (stridx_to_strinfo, i, &si); ++i)
     if (si != NULL)
       {
        if (!si->dont_invalidate)
          {
            ao_ref r;
+           /* Do not use si->length.  */
            ao_ref_init_from_ptr_and_size (&r, si->ptr, NULL_TREE);
            if (stmt_may_clobber_ref_p_1 (stmt, &r))
              {
                set_strinfo (i, NULL);
                free_strinfo (si);
                continue;
              }
          }
        si->dont_invalidate = false;
        nonempty = true;
@@ -1608,20 +1612,93 @@ handle_builtin_strcat (enum built_in_fun
        {
          laststmt.stmt = stmt;
          laststmt.len = srclen;
          laststmt.stridx = dsi->idx;
        }
     }
   else if (dump_file && (dump_flags & TDF_DETAILS) != 0)
     fprintf (dump_file, "not possible.\n");
 }

+/* Handle a call to malloc or calloc.  */
+
+static void
+handle_builtin_malloc (enum built_in_function bcode, gimple_stmt_iterator
*gsi)
+{
+  gimple stmt = gsi_stmt (*gsi);
+  tree lhs = gimple_call_lhs (stmt);
+  gcc_assert (get_stridx (lhs) == 0);
+  int idx = new_stridx (lhs);
+  tree length = NULL_TREE;
+  if (bcode == BUILT_IN_CALLOC)
+    length = build_int_cst (size_type_node, 0);
+  strinfo si = new_strinfo (lhs, idx, length);
+  if (bcode == BUILT_IN_CALLOC)
+    si->endptr = lhs;
+  set_strinfo (idx, si);
+  si->writable = true;
+  si->stmt = stmt;
+  si->dont_invalidate = true;
+}
+
+/* Handle a call to memset.
+   After a call to calloc, memset(,0,) is unnecessary.
+   memset(malloc(n),0,n) is calloc(n,1).  */
+
+static bool
+handle_builtin_memset (gimple_stmt_iterator *gsi)
+{
+  gimple stmt2 = gsi_stmt (*gsi);
+  if (!integer_zerop (gimple_call_arg (stmt2, 1)))
+    return true;
+  tree ptr = gimple_call_arg (stmt2, 0);
+  int idx1 = get_stridx (ptr);
+  if (idx1 <= 0)
+    return true;
+  strinfo si1 = get_strinfo (idx1);
+  if (!si1)
+    return true;
+  gimple stmt1 = si1->stmt;
+  if (!stmt1 || !is_gimple_call (stmt1))
+    return true;
+  tree callee1 = gimple_call_fndecl (stmt1);
+  if (!gimple_call_builtin_p (stmt1, BUILT_IN_NORMAL))
+    return true;
+  enum built_in_function code1 = DECL_FUNCTION_CODE (callee1);
+  tree size = gimple_call_arg (stmt2, 2);
+  if (code1 == BUILT_IN_CALLOC)
+    /* Not touching stmt1 */ ;
+  else if (code1 == BUILT_IN_MALLOC
+          && operand_equal_p (gimple_call_arg (stmt1, 0), size, 0))
+    {
+      gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt1);
+      update_gimple_call (&gsi1, builtin_decl_implicit (BUILT_IN_CALLOC),
2,
+                         size, build_one_cst (size_type_node));
+    }
+  else
+    return true;
+  tree lhs = gimple_call_lhs (stmt2);
+  unlink_stmt_vdef (stmt2);
+  if (lhs)
+    {
+      gimple assign = gimple_build_assign (lhs, ptr);
+      gsi_replace (gsi, assign, false);
+    }
+  else
+    {
+      gsi_remove (gsi, true);
+      release_defs (stmt2);
+    }
+
+  return false;
+}
+
 /* Handle a POINTER_PLUS_EXPR statement.
    For p = "abcd" + 2; compute associated length, or if
    p = q + off is pointing to a '\0' character of a string, call
    zero_length_string on it.  */

 static void
 handle_pointer_plus (gimple_stmt_iterator *gsi)
 {
   gimple stmt = gsi_stmt (*gsi);
   tree lhs = gimple_assign_lhs (stmt), off;
@@ -1845,20 +1922,28 @@ strlen_optimize_stmt (gimple_stmt_iterat
          case BUILT_IN_MEMCPY:
          case BUILT_IN_MEMCPY_CHK:
          case BUILT_IN_MEMPCPY:
          case BUILT_IN_MEMPCPY_CHK:
            handle_builtin_memcpy (DECL_FUNCTION_CODE (callee), gsi);
            break;
          case BUILT_IN_STRCAT:
          case BUILT_IN_STRCAT_CHK:
            handle_builtin_strcat (DECL_FUNCTION_CODE (callee), gsi);
            break;
+         case BUILT_IN_MALLOC:
+         case BUILT_IN_CALLOC:
+           handle_builtin_malloc (DECL_FUNCTION_CODE (callee), gsi);
+           break;
+         case BUILT_IN_MEMSET:
+           if (!handle_builtin_memset (gsi))
+             return false;
+           break;
          default:
            break;
          }
     }
   else if (is_gimple_assign (stmt))
     {
       tree lhs = gimple_assign_lhs (stmt);

       if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE (lhs)))
        {

--
Marc Glisse


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]