This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA][PR tree-optimization/61912] [PATCH 2/4] Trimming CONSTRUCTOR stores in DSE - V3


On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law <law@redhat.com> wrote:
> This is the second patch in the kit to improve our DSE implementation.
>
> This patch recognizes when a CONSTRUCTOR assignment could be trimmed at the
> head or tail because those bytes are dead.
>
> The first implementation of this turned the CONSTRUCTOR into a memset. This
> version actually rewrites the RHS and LHS of the CONSTRUCTOR assignment.
>
> You'll note that the implementation computes head and tail trim counts, then
> masks them to an even byte count.  We might even consider masking off the
> two low bits in the counts.  This masking keeps higher alignments on the
> CONSTRUCTOR remnant which helps keep things efficient when the CONSTRUCTOR
> results in a memset call.
>
> This patch hits a lot statically in GCC and the testsuite.  There were
> hundreds of hits in each.
>
> There may be some room for tuning.  Trimming shouldn't ever result in poorer
> performance, but it may also not result in any measurable gain (it depends
> on how much gets trimmed relative to the size of the CONSTRUCTOR node and
> how the CONSTRUCTOR node gets expanded, the processor's capabilities for
> merging stores internally, etc etc).  I suspect the main benefit comes when
> the CONSTRUCTOR collapses down to some thing small that gets expanded
> inline, thus exposing the internals to the rest of the optimization
> pipeline.
>
> We could, in theory, split the CONSTRUCTOR to pick up dead bytes in the
> middle of the CONSTRUCTOR.  I haven't looked to see how applicable that is
> in real code and what the cost/benefit analysis might look like.
>
>
> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?

+  categorize_ctor_elements (ctor, &nz_elts, &init_elts, &complete);
+
+  /* This is the only case we currently handle.  It actually seems to
+     catch most cases of actual interest.  */
+  if (init_elts == 0 && !CONSTRUCTOR_NO_CLEARING (ctor))

actually the _only_ case allowed as RHS of stores is an empty CONSTRUCTOR
and thus clearing of the object.  Thus

  gcc_assert (CONSTRUCTOR_NELTS (ctor) == 0);

should work here.

+         /* And the new type for the CONSTRUCTOR.  Essentially it's just
+            a char array large enough to cover the non-trimmed parts of
+            the original CONSTRUCTOR.  Note we want explicit bounds here
+            so that we know how many bytes to clear when expanding the
+            CONSTRUCTOR.  */
+         tree type = build_range_type (sizetype, size_zero_node,
+                                       build_int_cst (sizetype, count));
+         type = build_array_type (char_type_node, type);

there is build_array_type_nelts combining both (note nelts rather than
upper bound).

+         /* Build a MEM_REF representing the whole accessed area, starting
+            at the first byte not trimmed.  */
+         tree exp = fold_build2 (MEM_REF, type, lhs_addr,
+                                 build_int_cst (build_pointer_type (type),
+                                                head_trim));

this doesn't preserve alias info but pessimistically uses alias set
zero.  I believe
you can use

    tree alias_type = reference_alias_ptr_type (gimple_assign_lhs (stmt));
    ..., build_int_cst (alias_type, head_trim)

to do better.

Note that you end up generating invalid gimple (not sure what you are doing wiht
the tree-sra stuff in this patch - it seems to be unused).  Consider

    a[i] = {};

where trimming generates

   MEM[&a[i], head_trim] = {};

that is invalid GIMPLE.  I guess we never run into this case because for the LHS
max_size != size during analysis?  To be on the safe side please add a

  if (! is_gimple_min_invariant (lhs_addr))
    return;

after computing lhs_addr.

Sth to keep in mind is that nothing on GIMPLE changes the store from a
CONSTRUCTOR
to a store from literal zero.  So even if you end up with a int-size
and int-aligned

  MEM <char[4]> [&x, 12] = {};

nothing (but RTL expansion) rewrites it to

  MEM <int> [&x, 12] = 0;

that's something missing from gimple-fold.c (some code can be shared with
MEMSET folding I guess).

Otherwise the patch looks ok to me.

Thanks,
Richard.

>         PR tree-optimization/61912
>         PR tree-optimization/77485
>         * tree-sra.h: New file.
>         * ipa-cp.c: Include tree-sra.h
>         * ipa-prop.h (build_ref_for_offset): Remove prototype.
>         * tree-ssa-dse.c: Include expr.h and tree-sra.h.
>         (compute_trims, trim_constructor_store): New functions.
>         (maybe_trim_partially_dead_store): Call trim_constructor_store.
>
>
>         * g++.dg/tree-ssa/ssa-dse-1.C: New test.
>         * gcc.dg/tree-ssa/pr30375: Adjust expected output.
>         * gcc.dg/tree-ssa/ssa-dse-24.c: New test.
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index d3b5052..bc5ea87 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -122,6 +122,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-inline.h"
>  #include "ipa-utils.h"
>  #include "tree-ssa-ccp.h"
> +#include "tree-sra.h"
>
>  template <typename valtype> class ipcp_value;
>
> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> index 0e75cf4..6d7b480 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -820,10 +820,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree
> **, bool *,
>  void ipa_release_body_info (struct ipa_func_body_info *);
>  tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
>
> -/* From tree-sra.c:  */
> -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
> -                          gimple_stmt_iterator *, bool);
> -
>  /* In ipa-cp.c  */
>  void ipa_cp_c_finalize (void);
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
> b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
> new file mode 100644
> index 0000000..3f85f3a
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
> @@ -0,0 +1,101 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c++14 -O -fdump-tree-dse1-details" } */
> +
> +using uint = unsigned int;
> +
> +template<typename C, uint S>
> +struct FixBuf
> +{
> +       C buf[S] = {};
> +};
> +
> +template<typename C>
> +struct OutBuf
> +{
> +       C*      cur;
> +       C*      end;
> +       C*      beg;
> +
> +       template<uint S>
> +       constexpr
> +       OutBuf(FixBuf<C, S>& b) : cur{b.buf}, end{b.buf + S}, beg{b.buf} { }
> +
> +       OutBuf(C* b, C* e) : cur{b}, end{e} { }
> +       OutBuf(C* b, uint s) : cur{b}, end{b + s} { }
> +
> +       constexpr
> +       OutBuf& operator<<(C v)
> +       {
> +               if (cur < end) {
> +                       *cur = v;
> +               }
> +               ++cur;
> +               return *this;
> +       }
> +
> +       constexpr
> +       OutBuf& operator<<(uint v)
> +       {
> +               uint q = v / 10U;
> +               uint r = v % 10U;
> +               if (q) {
> +                       *this << q;
> +               }
> +               *this << static_cast<C>(r + '0');
> +               return *this;
> +       }
> +};
> +
> +template<bool BOS>
> +struct BufOrSize
> +{
> +       template<typename C, uint S>
> +       static constexpr auto Select(FixBuf<C, S>& fb, OutBuf<C>&)
> +       {
> +               return fb;
> +       }
> +};
> +
> +template<>
> +struct BufOrSize<true>
> +{
> +       template<typename C, uint S>
> +       static constexpr auto Select(FixBuf<C, S>&, OutBuf<C>& ob)
> +       {
> +               return ob.cur - ob.beg;
> +       }
> +};
> +
> +// if BOS=1, it will return the size of the generated data, else the data
> itself
> +template<uint N, uint S, bool BOS = 0>
> +constexpr
> +auto fixbuf()
> +{
> +       FixBuf<char, S> fb;
> +       OutBuf<char> ob{fb};
> +       for (uint i = 0; i <= N; ++i) {
> +               ob << i << static_cast<char>(i == N ? 0 : ' ');
> +       }
> +       return BufOrSize<BOS>::Select(fb, ob);
> +}
> +
> +auto foo()
> +{
> +       constexpr auto x = fixbuf<13, 200>();
> +       return x;
> +}
> +
> +auto foo_sized()
> +{
> +       constexpr auto s = fixbuf<13, 0, 1>();
> +       constexpr auto x = fixbuf<13, s>();
> +       return x;
> +}
> +
> +int main()
> +{
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\]
> \\*\\)&<retval> \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
> index 0439b1c..cf0a93b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
> @@ -22,4 +22,5 @@ void test_signed_msg_encoding(void)
>      f();
>  }
>
> -/* { dg-final { scan-tree-dump-times "signInfo = {}" 1 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\]
> \\*\\)&signInfo \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
> new file mode 100644
> index 0000000..80a35ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
> @@ -0,0 +1,62 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1" } */
> +
> +
> +typedef unsigned int wchar_t;
> +struct printf_info
> +{
> +  int prec;
> +  int width;
> +  wchar_t spec;
> +  unsigned int is_long_double:1;
> +  unsigned int is_short:1;
> +  unsigned int is_long:1;
> +  unsigned int alt:1;
> +  unsigned int space:1;
> +  unsigned int left:1;
> +  unsigned int showsign:1;
> +  unsigned int group:1;
> +  unsigned int extra:1;
> +  unsigned int is_char:1;
> +  unsigned int wide:1;
> +  unsigned int i18n:1;
> +  unsigned int __pad:4;
> +  unsigned short int user;
> +  wchar_t pad;
> +} info;
> +
> +void bar (struct printf_info *);
> +
> +void foo(int prec,
> +  int width,
> +  wchar_t spec,
> +  unsigned int is_long_double,
> +  unsigned int is_short,
> +  unsigned int is_long,
> +  unsigned int alt,
> +  unsigned int space,
> +  unsigned int left,
> +  unsigned int showsign,
> +  unsigned int group,
> +  wchar_t pad)
> +{
> +    struct printf_info info = {
> +        .prec = prec,
> +        .width = width,
> +        .spec = spec,
> +        .is_long_double = is_long_double,
> +        .is_short = is_short,
> +        .is_long = is_long,
> +        .alt = alt,
> +        .space = space,
> +        .left = left,
> +        .showsign = showsign,
> +        .group = group,
> +        .pad = pad,
> +        .extra = 0,
> +        .wide = sizeof (char) != 1 };
> +
> +    bar (&info);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\]
> \\*\\)&info \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
> diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
> new file mode 100644
> index 0000000..6d88ece
> --- /dev/null
> +++ b/gcc/tree-sra.h
> @@ -0,0 +1,27 @@
> +/* Header file for SRA helper functions
> +   Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> + for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_TREE_SRA_H
> +#define GCC_TREE_SRA_H
> +
> +tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
> +                          gimple_stmt_iterator *, bool);
> +
> +
> +#endif /* GCC_TREE_SRA_H */
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 40acd83..4757702 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -34,6 +34,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "domwalk.h"
>  #include "tree-cfgcleanup.h"
>  #include "params.h"
> +#include "expr.h"
> +#include "tree-sra.h"
>
>  /* This file implements dead store elimination.
>
> @@ -272,6 +274,68 @@ maybe_trim_complex_store (ao_ref *ref, sbitmap live,
> gimple *stmt)
>       are live.  We do not try to optimize those cases.  */
>  }
>
> +/* STMT initializes an object using a CONSTRUCTOR where one or more of the
> +   bytes written are dead stores.  ORIG is the bitmap of bytes stored by
> +   STMT.  LIVE is the bitmap of stores that are actually live.
> +
> +   Attempt to rewrite STMT so that only the real or imaginary part of
> +   the object is actually stored.
> +
> +   The most common case for getting here is a CONSTRUCTOR with no elements
> +   being used to zero initialize an object.  We do not try to handle other
> +   cases as those would force us to fully cover the object with the
> +   CONSTRUCTOR node except for the components that are dead.  */
> +
> +static void
> +trim_constructor_store (ao_ref *ref, sbitmap live, gimple *stmt)
> +{
> +  tree ctor = gimple_assign_rhs1 (stmt);
> +
> +  HOST_WIDE_INT nz_elts, init_elts;
> +  bool complete;
> +  categorize_ctor_elements (ctor, &nz_elts, &init_elts, &complete);
> +
> +  /* This is the only case we currently handle.  It actually seems to
> +     catch most cases of actual interest.  */
> +  if (init_elts == 0 && !CONSTRUCTOR_NO_CLEARING (ctor))
> +    {
> +      int head_trim = 0;
> +      int tail_trim = 0;
> +      compute_trims (ref, live, &head_trim, &tail_trim);
> +
> +      /* Now we want to replace the constructor initializer
> +        with memset (object + head_trim, 0, size - head_trim - tail_trim).
> */
> +      if (head_trim || tail_trim)
> +       {
> +         /* We want &lhs for the MEM_REF expression.  */
> +         tree lhs_addr = build_fold_addr_expr (gimple_assign_lhs (stmt));
> +
> +         /* The number of bytes for the new constructor.  */
> +         int count = (ref->size / BITS_PER_UNIT) - head_trim - tail_trim -
> 1;
> +
> +         /* And the new type for the CONSTRUCTOR.  Essentially it's just
> +            a char array large enough to cover the non-trimmed parts of
> +            the original CONSTRUCTOR.  Note we want explicit bounds here
> +            so that we know how many bytes to clear when expanding the
> +            CONSTRUCTOR.  */
> +         tree type = build_range_type (sizetype, size_zero_node,
> +                                       build_int_cst (sizetype, count));
> +         type = build_array_type (char_type_node, type);
> +
> +
> +         /* Build a MEM_REF representing the whole accessed area, starting
> +            at the first byte not trimmed.  */
> +         tree exp = fold_build2 (MEM_REF, type, lhs_addr,
> +                                 build_int_cst (build_pointer_type (type),
> +                                                head_trim));
> +
> +         /* Now update STMT with a new RHS and LHS.  */
> +         gimple_assign_set_lhs (stmt, exp);
> +         gimple_assign_set_rhs1 (stmt, build_constructor (type, NULL));
> +       }
> +    }
> +}
> +
>  /* STMT is a memory write where one or more bytes written are dead
>     stores.  ORIG is the bitmap of bytes stored by STMT.  LIVE is the
>     bitmap of stores that are actually live.
> @@ -288,6 +352,9 @@ maybe_trim_partially_dead_store (ao_ref *ref, sbitmap
> live, gimple *stmt)
>      {
>        switch (gimple_assign_rhs_code (stmt))
>         {
> +       case CONSTRUCTOR:
> +         trim_constructor_store (ref, live, stmt);
> +         break;
>         case COMPLEX_CST:
>           maybe_trim_complex_store (ref, live, stmt);
>           break;
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]