This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFA][PR tree-optimization/61912] [PATCH 2/4] Trimming CONSTRUCTOR stores in DSE - V3
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 4 Jan 2017 14:41:29 +0100
- Subject: Re: [RFA][PR tree-optimization/61912] [PATCH 2/4] Trimming CONSTRUCTOR stores in DSE - V3
- Authentication-results: sourceware.org; auth=none
- References: <60f68c8d-a7e0-73cd-aa0c-1d9bc4963ad3@redhat.com>
On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law <law@redhat.com> wrote:
> This is the second patch in the kit to improve our DSE implementation.
>
> This patch recognizes when a CONSTRUCTOR assignment could be trimmed at the
> head or tail because those bytes are dead.
>
> The first implementation of this turned the CONSTRUCTOR into a memset. This
> version actually rewrites the RHS and LHS of the CONSTRUCTOR assignment.
>
> You'll note that the implementation computes head and tail trim counts, then
> masks them to an even byte count. We might even consider masking off the
> two low bits in the counts. This masking keeps higher alignments on the
> CONSTRUCTOR remnant which helps keep things efficient when the CONSTRUCTOR
> results in a memset call.
>
> This patch hits a lot statically in GCC and the testsuite. There were
> hundreds of hits in each.
>
> There may be some room for tuning. Trimming shouldn't ever result in poorer
> performance, but it may also not result in any measurable gain (it depends
> on how much gets trimmed relative to the size of the CONSTRUCTOR node and
> how the CONSTRUCTOR node gets expanded, the processor's capabilities for
> merging stores internally, etc etc). I suspect the main benefit comes when
> the CONSTRUCTOR collapses down to some thing small that gets expanded
> inline, thus exposing the internals to the rest of the optimization
> pipeline.
>
> We could, in theory, split the CONSTRUCTOR to pick up dead bytes in the
> middle of the CONSTRUCTOR. I haven't looked to see how applicable that is
> in real code and what the cost/benefit analysis might look like.
>
>
> Bootstrapped and regression tested on x86_64-linux-gnu. OK for the trunk?
+ categorize_ctor_elements (ctor, &nz_elts, &init_elts, &complete);
+
+ /* This is the only case we currently handle. It actually seems to
+ catch most cases of actual interest. */
+ if (init_elts == 0 && !CONSTRUCTOR_NO_CLEARING (ctor))
actually the _only_ case allowed as RHS of stores is an empty CONSTRUCTOR
and thus clearing of the object. Thus
gcc_assert (CONSTRUCTOR_NELTS (ctor) == 0);
should work here.
+ /* And the new type for the CONSTRUCTOR. Essentially it's just
+ a char array large enough to cover the non-trimmed parts of
+ the original CONSTRUCTOR. Note we want explicit bounds here
+ so that we know how many bytes to clear when expanding the
+ CONSTRUCTOR. */
+ tree type = build_range_type (sizetype, size_zero_node,
+ build_int_cst (sizetype, count));
+ type = build_array_type (char_type_node, type);
there is build_array_type_nelts combining both (note nelts rather than
upper bound).
+ /* Build a MEM_REF representing the whole accessed area, starting
+ at the first byte not trimmed. */
+ tree exp = fold_build2 (MEM_REF, type, lhs_addr,
+ build_int_cst (build_pointer_type (type),
+ head_trim));
this doesn't preserve alias info but pessimistically uses alias set
zero. I believe
you can use
tree alias_type = reference_alias_ptr_type (gimple_assign_lhs (stmt));
..., build_int_cst (alias_type, head_trim)
to do better.
Note that you end up generating invalid gimple (not sure what you are doing wiht
the tree-sra stuff in this patch - it seems to be unused). Consider
a[i] = {};
where trimming generates
MEM[&a[i], head_trim] = {};
that is invalid GIMPLE. I guess we never run into this case because for the LHS
max_size != size during analysis? To be on the safe side please add a
if (! is_gimple_min_invariant (lhs_addr))
return;
after computing lhs_addr.
Sth to keep in mind is that nothing on GIMPLE changes the store from a
CONSTRUCTOR
to a store from literal zero. So even if you end up with a int-size
and int-aligned
MEM <char[4]> [&x, 12] = {};
nothing (but RTL expansion) rewrites it to
MEM <int> [&x, 12] = 0;
that's something missing from gimple-fold.c (some code can be shared with
MEMSET folding I guess).
Otherwise the patch looks ok to me.
Thanks,
Richard.
> PR tree-optimization/61912
> PR tree-optimization/77485
> * tree-sra.h: New file.
> * ipa-cp.c: Include tree-sra.h
> * ipa-prop.h (build_ref_for_offset): Remove prototype.
> * tree-ssa-dse.c: Include expr.h and tree-sra.h.
> (compute_trims, trim_constructor_store): New functions.
> (maybe_trim_partially_dead_store): Call trim_constructor_store.
>
>
> * g++.dg/tree-ssa/ssa-dse-1.C: New test.
> * gcc.dg/tree-ssa/pr30375: Adjust expected output.
> * gcc.dg/tree-ssa/ssa-dse-24.c: New test.
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index d3b5052..bc5ea87 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -122,6 +122,7 @@ along with GCC; see the file COPYING3. If not see
> #include "ipa-inline.h"
> #include "ipa-utils.h"
> #include "tree-ssa-ccp.h"
> +#include "tree-sra.h"
>
> template <typename valtype> class ipcp_value;
>
> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> index 0e75cf4..6d7b480 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -820,10 +820,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree
> **, bool *,
> void ipa_release_body_info (struct ipa_func_body_info *);
> tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
>
> -/* From tree-sra.c: */
> -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
> - gimple_stmt_iterator *, bool);
> -
> /* In ipa-cp.c */
> void ipa_cp_c_finalize (void);
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
> b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
> new file mode 100644
> index 0000000..3f85f3a
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
> @@ -0,0 +1,101 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c++14 -O -fdump-tree-dse1-details" } */
> +
> +using uint = unsigned int;
> +
> +template<typename C, uint S>
> +struct FixBuf
> +{
> + C buf[S] = {};
> +};
> +
> +template<typename C>
> +struct OutBuf
> +{
> + C* cur;
> + C* end;
> + C* beg;
> +
> + template<uint S>
> + constexpr
> + OutBuf(FixBuf<C, S>& b) : cur{b.buf}, end{b.buf + S}, beg{b.buf} { }
> +
> + OutBuf(C* b, C* e) : cur{b}, end{e} { }
> + OutBuf(C* b, uint s) : cur{b}, end{b + s} { }
> +
> + constexpr
> + OutBuf& operator<<(C v)
> + {
> + if (cur < end) {
> + *cur = v;
> + }
> + ++cur;
> + return *this;
> + }
> +
> + constexpr
> + OutBuf& operator<<(uint v)
> + {
> + uint q = v / 10U;
> + uint r = v % 10U;
> + if (q) {
> + *this << q;
> + }
> + *this << static_cast<C>(r + '0');
> + return *this;
> + }
> +};
> +
> +template<bool BOS>
> +struct BufOrSize
> +{
> + template<typename C, uint S>
> + static constexpr auto Select(FixBuf<C, S>& fb, OutBuf<C>&)
> + {
> + return fb;
> + }
> +};
> +
> +template<>
> +struct BufOrSize<true>
> +{
> + template<typename C, uint S>
> + static constexpr auto Select(FixBuf<C, S>&, OutBuf<C>& ob)
> + {
> + return ob.cur - ob.beg;
> + }
> +};
> +
> +// if BOS=1, it will return the size of the generated data, else the data
> itself
> +template<uint N, uint S, bool BOS = 0>
> +constexpr
> +auto fixbuf()
> +{
> + FixBuf<char, S> fb;
> + OutBuf<char> ob{fb};
> + for (uint i = 0; i <= N; ++i) {
> + ob << i << static_cast<char>(i == N ? 0 : ' ');
> + }
> + return BufOrSize<BOS>::Select(fb, ob);
> +}
> +
> +auto foo()
> +{
> + constexpr auto x = fixbuf<13, 200>();
> + return x;
> +}
> +
> +auto foo_sized()
> +{
> + constexpr auto s = fixbuf<13, 0, 1>();
> + constexpr auto x = fixbuf<13, s>();
> + return x;
> +}
> +
> +int main()
> +{
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\]
> \\*\\)&<retval> \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
> index 0439b1c..cf0a93b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
> @@ -22,4 +22,5 @@ void test_signed_msg_encoding(void)
> f();
> }
>
> -/* { dg-final { scan-tree-dump-times "signInfo = {}" 1 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\]
> \\*\\)&signInfo \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
> new file mode 100644
> index 0000000..80a35ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
> @@ -0,0 +1,62 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1" } */
> +
> +
> +typedef unsigned int wchar_t;
> +struct printf_info
> +{
> + int prec;
> + int width;
> + wchar_t spec;
> + unsigned int is_long_double:1;
> + unsigned int is_short:1;
> + unsigned int is_long:1;
> + unsigned int alt:1;
> + unsigned int space:1;
> + unsigned int left:1;
> + unsigned int showsign:1;
> + unsigned int group:1;
> + unsigned int extra:1;
> + unsigned int is_char:1;
> + unsigned int wide:1;
> + unsigned int i18n:1;
> + unsigned int __pad:4;
> + unsigned short int user;
> + wchar_t pad;
> +} info;
> +
> +void bar (struct printf_info *);
> +
> +void foo(int prec,
> + int width,
> + wchar_t spec,
> + unsigned int is_long_double,
> + unsigned int is_short,
> + unsigned int is_long,
> + unsigned int alt,
> + unsigned int space,
> + unsigned int left,
> + unsigned int showsign,
> + unsigned int group,
> + wchar_t pad)
> +{
> + struct printf_info info = {
> + .prec = prec,
> + .width = width,
> + .spec = spec,
> + .is_long_double = is_long_double,
> + .is_short = is_short,
> + .is_long = is_long,
> + .alt = alt,
> + .space = space,
> + .left = left,
> + .showsign = showsign,
> + .group = group,
> + .pad = pad,
> + .extra = 0,
> + .wide = sizeof (char) != 1 };
> +
> + bar (&info);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\]
> \\*\\)&info \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
> diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
> new file mode 100644
> index 0000000..6d88ece
> --- /dev/null
> +++ b/gcc/tree-sra.h
> @@ -0,0 +1,27 @@
> +/* Header file for SRA helper functions
> + Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> + for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +#ifndef GCC_TREE_SRA_H
> +#define GCC_TREE_SRA_H
> +
> +tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
> + gimple_stmt_iterator *, bool);
> +
> +
> +#endif /* GCC_TREE_SRA_H */
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 40acd83..4757702 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -34,6 +34,8 @@ along with GCC; see the file COPYING3. If not see
> #include "domwalk.h"
> #include "tree-cfgcleanup.h"
> #include "params.h"
> +#include "expr.h"
> +#include "tree-sra.h"
>
> /* This file implements dead store elimination.
>
> @@ -272,6 +274,68 @@ maybe_trim_complex_store (ao_ref *ref, sbitmap live,
> gimple *stmt)
> are live. We do not try to optimize those cases. */
> }
>
> +/* STMT initializes an object using a CONSTRUCTOR where one or more of the
> + bytes written are dead stores. ORIG is the bitmap of bytes stored by
> + STMT. LIVE is the bitmap of stores that are actually live.
> +
> + Attempt to rewrite STMT so that only the real or imaginary part of
> + the object is actually stored.
> +
> + The most common case for getting here is a CONSTRUCTOR with no elements
> + being used to zero initialize an object. We do not try to handle other
> + cases as those would force us to fully cover the object with the
> + CONSTRUCTOR node except for the components that are dead. */
> +
> +static void
> +trim_constructor_store (ao_ref *ref, sbitmap live, gimple *stmt)
> +{
> + tree ctor = gimple_assign_rhs1 (stmt);
> +
> + HOST_WIDE_INT nz_elts, init_elts;
> + bool complete;
> + categorize_ctor_elements (ctor, &nz_elts, &init_elts, &complete);
> +
> + /* This is the only case we currently handle. It actually seems to
> + catch most cases of actual interest. */
> + if (init_elts == 0 && !CONSTRUCTOR_NO_CLEARING (ctor))
> + {
> + int head_trim = 0;
> + int tail_trim = 0;
> + compute_trims (ref, live, &head_trim, &tail_trim);
> +
> + /* Now we want to replace the constructor initializer
> + with memset (object + head_trim, 0, size - head_trim - tail_trim).
> */
> + if (head_trim || tail_trim)
> + {
> + /* We want &lhs for the MEM_REF expression. */
> + tree lhs_addr = build_fold_addr_expr (gimple_assign_lhs (stmt));
> +
> + /* The number of bytes for the new constructor. */
> + int count = (ref->size / BITS_PER_UNIT) - head_trim - tail_trim -
> 1;
> +
> + /* And the new type for the CONSTRUCTOR. Essentially it's just
> + a char array large enough to cover the non-trimmed parts of
> + the original CONSTRUCTOR. Note we want explicit bounds here
> + so that we know how many bytes to clear when expanding the
> + CONSTRUCTOR. */
> + tree type = build_range_type (sizetype, size_zero_node,
> + build_int_cst (sizetype, count));
> + type = build_array_type (char_type_node, type);
> +
> +
> + /* Build a MEM_REF representing the whole accessed area, starting
> + at the first byte not trimmed. */
> + tree exp = fold_build2 (MEM_REF, type, lhs_addr,
> + build_int_cst (build_pointer_type (type),
> + head_trim));
> +
> + /* Now update STMT with a new RHS and LHS. */
> + gimple_assign_set_lhs (stmt, exp);
> + gimple_assign_set_rhs1 (stmt, build_constructor (type, NULL));
> + }
> + }
> +}
> +
> /* STMT is a memory write where one or more bytes written are dead
> stores. ORIG is the bitmap of bytes stored by STMT. LIVE is the
> bitmap of stores that are actually live.
> @@ -288,6 +352,9 @@ maybe_trim_partially_dead_store (ao_ref *ref, sbitmap
> live, gimple *stmt)
> {
> switch (gimple_assign_rhs_code (stmt))
> {
> + case CONSTRUCTOR:
> + trim_constructor_store (ref, live, stmt);
> + break;
> case COMPLEX_CST:
> maybe_trim_complex_store (ref, live, stmt);
> break;
>