This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR 80293] Don't totally-sRA char arrays


On Mon, 17 Apr 2017, Martin Jambor wrote:

> Hi,
> 
> On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote:
> > >Hi,
> > >
> > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> > >> On Wed, 12 Apr 2017, Martin Jambor wrote:
> > >> 
> > >> > Hi,
> > >> > 
> > >> > the patch below is an attempt to deal with PR 80293 as
> > >non-invasively
> > >> > as possible.  Basically, it switches off total SRA scalarization of
> > >> > any local aggregates which contains an array of elements that have
> > >one
> > >> > byte (or less).
> > >> > 
> > >> > The logic behind this is that accessing such arrays element-wise
> > >> > usually results in poor code and that such char arrays are often
> > >used
> > >> > for non-statically-typed content anyway, and we do not want to copy
> > >> > that byte per byte.
> > >> > 
> > >> > Alan, do you think this could impact your constant pool
> > >scalarization
> > >> > too severely?
> > >> 
> > >> Hmm, isn't one of the issues that we have
> > >> 
> > >>         if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> > >>           {
> > >>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> > >>                 <= max_scalarization_size)
> > >>               {
> > >>                 create_total_scalarization_access (var);
> > >> 
> > >> which limits the size of scalarizable vars but not the number
> > >> of accesses we create for total scalarization?
> > >
> > >Well, yes, but at least I understood from your comment #4 in the bug
> > >that you did not want to limit the number of replacements for gcc 7?
> > >
> > >> 
> > >> Is scalarizable_type_p only used in contexts where we have no hint
> > >> of the actual accesses?
> > >
> > >There are should_scalarize_away_bitmap and
> > >cannot_scalarize_away_bitmap hints.
> > >
> > >Total scalarization is a hack process where we chop up small
> > >aggregates according to their types - as opposed to actual accesses,
> > >meaning it is an alien process to the rest of SRA - knowing that they
> > >will go completely away afterwards (that is ensured by
> > >cannot_scalarize_away_bitmap) and that this will facilitate copy
> > >propagation (this is indicated by should_scalarize_away_bitmap, which
> > >has a bit set if there is a non-scalar assignment _from_ (a part of)
> > >the aggregate).
> > 
> > OK, but for the copy x = y where x would go away it still depends on uses of x what pieces we actually want?  Or is total scalarization only done for x  = y; z = x;?
> > Thus no further accesses to x?
> 
> Total scalarization adds artificial accesses only to y, but
> (in both cases of total and "natural" scalarization) for all aggregate
> assignments between SRA candidates, SRA attempts to recreate all
> accesses covering RHS to LHS.  Transitively.  So the artificial
> accesses created for y will then get created for x and z even if they
> would not be candidates for total scalarization.  So e.g. if z cannot
> go away because it is passed to a function, it will be loaded
> piece-wise from y.

So what I was trying to figure out is whether there is anything that
fores us to totally scalarize using "natural" elements rather than
using larger accesses?  For

  x = y;
  z = x;

the best pieces to chop x to depends on the write accesses to y
and the read accesses from z (we eventually want to easily match them
up for CSE).

I see we first create total scalarization accesses and only then
doing propagate_all_subaccesses.

> > 
> > >> That is, for the constant pool case we
> > >> usually have
> > >> 
> > >>   x = .LC0;
> > >>   .. = x[2];
> > >> 
> > >> so we have a "hint" that accesses on x are those we'd want to
> > >> optimize to accesses to .LC0.
> > >
> > >You don't need total scalarization for this, just the existence of
> > >read from x[2] would be sufficient but thanks for pointing me to the
> > >right direction.  For constant pool decl scalarization, it is not
> > >important to introduce artificial accesses for x but for .LC0.
> > >Therefore, I have adapted the patch to allow char arrays for const
> > >decls only and verified that it scalarizes a const-pool array of chars
> > >on Aarch64.  The (otherwise yet untested) result is below.
> > >
> > >What do you think?
> > 
> > Why special case char arrays?  I'd simply disallow total scalarization of non-const arrays completely.
> 
> Well, currently we will get element-wise copy propagation for things
> like "int rgb[3];" (possibly embeded in a small struct).  If I remove
> it, someone will complain.  Maybe the correct limit is SI mode size or
> BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
> I just wanted to be conservative at this point in the release cycle.

Sure but this total scalarization was added to be able to defer
ctor "lowering" at gimplification time to SRA.  So it would be ok
(IMHO) to limit it to constant initializers.

But agreed for to be conservative at this point.  I'd say we do this
once stage1 opens on trunk and then backport for 7.2.

Maybe there's input from Alan as well then.

Thanks,
Richard.

> Martin
> 
> > 
> > >Martin
> > >
> > >
> > >2017-04-13  Martin Jambor  <mjambor@suse.cz>
> > >
> > >	* tree-sra.c (scalarizable_type_p): New parameter const_decl, make
> > >	char arrays not totally scalarizable if it is false.
> > >	(analyze_all_variable_accesses): Pass correct value in the new
> > >	parameter.
> > >
> > >testsuite/
> > >	* g++.dg/tree-ssa/pr80293.C: New test.
> > >---
> > >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45
> > >+++++++++++++++++++++++++++++++++
> > > gcc/tree-sra.c                          | 21 ++++++++++-----
> > > 2 files changed, 60 insertions(+), 6 deletions(-)
> > > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> > >
> > >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> > >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> > >new file mode 100644
> > >index 00000000000..7faf35ae983
> > >--- /dev/null
> > >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> > >@@ -0,0 +1,45 @@
> > >+// { dg-do compile }
> > >+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
> > >+
> > >+#include <array>
> > >+
> > >+// Return a copy of the underlying memory of an arbitrary value.
> > >+template <
> > >+    typename T,
> > >+    typename = typename
> > >std::enable_if<std::is_trivially_copyable<T>::value>::type
> > >+>
> > >+auto getMem(
> > >+    T const & value
> > >+) -> std::array<char, sizeof(T)> {
> > >+    auto ret = std::array<char, sizeof(T)>{};
> > >+    __builtin_memcpy(ret.data(), &value, sizeof(T));
> > >+    return ret;
> > >+}
> > >+
> > >+template <
> > >+    typename T,
> > >+    typename = typename
> > >std::enable_if<std::is_trivially_copyable<T>::value>::type
> > >+>
> > >+auto fromMem(
> > >+    std::array<char, sizeof(T)> const & buf
> > >+) -> T {
> > >+    auto ret = T{};
> > >+    __builtin_memcpy(&ret, buf.data(), sizeof(T));
> > >+    return ret;
> > >+}
> > >+
> > >+double foo1(std::uint64_t arg) {
> > >+    return fromMem<double>(getMem(arg));
> > >+}
> > >+
> > >+double foo2(std::uint64_t arg) {
> > >+    return *reinterpret_cast<double*>(&arg);
> > >+}
> > >+
> > >+double foo3(std::uint64_t arg) {
> > >+    double ret;
> > >+    __builtin_memcpy(&ret, &arg, sizeof(arg));
> > >+    return ret;
> > >+}
> > >+
> > >+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
> > >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> > >index 02453d3ed9a..ab06be66131 100644
> > >--- a/gcc/tree-sra.c
> > >+++ b/gcc/tree-sra.c
> > >@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool
> > >write)
> > > 
> > >/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or
> > >fixed-length
> > >ARRAY_TYPE with fields that are either of gimple register types
> > >(excluding
> > >-   bit-fields) or (recursively) scalarizable types.  */
> > >+   bit-fields) or (recursively) scalarizable types.  CONST_DECL must
> > >be true if
> > >+   we are considering a decl from constant pool.  If it is false, char
> > >arrays
> > >+   will be refused.  */
> > > 
> > > static bool
> > >-scalarizable_type_p (tree type)
> > >+scalarizable_type_p (tree type, bool const_decl)
> > > {
> > >   gcc_assert (!is_gimple_reg_type (type));
> > >   if (type_contains_placeholder_p (type))
> > >@@ -970,7 +972,7 @@ scalarizable_type_p (tree type)
> > > 	    return false;
> > > 
> > > 	  if (!is_gimple_reg_type (ft)
> > >-	      && !scalarizable_type_p (ft))
> > >+	      && !scalarizable_type_p (ft, const_decl))
> > > 	    return false;
> > > 	}
> > > 
> > >@@ -978,10 +980,16 @@ scalarizable_type_p (tree type)
> > > 
> > >   case ARRAY_TYPE:
> > >     {
> > >+      HOST_WIDE_INT min_elem_size;
> > >+      if (const_decl)
> > >+	min_elem_size = 0;
> > >+      else
> > >+	min_elem_size = BITS_PER_UNIT;
> > >+
> > >       if (TYPE_DOMAIN (type) == NULL_TREE
> > > 	  || !tree_fits_shwi_p (TYPE_SIZE (type))
> > > 	  || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type)))
> > >-	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0)
> > >+	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size)
> > > 	  || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type))))
> > > 	return false;
> > >       if (tree_to_shwi (TYPE_SIZE (type)) == 0
> > >@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type)
> > > 
> > >       tree elem = TREE_TYPE (type);
> > >       if (!is_gimple_reg_type (elem)
> > >-	 && !scalarizable_type_p (elem))
> > >+	  && !scalarizable_type_p (elem, const_decl))
> > > 	return false;
> > >       return true;
> > >     }
> > >@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void)
> > >       {
> > > 	tree var = candidate (i);
> > > 
> > >-	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> > >+	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
> > >+						constant_decl_p (var)))
> > > 	  {
> > > 	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> > > 		<= max_scalarization_size)
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]