This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC, PR 80689] Copy small aggregates element-wise


On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor <mjambor@suse.cz> wrote:
> Hi,
>
> On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
>> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor <mjambor@suse.cz> wrote:
>> > Hi,
>> >
>> > I'd like to request comments to the patch below which aims to fix PR
>> > 80689, which is an instance of a store-to-load forwarding stall on
>> > x86_64 CPUs in the Image Magick benchmark, which is responsible for a
>> > slow down of up to 9% compared to gcc 6, depending on options and HW
>> > used.  (Actually, I have just seen 24% in one specific combination but
>> > for various reasons can no longer verify it today.)
>> >
>> > The revision causing the regression is 237074, which increased the
>> > size of the mode for copying aggregates "by pieces" to 128 bits,
>> > incurring big stalls when the values being copied are also still being
>> > stored in a smaller data type or if the copied values are loaded in a
>> > smaller types shortly afterwards.  Such situations happen in Image
>> > Magick even across calls, which means that any non-IPA flow-sensitive
>> > approach would not detect them.  Therefore, the patch simply changes
>> > the way we copy small BLKmode data that are simple combinations of
>> > records and arrays (meaning no unions, bit-fields but also character
>> > arrays are disallowed) and simply copies them one field and/or element
>> > at a time.
>> >
>> > "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
>> > (the structure in the benchmark has 32 bytes) but is subject to change
>> > after more benchmarking and is actually zero - meaning element copying
>> > never happens - on other architectures.  I believe that any
>> > architecture with a store buffer can benefit but it's probably better
>> > to leave it to their maintainers to find a different default value.  I
>> > am not sure this is how such HW-dependant decisions should be done and
>> > is the primary reason, why I am sending this RFC first.
>> >
>> > I have decided to implement this change at the expansion level because
>> > at that point the type information is still readily available and at
>> > the same time we can also handle various implicit copies, for example
>> > those passing a parameter.  I found I could re-use some bits and
>> > pieces of tree-SRA and so I did, creating tree-sra.h header file in
>> > the process.
>> >
>> > I am fully aware that in the final patch the new parameter, or indeed
>> > any new parameters, need to be documented.  I have skipped that
>> > intentionally now and will write the documentation if feedback here is
>> > generally good.
>> >
>> > I have bootstrapped and tested this patch on x86_64-linux, with
>> > different values of the parameter and only found problems with
>> > unreasonably high values leading to OOM.  I have done the same with a
>> > previous version of the patch which was equivalent to the limit being
>> > 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
>> > into failures of tests which assumed that structure padding was copied
>> > in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
>> > for example gcc.dg/vmx/varargs-4.c).
>> >
>> > The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
>> > 8% at -O2 and -Ofast compilation levels respectively on one particular
>> > new AMD CPU and by 6% and 3% on one particular old Intel machine.
>> >
>> > Thanks in advance for any comments,
>>
>> I wonder if you can at the place you choose to hook this in elide any
>> copying of padding between fields.
>>
>> I'd rather have hooked such "high level" optimization in
>> expand_assignment where you can be reasonably sure you're seeing an
>> actual source-level construct.
>
> I have discussed this with Honza and we eventually decided to make the
> elememnt-wise copy an alternative to emit_block_move (which uses the
> larger mode for moving since GCC 7) exactly so that we handle not only
> source-level assignments but also passing parameters by value and
> other situations.
>
>>
>> 35 bytes seems to be much - what is the code-size impact?
>
> I will find out and report on that.  I need at least 32 bytes (four
> long ints) to fix imagemagick, where the problematic structure is:
>
>   typedef struct _RectangleInfo
>   {
>     size_t
>       width,
>       height;
>
>     ssize_t
>       x,
>       y;
>   } RectangleInfo;
>
> ...so 8 longs, no padding.  Since any aggregate having between 33-35
> bytes needs to consist of smaller fields/elements, it seemed
> reasonable to also copy them element-wise.
>
> Nevertheless, I still intend to experiment with the limit, I sent out
> this RFC exactly so that I don't spend a lot of time benchmarking
> something that is eventually not deemed acceptable on principle.

I think the limit should be on the number of generated copies and not
the overall size of the structure...  If the struct were composed of
32 individual chars we wouldn't want to emit 32 loads and 32 stores...

I wonder how rep; movb; interacts with store to load forwarding?  Is
that maybe optimized well on some archs?  movb should always
forward and wasn't the setup cost for small N reasonable on modern
CPUs?

>>
>> IIRC the reason this may be slow isn't loading in smaller types than stored
>> before by the copy - the store buffers can handle this reasonably well.  It's
>> solely when previous smaller stores are
>>
>>   a1) not mergeabe in the store buffer
>>   a2) not merged because earlier stores are already committed
>>
>> and
>>
>>   b) loaded afterwards as a type that would access multiple store buffers
>>
>> a) would be sure to happen in case b) involves accessing padding.  Is the
>> Image Magick case one that involves padding in the structure?
>
> As I said above, there is no padding.
>
> Basically, what happens is that in a number of places, there is a
> variable region of the aforementioned type and it is initialized and
> passed to function SetPixelCacheNexusPixels in the following manner:
>
>     ...
>     region.width=cache_info->columns;
>     region.height=1;
>     region.x=0;
>     region.y=y;
>     pixels=SetPixelCacheNexusPixels(cache_info,ReadMode,&region,MagickTrue,
>       cache_nexus[id],exception);
>     ...
>
> and the first four statements in SetPixelCacheNexusPixels are:
>
>   assert(cache_info != (const CacheInfo *) NULL);
>   assert(cache_info->signature == MagickSignature);
>   if (cache_info->type == UndefinedCache)
>     return((PixelPacket *) NULL);
>   nexus_info->region=(*region);
>
> with the last one generating the stalls, on both Zen-based machines
> and also on 2-3 years old Intel CPUs.
>
> I have had a look at what Agner Fog's micro-architecture document says
> about store forwarding stalls and:
>
>   - on Broadwells and Haswells, any "write of any size is followed by
>     a read of a larger size" ioncurs a stall, which fits our example,
>   - on Skylakes: "A read that is bigger than the write, or a read that
>     covers both written and unwritten bytes, takes approximately 11
>     clock cycles extra" seems to apply
>   - on Intel silvermont, there will also be a stall because "A memory
>     write can be forwarded to a subsequent read of the same size or a
>     smaller size..."
>   - on Zens, Agner Fog says they work perfectly except when crossing a
>     page or when "A read that has a partial overlap with a preceding
>     write has a penalty of 6-7 clock cycles," which must be why I see
>     stalls.
>
> So I guess the pending stores are not really merged even without
> padding,

It probably depends on the width of the entries in the store buffer,
if they appear in-order and the alignment of the stores (if they are larger than
8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
entries than cache line size.

Given that load bandwith is usually higher than store bandwith it
might make sense to do the store combining in our copying sequence,
like for the 8 byte entry case use sth like

  movq 0(%eax), %xmm0
  movhps 8(%eax), %xmm0 // or vpinsert
  mov[au]ps %xmm0, 0%(ebx)
...

thus do two loads per store and perform the stores in wider
mode?

As said a general concern was you not copying padding.  If you
put this into an even more common place you surely will break
stuff, no?

Richard.

>
> Martin
>
>
>>
>> Richard.
>>
>> > Martin
>> >
>> >
>> > 2017-10-12  Martin Jambor  <mjambor@suse.cz>
>> >
>> >         PR target/80689
>> >         * tree-sra.h: New file.
>> >         * ipa-prop.h: Moved declaration of build_ref_for_offset to
>> >         tree-sra.h.
>> >         * expr.c: Include params.h and tree-sra.h.
>> >         (emit_move_elementwise): New function.
>> >         (store_expr_with_bounds): Optionally use it.
>> >         * ipa-cp.c: Include tree-sra.h.
>> >         * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
>> >         * config/i386/i386.c (ix86_option_override_internal): Set
>> >         PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
>> >         * tree-sra.c: Include tree-sra.h.
>> >         (scalarizable_type_p): Renamed to
>> >         simple_mix_of_records_and_arrays_p, made public, renamed the
>> >         second parameter to allow_char_arrays.
>> >         (extract_min_max_idx_from_array): New function.
>> >         (completely_scalarize): Moved bits of the function to
>> >         extract_min_max_idx_from_array.
>> >
>> >         testsuite/
>> >         * gcc.target/i386/pr80689-1.c: New test.
>> > ---
>> >  gcc/config/i386/i386.c                    |   4 ++
>> >  gcc/expr.c                                | 103 ++++++++++++++++++++++++++++--
>> >  gcc/ipa-cp.c                              |   1 +
>> >  gcc/ipa-prop.h                            |   4 --
>> >  gcc/params.def                            |   6 ++
>> >  gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++++++++++
>> >  gcc/tree-sra.c                            |  86 +++++++++++++++----------
>> >  gcc/tree-sra.h                            |  33 ++++++++++
>> >  8 files changed, 233 insertions(+), 42 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
>> >  create mode 100644 gcc/tree-sra.h
>> >
>> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> > index 1ee8351c21f..87f602e7ead 100644
>> > --- a/gcc/config/i386/i386.c
>> > +++ b/gcc/config/i386/i386.c
>> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
>> >                          ix86_tune_cost->l2_cache_size,
>> >                          opts->x_param_values,
>> >                          opts_set->x_param_values);
>> > +  maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
>> > +                        35,
>> > +                        opts->x_param_values,
>> > +                        opts_set->x_param_values);
>> >
>> >    /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
>> >    if (opts->x_flag_prefetch_loop_arrays < 0
>> > diff --git a/gcc/expr.c b/gcc/expr.c
>> > index 134ee731c29..dff24e7f166 100644
>> > --- a/gcc/expr.c
>> > +++ b/gcc/expr.c
>> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3.  If not see
>> >  #include "tree-chkp.h"
>> >  #include "rtl-chkp.h"
>> >  #include "ccmp.h"
>> > -
>> > +#include "params.h"
>> > +#include "tree-sra.h"
>> >
>> >  /* If this is nonzero, we do not bother generating VOLATILE
>> >     around volatile memory references, and we are willing to
>> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from)
>> >    return maybe_expand_insn (code, 2, ops);
>> >  }
>> >
>> > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET to TARGET
>> > +   plus OFFSET, but do so element-wise and/or field-wise for each record and
>> > +   array within TYPE.  TYPE must either be a register type or an aggregate
>> > +   complying with scalarizable_type_p.
>> > +
>> > +   If CALL_PARAM_P is nonzero, this is a store into a call param on the
>> > +   stack, and block moves may need to be treated specially.  */
>> > +
>> > +static void
>> > +emit_move_elementwise (tree type, rtx target, rtx source, HOST_WIDE_INT offset,
>> > +                      int call_param_p)
>> > +{
>> > +  switch (TREE_CODE (type))
>> > +    {
>> > +    case RECORD_TYPE:
>> > +      for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
>> > +       if (TREE_CODE (fld) == FIELD_DECL)
>> > +         {
>> > +           HOST_WIDE_INT fld_offset = offset + int_bit_position (fld);
>> > +           tree ft = TREE_TYPE (fld);
>> > +           emit_move_elementwise (ft, target, source, fld_offset,
>> > +                                  call_param_p);
>> > +         }
>> > +      break;
>> > +
>> > +    case ARRAY_TYPE:
>> > +      {
>> > +       tree elem_type = TREE_TYPE (type);
>> > +       HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type));
>> > +       gcc_assert (el_size > 0);
>> > +
>> > +       offset_int idx, max;
>> > +       /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1.  */
>> > +       if (extract_min_max_idx_from_array (type, &idx, &max))
>> > +         {
>> > +           HOST_WIDE_INT el_offset = offset;
>> > +           for (; idx <= max; ++idx)
>> > +             {
>> > +               emit_move_elementwise (elem_type, target, source, el_offset,
>> > +                                      call_param_p);
>> > +               el_offset += el_size;
>> > +             }
>> > +         }
>> > +      }
>> > +      break;
>> > +    default:
>> > +      machine_mode mode = TYPE_MODE (type);
>> > +
>> > +      rtx ntgt = adjust_address (target, mode, offset / BITS_PER_UNIT);
>> > +      rtx nsrc = adjust_address (source, mode, offset / BITS_PER_UNIT);
>> > +
>> > +      /* TODO: Figure out whether the following is actually necessary.  */
>> > +      if (target == ntgt)
>> > +       ntgt = copy_rtx (target);
>> > +      if (source == nsrc)
>> > +       nsrc = copy_rtx (source);
>> > +
>> > +      gcc_assert (mode != VOIDmode);
>> > +      if (mode != BLKmode)
>> > +       emit_move_insn (ntgt, nsrc);
>> > +      else
>> > +       {
>> > +         /* For example vector gimple registers can end up here.  */
>> > +         rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX,
>> > +                                 TYPE_MODE (sizetype), EXPAND_NORMAL);
>> > +         emit_block_move (ntgt, nsrc, size,
>> > +                          (call_param_p
>> > +                           ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>> > +       }
>> > +      break;
>> > +    }
>> > +  return;
>> > +}
>> > +
>> >  /* Generate code for computing expression EXP,
>> >     and storing the value into TARGET.
>> >
>> > @@ -5713,9 +5788,29 @@ store_expr_with_bounds (tree exp, rtx target, int call_param_p,
>> >         emit_group_store (target, temp, TREE_TYPE (exp),
>> >                           int_size_in_bytes (TREE_TYPE (exp)));
>> >        else if (GET_MODE (temp) == BLKmode)
>> > -       emit_block_move (target, temp, expr_size (exp),
>> > -                        (call_param_p
>> > -                         ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>> > +       {
>> > +         /* Copying smallish BLKmode structures with emit_block_move and thus
>> > +            by-pieces can result in store-to-load stalls.  So copy some simple
>> > +            small aggregates element or field-wise.  */
>> > +         if (GET_MODE (target) == BLKmode
>> > +             && AGGREGATE_TYPE_P (TREE_TYPE (exp))
>> > +             && !TREE_ADDRESSABLE (TREE_TYPE (exp))
>> > +             && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp)))
>> > +             && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp)))
>> > +                 <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY)
>> > +                     * BITS_PER_UNIT))
>> > +             && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp), false))
>> > +           {
>> > +             /* FIXME:  Can this happen?  What would it mean?  */
>> > +             gcc_assert (!reverse);
>> > +             emit_move_elementwise (TREE_TYPE (exp), target, temp, 0,
>> > +                                    call_param_p);
>> > +           }
>> > +         else
>> > +           emit_block_move (target, temp, expr_size (exp),
>> > +                            (call_param_p
>> > +                             ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>> > +       }
>> >        /* If we emit a nontemporal store, there is nothing else to do.  */
>> >        else if (nontemporal && emit_storent_insn (target, temp))
>> >         ;
>> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>> > index 6b3d8d7364c..7d6019bbd30 100644
>> > --- a/gcc/ipa-cp.c
>> > +++ b/gcc/ipa-cp.c
>> > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
>> >  #include "tree-ssa-ccp.h"
>> >  #include "stringpool.h"
>> >  #include "attribs.h"
>> > +#include "tree-sra.h"
>> >
>> >  template <typename valtype> class ipcp_value;
>> >
>> > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
>> > index fa5bed49ee0..2313cc884ed 100644
>> > --- a/gcc/ipa-prop.h
>> > +++ b/gcc/ipa-prop.h
>> > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *,
>> >  void ipa_release_body_info (struct ipa_func_body_info *);
>> >  tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
>> >
>> > -/* From tree-sra.c:  */
>> > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
>> > -                          gimple_stmt_iterator *, bool);
>> > -
>> >  /* In ipa-cp.c  */
>> >  void ipa_cp_c_finalize (void);
>> >
>> > diff --git a/gcc/params.def b/gcc/params.def
>> > index e55afc28053..5e19f1414a0 100644
>> > --- a/gcc/params.def
>> > +++ b/gcc/params.def
>> > @@ -1294,6 +1294,12 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK,
>> >           "Enable loop epilogue vectorization using smaller vector size.",
>> >           0, 0, 1)
>> >
>> > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
>> > +         "max-size-for-elementwise-copy",
>> > +         "Maximum size in bytes of a structure or array to by considered for "
>> > +         "copying by its individual fields or elements",
>> > +         0, 0, 512)
>> > +
>> >  /*
>> >
>> >  Local variables:
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c b/gcc/testsuite/gcc.target/i386/pr80689-1.c
>> > new file mode 100644
>> > index 00000000000..4156d4fba45
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c
>> > @@ -0,0 +1,38 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2" } */
>> > +
>> > +typedef struct st1
>> > +{
>> > +        long unsigned int a,b;
>> > +        long int c,d;
>> > +}R;
>> > +
>> > +typedef struct st2
>> > +{
>> > +        int  t;
>> > +        R  reg;
>> > +}N;
>> > +
>> > +void Set (const R *region,  N *n_info );
>> > +
>> > +void test(N  *n_obj ,const long unsigned int a, const long unsigned int b,  const long int c,const long int d)
>> > +{
>> > +        R reg;
>> > +
>> > +        reg.a=a;
>> > +        reg.b=b;
>> > +        reg.c=c;
>> > +        reg.d=d;
>> > +        Set (&reg, n_obj);
>> > +
>> > +}
>> > +
>> > +void Set (const R *reg,  N *n_obj )
>> > +{
>> > +        n_obj->reg=(*reg);
>> > +}
>> > +
>> > +
>> > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */
>> > +/* { dg-final { scan-assembler-not "movdqu" } } */
>> > +/* { dg-final { scan-assembler-not "movups" } } */
>> > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> > index bac593951e7..ade97964205 100644
>> > --- a/gcc/tree-sra.c
>> > +++ b/gcc/tree-sra.c
>> > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.  If not see
>> >  #include "ipa-fnsummary.h"
>> >  #include "ipa-utils.h"
>> >  #include "builtins.h"
>> > +#include "tree-sra.h"
>> >
>> >  /* Enumeration of all aggregate reductions we can do.  */
>> >  enum sra_mode { SRA_MODE_EARLY_IPA,   /* early call regularization */
>> > @@ -952,14 +953,14 @@ create_access (tree expr, gimple *stmt, bool write)
>> >  }
>> >
>> >
>> > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length
>> > -   ARRAY_TYPE with fields that are either of gimple register types (excluding
>> > -   bit-fields) or (recursively) scalarizable types.  CONST_DECL must be true if
>> > -   we are considering a decl from constant pool.  If it is false, char arrays
>> > -   will be refused.  */
>> > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length ARRAY_TYPE with
>> > +   fields/elements that are not bit-fields and are either register types or
>> > +   recursively comply with simple_mix_of_records_and_arrays_p.  Furthermore, if
>> > +   ALLOW_CHAR_ARRAYS is false, the function will return false also if TYPE
>> > +   contains an array of elements that only have one byte.  */
>> >
>> > -static bool
>> > -scalarizable_type_p (tree type, bool const_decl)
>> > +bool
>> > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays)
>> >  {
>> >    gcc_assert (!is_gimple_reg_type (type));
>> >    if (type_contains_placeholder_p (type))
>> > @@ -977,7 +978,7 @@ scalarizable_type_p (tree type, bool const_decl)
>> >             return false;
>> >
>> >           if (!is_gimple_reg_type (ft)
>> > -             && !scalarizable_type_p (ft, const_decl))
>> > +             && !simple_mix_of_records_and_arrays_p (ft, allow_char_arrays))
>> >             return false;
>> >         }
>> >
>> > @@ -986,7 +987,7 @@ scalarizable_type_p (tree type, bool const_decl)
>> >    case ARRAY_TYPE:
>> >      {
>> >        HOST_WIDE_INT min_elem_size;
>> > -      if (const_decl)
>> > +      if (allow_char_arrays)
>> >         min_elem_size = 0;
>> >        else
>> >         min_elem_size = BITS_PER_UNIT;
>> > @@ -1008,7 +1009,7 @@ scalarizable_type_p (tree type, bool const_decl)
>> >
>> >        tree elem = TREE_TYPE (type);
>> >        if (!is_gimple_reg_type (elem)
>> > -         && !scalarizable_type_p (elem, const_decl))
>> > +         && !simple_mix_of_records_and_arrays_p (elem, allow_char_arrays))
>> >         return false;
>> >        return true;
>> >      }
>> > @@ -1017,10 +1018,38 @@ scalarizable_type_p (tree type, bool const_decl)
>> >    }
>> >  }
>> >
>> > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, tree);
>> > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree,
>> > +                           tree);
>> > +
>> > +/* For a given array TYPE, return false if its domain does not have any maximum
>> > +   value.  Otherwise calculate MIN and MAX indices of the first and the last
>> > +   element.  */
>> > +
>> > +bool
>> > +extract_min_max_idx_from_array (tree type, offset_int *min, offset_int *max)
>> > +{
>> > +  tree domain = TYPE_DOMAIN (type);
>> > +  tree minidx = TYPE_MIN_VALUE (domain);
>> > +  gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
>> > +  tree maxidx = TYPE_MAX_VALUE (domain);
>> > +  if (!maxidx)
>> > +    return false;
>> > +  gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
>> > +
>> > +  /* MINIDX and MAXIDX are inclusive, and must be interpreted in
>> > +     DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
>> > +  *min = wi::to_offset (minidx);
>> > +  *max = wi::to_offset (maxidx);
>> > +  if (!TYPE_UNSIGNED (domain))
>> > +    {
>> > +      *min = wi::sext (*min, TYPE_PRECISION (domain));
>> > +      *max = wi::sext (*max, TYPE_PRECISION (domain));
>> > +    }
>> > +  return true;
>> > +}
>> >
>> >  /* Create total_scalarization accesses for all scalar fields of a member
>> > -   of type DECL_TYPE conforming to scalarizable_type_p.  BASE
>> > +   of type DECL_TYPE conforming to simple_mix_of_records_and_arrays_p.  BASE
>> >     must be the top-most VAR_DECL representing the variable; within that,
>> >     OFFSET locates the member and REF must be the memory reference expression for
>> >     the member.  */
>> > @@ -1047,27 +1076,14 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>> >        {
>> >         tree elemtype = TREE_TYPE (decl_type);
>> >         tree elem_size = TYPE_SIZE (elemtype);
>> > -       gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
>> >         HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
>> >         gcc_assert (el_size > 0);
>> >
>> > -       tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
>> > -       gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
>> > -       tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
>> > +       offset_int idx, max;
>> >         /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1.  */
>> > -       if (maxidx)
>> > +       if (extract_min_max_idx_from_array (decl_type, &idx, &max))
>> >           {
>> > -           gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
>> >             tree domain = TYPE_DOMAIN (decl_type);
>> > -           /* MINIDX and MAXIDX are inclusive, and must be interpreted in
>> > -              DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
>> > -           offset_int idx = wi::to_offset (minidx);
>> > -           offset_int max = wi::to_offset (maxidx);
>> > -           if (!TYPE_UNSIGNED (domain))
>> > -             {
>> > -               idx = wi::sext (idx, TYPE_PRECISION (domain));
>> > -               max = wi::sext (max, TYPE_PRECISION (domain));
>> > -             }
>> >             for (int el_off = offset; idx <= max; ++idx)
>> >               {
>> >                 tree nref = build4 (ARRAY_REF, elemtype,
>> > @@ -1088,10 +1104,10 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>> >  }
>> >
>> >  /* Create total_scalarization accesses for a member of type TYPE, which must
>> > -   satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be the
>> > -   top-most VAR_DECL representing the variable; within that, POS and SIZE locate
>> > -   the member, REVERSE gives its torage order. and REF must be the reference
>> > -   expression for it.  */
>> > +   satisfy either is_gimple_reg_type or simple_mix_of_records_and_arrays_p.
>> > +   BASE must be the top-most VAR_DECL representing the variable; within that,
>> > +   POS and SIZE locate the member, REVERSE gives its torage order. and REF must
>> > +   be the reference expression for it.  */
>> >
>> >  static void
>> >  scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse,
>> > @@ -1111,7 +1127,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse,
>> >  }
>> >
>> >  /* Create a total_scalarization access for VAR as a whole.  VAR must be of a
>> > -   RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p.  */
>> > +   RECORD_TYPE or ARRAY_TYPE conforming to
>> > +   simple_mix_of_records_and_arrays_p.  */
>> >
>> >  static void
>> >  create_total_scalarization_access (tree var)
>> > @@ -2803,8 +2820,9 @@ analyze_all_variable_accesses (void)
>> >        {
>> >         tree var = candidate (i);
>> >
>> > -       if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
>> > -                                               constant_decl_p (var)))
>> > +       if (VAR_P (var)
>> > +           && simple_mix_of_records_and_arrays_p (TREE_TYPE (var),
>> > +                                                  constant_decl_p (var)))
>> >           {
>> >             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>> >                 <= max_scalarization_size)
>> > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
>> > new file mode 100644
>> > index 00000000000..dc901385994
>> > --- /dev/null
>> > +++ b/gcc/tree-sra.h
>> > @@ -0,0 +1,33 @@
>> > +/* tree-sra.h - Run-time parameters.
>> > +   Copyright (C) 2017 Free Software Foundation, Inc.
>> > +
>> > +This file is part of GCC.
>> > +
>> > +GCC is free software; you can redistribute it and/or modify it under
>> > +the terms of the GNU General Public License as published by the Free
>> > +Software Foundation; either version 3, or (at your option) any later
>> > +version.
>> > +
>> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> > +for more details.
>> > +
>> > +You should have received a copy of the GNU General Public License
>> > +along with GCC; see the file COPYING3.  If not see
>> > +<http://www.gnu.org/licenses/>.  */
>> > +
>> > +#ifndef TREE_SRA_H
>> > +#define TREE_SRA_H
>> > +
>> > +
>> > +bool simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays);
>> > +bool extract_min_max_idx_from_array (tree type, offset_int *idx,
>> > +                                    offset_int *max);
>> > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset,
>> > +                          bool reverse, tree exp_type,
>> > +                          gimple_stmt_iterator *gsi, bool insert_after);
>> > +
>> > +
>> > +
>> > +#endif /* TREE_SRA_H */
>> > --
>> > 2.14.1
>> >


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]