This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates.


Hi,

thank you very much for a detailed review.

You are  probably correct that the  functions are in  a rather strange
order  for a  reader.  This  is because  IPA-SRA was  removed  and the
things it shares with intra-SRA remained in the same order.

I will send a new tree-sra.c as a separate reply to your mail in a few
moments.  A few reactions to your comments are below.

On Wed, Apr 29, 2009 at 02:48:12PM +0200, Richard Guenther wrote:
> On Tue, 28 Apr 2009, Martin Jambor wrote:
> 
> > On Tue, Apr 28, 2009 at 12:04:32PM +0200, Martin Jambor wrote:
> > > This  is  the  new  intraprocedural  SRA.  I  have  stripped  off  the
> > > interprocedural part  and will propose to commit  it separately later.
> > > I have  tried to  remove almost every  trace of IPA-SRA,  however, two
> > > provisions for it  have remained in the patch.   First, an enumeration
> > > (rather than  a boolean) is  used to distuinguish between  "early" and
> > > "late" SRA  so that other  SRA modes can  be added later  on.  Second,
> > > scan_function()  has a  hook parameter  and a  void  pointer parameter
> > > which are not used in this patch but will be by IPA-SRA.
> > > 
> > > Otherwise, the patch is hopefully self-contained and the bases of its
> > > operation is described by the initial comment.
> > > 
> > > The patch bootstraps (on x86_64-linux-gnu but I am about to try it on
> > > hppa-linux-gnu too) but produces a small number of testsuite failures
> > > which are handled by the two following patches.
> > > 
> > > Thanks,
> > > 
> > > Martin
> > > 
> > > 
> > > 2009-04-27  Martin Jambor  <mjambor@suse.cz>
> > > 
> > > 	* tree-sra.c (enum sra_mode): The whole contents of the file was
> > > 	replaced.
> > 
> > Hm, the  patch is quite unreadable,  below is the  new tree-sra.c file
> > which entirely replaces the old one (note that the patch also modifies
> > the Makefile though):
> 
> Ah.  Here it is ... the comment to the changelog still applies.

Sure.

> > /* Scalar Replacement of Aggregates (SRA) converts some structure
> >    references into scalar references, exposing them to the scalar
> >    optimizers.
> >    Copyright (C) 2008, 2009 Free Software Foundation, Inc.
> >    Contributed by Martin Jambor <mjambor@suse.cz>
> > 
> > This file is part of GCC.
> > 
> > GCC is free software; you can redistribute it and/or modify it under
> > the terms of the GNU General Public License as published by the Free
> > Software Foundation; either version 3, or (at your option) any later
> > version.
> > 
> > GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > for more details.
> > 
> > You should have received a copy of the GNU General Public License
> > along with GCC; see the file COPYING3.  If not see
> > <http://www.gnu.org/licenses/>.  */
> > 
> > /* This file implements Scalar Reduction of Aggregates (SRA).  SRA is run
> >    twice, once in the early stages of compilation (early SRA) and once in the
> >    late stages (late SRA).  The aim of both is to turn references to scalar
> >    parts of aggregates into uses of independent scalar variables.
> > 
> >    The two passes are nearly identical, the only difference is that early SRA
> >    does not scalarize unions which are used as the result in a GIMPLE_RETURN
> >    statement because together with inlining this can lead to weird type
> >    conversions.
> 
> Do you happen to have a testcase for this or can you describe that problem
> some more?

The testcase  is gfortran.fortran-torture/execute/entry_4.f90 from our
testsuite.  IIRC  there is a  union with a  float and a boolean  in it
which is returned in a  number of functions.  In different functions I
selected one of these fields as scalar replacements (because the other
one was not  used in the function, I refuse  to create replacements of
accesses  that are  scalar but  have children  and children  of scalar
accesses because of similar problems)  and used a V_C_E to construct a
return value  of the correct  type.  However, inlining  combined these
together and wanted to cast a  boolean to a float.  That made expander
ICE on an assert.

> >    Both passes operate in four stages:
> > 
> >    1. The declarations that have properties which make them candidates for
> >       scalarization are identified in function find_var_candidates().  The
> >       candidates are stored in candidate_bitmap.
> > 
> >    2. The function body is scanned.  In the process, declarations which are
> >       used in a manner that prevent their scalarization are removed from the
> >       candidate bitmap.  More importantly, for every access into an aggregate,
> >       an access structure (struct access) is created by create_access() and
> >       stored in a vector associated with the aggregate.  Among other
> >       information, the aggregate declaration, the offset and size of the access
> >       and its type are stored in the structure.
> > 
> >       On a related note, assign_link structures are created for every assign
> >       statement between candidate aggregates and attached to the related
> >       accesses.
> > 
> >    3. The vectors of accesses are analyzed.  They are first sorted according to
> >       their offset and size and then scanned for partially overlapping accesses
> >       (i.e. those which overlap but one is not entirely within another).  Such
> >       an access disqualifies the whole aggregate from being scalarized.
> 
> This happens only when get_ref_base_and_extent punts and returns -1 for
> the access size?  And of course with struct field accesses of different
> structs that are inside the same union.

This was specifically aimed  at through-union accesses which could not
be (both) represented  by independent scalars.  As you  noted below, I
bail out if get_ref_base_and_extent returned -1 for max_size.

> 
> >       If there is no such inhibiting overlap, a representative access structure
> >       is chosen for every unique combination of offset and size.  Afterwards,
> >       the pass builds a set of trees from these structures, in which children
> >       of an access are within their parent (in terms of offset and size).
> > 
> >       Then accesses  are propagated  whenever possible (i.e.  in cases  when it
> >       does not create a partially overlapping access) across assign_links from
> >       the right hand side to the left hand side.
> > 
> >       Then the set of trees for each declaration is traversed again and those
> >       accesses which should be replaced by a scalar are identified.
> > 
> >    4. The function is traversed again, and for every reference into an
> >       aggregate that has some component which is about to be scalarized,
> >       statements are amended and new statements are created as necessary.
> >       Finally, if a parameter got scalarized, the scalar replacements are
> >       initialized with values from respective parameter aggregates.
> > */
> 
> Closing */ goes to the previous line.

OK

> > #include "config.h"
> > #include "system.h"
> > #include "coretypes.h"
> > #include "alloc-pool.h"
> > #include "tm.h"
> > #include "tree.h"
> > #include "gimple.h"
> > #include "tree-flow.h"
> > #include "diagnostic.h"
> > #include "tree-dump.h"
> > #include "timevar.h"
> > #include "params.h"
> > #include "target.h"
> > #include "flags.h"
> > 
> > /* Enumeration of all aggregate reductions we can do.  */
> > enum sra_mode {SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */
> > 	       SRA_MODE_INTRA};	     /* late intraprocedural SRA */
> 
> Spaces after { and before }.
> 

OK

> > /* Global variable describing which aggregate reduction we are performing at
> >    the moment.  */
> > static enum sra_mode sra_mode;
> > 
> > struct assign_link;
> > 
> > /* ACCESS represents each access to an aggregate variable (as a whole or a
> >    part).  It can also represent a group of accesses that refer to exactly the
> >    same fragment of an aggregate (i.e. those that have exactly the same offset
> >    and size).  Such representatives for a single aggregate, once determined,
> >    are linked in a linked list and have the group fields set.
> > 
> >    Moreover, when doing intraprocedural SRA, a tree is built from those
> >    representatives (by the means of first_child and next_sibling pointers), in
> >    which all items in a subtree are "within" the root, i.e. their offset is
> >    greater or equal to offset of the root and offset+size is smaller or equal
> >    to offset+size of the root.  Children of an access are sorted by offset.
> > */
> 
> */ to previous line.

OK

> > struct access
> > {
> >   /* Values returned by `get_ref_base_and_extent' for each COMPONENT_REF
> >      If EXPR isn't a COMPONENT_REF just set `BASE = EXPR', `OFFSET = 0',
> >      `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */
> 
> s/COMPONENT_REF/component reference/g - it's not only COMPONENT_REF
> trees we handle.

OK

> >   HOST_WIDE_INT offset;
> >   HOST_WIDE_INT size;
> >   tree base;
> > 
> >   /* Expression.  */
> >   tree expr;
> >   /* Type.  */
> >   tree type;
> > 
> >   /* Next group representative for this aggregate. */
> >   struct access *next_grp;
> > 
> >   /* Pointer to the group representative.  Pointer to itself if the struct is
> >      the representative.  */
> >   struct access *group_representative;
> > 
> >   /* If this access has any children (in terms of the definition above), this
> >      points to the first one.  */
> >   struct access *first_child;
> > 
> >   /* Pointer to the next sibling in the access tree as described above.  */
> >   struct access *next_sibling;
> > 
> >   /* Pointers to the first and last element in the linked list of assign
> >      links.  */
> >   struct assign_link *first_link, *last_link;
> 
> vertical space missing.

OK

> >   /* Pointer to the next access in the work queue.  */
> >   struct access *next_queued;
> > 
> >   /* Replacement variable for this access "region."  Never to be accessed
> >      directly, always only by the means of get_access_replacement() and only
> >      when grp_to_be_replaced flag is set.  */
> >   tree replacement_decl;
> > 
> >   /* Is this particular access write access? */
> >   unsigned write : 1;
> > 
> >   /* Is this access currently in the work queue?  */
> >   unsigned grp_queued : 1;
> >   /* Does this group contain a write access?  This flag is propagated down the
> >      access tree.  */
> >   unsigned grp_write : 1;
> >   /* Does this group contain a read access?  This flag is propagated down the
> >      access tree.  */
> >   unsigned grp_read : 1;
> >   /* Is the subtree rooted in this access fully covered by scalar
> >      replacements?  */
> >   unsigned grp_covered : 1;
> >   /* If set to true, this access and all below it in an access tree must not be
> >      scalarized.  */
> >   unsigned grp_unscalarizable_region : 1;
> >   /* Whether data have been written to parts of the aggregate covered by this
> >      access which is not to be scalarized.  This flag is propagated up in the
> >      access tree.  */
> >   unsigned grp_unscalarized_data : 1;
> >   /* Does this access and/or group contain a write access through a
> >      BIT_FIELD_REF?  */
> >   unsigned grp_bfr_lhs : 1;
> > 
> >   /* Set when a scalar replacement should be created for this variable.  We do
> >      the decision and creation at different places because create_tmp_var
> >      cannot be called from within FOR_EACH_REFERENCED_VAR. */
> >   unsigned grp_to_be_replaced : 1;
> > };
> > 
> > typedef struct access *access_p;
> > 
> > DEF_VEC_P (access_p);
> > DEF_VEC_ALLOC_P (access_p, heap);
> > 
> > /* Alloc pool for allocating access structures.  */
> > static alloc_pool access_pool;
> > 
> > /* A structure linking lhs and rhs accesses from an aggregate assignment.  They
> >    are used to propagate subaccesses from rhs to lhs as long as they don't
> >    conflict with what is already there.  */
> > struct assign_link
> > {
> >   struct access *lacc, *racc;
> >   struct assign_link *next;
> > };
> > 
> > /* Alloc pool for allocating assign link structures.  */
> > static alloc_pool link_pool;
> > 
> > /* Base (tree) -> Vector (VEC(access_p,heap) *) map.  */
> > static struct pointer_map_t *base_access_vec;
> > 
> > /* Bitmap of bases (candidates).  */
> > static bitmap candidate_bitmap;
> > /* Bitmap of declarations used in a return statement.  */
> > static bitmap retvals_bitmap;
> > /* Obstack for creation of fancy names.  */
> > static struct obstack name_obstack;
> > 
> > /* Head of a linked list of accesses that need to have its subaccesses
> >    propagated to their assignment counterparts. */
> > static struct access *work_queue_head;
> > 
> > /* Dump contents of ACCESS to file F in a human friendly way.  If GRP is true,
> >    representative fields are dumped, otherwise those which only describe the
> >    individual access are.  */
> > 
> > static void
> > dump_access (FILE *f, struct access *access, bool grp)
> > {
> >   fprintf (f, "access { ");
> >   fprintf (f, "base = (%d)'", DECL_UID (access->base));
> >   print_generic_expr (f, access->base, 0);
> >   fprintf (f, "', offset = %d", (int) access->offset);
> >   fprintf (f, ", size = %d", (int) access->size);
> 
> you can use ", offset = "HOST_WIDE_INT_PRINT_DEC, access->offset here.
> 

OK

> >   fprintf (f, ", expr = ");
> >   print_generic_expr (f, access->expr, 0);
> >   fprintf (f, ", type = ");
> >   print_generic_expr (f, access->type, 0);
> >   if (grp)
> >     fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, "
> > 	     "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, "
> > 	     "grp_to_be_replaced = %d\n",
> > 	     access->grp_write, access->grp_read, access->grp_covered,
> > 	     access->grp_unscalarizable_region, access->grp_unscalarized_data,
> > 	     access->grp_to_be_replaced);
> >   else
> >     fprintf (f, ", write = %d'\n", access->write);
> > }
> > 
> > /* Dump a subtree rooted in ACCESS to file F, indent by LEVEL.  */
> > 
> > static void
> > dump_access_tree_1 (FILE *f, struct access *access, int level)
> > {
> >   do
> >     {
> >       int i;
> > 
> >       for (i = 0; i < level; i++)
> > 	fputs ("* ", dump_file);
> > 
> >       dump_access (f, access, true);
> > 
> >       if (access->first_child)
> > 	dump_access_tree_1 (f, access->first_child, level + 1);
> > 
> >       access = access->next_sibling;
> >     }
> >   while (access);
> > }
> > 
> > /* Dump all access trees for a variable, given the pointer to the first root in
> >    ACCESS.  */
> > 
> > static void
> > dump_access_tree (FILE *f, struct access *access)
> > {
> >   for (; access; access = access->next_grp)
> >     dump_access_tree_1 (f, access, 0);
> > }
> > 
> > /* Return a vector of pointers to accesses for the variable given in BASE or
> >    NULL if there is none.  */
> > 
> > static VEC (access_p, heap) *
> > get_base_access_vector (tree base)
> > {
> >   void **slot;
> > 
> >   slot = pointer_map_contains (base_access_vec, base);
> >   if (!slot)
> >     return NULL;
> >   else
> >     return *(VEC (access_p, heap) **) slot;
> > }
> > 
> > /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted
> >    in ACCESS.  Return NULL if it cannot be found.  */
> > 
> > static struct access *
> > find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,
> > 			HOST_WIDE_INT size)
> > {
> >   while (access && (access->offset != offset || access->size != size))
> >     {
> >       struct access *child = access->first_child;
> > 
> >       while (child && (child->offset + child->size <= offset))
> > 	child = child->next_sibling;
> 
> Do you limit the number of siblings?  We should keep an eye on this
> for potential compile-time issues (not that I expect some).
> 

Not artificially and given how I construct them (from representatives),
I doubt that  it is easily possible.  They are limited  by the size and
the  structure of  the  aggregate though.   If  ever we  come across  a
problem  like this,  we  might disable  scalarization  of grossly  huge
aggregates.


> >       access = child;
> >     }
> > 
> >   return access;
> > }
> > 
> > /* Return the first group representative for DECL or NULL if none exists.  */
> > 
> > static struct access *
> > get_first_repr_for_decl (tree base)
> > {
> >   VEC (access_p, heap) *access_vec;
> > 
> >   access_vec = get_base_access_vector (base);
> >   if (!access_vec)
> >     return NULL;
> > 
> >   return VEC_index (access_p, access_vec, 0);
> > }
> > 
> > /* Find an access representative for the variable BASE and given OFFSET and
> >    SIZE.  Requires that access trees have already been built.  Return NULL if
> >    it cannot be found.  */
> > 
> > static struct access *
> > get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset,
> > 				 HOST_WIDE_INT size)
> > {
> >   struct access *access;
> > 
> >   access = get_first_repr_for_decl (base);
> >   while (access && (access->offset + access->size <= offset))
> >     access = access->next_grp;
> >   if (!access)
> >     return NULL;
> > 
> >   return find_access_in_subtree (access, offset, size);
> > }
> > 
> > /* Add LINK to the linked list of assign links of RACC.  */
> > static void
> > add_link_to_rhs (struct access *racc, struct assign_link *link)
> > {
> >   gcc_assert (link->racc == racc);
> > 
> >   if (!racc->first_link)
> >     {
> >       gcc_assert (!racc->last_link);
> >       racc->first_link = link;
> >     }
> >   else
> >     racc->last_link->next = link;
> > 
> >   racc->last_link = link;
> >   link->next = NULL;
> > }
> > 
> > /* Move all link structures in their linked list in OLD_RACC to the linked list
> >    in NEW_RACC.  */
> > static void
> > relink_to_new_repr (struct access *new_racc, struct access *old_racc)
> > {
> >   if (!old_racc->first_link)
> >     {
> >       gcc_assert (!old_racc->last_link);
> >       return;
> >     }
> > 
> >   if (new_racc->first_link)
> >     {
> >       gcc_assert (!new_racc->last_link->next);
> >       gcc_assert (!old_racc->last_link || !old_racc->last_link->next);
> > 
> >       new_racc->last_link->next = old_racc->first_link;
> >       new_racc->last_link = old_racc->last_link;
> >     }
> >   else
> >     {
> >       gcc_assert (!new_racc->last_link);
> > 
> >       new_racc->first_link = old_racc->first_link;
> >       new_racc->last_link = old_racc->last_link;
> >     }
> >   old_racc->first_link = old_racc->last_link = NULL;
> > }
> > 
> > /* Add ACCESS to the work queue (which is actually a stack).  */
> > 
> > static void
> > add_access_to_work_queue (struct access *access)
> > {
> >   if (!access->grp_queued)
> >     {
> >       gcc_assert (!access->next_queued);
> >       access->next_queued = work_queue_head;
> >       access->grp_queued = 1;
> >       work_queue_head = access;
> >     }
> > }
> > 
> > /* Pop an access from the work queue, and return it, assuming there is one.  */
> > 
> > static struct access *
> > pop_access_from_work_queue (void)
> > {
> >   struct access *access = work_queue_head;
> > 
> >   work_queue_head = access->next_queued;
> >   access->next_queued = NULL;
> >   access->grp_queued = 0;
> >   return access;
> > }
> > 
> > 
> > /* Allocate necessary structures.  */
> > 
> > static void
> > sra_initialize (void)
> > {
> >   candidate_bitmap = BITMAP_ALLOC (NULL);
> >   retvals_bitmap = BITMAP_ALLOC (NULL);
> >   gcc_obstack_init (&name_obstack);
> >   access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16);
> >   link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16);
> >   base_access_vec = pointer_map_create ();
> > }
> > 
> > /* Hook fed to pointer_map_traverse, deallocate stored vectors.  */
> > 
> > static bool
> > delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value,
> > 		     void *data ATTRIBUTE_UNUSED)
> > {
> >   VEC (access_p, heap) *access_vec;
> >   access_vec = (VEC (access_p, heap) *) *value;
> >   VEC_free (access_p, heap, access_vec);
> > 
> >   return true;
> > }
> > 
> > /* Deallocate all general structures.  */
> > 
> > static void
> > sra_deinitialize (void)
> > {
> >   BITMAP_FREE (candidate_bitmap);
> >   BITMAP_FREE (retvals_bitmap);
> >   free_alloc_pool (access_pool);
> >   free_alloc_pool (link_pool);
> >   obstack_free (&name_obstack, NULL);
> > 
> >   pointer_map_traverse (base_access_vec, delete_base_accesses, NULL);
> >   pointer_map_destroy (base_access_vec);
> > }
> > 
> > /* Remove DECL from candidates for SRA and write REASON to the dump file if
> >    there is one.  */
> > static void
> > disqualify_candidate (tree decl, const char *reason)
> > {
> >   bitmap_clear_bit (candidate_bitmap, DECL_UID (decl));
> > 
> >   if (dump_file)
> 
> && (dump_flags & TDF_DETAILS)

OK

> >     {
> >       fprintf (dump_file, "! Disqualifying ");
> >       print_generic_expr (dump_file, decl, 0);
> >       fprintf (dump_file, " - %s\n", reason);
> >     }
> > }
> > 
> > /* Return true iff the type contains a field or an element which does not allow
> >    scalarization.  */
> > 
> > static bool
> > type_internals_preclude_sra_p (tree type)
> > {
> >   tree fld;
> >   tree et;
> > 
> >   switch (TREE_CODE (type))
> >     {
> >     case RECORD_TYPE:
> >     case UNION_TYPE:
> >     case QUAL_UNION_TYPE:
> >       for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
> > 	if (TREE_CODE (fld) == FIELD_DECL)
> > 	  {
> > 	    tree ft = TREE_TYPE (fld);
> > 
> > 	    if (TREE_THIS_VOLATILE (fld)
> > 		|| !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld)
> > 		|| !host_integerp (DECL_FIELD_OFFSET (fld), 1)
> > 		|| !host_integerp (DECL_SIZE (fld), 1))
> > 	      return true;
> > 
> > 	    if (AGGREGATE_TYPE_P (ft)
> > 		&& type_internals_preclude_sra_p (ft))
> > 	      return true;
> > 	  }
> > 
> >       return false;
> > 
> >     case ARRAY_TYPE:
> >       et = TREE_TYPE (type);
> > 
> >       if (AGGREGATE_TYPE_P (et))
> > 	return type_internals_preclude_sra_p (et);
> >       else
> > 	return false;
> > 
> >     default:
> >       return false;
> >     }
> > }
> > 
> > /* Create and insert access for EXPR. Return created access, or NULL if it is
> >    not possible.  */
> > 
> > static struct access *
> > create_access (tree expr, bool write)
> > {
> >   struct access *access;
> >   void **slot;
> >   VEC (access_p,heap) *vec;
> >   HOST_WIDE_INT offset, size, max_size;
> >   tree base = expr;
> >   bool unscalarizable_region = false;
> > 
> >   if (handled_component_p (expr))
> >     base = get_ref_base_and_extent (expr, &offset, &size, &max_size);
> >   else
> >     {
> >       tree tree_size;
> > 
> >       tree_size = TYPE_SIZE (TREE_TYPE (base));
> >       if (tree_size && host_integerp (tree_size, 1))
> > 	size = max_size = tree_low_cst (tree_size, 1);
> >       else
> > 	size = max_size = -1;
> > 
> >       offset = 0;
> >     }
> 
> get_ref_base_and_extent should now also work on plain DECLs
> (non-handled_component_p) and base should never be NULL.

Good point.  (However,  I seem to remember I put  the !base check there
for a reason... it might have  been IPA-SRA though.  I'll put an assert
there for the next few days at least.)

> >   if (!base || !DECL_P (base)
> >       || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
> >     return NULL;
> > 
> >   if (size != max_size)
> >     {
> >       size = max_size;
> >       unscalarizable_region = true;
> >     }
> > 
> >   if (size < 0)
> >     {
> >       disqualify_candidate (base, "Encountered an ultra variable sized "
> > 			    "access.");
> 
> ultra variable sized?  I would name it 'unconstrained access'.

Well, IPA-SRA does not even like normal variable sized :-)  But yes,
I'll change the reason string.

>  Note that there is still useful information in this case if offset
> is non-zero (namely accesses before [offset, -1] may still be
> scalarized).  Maybe something for further improvements.  This for
> example would happen for structs with trailing arrays.

Yes, we could make an unscalarizable region from it.  It's now on my
TODO list.

> >       return NULL;
> >     }
> > 
> >   access = (struct access *) pool_alloc (access_pool);
> >   memset (access, 0, sizeof (struct access));
> > 
> >   access->base = base;
> >   access->offset = offset;
> >   access->size = size;
> >   access->expr = expr;
> >   access->type = TREE_TYPE (expr);
> >   access->write = write;
> >   access->grp_unscalarizable_region = unscalarizable_region;
> > 
> >   slot = pointer_map_contains (base_access_vec, base);
> >   if (slot)
> >     vec = (VEC (access_p, heap) *) *slot;
> >   else
> >     vec = VEC_alloc (access_p, heap, 32);
> > 
> >   VEC_safe_push (access_p, heap, vec, access);
> > 
> >   *((struct VEC (access_p,heap) **)
> > 	pointer_map_insert (base_access_vec, base)) = vec;
> > 
> >   return access;
> > }
> > 
> > 
> > /* Callback of walk_tree.  Search the given tree for a declaration and exclude
> >    it from the candidates.  */
> > 
> > static tree
> > disqualify_all (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
> > {
> >   tree base = *tp;
> > 
> > 
> >   if (TREE_CODE (base) == SSA_NAME)
> >     base = SSA_NAME_VAR (base);
> 
> Err ... for SSA_NAME bases there is nothing to scalarize?  So just
> bail out in that case? 

You are right, this is also a leftover from IPA-SRA. 

> In fact, using walk_tree for disqualify_all looks
> a bit expensive (it also walks types).

OK, it  seems quite possible to  get rid of  it and use a  simple dive
through handled_components where still necessary.

> >   if (DECL_P (base))
> >     {
> >       disqualify_candidate (base, "From within disqualify_all().");
> >       *walk_subtrees = 0;
> >     }
> >   else
> >     *walk_subtrees = 1;
> > 
> > 
> >   return NULL_TREE;
> > }
> > 
> > /* Scan expression EXPR and create access structures for all accesses to
> >    candidates for scalarization.  Return the created access or NULL if none is
> >    created.  */
> > 
> > static struct access *
> > build_access_from_expr_1 (tree *expr_ptr,
> > 			gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write)
> > {
> >   struct access *ret = NULL;
> >   tree expr = *expr_ptr;
> >   tree safe_expr = expr;
> >   bool bit_ref;
> > 
> >   if (TREE_CODE (expr) == BIT_FIELD_REF)
> >     {
> >       expr = TREE_OPERAND (expr, 0);
> >       bit_ref = true;
> >     }
> >   else
> >     bit_ref = false;
> > 
> >   while (TREE_CODE (expr) == NOP_EXPR
> 
> CONVERT_EXPR_P (expr)

OK... but  at another place  in the email  you said it might  not even
appear in a valid gimple statement?  Should I remove it altogether?

> > 	 || TREE_CODE (expr) == VIEW_CONVERT_EXPR
> > 	 || TREE_CODE (expr) == REALPART_EXPR
> > 	 || TREE_CODE (expr) == IMAGPART_EXPR)
> >     expr = TREE_OPERAND (expr, 0);
> 
> Why do this here btw, and not just lump ...
> 
> >   switch (TREE_CODE (expr))
> >     {
> >     case ADDR_EXPR:
> >     case SSA_NAME:
> >     case INDIRECT_REF:
> >       break;
> > 
> >     case VAR_DECL:
> >     case PARM_DECL:
> >     case RESULT_DECL:
> >     case COMPONENT_REF:
> >     case ARRAY_REF:
> >       ret = create_access (expr, write);
> >       break;
> 
> ... this ...
> 
> >     case REALPART_EXPR:
> >     case IMAGPART_EXPR:
> >       expr = TREE_OPERAND (expr, 0);
> >       ret = create_access (expr, write);
> 
> ... and this together?  Won't you create bogus accesses if you
> strip for example IMAGPART_EXPR (which has non-zero offset)?

That would  break the complex  number into its components.   I thought
that they are  meant to stay together for  some reason, otherwise they
would not be represented explicitly  in gimple... do you think it does
not matter?  What about vectors then?

The access is not bogus because modification functions take care of
these statements in a special way.  However, if it is indeed OK to
split complex numbers into their components, I will gladly simplify
this as you suggested.

> >       break;
> > 
> >     case ARRAY_RANGE_REF:
> 
> it should just be handled fine I think.

OK, I will try that at some later stage.

> >     default:
> >       walk_tree (&safe_expr, disqualify_all, NULL, NULL);
> 
> and if not, this should just disqualify the base of the access, like
> get_base_address (safe_expr) (save_expr you mean?) and then if that
> is a DECL, disqualify that decl.

I'll test just doing nothing, things handled by get_base_address are
either fine or already accounted for.

> >       break;
> >     }
> > 
> >   if (write && bit_ref && ret)
> >     ret->grp_bfr_lhs = 1;
> > 
> >   return ret;
> > }
> > 
> > /* Scan expression EXPR and create access structures for all accesses to
> >    candidates for scalarization.  Return true if any access has been
> >    inserted.  */
> > 
> > static bool
> > build_access_from_expr (tree *expr_ptr,
> > 			gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write,
> > 			void *data ATTRIBUTE_UNUSED)
> > {
> >   return build_access_from_expr_1 (expr_ptr, gsi, write) != NULL;
> > }
> > 
> > /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in
> >    modes in which it matters, return true iff they have been disqualified.  RHS
> >    may be NULL, in that case ignore it.  If we scalarize an aggregate in
> >    intra-SRA we may need to add statements after each statement.  This is not
> >    possible if a statement unconditionally has to end the basic block.  */
> > static bool
> > disqualify_ops_if_throwing_stmt (gimple stmt, tree *lhs, tree *rhs)
> > {
> >   if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt))
> >     {
> >       walk_tree (lhs, disqualify_all, NULL, NULL);
> >       if (rhs)
> > 	walk_tree (rhs, disqualify_all, NULL, NULL);
> >       return true;
> >     }
> >   return false;
> > }
> > 
> > 
> > /* Result code for scan_assign callback for scan_function.  */
> > enum scan_assign_result {SRA_SA_NONE,       /* nothing done for the stmt */
> > 			 SRA_SA_PROCESSED,  /* stmt analyzed/changed */
> > 			 SRA_SA_REMOVED};   /* stmt redundant and eliminated */
> 
> space after { and before }.

OK
 
> > 
> > 
> > /* Scan expressions occuring in the statement pointed to by STMT_EXPR, create
> >    access structures for all accesses to candidates for scalarization and
> >    remove those candidates which occur in statements or expressions that
> >    prevent them from being split apart.  Return true if any access has been
> >    inserted.  */
> > 
> > static enum scan_assign_result
> > build_accesses_from_assign (gimple *stmt_ptr,
> > 			    gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED,
> > 			    void *data ATTRIBUTE_UNUSED)
> > {
> >   gimple stmt = *stmt_ptr;
> >   tree *lhs_ptr, *rhs_ptr;
> >   struct access *lacc, *racc;
> > 
> >   if (gimple_assign_rhs2 (stmt))
> 
> !gimple_assign_single_p (stmt)
> 
> >     return SRA_SA_NONE;
> >
> >   lhs_ptr = gimple_assign_lhs_ptr (stmt);
> >   rhs_ptr = gimple_assign_rhs1_ptr (stmt);
> 
> you probably don't need to pass pointers to trees everywhere as you
> are not changing them.

Well, this  function is a  callback called by scan_function  which can
also  call  sra_modify_expr  in  the  last  stage  of  the  pass  when
statements  are modified.   I have  considered splitting  the function
into two but  in the end I  thought they would be too  similar and the
overhead is hopefully manageable.
 
> >   if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr))
> >     return SRA_SA_NONE;
> > 
> >   racc = build_access_from_expr_1 (rhs_ptr, gsi, false);
> >   lacc = build_access_from_expr_1 (lhs_ptr, gsi, true);
> 
> just avoid calling into build_access_from_expr_1 for SSA_NAMEs
> or is_gimple_min_invariant lhs/rhs, that should make that
> function more regular.

In what sense?  build_access_from_expr_1 looks at TREE_CODE anyway and
can discard the two cases,  without for example looking into ADR_EXPRs
like is_gimple_min_invariant().

But if you really think it is indeed beneficial, I can do that, sure -
to me it just looks ugly).

 
> >   if (lacc && racc
> >       && !lacc->grp_unscalarizable_region
> >       && !racc->grp_unscalarizable_region
> >       && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr))
> >       && lacc->size <= racc->size
> >       && useless_type_conversion_p (lacc->type, racc->type))
> 
> useless_type_conversion_p should be always true here.

I don't think so, build_access_from_expr_1 can look through V_C_Es and
the types of accesses are the type of the operand in such cases..
 
> >     {
> >       struct assign_link *link;
> > 
> >       link = (struct assign_link *) pool_alloc (link_pool);
> >       memset (link, 0, sizeof (struct assign_link));
> > 
> >       link->lacc = lacc;
> >       link->racc = racc;
> > 
> >       add_link_to_rhs (racc, link);
> >     }
> > 
> >   return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE;
> > }
> > 
> > /* Scan function and look for interesting statements. Return true if any has
> >    been found or processed, as indicated by callbacks.  SCAN_EXPR is a callback
> >    called on all expressions within statements except assign statements and
> >    those deemed entirely unsuitable for some reason (all operands in such
> >    statements and expression are removed from candidate_bitmap).  SCAN_ASSIGN
> >    is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback
> >    called on assign statements and those call statements which have a lhs and
> >    it is the only callback which can be NULL. ANALYSIS_STAGE is true when
> >    running in the analysis stage of a pass and thus no statement is being
> >    modified.  DATA is a pointer passed to all callbacks.  If any single
> >    callback returns true, this function also returns true, otherwise it returns
> >    false.  */
> > 
> > static bool
> > scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *),
> > 	       enum scan_assign_result (*scan_assign) (gimple *,
> > 						       gimple_stmt_iterator *,
> > 						       void *),
> > 	       bool (*handle_ssa_defs)(gimple, void *),
> > 	       bool analysis_stage, void *data)
> > {
> >   gimple_stmt_iterator gsi;
> >   basic_block bb;
> >   unsigned i;
> >   tree *t;
> >   bool ret = false;
> > 
> >   FOR_EACH_BB (bb)
> >     {
> >       bool bb_changed = false;
> > 
> >       gsi = gsi_start_bb (bb);
> >       while (!gsi_end_p (gsi))
> > 	{
> > 	  gimple stmt = gsi_stmt (gsi);
> > 	  enum scan_assign_result assign_result;
> > 	  bool any = false, deleted = false;
> > 
> > 	  switch (gimple_code (stmt))
> > 	    {
> > 	    case GIMPLE_RETURN:
> > 	      t = gimple_return_retval_ptr (stmt);
> > 	      if (*t != NULL_TREE)
> > 		{
> > 		  if (DECL_P (*t))
> > 		    {
> > 		      tree ret_type = TREE_TYPE (*t);
> > 		      if (sra_mode == SRA_MODE_EARLY_INTRA
> > 			  && (TREE_CODE (ret_type) == UNION_TYPE
> > 			      || TREE_CODE (ret_type) == QUAL_UNION_TYPE))
> > 			disqualify_candidate (*t,
> > 					      "Union in a return statement.");
> > 		      else
> > 			bitmap_set_bit (retvals_bitmap, DECL_UID (*t));
> > 		    }
> > 		  any |= scan_expr (t, &gsi, false, data);
> > 		}
> 
> Likewise for passing pointers (why is gsi necessary and passing a stmt
> does not work?)

So that scan_expr can insert V_C_E statements before or after the
statement.  I'll explain more when commenting on the
fix_incompatible_types_for_expr() function.

 
> > 	      break;
> > 
> > 	    case GIMPLE_ASSIGN:
> > 	      assign_result = scan_assign (&stmt, &gsi, data);
> > 	      any |= assign_result == SRA_SA_PROCESSED;
> > 	      deleted = assign_result == SRA_SA_REMOVED;
> > 	      if (handle_ssa_defs && assign_result != SRA_SA_REMOVED)
> > 		any |= handle_ssa_defs (stmt, data);
> > 	      break;
> > 
> > 	    case GIMPLE_CALL:
> > 	      /* Operands must be processed before the lhs.  */
> > 	      for (i = 0; i < gimple_call_num_args (stmt); i++)
> > 		{
> > 		  tree *argp = gimple_call_arg_ptr (stmt, i);
> > 		  any |= scan_expr (argp, &gsi, false, data);
> > 		}
> > 
> > 	      if (gimple_call_lhs (stmt))
> > 		{
> > 		  tree *lhs_ptr = gimple_call_lhs_ptr (stmt);
> > 		  if (!analysis_stage ||
> > 		      !disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, NULL))
> > 		    {
> > 		      any |= scan_expr (lhs_ptr, &gsi, true, data);
> > 		      if (handle_ssa_defs)
> > 			any |= handle_ssa_defs (stmt, data);
> > 		    }
> > 		}
> > 	      break;
> > 
> > 	    case GIMPLE_ASM:
> > 	      for (i = 0; i < gimple_asm_ninputs (stmt); i++)
> > 		{
> > 		  tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i));
> > 		  any |= scan_expr (op, &gsi, false, data);
> > 		}
> > 	      for (i = 0; i < gimple_asm_noutputs (stmt); i++)
> > 		{
> > 		  tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i));
> > 		  any |= scan_expr (op, &gsi, true, data);
> > 		}
> 
> asm operands with memory constraints should be disqualified from SRA
> (see walk_stmt_load_store_addr_ops and/or the operand scanner).

Hm, identifying those with that constraint seems quite complicated and
scan_function  is  enough  complex  as   it  is.   I  guess  I'll  use
walk_stmt_load_store_addr_ops to determine this for me then.

> > 	    default:
> > 	      if (analysis_stage)
> > 		walk_gimple_op (stmt, disqualify_all, NULL);
> 
> You seem to be very eager to disqualify anything unknown ;)  (But I
> yet have to come across a TREE_ADDRESSABLE check ...)

It is in find_var_candidates.
 
> > 	      break;
> > 	    }
> > 
> > 	  if (any)
> > 	    {
> > 	      ret = true;
> > 	      bb_changed = true;
> > 
> > 	      if (!analysis_stage)
> 
> Oh.  So we reuse this function.  Hmm.

Yes.   And with  IPA-SRA it's  reused again  with another  few special
cases.  As I  have written above, I  am no longer sure this  is a good
idea but so far it's been manageable and it has a few advantages too.
 
> > 		{
> > 		  update_stmt (stmt);
> > 		  if (!stmt_could_throw_p (stmt))
> > 		    remove_stmt_from_eh_region (stmt);
> 
> Usually
> 
>   if (maybe_clean_or_replace_eh_stmt (stmt, stmt)
>       && gimple_purge_dead_eh_edges (gimple_bb (stmt)))
> 
> is the pattern for this.  But then you disqualified all throwing
> expressions, no?

I'll experiment with removing this.  So far it's on my TODO list so
that it does not hold me up sending an updated version to you.

 
> > 		}
> > 	    }
> > 	  if (deleted)
> > 	    bb_changed = true;
> > 	  else
> > 	    {
> > 	      gsi_next (&gsi);
> > 	      ret = true;
> > 	    }
> > 	}
> >       if (!analysis_stage && bb_changed)
> > 	gimple_purge_dead_eh_edges (bb);
> >     }
> > 
> >   return ret;
> > }
> > 
> > /* Helper of QSORT function. There are pointers to accesses in the array.  An
> >    access is considered smaller than another if it has smaller offset or if the
> >    offsets are the same but is size is bigger. */
> > 
> > static int
> > compare_access_positions (const void *a, const void *b)
> > {
> >   const access_p *fp1 = (const access_p *) a;
> >   const access_p *fp2 = (const access_p *) b;
> >   const access_p f1 = *fp1;
> >   const access_p f2 = *fp2;
> > 
> >   if (f1->offset != f2->offset)
> >     return f1->offset < f2->offset ? -1 : 1;
> > 
> >   if (f1->size == f2->size)
> >     return 0;
> >   /* We want the bigger accesses first, thus the opposite operator in the next
> >      line: */
> >   return f1->size > f2->size ? -1 : 1;
> > }
> > 
> > 
> > /* Append a name of the declaration to the name obstack.  A helper function for
> >    make_fancy_name.  */
> > 
> > static void
> > make_fancy_decl_name (tree decl)
> > {
> >   char buffer[32];
> > 
> >   tree name = DECL_NAME (decl);
> >   if (name)
> >     obstack_grow (&name_obstack, IDENTIFIER_POINTER (name),
> > 		  IDENTIFIER_LENGTH (name));
> >   else
> >     {
> >       sprintf (buffer, "D%u", DECL_UID (decl));
> 
> That would just be useless information.  I guess you copied this
> from old SRA?

Yes.  All this fancy naming stuff  is quite useless but I find it very
handy when debugging SRA issues.
 
> >       obstack_grow (&name_obstack, buffer, strlen (buffer));
> >     }
> > }
> > 
> > /* Helper for make_fancy_name.  */
> > 
> > static void
> > make_fancy_name_1 (tree expr)
> > {
> >   char buffer[32];
> >   tree index;
> > 
> >   if (DECL_P (expr))
> >     {
> >       make_fancy_decl_name (expr);
> >       return;
> >     }
> > 
> >   switch (TREE_CODE (expr))
> >     {
> >     case COMPONENT_REF:
> >       make_fancy_name_1 (TREE_OPERAND (expr, 0));
> >       obstack_1grow (&name_obstack, '$');
> >       make_fancy_decl_name (TREE_OPERAND (expr, 1));
> >       break;
> > 
> >     case ARRAY_REF:
> >       make_fancy_name_1 (TREE_OPERAND (expr, 0));
> >       obstack_1grow (&name_obstack, '$');
> >       /* Arrays with only one element may not have a constant as their
> > 	 index. */
> >       index = TREE_OPERAND (expr, 1);
> >       if (TREE_CODE (index) != INTEGER_CST)
> > 	break;
> >       sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index));
> >       obstack_grow (&name_obstack, buffer, strlen (buffer));
> > 
> >       break;
> > 
> >     case BIT_FIELD_REF:
> >     case REALPART_EXPR:
> >     case IMAGPART_EXPR:
> >       gcc_unreachable (); 	/* we treat these as scalars.  */
> >       break;
> >     default:
> >       break;
> >     }
> > }
> > 
> > /* Create a human readable name for replacement variable of ACCESS.  */
> > 
> > static char *
> > make_fancy_name (tree expr)
> > {
> >   make_fancy_name_1 (expr);
> >   obstack_1grow (&name_obstack, '\0');
> >   return XOBFINISH (&name_obstack, char *);
> > }
> 
> As all new scalars are DECL_ARTIFICIAL anyway why bother to create
> a fancy name? ....

Well, it helps when reading dumps in order to look into what SRA did.
 
> > /* Helper function for build_ref_for_offset.  */
> > 
> > static bool
> > build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset,
> > 			tree exp_type)
> > {
> >   while (1)
> >     {
> >       tree fld;
> >       tree tr_size, index;
> >       HOST_WIDE_INT el_size;
> > 
> >       if (offset == 0 && exp_type
> > 	  && useless_type_conversion_p (exp_type, type))
> > 	return true;
> > 
> >       switch (TREE_CODE (type))
> > 	{
> > 	case UNION_TYPE:
> > 	case QUAL_UNION_TYPE:
> > 	case RECORD_TYPE:
> > 	  /* Some ADA records are half-unions, treat all of them the same.  */
> > 	  for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
> > 	    {
> > 	      HOST_WIDE_INT pos, size;
> > 	      tree expr, *expr_ptr;
> > 
> > 	      if (TREE_CODE (fld) != FIELD_DECL)
> > 		continue;
> > 
> > 	      pos = int_bit_position (fld);
> > 	      gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0);
> > 	      size = tree_low_cst (DECL_SIZE (fld), 1);
> > 	      if (pos > offset || (pos + size) <= offset)
> > 		continue;
> > 
> > 	      if (res)
> > 		{
> > 		  expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld,
> > 				 NULL_TREE);
> > 		  expr_ptr = &expr;
> > 		}
> > 	      else
> > 		expr_ptr = NULL;
> > 	      if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld),
> > 					  offset - pos, exp_type))
> > 		{
> > 		  if (res)
> > 		    *res = expr;
> > 		  return true;
> > 		}
> > 	    }
> > 	  return false;
> > 
> > 	case ARRAY_TYPE:
> > 	  tr_size = TYPE_SIZE (TREE_TYPE (type));
> > 	  if (!tr_size || !host_integerp (tr_size, 1))
> > 	    return false;
> > 	  el_size = tree_low_cst (tr_size, 1);
> > 
> > 	  index = build_int_cst (TYPE_DOMAIN (type), offset / el_size);
> > 	  if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type))))
> > 	    index = int_const_binop (PLUS_EXPR, index,
> > 				     TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0);
> > 	  if (res)
> > 	    *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE,
> > 			   NULL_TREE);
> > 	  offset = offset % el_size;
> > 	  type = TREE_TYPE (type);
> > 	  break;
> > 
> > 	default:
> > 	  if (offset != 0)
> > 	    return false;
> > 
> > 	  if (exp_type)
> > 	    return false;
> > 	  else
> > 	    return true;
> > 	}
> >     }
> > }
> > 
> > /* Construct an expression that would reference a part of aggregate *EXPR of
> >    type TYPE at the given OFFSET of the type EXP_TYPE.  If EXPR is NULL, the
> >    function only determines whether it can build such a reference without
> >    actually doing it.
> > 
> >    FIXME: Eventually this should be replaced with
> >    maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a
> >    minor rewrite of fold_stmt.
> >  */
> > 
> > static bool
> > build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset,
> > 		      tree exp_type, bool allow_ptr)
> > {
> >   if (allow_ptr && POINTER_TYPE_P (type))
> >     {
> >       type = TREE_TYPE (type);
> >       if (expr)
> > 	*expr = fold_build1 (INDIRECT_REF, type, *expr);
> >     }
> > 
> >   return build_ref_for_offset_1 (expr, type, offset, exp_type);
> > }
> > 
> > /* The very first phase of intraprocedural SRA.  It marks in candidate_bitmap
> >    those with type which is suitable for scalarization.  */
> > 
> > static bool
> > find_var_candidates (void)
> > {
> >   tree var, type;
> >   referenced_var_iterator rvi;
> >   bool ret = false;
> > 
> >   FOR_EACH_REFERENCED_VAR (var, rvi)
> >     {
> >       if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL)
> >         continue;
> >       type = TREE_TYPE (var);
> > 
> >       if (!AGGREGATE_TYPE_P (type)
> > 	  || needs_to_live_in_memory (var)
> 
> Ok, here's the TREE_ADDRESSABLE check.  I'm finally convinced that
> disqualify_all should go ;)

OK
 
> > 	  || TREE_THIS_VOLATILE (var)
> > 	  || !COMPLETE_TYPE_P (type)
> > 	  || !host_integerp (TYPE_SIZE (type), 1)
> >           || tree_low_cst (TYPE_SIZE (type), 1) == 0
> > 	  || type_internals_preclude_sra_p (type))
> > 	continue;
> > 
> >       bitmap_set_bit (candidate_bitmap, DECL_UID (var));
> > 
> >       if (dump_file)
> 
> && (dump_flags & TDF_DETAILS)

OK
 
> > 	{
> > 	  fprintf (dump_file, "Candidate (%d): ", DECL_UID (var));
> > 	  print_generic_expr (dump_file, var, 0);
> > 	  fprintf (dump_file, "\n");
> > 	}
> >       ret = true;
> >     }
> > 
> >   return ret;
> > }
> > 
> > /* Return true if TYPE should be considered a scalar type by SRA.  */
> > 
> > static bool
> > is_sra_scalar_type (tree type)
> > {
> >   enum tree_code code = TREE_CODE (type);
> >   return (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type)
> > 	  || FIXED_POINT_TYPE_P (type) || POINTER_TYPE_P (type)
> > 	  || code == VECTOR_TYPE || code == COMPLEX_TYPE
> > 	  || code == OFFSET_TYPE);
> > }
> 
> Why is this anything different from is_gimple_reg_type ()?

The old SRA had  a function like this and I kept  it.  But it makes no
sense,   I've   removed   it   and   replaced   calls   to   it   with
is_gimple_reg_type().
 
> > /* Sort all accesses for the given variable, check for partial overlaps and
> >    return NULL if there are any.  If there are none, pick a representative for
> >    each combination of offset and size and create a linked list out of them.
> >    Return the pointer to the first representative and make sure it is the first
> >    one in the vector of accesses.  */
> > 
> > static struct access *
> > sort_and_splice_var_accesses (tree var)
> > {
> >   int i, j, access_count;
> >   struct access *res, **prev_acc_ptr = &res;
> >   VEC (access_p, heap) *access_vec;
> >   bool first = true;
> >   HOST_WIDE_INT low = -1, high = 0;
> > 
> >   access_vec = get_base_access_vector (var);
> >   if (!access_vec)
> >     return NULL;
> >   access_count = VEC_length (access_p, access_vec);
> > 
> >   /* Sort by <OFFSET, SIZE>.  */
> >   qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p),
> > 	 compare_access_positions);
> > 
> >   i = 0;
> >   while (i < access_count)
> >     {
> >       struct access *access = VEC_index (access_p, access_vec, i);
> >       bool modification = access->write;
> >       bool grp_read = !access->write;
> >       bool grp_bfr_lhs = access->grp_bfr_lhs;
> >       bool first_scalar = is_sra_scalar_type (access->type);
> >       bool unscalarizable_region = access->grp_unscalarizable_region;
> > 
> >       if (first || access->offset >= high)
> > 	{
> > 	  first = false;
> > 	  low = access->offset;
> > 	  high = access->offset + access->size;
> > 	}
> >       else if (access->offset > low && access->offset + access->size > high)
> > 	return NULL;
> >       else
> > 	gcc_assert (access->offset >= low
> > 		    && access->offset + access->size <= high);
> > 
> >       j = i + 1;
> >       while (j < access_count)
> > 	{
> > 	  struct access *ac2 = VEC_index (access_p, access_vec, j);
> > 	  if (ac2->offset != access->offset || ac2->size != access->size)
> > 	    break;
> > 	  modification |= ac2->write;
> > 	  grp_read |= !ac2->write;
> > 	  grp_bfr_lhs |= ac2->grp_bfr_lhs;
> > 	  unscalarizable_region |= ac2->grp_unscalarizable_region;
> > 	  relink_to_new_repr (access, ac2);
> > 
> > 	  /* If one of the equivalent accesses is scalar, use it as a
> > 	     representative (this happens when when there is for example on a
> > 	     single scalar field in a structure).  */
> > 	  if (!first_scalar && is_sra_scalar_type (ac2->type))
> > 	    {
> > 	      struct access tmp_acc;
> > 	      first_scalar = true;
> > 
> > 	      memcpy (&tmp_acc, ac2, sizeof (struct access));
> > 	      memcpy (ac2, access,  sizeof (struct access));
> > 	      memcpy (access, &tmp_acc, sizeof (struct access));
> > 	    }
> > 	  ac2->group_representative = access;
> > 	  j++;
> > 	}
> > 
> >       i = j;
> > 
> >       access->group_representative = access;
> >       access->grp_write = modification;
> >       access->grp_read = grp_read;
> >       access->grp_bfr_lhs = grp_bfr_lhs;
> >       access->grp_unscalarizable_region = unscalarizable_region;
> >       if (access->first_link)
> > 	add_access_to_work_queue (access);
> > 
> >       *prev_acc_ptr = access;
> >       prev_acc_ptr = &access->next_grp;
> >     }
> > 
> >   gcc_assert (res == VEC_index (access_p, access_vec, 0));
> >   return res;
> > }
> > 
> > /* Create a variable for the given ACCESS which determines the type, name and a
> >    few other properties.  Return the variable declaration and store it also to
> >    ACCESS->replacement.  */
> > 
> > static tree
> > create_access_replacement (struct access *access)
> > {
> >   tree repl;
> > 
> >   repl = make_rename_temp (access->type, "SR");
> >   get_var_ann (repl);
> >   add_referenced_var (repl);
> > 
> >   DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base);
> >   DECL_ARTIFICIAL (repl) = 1;
> > 
> >   if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base))
> 
> at least && !DECL_ARTIFICIAL (access->base) I think.

This part is also largely copied from the old SRA.  So far it seems to
work nicely, replacements of artificial declarations get SRsome_number
fancy names and that makes  them easy to distinguish.  Nevertheless, I
can change the condition if it is somehow wrong.  Or do you expect any
other problems beside not-so-fancy fancy names?
 
> >     {
> >       char *pretty_name = make_fancy_name (access->expr);
> > 
> >       DECL_NAME (repl) = get_identifier (pretty_name);
> >       obstack_free (&name_obstack, pretty_name);
> > 
> >       SET_DECL_DEBUG_EXPR (repl, access->expr);
> >       DECL_DEBUG_EXPR_IS_FROM (repl) = 1;
> >       DECL_IGNORED_P (repl) = 0;
> >       TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base);
> >     }
> >   else
> >     {
> >       DECL_IGNORED_P (repl) = 1;
> >       TREE_NO_WARNING (repl) = 1;
> >     }
> 
> So just copy DECL_IGNORED_P and TREE_NO_WARNING from access->base
> unconditionally?

Yeah, that makes sense.
 
> >   if (access->grp_bfr_lhs)
> >     DECL_GIMPLE_REG_P (repl) = 0;
> 
> But you never set it (see update_address_taken for more cases,
> most notably VIEW_CONVERT_EXPR on the lhs which need to be taken
> care of).  You should set it for COMPLEX_TYPE and VECTOR_TYPE 
> replacements.

This function  is the  only place where  I still  use make_rename_temp
which sets it  exactly in these two cases.  I did  not really know why
it is  required in these two  cases and only  in these two cases  so I
left it there, at least for  now.  I guess I understand that now after
seeing update_address_taken.

I can  replace this  with calling create_tmp_var()  and doing  all the
rest  that make_rename_temp does  - I  believe that  you intend  to to
remove it - I have just not found out why it is so bad.
 
> >   if (dump_file)
> >     {
> >       fprintf (dump_file, "Created a replacement for ");
> >       print_generic_expr (dump_file, access->base, 0);
> >       fprintf (dump_file, " offset: %u, size: %u: ",
> > 	       (unsigned) access->offset, (unsigned) access->size);
> >       print_generic_expr (dump_file, repl, 0);
> >       fprintf (dump_file, "\n");
> >     }
> > 
> >   return repl;
> > }
> > 
> > /* Return ACCESS scalar replacement, create it if it does not exist yet.  */
> > 
> > static inline tree
> > get_access_replacement (struct access *access)
> > {
> >   gcc_assert (access->grp_to_be_replaced);
> > 
> >   if (access->replacement_decl)
> >     return access->replacement_decl;
> > 
> >   access->replacement_decl = create_access_replacement (access);
> >   return access->replacement_decl;
> > }
> > 
> > /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the
> >    linked list along the way.  Stop when *ACCESS is NULL or the access pointed
> >    to it is not "within" the root.  */
> > 
> > static void
> > build_access_subtree (struct access **access)
> > {
> >   struct access *root = *access, *last_child = NULL;
> >   HOST_WIDE_INT limit = root->offset + root->size;
> > 
> >   *access = (*access)->next_grp;
> >   while  (*access && (*access)->offset + (*access)->size <= limit)
> >     {
> >       if (!last_child)
> > 	root->first_child = *access;
> >       else
> > 	last_child->next_sibling = *access;
> >       last_child = *access;
> > 
> >       build_access_subtree (access);
> >     }
> > }
> > 
> > /* Build a tree of access representatives, ACCESS is the pointer to the first
> >    one, others are linked in a list by the next_grp field.  Decide about scalar
> >    replacements on the way, return true iff any are to be created.  */
> > 
> > static void
> > build_access_trees (struct access *access)
> > {
> >   while (access)
> >     {
> >       struct access *root = access;
> > 
> >       build_access_subtree (&access);
> >       root->next_grp = access;
> >     }
> > }
> > 
> > /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when
> >    both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  Also set
> >    all sorts of access flags appropriately along the way, notably always ser
> >    grp_read when MARK_READ is true and grp_write when MARK_WRITE is true.  */
> > 
> > static bool
> > analyze_access_subtree (struct access *root, bool allow_replacements,
> > 			bool mark_read, bool mark_write)
> > {
> >   struct access *child;
> >   HOST_WIDE_INT limit = root->offset + root->size;
> >   HOST_WIDE_INT covered_to = root->offset;
> >   bool scalar = is_sra_scalar_type (root->type);
> >   bool hole = false, sth_created = false;
> > 
> >   if (mark_read)
> >     root->grp_read = true;
> >   else if (root->grp_read)
> >     mark_read = true;
> > 
> >   if (mark_write)
> >     root->grp_write = true;
> >   else if (root->grp_write)
> >     mark_write = true;
> > 
> >   if (root->grp_unscalarizable_region)
> >     allow_replacements = false;
> > 
> >   for (child = root->first_child; child; child = child->next_sibling)
> >     {
> >       if (!hole && child->offset < covered_to)
> > 	hole = true;
> >       else
> > 	covered_to += child->size;
> > 
> >       sth_created |= analyze_access_subtree (child,
> > 					     allow_replacements && !scalar,
> > 					     mark_read, mark_write);
> > 
> >       root->grp_unscalarized_data |= child->grp_unscalarized_data;
> >       hole |= !child->grp_covered;
> >     }
> > 
> >   if (allow_replacements && scalar && !root->first_child)
> >     {
> >       if (dump_file)
> 
> && (dump_flags & TDF_DETAILS)

OK
 
> > 	{
> > 	  fprintf (dump_file, "Marking ");
> > 	  print_generic_expr (dump_file, root->base, 0);
> > 	  fprintf (dump_file, " offset: %u, size: %u: ",
> > 		   (unsigned) root->offset, (unsigned) root->size);
> > 	  fprintf (dump_file, " to be replaced.\n");
> > 	}
> > 
> >       root->grp_to_be_replaced = 1;
> >       sth_created = true;
> >       hole = false;
> >     }
> >   else if (covered_to < limit)
> >     hole = true;
> > 
> >   if (sth_created && !hole)
> >     {
> >       root->grp_covered = 1;
> >       return true;
> >     }
> >   if (root->grp_write || TREE_CODE (root->base) == PARM_DECL)
> >     root->grp_unscalarized_data = 1; /* not covered and written to */
> >   if (sth_created)
> >     return true;
> >   return false;
> > }
> > 
> > /* Analyze all access trees linked by next_grp by the means of
> >    analyze_access_subtree.  */
> > static bool
> > analyze_access_trees (struct access *access)
> > {
> >   bool ret = false;
> > 
> >   while (access)
> >     {
> >       if (analyze_access_subtree (access, true, false, false))
> > 	ret = true;
> >       access = access->next_grp;
> >     }
> > 
> >   return ret;
> > }
> > 
> > /* Return true iff a potential new child of LACC at offset OFFSET and with size
> >    SIZE would conflict with an already existing one.  If exactly such a child
> >    already exists in LACC, store a pointer to it in EXACT_MATCH.  */
> > 
> > static bool
> > child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset,
> > 			      HOST_WIDE_INT size, struct access **exact_match)
> > {
> >   struct access *child;
> > 
> >   for (child = lacc->first_child; child; child = child->next_sibling)
> >     {
> >       if (child->offset == norm_offset && child->size == size)
> > 	{
> > 	  *exact_match = child;
> > 	  return true;
> > 	}
> > 
> >       if (child->offset < norm_offset + size
> > 	  && child->offset + child->size > norm_offset)
> > 	return true;
> >     }
> > 
> >   return false;
> > }
> > 
> > /* Set the expr of TARGET to one just like MODEL but with is own base at the
> >    bottom of the handled components.  */
> > 
> > static void
> > duplicate_expr_for_different_base (struct access *target,
> > 				   struct access *model)
> > {
> >   tree t, expr = unshare_expr (model->expr);
> > 
> >   gcc_assert (handled_component_p (expr));
> >   t = expr;
> >   while (handled_component_p (TREE_OPERAND (t, 0)))
> >     t = TREE_OPERAND (t, 0);
> >   gcc_assert (TREE_OPERAND (t, 0) == model->base);
> >   TREE_OPERAND (t, 0) = target->base;
> > 
> >   target->expr = expr;
> > }
> > 
> > 
> > /* Create a new child access of PARENT, with all properties just like MODEL
> >    except for its offset and with its grp_write false and grp_read true.
> >    Return the new access. Note that this access is created long after all
> >    splicing and sorting, it's not located in any access vector and is
> >    automatically a representative of its group.  */
> > 
> > static struct access *
> > create_artificial_child_access (struct access *parent, struct access *model,
> > 				HOST_WIDE_INT new_offset)
> > {
> >   struct access *access;
> >   struct access **child;
> > 
> >   gcc_assert (!model->grp_unscalarizable_region);
> > 
> >   access = (struct access *) pool_alloc (access_pool);
> >   memset (access, 0, sizeof (struct access));
> >   access->base = parent->base;
> >   access->offset = new_offset;
> >   access->size = model->size;
> >   duplicate_expr_for_different_base (access, model);
> >   access->type = model->type;
> >   access->grp_write = true;
> >   access->grp_read = false;
> > 
> >   child = &parent->first_child;
> >   while (*child && (*child)->offset < new_offset)
> >     child = &(*child)->next_sibling;
> > 
> >   access->next_sibling = *child;
> >   *child = access;
> > 
> >   return access;
> > }
> > 
> > 
> > /* Propagate all subaccesses of RACC across an assignment link to LACC. Return
> >    true if any new subaccess was created.  Additionally, if RACC is a scalar
> >    access but LACC is not, change the type of the latter.  */
> > 
> > static bool
> > propagate_subacesses_accross_link (struct access *lacc, struct access *racc)
> > {
> >   struct access *rchild;
> >   HOST_WIDE_INT norm_delta = lacc->offset - racc->offset;
> >   bool ret = false;
> > 
> >   if (is_sra_scalar_type (lacc->type)
> >       || lacc->grp_unscalarizable_region
> >       || racc->grp_unscalarizable_region)
> >     return false;
> > 
> >   if (!lacc->first_child && !racc->first_child
> >       && is_sra_scalar_type (racc->type)
> >       && (sra_mode == SRA_MODE_INTRA
> >           || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base))))
> >     {
> >       duplicate_expr_for_different_base (lacc, racc);
> >       lacc->type = racc->type;
> >       return false;
> >     }
> > 
> >   gcc_assert (lacc->size <= racc->size);
> > 
> >   for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling)
> >     {
> >       struct access *new_acc = NULL;
> >       HOST_WIDE_INT norm_offset = rchild->offset + norm_delta;
> > 
> >       if (rchild->grp_unscalarizable_region)
> > 	continue;
> > 
> >       if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size,
> > 					&new_acc))
> > 	{
> > 	  if (new_acc && rchild->first_child)
> > 	    ret |= propagate_subacesses_accross_link (new_acc, rchild);
> > 	  continue;
> > 	}
> > 
> >       new_acc = create_artificial_child_access (lacc, rchild, norm_offset);
> >       if (racc->first_child)
> > 	propagate_subacesses_accross_link (new_acc, rchild);
> > 
> >       ret = true;
> >     }
> > 
> >   return ret;
> > }
> > 
> > /* Propagate all subaccesses across assignment links.  */
> > 
> > static void
> > propagate_all_subaccesses (void)
> > {
> >   while (work_queue_head)
> >     {
> >       struct access *racc = pop_access_from_work_queue ();
> >       struct assign_link *link;
> > 
> >       gcc_assert (racc->first_link);
> > 
> >       for (link = racc->first_link; link; link = link->next)
> > 	{
> > 	  struct access *lacc = link->lacc;
> > 
> > 	  if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base)))
> > 	    continue;
> > 	  lacc = lacc->group_representative;
> > 	  if (propagate_subacesses_accross_link (lacc, racc)
> > 	      && lacc->first_link)
> > 	    add_access_to_work_queue (lacc);
> > 	}
> >     }
> > }
> > 
> > /* Go through all accesses collected throughout the (intraprocedural) analysis
> >    stage, exclude overlapping ones, identify representatives and build trees
> >    out of them, making decisions about scalarization on the way.  Return true
> >    iff there are any to-be-scalarized variables after this stage. */
> > 
> > static bool
> > analyze_all_variable_accesses (void)
> > {
> >   tree var;
> >   referenced_var_iterator rvi;
> >   bool res = false;
> > 
> >   FOR_EACH_REFERENCED_VAR (var, rvi)
> >     if (bitmap_bit_p (candidate_bitmap, DECL_UID (var)))
> >       {
> > 	struct access *access;
> > 
> > 	access = sort_and_splice_var_accesses (var);
> > 	if (access)
> > 	  build_access_trees (access);
> > 	else
> > 	  disqualify_candidate (var,
> > 				"No or inhibitingly overlapping accesses.");
> >       }
> > 
> >   propagate_all_subaccesses ();
> > 
> >   FOR_EACH_REFERENCED_VAR (var, rvi)
> >     if (bitmap_bit_p (candidate_bitmap, DECL_UID (var)))
> >       {
> > 	struct access *access = get_first_repr_for_decl (var);
> > 
> > 	if (analyze_access_trees (access))
> > 	  {
> > 	    res = true;
> > 	    if (dump_file)
> 
> && (dump_flags & TDF_DETAILS)

OK
 
> > 	      {
> > 		fprintf (dump_file, "\nAccess trees for ");
> > 		print_generic_expr (dump_file, var, 0);
> > 		fprintf (dump_file, " (UID: %u): \n", DECL_UID (var));
> > 		dump_access_tree (dump_file, access);
> > 		fprintf (dump_file, "\n");
> > 	      }
> > 	  }
> > 	else
> > 	  disqualify_candidate (var, "No scalar replacements to be created.");
> >       }
> > 
> >   return res;
> > }
> > 
> > /* Return true iff a reference statement into aggregate AGG can be built for
> >    every single to-be-replaced accesses that is a child of ACCESS, its sibling
> >    or a child of its sibling. TOP_OFFSET is the offset from the processed
> >    access subtree that has to be subtracted from offset of each access.  */
> > 
> > static bool
> > ref_expr_for_all_replacements_p (struct access *access, tree agg,
> > 				 HOST_WIDE_INT top_offset)
> > {
> >   do
> >     {
> >       if (access->grp_to_be_replaced
> > 	  && !build_ref_for_offset (NULL, TREE_TYPE (agg),
> > 				    access->offset - top_offset,
> > 				    access->type, false))
> > 	return false;
> > 
> >       if (access->first_child
> > 	  && !ref_expr_for_all_replacements_p (access->first_child, agg,
> > 					       top_offset))
> > 	return false;
> > 
> >       access = access->next_sibling;
> >     }
> >   while (access);
> > 
> >   return true;
> > }
> > 
> > 
> > /* Generate statements copying scalar replacements of accesses within a subtree
> >    into or out of AGG.  ACCESS is the first child of the root of the subtree to
> >    be processed.  AGG is an aggregate type expression (can be a declaration but
> >    does not have to be, it can for example also be an indirect_ref).
> >    TOP_OFFSET is the offset of the processed subtree which has to be subtracted
> >    from offsets of individual accesses to get corresponding offsets for AGG.
> >    If CHUNK_SIZE is non-null, copy only replacements in the interval
> >    <start_offset, start_offset + chunk_size>, otherwise copy all.  GSI is a
> >    statement iterator used to place the new statements.  WRITE should be true
> >    when the statements should write from AGG to the replacement and false if
> >    vice versa.  if INSERT_AFTER is true, new statements will be added after the
> >    current statement in GSI, they will be added before the statement
> >    otherwise.  */
> > 
> > static void
> > generate_subtree_copies (struct access *access, tree agg,
> > 			 HOST_WIDE_INT top_offset,
> > 			 HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size,
> > 			 gimple_stmt_iterator *gsi, bool write,
> > 			 bool insert_after)
> > {
> >   do
> >     {
> >       tree expr = unshare_expr (agg);
> > 
> >       if (chunk_size && access->offset >= start_offset + chunk_size)
> > 	return;
> > 
> >       if (access->grp_to_be_replaced
> > 	  && (chunk_size == 0
> > 	      || access->offset + access->size > start_offset))
> > 	{
> > 	  bool repl_found;
> > 	  gimple stmt;
> > 
> > 	  repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg),
> > 					     access->offset - top_offset,
> > 					     access->type, false);
> > 	  gcc_assert (repl_found);
> > 
> > 	  if (write)
> > 	    stmt = gimple_build_assign (get_access_replacement (access), expr);
> > 	  else
> > 	    {
> > 	      tree repl = get_access_replacement (access);
> > 	      TREE_NO_WARNING (repl) = 1;
> > 	      stmt = gimple_build_assign (expr, repl);
> > 	    }
> > 
> > 	  if (insert_after)
> > 	    gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> > 	  else
> > 	    gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> > 	  update_stmt (stmt);
> > 	}
> > 
> >       if (access->first_child)
> > 	generate_subtree_copies (access->first_child, agg, top_offset,
> > 				 start_offset, chunk_size, gsi,
> > 				 write, insert_after);
> > 
> >       access = access->next_sibling;
> >     }
> >   while (access);
> > }
> > 
> > /* Assign zero to all scalar replacements in an access subtree.  ACCESS is the
> >    the root of the subtree to be processed.  GSI is the statement iterator used
> >    for inserting statements which are added after the current statement if
> >    INSERT_AFTER is true or before it otherwise.  */
> > 
> > static void
> > init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi,
> > 			bool insert_after)
> > 
> > {
> >   struct access *child;
> > 
> >   if (access->grp_to_be_replaced)
> >     {
> >       gimple stmt;
> > 
> >       stmt = gimple_build_assign (get_access_replacement (access),
> > 				  fold_convert (access->type,
> > 						integer_zero_node));
> >       if (insert_after)
> > 	gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> >       else
> > 	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> >       update_stmt (stmt);
> >     }
> > 
> >   for (child = access->first_child; child; child = child->next_sibling)
> >     init_subtree_with_zero (child, gsi, insert_after);
> > }
> > 
> > /* Search for an access representative for the given expression EXPR and
> >    return it or NULL if it cannot be found.  */
> > 
> > static struct access *
> > get_access_for_expr (tree expr)
> > {
> >   HOST_WIDE_INT offset, size, max_size;
> >   tree base;
> > 
> >   if (TREE_CODE (expr) == NOP_EXPR
> 
> CONVERT_EXPR_P (expr)
> 
> >       || TREE_CODE (expr) == VIEW_CONVERT_EXPR)
> 
> VIEW_CONVERT_EXPR is also a handled_component_p.
> 
> Note that NOP_EXPR should never occur here - that would be invalid
> gimple.  So I think you can (and should) just delete the above.

I haven't  seen a NOP_EXPR for a  while, do they still  exist in lower
gimple?  Thus I have removed their handling.

Removing diving through V_C_E breaks ADA, though.  The reason is that
we get a different size (and max_size) when calling
get_ref_base_and_extent on the V_C_E and on its argument.  However, I
believe both should be represented by a single access representative.
 
> >     expr = TREE_OPERAND (expr, 0);
> > 
> >   if (handled_component_p (expr))
> >     {
> >       base = get_ref_base_and_extent (expr, &offset, &size, &max_size);
> >       size = max_size;
> >       if (size == -1 || !base || !DECL_P (base))
> > 	return NULL;
> >     }
> >   else if (DECL_P (expr))
> >     {
> >       tree tree_size;
> > 
> >       base = expr;
> >       tree_size = TYPE_SIZE (TREE_TYPE (base));
> >       if (tree_size && host_integerp (tree_size, 1))
> > 	size = max_size = tree_low_cst (tree_size, 1);
> >       else
> > 	return NULL;
> > 
> >       offset = 0;
> 
> See above.  get_ref_base_and_extent handles plain DECLs just fine.

OK
 
> >     }
> >   else
> >     return NULL;
> > 
> >   if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
> >     return NULL;
> > 
> >   return get_var_base_offset_size_access (base, offset, size);
> > }
> > 
> > /* Substitute into *EXPR an expression of type TYPE with the value of the
> >    replacement of ACCESS.  This is done either by producing a special V_C_E
> >    assignment statement converting the replacement to a new temporary of the
> >    requested type if TYPE is not TREE_ADDRESSABLE or by going through the base
> >    aggregate if it is.  */
> > 
> > static void
> > fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access,
> > 				 gimple_stmt_iterator *gsi, bool write)
> > {
> >   tree repl = get_access_replacement (access);
> >   if (!TREE_ADDRESSABLE (type))
> >     {
> >       tree tmp = create_tmp_var (type, "SRvce");
> > 
> >       add_referenced_var (tmp);
> >       if (is_gimple_reg_type (type))
> > 	tmp = make_ssa_name (tmp, NULL);
> 
> Should be always is_gimple_reg_type () if it is a type suitable for
> a SRA scalar replacement. 

No, it is the type suitable for  the statement, it can be a union type
or a record with only one field. But see the more thorough explanation
below...

> But you should set DECL_GIMPLE_REG_P for
> VECTOR and COMPLEX types here.
> 
> >       if (write)
> > 	{
> > 	  gimple stmt;
> > 	  tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp);
> 
> This needs to either always fold to plain 'tmp' or tmp has to be a
> non-register.  Otherwise you will create invalid gimple.
> 
> > 	  *expr = tmp;
> > 	  if (is_gimple_reg_type (type))
> > 	    SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi);
> 
> See above.
> 
> > 	  stmt = gimple_build_assign (repl, conv);
> > 	  gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> > 	  update_stmt (stmt);
> > 	}
> >       else
> > 	{
> > 	  gimple stmt;
> > 	  tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl);
> > 
> > 	  stmt = gimple_build_assign (tmp, conv);
> > 	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> > 	  if (is_gimple_reg_type (type))
> > 	    SSA_NAME_DEF_STMT (tmp) = stmt;
> 
> See above.  (I wonder if the patch still passes bootstrap & regtest
> after the typecking patch)
> 
> > 	  *expr = tmp;
> > 	  update_stmt (stmt);
> > 	}
> >     }
> >   else
> >     {
> >       if (write)
> > 	{
> > 	  gimple stmt;
> > 
> > 	  stmt = gimple_build_assign (repl, unshare_expr (access->expr));
> > 	  gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> > 	  update_stmt (stmt);
> > 	}
> >       else
> > 	{
> > 	  gimple stmt;
> > 
> > 	  stmt = gimple_build_assign (unshare_expr (access->expr), repl);
> > 	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> > 	  update_stmt (stmt);
> > 	}
> 
> I don't understand this path.  Are the types here always compatible?

And I don't really understand the comments.  The function is called by
sra_modify_expr (the function doing the replacements in all non-assign
statements) when it  needs to replace a reference by  a scalar but the
types don't  match.  This can happen  when replacing a  V_C_E, a union
access  when we  picked a  different  type that  the one  used in  the
statement or (and this case can be remarkably irritating) an access to
a records with only one (scalar) field.

My original idea  was to simply put a V_C_E in  the place.  However, I
believe there are places where this  is not possible - or at least one
case, a  LHS of  a call statement  because V_C_Es  of gimple_registers
(ssa_names) are not allowed on  LHSs.  My initial idea to handle these
cases  were to  create a  new temporary  with a  matching and  a V_C_E
assign statement  (with the V_C_E always  on the RHS -  I believe that
works even  with gimple  registers) that would  do the  conversion and
load/store  it   to  the  replacement  variable  (this   is  what  the
!TREE_ADDRESSABLE branch does).

The problem  with this idea  are TREE_ADDRESSABLE types.   These types
need to be  constructed and thus we cannot  create temporary variables
of these types.   On the other hand they absolutely  need to be SRAed,
not doing  so slows down tramp3d by  a factor of two  (and the current
SRA also breaks them up).  And  quite a few C++ classes are such types
that   are  "non-addressable"   and  have   only  one   scalar  field.
Identifying  such records  is possible,  I  soon realized  that I  can
simply leave the statement as it  is and produce a new statement to do
load/store  from the original  field (that's  what the  outermost else
branch does).

Does this make sense or is there some fundamental flaw in my reasoning
about gimple again?  Does this explain what the function does?

It certainly passes bootstrap and testing, I use --enable-checking=yes.
 
> >     }
> > }
> > 
> > 
> > /* Callback for scan_function.  Replace the expression EXPR with a scalar
> >    replacement if there is one and generate other statements to do type
> >    conversion or subtree copying if necessary.  GSI is used to place newly
> >    created statements, WRITE is true if the expression is being written to (it
> >    is on a LHS of a statement or output in an assembly statement).  */
> > 
> > static bool
> > sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write,
> > 		 void *data ATTRIBUTE_UNUSED)
> > {
> >   struct access *access;
> >   tree type, bfr;
> > 
> >   if (TREE_CODE (*expr) == BIT_FIELD_REF)
> >     {
> >       bfr = *expr;
> >       expr = &TREE_OPERAND (*expr, 0);
> >     }
> >   else
> >     bfr = NULL_TREE;
> > 
> >   if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR)
> >     expr = &TREE_OPERAND (*expr, 0);
> 
> Why strip these early?  I think this is wrong (or do you always want to
> produce complex type replacements, even if only the real or imaginary part
> are used?  If so a strathegic comment somewhere is missing.)

As I have already written above,  my intention was to keep the complex
replacements complex.   Again, not doing so  would definitely simplify
the code so if you think there is  no benefit in doing so I ma get rid
of all their  special handling.  I also treat  vector accesses through
BIT_FIELD_REFs specially...

ATM, I'll add a comment to the access structure description.

> >   type = TREE_TYPE (*expr);
> > 
> >   access = get_access_for_expr (*expr);
> >   if (!access)
> >     return false;
> > 
> >   if (access->grp_to_be_replaced)
> >     {
> >       if (!useless_type_conversion_p (type, access->type))
> > 	fix_incompatible_types_for_expr (expr, type, access, gsi, write);
> >       else
> > 	*expr = get_access_replacement (access);
> >     }
> > 
> >   if (access->first_child)
> >     {
> >       HOST_WIDE_INT start_offset, chunk_size;
> >       if (bfr
> > 	  && host_integerp (TREE_OPERAND (bfr, 1), 1)
> > 	  && host_integerp (TREE_OPERAND (bfr, 2), 1))
> > 	{
> > 	  start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1);
> > 	  chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1);
> > 	}
> >       else
> > 	start_offset = chunk_size = 0;
> > 
> >       generate_subtree_copies (access->first_child, access->base, 0,
> > 			       start_offset, chunk_size, gsi, write, write);
> >     }
> >   return true;
> > }
> > 
> > /* Store all replacements in the access tree rooted in TOP_RACC either to their
> >    base aggregate if there are unscalarized data or directly to LHS
> >    otherwise.  */
> > 
> > static void
> > handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs,
> > 				     gimple_stmt_iterator *gsi)
> > {
> >   if (top_racc->grp_unscalarized_data)
> >     generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0,
> > 			     gsi, false, false);
> >   else
> >     generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset,
> > 			     0, 0, gsi, false, false);
> > }
> > 
> > 
> > /* Try to generate statements to load all sub-replacements in an access
> >    (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC
> >    (sub)tree.  If that is not possible, refresh the TOP_RACC base aggregate and
> >    load the accesses from it.  LEFT_OFFSET is the offset of the left whole
> >    subtree being copied, RIGHT_OFFSET is the same thing for the right subtree.
> >    GSI is stmt iterator used for statement insertions.  *REFRESHED is true iff
> >    the rhs top aggregate has already been refreshed by contents of its scalar
> >    reductions and is set to true if this function has to do it.  */
> > 
> > static void
> > load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc,
> > 				 HOST_WIDE_INT left_offset,
> > 				 HOST_WIDE_INT right_offset,
> > 				 gimple_stmt_iterator *old_gsi,
> > 				 gimple_stmt_iterator *new_gsi,
> > 				 bool *refreshed, tree lhs)
> > {
> >   do
> >     {
> >       if (lacc->grp_to_be_replaced)
> > 	{
> > 	  struct access *racc;
> > 	  HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset;
> > 
> > 	  racc = find_access_in_subtree (top_racc, offset, lacc->size);
> > 	  if (racc && racc->grp_to_be_replaced)
> > 	    {
> > 	      gimple stmt;
> > 
> > 	      if (useless_type_conversion_p (lacc->type, racc->type))
> > 		stmt = gimple_build_assign (get_access_replacement (lacc),
> > 					    get_access_replacement (racc));
> > 	      else
> > 		{
> > 		  tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type,
> > 					  get_access_replacement (racc));
> > 		  stmt = gimple_build_assign (get_access_replacement (lacc),
> > 					      rhs);
> > 		}
> > 
> > 	      gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT);
> > 	      update_stmt (stmt);
> > 	    }
> > 	  else
> > 	    {
> > 	      tree expr = unshare_expr (top_racc->base);
> > 	      bool repl_found;
> > 	      gimple stmt;
> > 
> > 	      /* No suitable access on the right hand side, need to load from
> > 		 the aggregate.  See if we have to update it first... */
> > 	      if (!*refreshed)
> > 		{
> > 		  gcc_assert (top_racc->first_child);
> > 		  handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi);
> > 		  *refreshed = true;
> > 		}
> > 
> > 	      repl_found = build_ref_for_offset (&expr,
> > 						 TREE_TYPE (top_racc->base),
> > 						 lacc->offset - left_offset,
> > 						 lacc->type, false);
> > 	      gcc_assert (repl_found);
> > 	      stmt = gimple_build_assign (get_access_replacement (lacc),
> > 					  expr);
> > 	      gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT);
> > 	      update_stmt (stmt);
> > 	    }
> > 	}
> >       else if (lacc->grp_read && !lacc->grp_covered && !*refreshed)
> > 	{
> > 	  handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi);
> > 	  *refreshed = true;
> > 	}
> > 
> >       if (lacc->first_child)
> > 	load_assign_lhs_subreplacements (lacc->first_child, top_racc,
> > 					 left_offset, right_offset,
> > 					 old_gsi, new_gsi, refreshed, lhs);
> >       lacc = lacc->next_sibling;
> >     }
> >   while (lacc);
> > }
> > 
> > /* Return true iff ACC is non-NULL and has subaccesses.  */
> > 
> > static inline bool
> > access_has_children_p (struct access *acc)
> > {
> >   return acc && acc->first_child;
> > }
> > 
> > /* Modify assignments with a CONSTRUCTOR on their RHS.  STMT contains a pointer
> >    to the assignment and GSI is the statement iterator pointing at it.  Returns
> >    the same values as sra_modify_assign.  */
> > 
> > static enum scan_assign_result
> > sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi)
> > {
> >   tree lhs = gimple_assign_lhs (*stmt);
> >   struct access *acc;
> > 
> >   gcc_assert (TREE_CODE (lhs) != REALPART_EXPR
> > 	      && TREE_CODE (lhs) != IMAGPART_EXPR);
> >   acc = get_access_for_expr (lhs);
> >   if (!acc)
> >     return SRA_SA_NONE;
> > 
> >   if (VEC_length (constructor_elt,
> > 		  CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0)
> >     {
> >       /* I have never seen this code path trigger but if it can happen the
> > 	 following should handle it gracefully.  */
> 
> It can trigger for vector constants.

OK, I'll remove the comment.  Apparently there are none in the
testsuite, I believe I tested with a gcc_unreachable here.

 
> >       if (access_has_children_p (acc))
> > 	generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi,
> > 				 true, true);
> >       return SRA_SA_PROCESSED;
> >     }
> > 
> >   if (acc->grp_covered)
> >     {
> >       init_subtree_with_zero (acc, gsi, false);
> >       unlink_stmt_vdef (*stmt);
> >       gsi_remove (gsi, true);
> >       return SRA_SA_REMOVED;
> >     }
> >   else
> >     {
> >       init_subtree_with_zero (acc, gsi, true);
> >       return SRA_SA_PROCESSED;
> >     }
> > }
> > 
> > 
> > /* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with
> >    to-be-scalarized expressions with them.  STMT is the statement and GSI is
> >    the iterator used to place new helper statements.  Returns the same values
> >    as sra_modify_assign.  */
> > 
> > static enum scan_assign_result
> > sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi)
> > {
> >   tree lhs, complex, ptype, rp, ip;
> >   struct access *access;
> >   gimple new_stmt, aux_stmt;
> > 
> >   lhs = gimple_assign_lhs (stmt);
> >   complex = TREE_OPERAND (lhs, 0);
> > 
> >   access = get_access_for_expr (complex);
> > 
> >   if (!access || !access->grp_to_be_replaced)
> >     return SRA_SA_NONE;
> > 
> >   ptype = TREE_TYPE (TREE_TYPE (complex));
> >   rp = create_tmp_var (ptype, "SRr");
> >   add_referenced_var (rp);
> >   rp = make_ssa_name (rp, NULL);
> > 
> >   ip = create_tmp_var (ptype, "SRp");
> >   add_referenced_var (ip);
> >   ip = make_ssa_name (ip, NULL);
> > 
> >   if (TREE_CODE (lhs) == IMAGPART_EXPR)
> >     {
> >       aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype,
> > 					     get_access_replacement (access)));
> >       SSA_NAME_DEF_STMT (rp) = aux_stmt;
> >       gimple_assign_set_lhs (stmt, ip);
> >       SSA_NAME_DEF_STMT (ip) = stmt;
> >     }
> >   else
> >     {
> >       aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype,
> > 					     get_access_replacement (access)));
> >       SSA_NAME_DEF_STMT (ip) = aux_stmt;
> >       gimple_assign_set_lhs (stmt, rp);
> >       SSA_NAME_DEF_STMT (rp) = stmt;
> >     }
> > 
> >   gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT);
> >   update_stmt (aux_stmt);
> >   new_stmt = gimple_build_assign (get_access_replacement (access),
> > 				  fold_build2 (COMPLEX_EXPR, access->type,
> > 					       rp, ip));
> >   gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT);
> >   update_stmt (new_stmt);
> 
> Hm.  So you do what complex lowering does here.  Note that this may
> create loads from uninitialized memory with all its problems.

Yes,  but I  have not  had any  such problems  with complex  types (as
opposed to  simple loads from half-initialized  records, for example).
OTOH, I  have also contemplated  setting DECL_GIMPLE_REG_P to  zero of
complex replacement which appear in IMAG_PART or REAL_PART on a LHS of
a statement.

> 
> WRT the complex stuff.  If you would do scalarization and analysis
> just on the components (not special case REAL/IMAGPART_EXPR everywhere)
> it should work better, correct?  You still could handle group
> scalarization for the case of for example passing a complex argument
> to a function.

Well, my reasoning was that if complex types were first-class citizens
in gimple  (as opposed to a record),  there was a reason  to keep them
together  and  so  I  attempted   that.   But  again,  if  that  is  a
misconception of mine and there  is no point in keeping them together,
I will gladly remove this.
 
> void bar(_Complex float);
> void foo(float x, float y)
> {
>   _Complex float z = x;
>   __imag z = y;
>   bar(z);
> }
> 
> The same applies for vectors - the REAL/IMAGPART_EXPRs equivalent
> there is BIT_FIELD_REF.

These are handled  by setting DECL_GIMPLE_REG_P to zero  if a B_F_R is
on a LHS.  I believe the current SRA does the same.  It works fine and
there's a lot less fuss about them.
 
> >   return SRA_SA_PROCESSED;
> > }
> > 
> > /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components.  */
> > 
> > static bool
> > contains_view_convert_expr_p (tree t)
> > {
> >   while (1)
> >     {
> >       if (TREE_CODE (t) == VIEW_CONVERT_EXPR)
> > 	return true;
> >       if (!handled_component_p (t))
> > 	return false;
> >       t = TREE_OPERAND (t, 0);
> >     }
> > }
> 
> Place this in tree-flow-inline.h next to ref_contains_array_ref, also
> structure the loop in the same way.

OK,  but I'd like  the function  to work  if passed  declarations too.
Thus I cannot really use a  do-while loop.  I'll send it in a separate
patch.

> > /* Change STMT to assign compatible types by means of adding component or array
> >    references or VIEW_CONVERT_EXPRs.  All parameters have the same meaning as
> >    variable with the same names in sra_modify_assign.  This is done in a
> >    such a complicated way in order to make
> >    testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some
> >    cases.  */
> > 
> > static void
> > fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt,
> > 				   struct access *lacc, struct access *racc,
> > 				   tree lhs, tree *rhs, tree ltype, tree rtype)
> > {
> >   if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype)
> >       && !access_has_children_p (lacc))
> >     {
> >       tree expr = unshare_expr (lhs);
> >       bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype,
> > 					 false);
> >       if (found)
> > 	{
> > 	  gimple_assign_set_lhs (*stmt, expr);
> > 	  return;
> > 	}
> >     }
> > 
> >   if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype)
> >       && !access_has_children_p (racc))
> >     {
> >       tree expr = unshare_expr (*rhs);
> >       bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype,
> > 					 false);
> >       if (found)
> > 	{
> > 	  gimple_assign_set_rhs1 (*stmt, expr);
> > 	  return;
> > 	}
> >     }
> > 
> >   *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs);
> >   gimple_assign_set_rhs_from_tree (gsi, *rhs);
> >   *stmt = gsi_stmt (*gsi);
> 
> Reading this I have a deja-vu - isn't there another function in this
> file doing the same thing?  You are doing much unsharing even though
> you re-build the access tree from scratch?

This function has a similar purpose as fix_incompatible_types_for_expr
but this time  only for assign statements.  That  is easier because we
can always put the  V_C_E on the RHS and be safe  and so no additional
statements need to be generated.

However,  the V_C_Es  rather than  COMPONENT_REFs and  ARRAY_REFs feel
unnatural for  accessing fields from  single field records  and unions
and single  element arrays.  According to  the comment I  used to have
problems of some sort with that  in the ssa-sra-2.C testcase but I can
no longer reproduce them (and don't remember them).

I  call  unshare_expr  in  this  context  only when  one  side  of  an
assignment statement is  a scalar replacement and the  other one is an
aggregate (but not necessarily a declaration) which can happen only in
the cases listed  above.  That is not very many  calls and chances are
good that build_ref_for_offset succeeds.

Does that explain what is going on here?
 
> > }
> > 
> > /* Callback of scan_function to process assign statements.  It examines both
> >    sides of the statement, replaces them with a scalare replacement if there is
> >    one and generating copying of replacements if scalarized aggregates have been
> >    used in the assignment.  STMT is a pointer to the assign statement, GSI is
> >    used to hold generated statements for type conversions and subtree
> >    copying.  */
> > 
> > static enum scan_assign_result
> > sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi,
> > 		   void *data ATTRIBUTE_UNUSED)
> > {
> >   struct access *lacc, *racc;
> >   tree ltype, rtype;
> >   tree lhs, rhs;
> >   bool modify_this_stmt;
> > 
> >   if (gimple_assign_rhs2 (*stmt))
> 
> !gimple_assign_single_p (*stmt)
> 
> (the only gimple assign that may access memory)

OK

> 
> >     return SRA_SA_NONE;
> >   lhs = gimple_assign_lhs (*stmt);
> >   rhs = gimple_assign_rhs1 (*stmt);
> > 
> >   if (TREE_CODE (rhs) == CONSTRUCTOR)
> >     return sra_modify_constructor_assign (stmt, gsi);
> > 
> >   if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR)
> >     return sra_modify_partially_complex_lhs (*stmt, gsi);
> > 
> >   if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR
> >       || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF)
> >     {
> >       modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt),
> > 					  gsi, false, data);
> >       modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt),
> > 					   gsi, true, data);
> >       return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE;
> >     }
> > 
> >   lacc = get_access_for_expr (lhs);
> >   racc = get_access_for_expr (rhs);
> >   if (!lacc && !racc)
> >     return SRA_SA_NONE;
> > 
> >   modify_this_stmt = ((lacc && lacc->grp_to_be_replaced)
> > 		      || (racc && racc->grp_to_be_replaced));
> > 
> >   if (lacc && lacc->grp_to_be_replaced)
> >     {
> >       lhs = get_access_replacement (lacc);
> >       gimple_assign_set_lhs (*stmt, lhs);
> >       ltype = lacc->type;
> >     }
> >   else
> >     ltype = TREE_TYPE (lhs);
> > 
> >   if (racc && racc->grp_to_be_replaced)
> >     {
> >       rhs = get_access_replacement (racc);
> >       gimple_assign_set_rhs1 (*stmt, rhs);
> >       rtype = racc->type;
> >     }
> >   else
> >     rtype = TREE_TYPE (rhs);
> > 
> >   /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate
> >      the statement makes the position of this pop_stmt_changes() a bit awkward
> >      but hopefully make some sense.  */
> 
> I don't see pop_stmt_changes().

Yeah, the comment is outdated. I've removed it.
 
> >   if (modify_this_stmt)
> >     {
> >       if (!useless_type_conversion_p (ltype, rtype))
> > 	fix_modified_assign_compatibility (gsi, stmt, lacc, racc,
> > 					   lhs, &rhs, ltype, rtype);
> >     }
> > 
> >   if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs)
> >       || (access_has_children_p (racc)
> > 	  && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset))
> >       || (access_has_children_p (lacc)
> > 	  && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset)))
> 
> ?  A comment is missing what this case is about ...
> 
> (this smells like fixup that could be avoided by doing things correct
> in the first place)

>From this  point on,  the function deals  with assignments  in between
aggregates  when at least  one has  scalar reductions  of some  of its
components.  There are three possible  scenarios: Both the LHS and RHS
have to-be-scalarized components,  2) only the RHS has  or 3) only the
LHS has.

In the first  case, we would like to load the  LHS components from RHS
components whenever possible.  If that  is not possible, we would like
to read it  directly from the RHS (after updating it  by storing in it
its own components).  If there are some necessary unscalarized data in
the  LHS, those will  be loaded  by the  original assignment  too.  If
neither of these cases happen,  the original statement can be removed.
Most of this is done by load_assign_lhs_subreplacements.

In  the  second  case, we  would  like  to  store all  RHS  scalarized
components  directly  into  LHS   and  if  they  cover  the  aggregate
completely, remove the statement too.   In the third case, we want the
LHS components to be loaded directly from the RHS (DSE will remove the
original statement if it becomes redundant).

This is a bit complex but  manageable when types match and when unions
do not cause confusion in a way that we cannot really load a component
of LHS from the RHS or  vice versa (the access representing this level
can  have subaccesses  that are  accessible only  through  a different
union field  at a higher  level - different  from the one used  in the
examined expression).  Unions are fun.

Therefore, I specially handle a fourth case, happening when there is a
specific  type  cast  or  it  is impossible  to  locate  a  scalarized
subaccess on  the other  side of the  expression.  If that  happens, I
simply "refresh"  the RHS  by storing in  it is  scalarized components
leave the original statement there to do the copying and then load the
scalar replacements of the LHS.  This is what the first branch does.

Is it  clearer now?  Perhaps I  should put these five  paragraphs as a
comment into the function?
 
> >     {
> >       if (access_has_children_p (racc))
> > 	generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0,
> > 				 gsi, false, false);
> >       if (access_has_children_p (lacc))
> > 	generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0,
> > 				 gsi, true, true);
> >     }
> >   else
> >     {
> >       if (access_has_children_p (lacc) && access_has_children_p (racc))
> > 	{
> > 	  gimple_stmt_iterator orig_gsi = *gsi;
> > 	  bool refreshed;
> > 
> > 	  if (lacc->grp_read && !lacc->grp_covered)
> > 	    {
> > 	      handle_unscalarized_data_in_subtree (racc, lhs, gsi);
> > 	      refreshed = true;
> > 	    }
> > 	  else
> > 	    refreshed = false;
> > 
> > 	  load_assign_lhs_subreplacements (lacc->first_child, racc,
> > 					   lacc->offset, racc->offset,
> > 					   &orig_gsi, gsi, &refreshed, lhs);
> > 	  if (!refreshed || !racc->grp_unscalarized_data)
> > 	    {
> > 	      if (*stmt == gsi_stmt (*gsi))
> > 		gsi_next (gsi);
> > 
> > 	      unlink_stmt_vdef (*stmt);
> > 	      gsi_remove (&orig_gsi, true);
> > 	      return SRA_SA_REMOVED;
> > 	    }
> > 	}
> >       else
> > 	{
> > 	  if (access_has_children_p (racc))
> > 	    {
> > 	      if (!racc->grp_unscalarized_data)
> > 		{
> > 		  generate_subtree_copies (racc->first_child, lhs,
> > 					   racc->offset, 0, 0, gsi,
> > 					   false, false);
> > 		  gcc_assert (*stmt == gsi_stmt (*gsi));
> > 		  unlink_stmt_vdef (*stmt);
> > 		  gsi_remove (gsi, true);
> > 		  return SRA_SA_REMOVED;
> > 		}
> > 	      else
> > 		generate_subtree_copies (racc->first_child, lhs,
> > 					 racc->offset, 0, 0, gsi, false, true);
> > 	    }
> > 	  else if (access_has_children_p (lacc))
> > 	    generate_subtree_copies (lacc->first_child, rhs, lacc->offset,
> > 				     0, 0, gsi, true, true);
> > 	}
> >     }
> > 
> >   return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE;
> > }
> > 
> > /* Generate statements initializing scalar replacements of parts of function
> >    parameters.  */
> > 
> > static void
> > initialize_parameter_reductions (void)
> > {
> >   gimple_stmt_iterator gsi;
> >   gimple_seq seq = NULL;
> >   tree parm;
> > 
> >   for (parm = DECL_ARGUMENTS (current_function_decl);
> >        parm;
> >        parm = TREE_CHAIN (parm))
> >     {
> >       VEC (access_p, heap) *access_vec;
> >       struct access *access;
> > 
> >       if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm)))
> > 	continue;
> >       access_vec = get_base_access_vector (parm);
> >       if (!access_vec)
> > 	continue;
> > 
> >       if (!seq)
> > 	{
> > 	  seq = gimple_seq_alloc ();
> > 	  gsi = gsi_start (seq);
> > 	}
> > 
> >       for (access = VEC_index (access_p, access_vec, 0);
> > 	   access;
> > 	   access = access->next_grp)
> > 	generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true);
> >     }
> > 
> >   if (seq)
> >     gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq);
> > }
> > 
> > /* The "main" function of intraprocedural SRA passes.  Runs the analysis and if
> >    it reveals there are components of some aggregates to be scalarized, it runs
> >    the required transformations.  */
> > static unsigned int
> > perform_intra_sra (void)
> > {
> >   int ret = 0;
> >   sra_initialize ();
> > 
> >   if (!find_var_candidates ())
> >     goto out;
> > 
> >   if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL,
> > 		      true, NULL))
> >     goto out;
> > 
> >   if (!analyze_all_variable_accesses ())
> >     goto out;
> > 
> >   scan_function (sra_modify_expr, sra_modify_assign, NULL,
> > 		 false, NULL);
> >   initialize_parameter_reductions ();
> > 
> >   ret = TODO_update_ssa;
> 
> redundant set.
> 
> >   if (sra_mode == SRA_MODE_EARLY_INTRA)
> >     ret = TODO_update_ssa;
> >   else
> >     ret = TODO_update_ssa | TODO_rebuild_alias;
> 
> in fact you shouldn't (need to) rebuild alias.

OK
 
> >  out:
> >   sra_deinitialize ();
> >   return ret;
> > }
> > 
> > /* Perform early intraprocedural SRA.  */
> > static unsigned int
> > early_intra_sra (void)
> > {
> >   sra_mode = SRA_MODE_EARLY_INTRA;
> >   return perform_intra_sra ();
> > }
> > 
> > /* Perform "late" intraprocedural SRA.  */
> > static unsigned int
> > late_intra_sra (void)
> > {
> >   sra_mode = SRA_MODE_INTRA;
> >   return perform_intra_sra ();
> > }
> > 
> > 
> > static bool
> > gate_intra_sra (void)
> > {
> >   return flag_tree_sra != 0;
> > }
> > 
> > 
> > struct gimple_opt_pass pass_sra_early =
> > {
> >  {
> >   GIMPLE_PASS,
> >   "esra",	 			/* name */
> >   gate_intra_sra,			/* gate */
> >   early_intra_sra,			/* execute */
> >   NULL,					/* sub */
> >   NULL,					/* next */
> >   0,					/* static_pass_number */
> >   TV_TREE_SRA,				/* tv_id */
> >   PROP_cfg | PROP_ssa,                  /* properties_required */
> >   0,					/* properties_provided */
> >   0,					/* properties_destroyed */
> >   0,					/* todo_flags_start */
> >   TODO_dump_func
> >   | TODO_update_ssa
> >   | TODO_ggc_collect
> >   | TODO_verify_ssa			/* todo_flags_finish */
> >  }
> > };
> > 
> > 
> > struct gimple_opt_pass pass_sra =
> > {
> >  {
> >   GIMPLE_PASS,
> >   "sra",	 			/* name */
> >   gate_intra_sra,			/* gate */
> >   late_intra_sra,			/* execute */
> >   NULL,					/* sub */
> >   NULL,					/* next */
> >   0,					/* static_pass_number */
> >   TV_TREE_SRA,				/* tv_id */
> >   PROP_cfg | PROP_ssa,                  /* properties_required */
> >   0,					/* properties_provided */
> >   0,					/* properties_destroyed */
> >   TODO_update_address_taken,		/* todo_flags_start */
> >   TODO_dump_func
> >   | TODO_update_ssa
> >   | TODO_ggc_collect
> >   | TODO_verify_ssa			/* todo_flags_finish */
> >  }
> > };
> 
> 
> Overall it looks good - I'm still a little bit confused, but that's
> likely because reading code from top to bottom doesn't make the
> most sense in all cases ;)
> 
> Looking forward to a second look on a revised version.
> 

Thanks again,

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]