This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates.


On Tue, 28 Apr 2009, Martin Jambor wrote:

> On Tue, Apr 28, 2009 at 12:04:32PM +0200, Martin Jambor wrote:
> > This  is  the  new  intraprocedural  SRA.  I  have  stripped  off  the
> > interprocedural part  and will propose to commit  it separately later.
> > I have  tried to  remove almost every  trace of IPA-SRA,  however, two
> > provisions for it  have remained in the patch.   First, an enumeration
> > (rather than  a boolean) is  used to distuinguish between  "early" and
> > "late" SRA  so that other  SRA modes can  be added later  on.  Second,
> > scan_function()  has a  hook parameter  and a  void  pointer parameter
> > which are not used in this patch but will be by IPA-SRA.
> > 
> > Otherwise, the patch is hopefully self-contained and the bases of its
> > operation is described by the initial comment.
> > 
> > The patch bootstraps (on x86_64-linux-gnu but I am about to try it on
> > hppa-linux-gnu too) but produces a small number of testsuite failures
> > which are handled by the two following patches.
> > 
> > Thanks,
> > 
> > Martin
> > 
> > 
> > 2009-04-27  Martin Jambor  <mjambor@suse.cz>
> > 
> > 	* tree-sra.c (enum sra_mode): The whole contents of the file was
> > 	replaced.
> 
> Hm, the  patch is quite unreadable,  below is the  new tree-sra.c file
> which entirely replaces the old one (note that the patch also modifies
> the Makefile though):

Ah.  Here it is ... the comment to the changelog still applies.

> /* Scalar Replacement of Aggregates (SRA) converts some structure
>    references into scalar references, exposing them to the scalar
>    optimizers.
>    Copyright (C) 2008, 2009 Free Software Foundation, Inc.
>    Contributed by Martin Jambor <mjambor@suse.cz>
> 
> This file is part of GCC.
> 
> GCC is free software; you can redistribute it and/or modify it under
> the terms of the GNU General Public License as published by the Free
> Software Foundation; either version 3, or (at your option) any later
> version.
> 
> GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> WARRANTY; without even the implied warranty of MERCHANTABILITY or
> FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for more details.
> 
> You should have received a copy of the GNU General Public License
> along with GCC; see the file COPYING3.  If not see
> <http://www.gnu.org/licenses/>.  */
> 
> /* This file implements Scalar Reduction of Aggregates (SRA).  SRA is run
>    twice, once in the early stages of compilation (early SRA) and once in the
>    late stages (late SRA).  The aim of both is to turn references to scalar
>    parts of aggregates into uses of independent scalar variables.
> 
>    The two passes are nearly identical, the only difference is that early SRA
>    does not scalarize unions which are used as the result in a GIMPLE_RETURN
>    statement because together with inlining this can lead to weird type
>    conversions.

Do you happen to have a testcase for this or can you describe that problem
some more?

>    Both passes operate in four stages:
> 
>    1. The declarations that have properties which make them candidates for
>       scalarization are identified in function find_var_candidates().  The
>       candidates are stored in candidate_bitmap.
> 
>    2. The function body is scanned.  In the process, declarations which are
>       used in a manner that prevent their scalarization are removed from the
>       candidate bitmap.  More importantly, for every access into an aggregate,
>       an access structure (struct access) is created by create_access() and
>       stored in a vector associated with the aggregate.  Among other
>       information, the aggregate declaration, the offset and size of the access
>       and its type are stored in the structure.
> 
>       On a related note, assign_link structures are created for every assign
>       statement between candidate aggregates and attached to the related
>       accesses.
> 
>    3. The vectors of accesses are analyzed.  They are first sorted according to
>       their offset and size and then scanned for partially overlapping accesses
>       (i.e. those which overlap but one is not entirely within another).  Such
>       an access disqualifies the whole aggregate from being scalarized.

This happens only when get_ref_base_and_extent punts and returns -1 for
the access size?  And of course with struct field accesses of different
structs that are inside the same union.

>       If there is no such inhibiting overlap, a representative access structure
>       is chosen for every unique combination of offset and size.  Afterwards,
>       the pass builds a set of trees from these structures, in which children
>       of an access are within their parent (in terms of offset and size).
> 
>       Then accesses  are propagated  whenever possible (i.e.  in cases  when it
>       does not create a partially overlapping access) across assign_links from
>       the right hand side to the left hand side.
> 
>       Then the set of trees for each declaration is traversed again and those
>       accesses which should be replaced by a scalar are identified.
> 
>    4. The function is traversed again, and for every reference into an
>       aggregate that has some component which is about to be scalarized,
>       statements are amended and new statements are created as necessary.
>       Finally, if a parameter got scalarized, the scalar replacements are
>       initialized with values from respective parameter aggregates.
> */

Closing */ goes to the previous line.

> 
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> #include "alloc-pool.h"
> #include "tm.h"
> #include "tree.h"
> #include "gimple.h"
> #include "tree-flow.h"
> #include "diagnostic.h"
> #include "tree-dump.h"
> #include "timevar.h"
> #include "params.h"
> #include "target.h"
> #include "flags.h"
> 
> /* Enumeration of all aggregate reductions we can do.  */
> enum sra_mode {SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */
> 	       SRA_MODE_INTRA};	     /* late intraprocedural SRA */

Spaces after { and before }.

> 
> /* Global variable describing which aggregate reduction we are performing at
>    the moment.  */
> static enum sra_mode sra_mode;
> 
> struct assign_link;
> 
> /* ACCESS represents each access to an aggregate variable (as a whole or a
>    part).  It can also represent a group of accesses that refer to exactly the
>    same fragment of an aggregate (i.e. those that have exactly the same offset
>    and size).  Such representatives for a single aggregate, once determined,
>    are linked in a linked list and have the group fields set.
> 
>    Moreover, when doing intraprocedural SRA, a tree is built from those
>    representatives (by the means of first_child and next_sibling pointers), in
>    which all items in a subtree are "within" the root, i.e. their offset is
>    greater or equal to offset of the root and offset+size is smaller or equal
>    to offset+size of the root.  Children of an access are sorted by offset.
> */

*/ to previous line.

> struct access
> {
>   /* Values returned by `get_ref_base_and_extent' for each COMPONENT_REF
>      If EXPR isn't a COMPONENT_REF just set `BASE = EXPR', `OFFSET = 0',
>      `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */

s/COMPONENT_REF/component reference/g - it's not only COMPONENT_REF
trees we handle.

>   HOST_WIDE_INT offset;
>   HOST_WIDE_INT size;
>   tree base;
> 
>   /* Expression.  */
>   tree expr;
>   /* Type.  */
>   tree type;
> 
>   /* Next group representative for this aggregate. */
>   struct access *next_grp;
> 
>   /* Pointer to the group representative.  Pointer to itself if the struct is
>      the representative.  */
>   struct access *group_representative;
> 
>   /* If this access has any children (in terms of the definition above), this
>      points to the first one.  */
>   struct access *first_child;
> 
>   /* Pointer to the next sibling in the access tree as described above.  */
>   struct access *next_sibling;
> 
>   /* Pointers to the first and last element in the linked list of assign
>      links.  */
>   struct assign_link *first_link, *last_link;

vertical space missing.

>   /* Pointer to the next access in the work queue.  */
>   struct access *next_queued;
> 
>   /* Replacement variable for this access "region."  Never to be accessed
>      directly, always only by the means of get_access_replacement() and only
>      when grp_to_be_replaced flag is set.  */
>   tree replacement_decl;
> 
>   /* Is this particular access write access? */
>   unsigned write : 1;
> 
>   /* Is this access currently in the work queue?  */
>   unsigned grp_queued : 1;
>   /* Does this group contain a write access?  This flag is propagated down the
>      access tree.  */
>   unsigned grp_write : 1;
>   /* Does this group contain a read access?  This flag is propagated down the
>      access tree.  */
>   unsigned grp_read : 1;
>   /* Is the subtree rooted in this access fully covered by scalar
>      replacements?  */
>   unsigned grp_covered : 1;
>   /* If set to true, this access and all below it in an access tree must not be
>      scalarized.  */
>   unsigned grp_unscalarizable_region : 1;
>   /* Whether data have been written to parts of the aggregate covered by this
>      access which is not to be scalarized.  This flag is propagated up in the
>      access tree.  */
>   unsigned grp_unscalarized_data : 1;
>   /* Does this access and/or group contain a write access through a
>      BIT_FIELD_REF?  */
>   unsigned grp_bfr_lhs : 1;
> 
>   /* Set when a scalar replacement should be created for this variable.  We do
>      the decision and creation at different places because create_tmp_var
>      cannot be called from within FOR_EACH_REFERENCED_VAR. */
>   unsigned grp_to_be_replaced : 1;
> };
> 
> typedef struct access *access_p;
> 
> DEF_VEC_P (access_p);
> DEF_VEC_ALLOC_P (access_p, heap);
> 
> /* Alloc pool for allocating access structures.  */
> static alloc_pool access_pool;
> 
> /* A structure linking lhs and rhs accesses from an aggregate assignment.  They
>    are used to propagate subaccesses from rhs to lhs as long as they don't
>    conflict with what is already there.  */
> struct assign_link
> {
>   struct access *lacc, *racc;
>   struct assign_link *next;
> };
> 
> /* Alloc pool for allocating assign link structures.  */
> static alloc_pool link_pool;
> 
> /* Base (tree) -> Vector (VEC(access_p,heap) *) map.  */
> static struct pointer_map_t *base_access_vec;
> 
> /* Bitmap of bases (candidates).  */
> static bitmap candidate_bitmap;
> /* Bitmap of declarations used in a return statement.  */
> static bitmap retvals_bitmap;
> /* Obstack for creation of fancy names.  */
> static struct obstack name_obstack;
> 
> /* Head of a linked list of accesses that need to have its subaccesses
>    propagated to their assignment counterparts. */
> static struct access *work_queue_head;
> 
> /* Dump contents of ACCESS to file F in a human friendly way.  If GRP is true,
>    representative fields are dumped, otherwise those which only describe the
>    individual access are.  */
> 
> static void
> dump_access (FILE *f, struct access *access, bool grp)
> {
>   fprintf (f, "access { ");
>   fprintf (f, "base = (%d)'", DECL_UID (access->base));
>   print_generic_expr (f, access->base, 0);
>   fprintf (f, "', offset = %d", (int) access->offset);
>   fprintf (f, ", size = %d", (int) access->size);

you can use ", offset = "HOST_WIDE_INT_PRINT_DEC, access->offset here.

>   fprintf (f, ", expr = ");
>   print_generic_expr (f, access->expr, 0);
>   fprintf (f, ", type = ");
>   print_generic_expr (f, access->type, 0);
>   if (grp)
>     fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, "
> 	     "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, "
> 	     "grp_to_be_replaced = %d\n",
> 	     access->grp_write, access->grp_read, access->grp_covered,
> 	     access->grp_unscalarizable_region, access->grp_unscalarized_data,
> 	     access->grp_to_be_replaced);
>   else
>     fprintf (f, ", write = %d'\n", access->write);
> }
> 
> /* Dump a subtree rooted in ACCESS to file F, indent by LEVEL.  */
> 
> static void
> dump_access_tree_1 (FILE *f, struct access *access, int level)
> {
>   do
>     {
>       int i;
> 
>       for (i = 0; i < level; i++)
> 	fputs ("* ", dump_file);
> 
>       dump_access (f, access, true);
> 
>       if (access->first_child)
> 	dump_access_tree_1 (f, access->first_child, level + 1);
> 
>       access = access->next_sibling;
>     }
>   while (access);
> }
> 
> /* Dump all access trees for a variable, given the pointer to the first root in
>    ACCESS.  */
> 
> static void
> dump_access_tree (FILE *f, struct access *access)
> {
>   for (; access; access = access->next_grp)
>     dump_access_tree_1 (f, access, 0);
> }
> 
> /* Return a vector of pointers to accesses for the variable given in BASE or
>    NULL if there is none.  */
> 
> static VEC (access_p, heap) *
> get_base_access_vector (tree base)
> {
>   void **slot;
> 
>   slot = pointer_map_contains (base_access_vec, base);
>   if (!slot)
>     return NULL;
>   else
>     return *(VEC (access_p, heap) **) slot;
> }
> 
> /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted
>    in ACCESS.  Return NULL if it cannot be found.  */
> 
> static struct access *
> find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,
> 			HOST_WIDE_INT size)
> {
>   while (access && (access->offset != offset || access->size != size))
>     {
>       struct access *child = access->first_child;
> 
>       while (child && (child->offset + child->size <= offset))
> 	child = child->next_sibling;

Do you limit the number of siblings?  We should keep an eye on this
for potential compile-time issues (not that I expect some).

>       access = child;
>     }
> 
>   return access;
> }
> 
> /* Return the first group representative for DECL or NULL if none exists.  */
> 
> static struct access *
> get_first_repr_for_decl (tree base)
> {
>   VEC (access_p, heap) *access_vec;
> 
>   access_vec = get_base_access_vector (base);
>   if (!access_vec)
>     return NULL;
> 
>   return VEC_index (access_p, access_vec, 0);
> }
> 
> /* Find an access representative for the variable BASE and given OFFSET and
>    SIZE.  Requires that access trees have already been built.  Return NULL if
>    it cannot be found.  */
> 
> static struct access *
> get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset,
> 				 HOST_WIDE_INT size)
> {
>   struct access *access;
> 
>   access = get_first_repr_for_decl (base);
>   while (access && (access->offset + access->size <= offset))
>     access = access->next_grp;
>   if (!access)
>     return NULL;
> 
>   return find_access_in_subtree (access, offset, size);
> }
> 
> /* Add LINK to the linked list of assign links of RACC.  */
> static void
> add_link_to_rhs (struct access *racc, struct assign_link *link)
> {
>   gcc_assert (link->racc == racc);
> 
>   if (!racc->first_link)
>     {
>       gcc_assert (!racc->last_link);
>       racc->first_link = link;
>     }
>   else
>     racc->last_link->next = link;
> 
>   racc->last_link = link;
>   link->next = NULL;
> }
> 
> /* Move all link structures in their linked list in OLD_RACC to the linked list
>    in NEW_RACC.  */
> static void
> relink_to_new_repr (struct access *new_racc, struct access *old_racc)
> {
>   if (!old_racc->first_link)
>     {
>       gcc_assert (!old_racc->last_link);
>       return;
>     }
> 
>   if (new_racc->first_link)
>     {
>       gcc_assert (!new_racc->last_link->next);
>       gcc_assert (!old_racc->last_link || !old_racc->last_link->next);
> 
>       new_racc->last_link->next = old_racc->first_link;
>       new_racc->last_link = old_racc->last_link;
>     }
>   else
>     {
>       gcc_assert (!new_racc->last_link);
> 
>       new_racc->first_link = old_racc->first_link;
>       new_racc->last_link = old_racc->last_link;
>     }
>   old_racc->first_link = old_racc->last_link = NULL;
> }
> 
> /* Add ACCESS to the work queue (which is actually a stack).  */
> 
> static void
> add_access_to_work_queue (struct access *access)
> {
>   if (!access->grp_queued)
>     {
>       gcc_assert (!access->next_queued);
>       access->next_queued = work_queue_head;
>       access->grp_queued = 1;
>       work_queue_head = access;
>     }
> }
> 
> /* Pop an access from the work queue, and return it, assuming there is one.  */
> 
> static struct access *
> pop_access_from_work_queue (void)
> {
>   struct access *access = work_queue_head;
> 
>   work_queue_head = access->next_queued;
>   access->next_queued = NULL;
>   access->grp_queued = 0;
>   return access;
> }
> 
> 
> /* Allocate necessary structures.  */
> 
> static void
> sra_initialize (void)
> {
>   candidate_bitmap = BITMAP_ALLOC (NULL);
>   retvals_bitmap = BITMAP_ALLOC (NULL);
>   gcc_obstack_init (&name_obstack);
>   access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16);
>   link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16);
>   base_access_vec = pointer_map_create ();
> }
> 
> /* Hook fed to pointer_map_traverse, deallocate stored vectors.  */
> 
> static bool
> delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value,
> 		     void *data ATTRIBUTE_UNUSED)
> {
>   VEC (access_p, heap) *access_vec;
>   access_vec = (VEC (access_p, heap) *) *value;
>   VEC_free (access_p, heap, access_vec);
> 
>   return true;
> }
> 
> /* Deallocate all general structures.  */
> 
> static void
> sra_deinitialize (void)
> {
>   BITMAP_FREE (candidate_bitmap);
>   BITMAP_FREE (retvals_bitmap);
>   free_alloc_pool (access_pool);
>   free_alloc_pool (link_pool);
>   obstack_free (&name_obstack, NULL);
> 
>   pointer_map_traverse (base_access_vec, delete_base_accesses, NULL);
>   pointer_map_destroy (base_access_vec);
> }
> 
> /* Remove DECL from candidates for SRA and write REASON to the dump file if
>    there is one.  */
> static void
> disqualify_candidate (tree decl, const char *reason)
> {
>   bitmap_clear_bit (candidate_bitmap, DECL_UID (decl));
> 
>   if (dump_file)

&& (dump_flags & TDF_DETAILS)

>     {
>       fprintf (dump_file, "! Disqualifying ");
>       print_generic_expr (dump_file, decl, 0);
>       fprintf (dump_file, " - %s\n", reason);
>     }
> }
> 
> /* Return true iff the type contains a field or an element which does not allow
>    scalarization.  */
> 
> static bool
> type_internals_preclude_sra_p (tree type)
> {
>   tree fld;
>   tree et;
> 
>   switch (TREE_CODE (type))
>     {
>     case RECORD_TYPE:
>     case UNION_TYPE:
>     case QUAL_UNION_TYPE:
>       for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
> 	if (TREE_CODE (fld) == FIELD_DECL)
> 	  {
> 	    tree ft = TREE_TYPE (fld);
> 
> 	    if (TREE_THIS_VOLATILE (fld)
> 		|| !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld)
> 		|| !host_integerp (DECL_FIELD_OFFSET (fld), 1)
> 		|| !host_integerp (DECL_SIZE (fld), 1))
> 	      return true;
> 
> 	    if (AGGREGATE_TYPE_P (ft)
> 		&& type_internals_preclude_sra_p (ft))
> 	      return true;
> 	  }
> 
>       return false;
> 
>     case ARRAY_TYPE:
>       et = TREE_TYPE (type);
> 
>       if (AGGREGATE_TYPE_P (et))
> 	return type_internals_preclude_sra_p (et);
>       else
> 	return false;
> 
>     default:
>       return false;
>     }
> }
> 
> /* Create and insert access for EXPR. Return created access, or NULL if it is
>    not possible.  */
> 
> static struct access *
> create_access (tree expr, bool write)
> {
>   struct access *access;
>   void **slot;
>   VEC (access_p,heap) *vec;
>   HOST_WIDE_INT offset, size, max_size;
>   tree base = expr;
>   bool unscalarizable_region = false;
> 
>   if (handled_component_p (expr))
>     base = get_ref_base_and_extent (expr, &offset, &size, &max_size);
>   else
>     {
>       tree tree_size;
> 
>       tree_size = TYPE_SIZE (TREE_TYPE (base));
>       if (tree_size && host_integerp (tree_size, 1))
> 	size = max_size = tree_low_cst (tree_size, 1);
>       else
> 	size = max_size = -1;
> 
>       offset = 0;
>     }

get_ref_base_and_extent should now also work on plain DECLs
(non-handled_component_p) and base should never be NULL.

>   if (!base || !DECL_P (base)
>       || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>     return NULL;
> 
>   if (size != max_size)
>     {
>       size = max_size;
>       unscalarizable_region = true;
>     }
> 
>   if (size < 0)
>     {
>       disqualify_candidate (base, "Encountered an ultra variable sized "
> 			    "access.");

ultra variable sized?  I would name it 'unconstrained access'.  Note
that there is still useful information in this case if offset is non-zero
(namely accesses before [offset, -1] may still be scalarized).

Maybe something for further improvements.  This for example would
happen for structs with trailing arrays.

>       return NULL;
>     }
> 
>   access = (struct access *) pool_alloc (access_pool);
>   memset (access, 0, sizeof (struct access));
> 
>   access->base = base;
>   access->offset = offset;
>   access->size = size;
>   access->expr = expr;
>   access->type = TREE_TYPE (expr);
>   access->write = write;
>   access->grp_unscalarizable_region = unscalarizable_region;
> 
>   slot = pointer_map_contains (base_access_vec, base);
>   if (slot)
>     vec = (VEC (access_p, heap) *) *slot;
>   else
>     vec = VEC_alloc (access_p, heap, 32);
> 
>   VEC_safe_push (access_p, heap, vec, access);
> 
>   *((struct VEC (access_p,heap) **)
> 	pointer_map_insert (base_access_vec, base)) = vec;
> 
>   return access;
> }
> 
> 
> /* Callback of walk_tree.  Search the given tree for a declaration and exclude
>    it from the candidates.  */
> 
> static tree
> disqualify_all (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
> {
>   tree base = *tp;
> 
> 
>   if (TREE_CODE (base) == SSA_NAME)
>     base = SSA_NAME_VAR (base);

Err ... for SSA_NAME bases there is nothing to scalarize?  So just
bail out in that case?  In fact, using walk_tree for disqualify_all looks
a bit expensive (it also walks types).

>   if (DECL_P (base))
>     {
>       disqualify_candidate (base, "From within disqualify_all().");
>       *walk_subtrees = 0;
>     }
>   else
>     *walk_subtrees = 1;
> 
> 
>   return NULL_TREE;
> }
> 
> /* Scan expression EXPR and create access structures for all accesses to
>    candidates for scalarization.  Return the created access or NULL if none is
>    created.  */
> 
> static struct access *
> build_access_from_expr_1 (tree *expr_ptr,
> 			gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write)
> {
>   struct access *ret = NULL;
>   tree expr = *expr_ptr;
>   tree safe_expr = expr;
>   bool bit_ref;
> 
>   if (TREE_CODE (expr) == BIT_FIELD_REF)
>     {
>       expr = TREE_OPERAND (expr, 0);
>       bit_ref = true;
>     }
>   else
>     bit_ref = false;
> 
>   while (TREE_CODE (expr) == NOP_EXPR

CONVERT_EXPR_P (expr)

> 	 || TREE_CODE (expr) == VIEW_CONVERT_EXPR
> 	 || TREE_CODE (expr) == REALPART_EXPR
> 	 || TREE_CODE (expr) == IMAGPART_EXPR)
>     expr = TREE_OPERAND (expr, 0);

Why do this here btw, and not just lump ...

>   switch (TREE_CODE (expr))
>     {
>     case ADDR_EXPR:
>     case SSA_NAME:
>     case INDIRECT_REF:
>       break;
> 
>     case VAR_DECL:
>     case PARM_DECL:
>     case RESULT_DECL:
>     case COMPONENT_REF:
>     case ARRAY_REF:
>       ret = create_access (expr, write);
>       break;

... this ...

>     case REALPART_EXPR:
>     case IMAGPART_EXPR:
>       expr = TREE_OPERAND (expr, 0);
>       ret = create_access (expr, write);

... and this together?  Won't you create bogus accesses if you
strip for example IMAGPART_EXPR (which has non-zero offset)?

>       break;
> 
>     case ARRAY_RANGE_REF:

it should just be handled fine I think.

>     default:
>       walk_tree (&safe_expr, disqualify_all, NULL, NULL);

and if not, this should just disqualify the base of the access, like
get_base_address (safe_expr) (save_expr you mean?) and then if that
is a DECL, disqualify that decl.

>       break;
>     }
> 
>   if (write && bit_ref && ret)
>     ret->grp_bfr_lhs = 1;
> 
>   return ret;
> }
> 
> /* Scan expression EXPR and create access structures for all accesses to
>    candidates for scalarization.  Return true if any access has been
>    inserted.  */
> 
> static bool
> build_access_from_expr (tree *expr_ptr,
> 			gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write,
> 			void *data ATTRIBUTE_UNUSED)
> {
>   return build_access_from_expr_1 (expr_ptr, gsi, write) != NULL;
> }
> 
> /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in
>    modes in which it matters, return true iff they have been disqualified.  RHS
>    may be NULL, in that case ignore it.  If we scalarize an aggregate in
>    intra-SRA we may need to add statements after each statement.  This is not
>    possible if a statement unconditionally has to end the basic block.  */
> static bool
> disqualify_ops_if_throwing_stmt (gimple stmt, tree *lhs, tree *rhs)
> {
>   if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt))
>     {
>       walk_tree (lhs, disqualify_all, NULL, NULL);
>       if (rhs)
> 	walk_tree (rhs, disqualify_all, NULL, NULL);
>       return true;
>     }
>   return false;
> }
> 
> 
> /* Result code for scan_assign callback for scan_function.  */
> enum scan_assign_result {SRA_SA_NONE,       /* nothing done for the stmt */
> 			 SRA_SA_PROCESSED,  /* stmt analyzed/changed */
> 			 SRA_SA_REMOVED};   /* stmt redundant and eliminated */

space after { and before }.

> 
> 
> /* Scan expressions occuring in the statement pointed to by STMT_EXPR, create
>    access structures for all accesses to candidates for scalarization and
>    remove those candidates which occur in statements or expressions that
>    prevent them from being split apart.  Return true if any access has been
>    inserted.  */
> 
> static enum scan_assign_result
> build_accesses_from_assign (gimple *stmt_ptr,
> 			    gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED,
> 			    void *data ATTRIBUTE_UNUSED)
> {
>   gimple stmt = *stmt_ptr;
>   tree *lhs_ptr, *rhs_ptr;
>   struct access *lacc, *racc;
> 
>   if (gimple_assign_rhs2 (stmt))

!gimple_assign_single_p (stmt)

>     return SRA_SA_NONE;
>
>   lhs_ptr = gimple_assign_lhs_ptr (stmt);
>   rhs_ptr = gimple_assign_rhs1_ptr (stmt);

you probably don't need to pass pointers to trees everywhere as you
are not changing them.

>   if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr))
>     return SRA_SA_NONE;
> 
>   racc = build_access_from_expr_1 (rhs_ptr, gsi, false);
>   lacc = build_access_from_expr_1 (lhs_ptr, gsi, true);

just avoid calling into build_access_from_expr_1 for SSA_NAMEs
or is_gimple_min_invariant lhs/rhs, that should make that
function more regular.

>   if (lacc && racc
>       && !lacc->grp_unscalarizable_region
>       && !racc->grp_unscalarizable_region
>       && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr))
>       && lacc->size <= racc->size
>       && useless_type_conversion_p (lacc->type, racc->type))

useless_type_conversion_p should be always true here.

>     {
>       struct assign_link *link;
> 
>       link = (struct assign_link *) pool_alloc (link_pool);
>       memset (link, 0, sizeof (struct assign_link));
> 
>       link->lacc = lacc;
>       link->racc = racc;
> 
>       add_link_to_rhs (racc, link);
>     }
> 
>   return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE;
> }
> 
> /* Scan function and look for interesting statements. Return true if any has
>    been found or processed, as indicated by callbacks.  SCAN_EXPR is a callback
>    called on all expressions within statements except assign statements and
>    those deemed entirely unsuitable for some reason (all operands in such
>    statements and expression are removed from candidate_bitmap).  SCAN_ASSIGN
>    is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback
>    called on assign statements and those call statements which have a lhs and
>    it is the only callback which can be NULL. ANALYSIS_STAGE is true when
>    running in the analysis stage of a pass and thus no statement is being
>    modified.  DATA is a pointer passed to all callbacks.  If any single
>    callback returns true, this function also returns true, otherwise it returns
>    false.  */
> 
> static bool
> scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *),
> 	       enum scan_assign_result (*scan_assign) (gimple *,
> 						       gimple_stmt_iterator *,
> 						       void *),
> 	       bool (*handle_ssa_defs)(gimple, void *),
> 	       bool analysis_stage, void *data)
> {
>   gimple_stmt_iterator gsi;
>   basic_block bb;
>   unsigned i;
>   tree *t;
>   bool ret = false;
> 
>   FOR_EACH_BB (bb)
>     {
>       bool bb_changed = false;
> 
>       gsi = gsi_start_bb (bb);
>       while (!gsi_end_p (gsi))
> 	{
> 	  gimple stmt = gsi_stmt (gsi);
> 	  enum scan_assign_result assign_result;
> 	  bool any = false, deleted = false;
> 
> 	  switch (gimple_code (stmt))
> 	    {
> 	    case GIMPLE_RETURN:
> 	      t = gimple_return_retval_ptr (stmt);
> 	      if (*t != NULL_TREE)
> 		{
> 		  if (DECL_P (*t))
> 		    {
> 		      tree ret_type = TREE_TYPE (*t);
> 		      if (sra_mode == SRA_MODE_EARLY_INTRA
> 			  && (TREE_CODE (ret_type) == UNION_TYPE
> 			      || TREE_CODE (ret_type) == QUAL_UNION_TYPE))
> 			disqualify_candidate (*t,
> 					      "Union in a return statement.");
> 		      else
> 			bitmap_set_bit (retvals_bitmap, DECL_UID (*t));
> 		    }
> 		  any |= scan_expr (t, &gsi, false, data);
> 		}

Likewise for passing pointers (why is gsi necessary and passing a stmt
does not work?)

> 	      break;
> 
> 	    case GIMPLE_ASSIGN:
> 	      assign_result = scan_assign (&stmt, &gsi, data);
> 	      any |= assign_result == SRA_SA_PROCESSED;
> 	      deleted = assign_result == SRA_SA_REMOVED;
> 	      if (handle_ssa_defs && assign_result != SRA_SA_REMOVED)
> 		any |= handle_ssa_defs (stmt, data);
> 	      break;
> 
> 	    case GIMPLE_CALL:
> 	      /* Operands must be processed before the lhs.  */
> 	      for (i = 0; i < gimple_call_num_args (stmt); i++)
> 		{
> 		  tree *argp = gimple_call_arg_ptr (stmt, i);
> 		  any |= scan_expr (argp, &gsi, false, data);
> 		}
> 
> 	      if (gimple_call_lhs (stmt))
> 		{
> 		  tree *lhs_ptr = gimple_call_lhs_ptr (stmt);
> 		  if (!analysis_stage ||
> 		      !disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, NULL))
> 		    {
> 		      any |= scan_expr (lhs_ptr, &gsi, true, data);
> 		      if (handle_ssa_defs)
> 			any |= handle_ssa_defs (stmt, data);
> 		    }
> 		}
> 	      break;
> 
> 	    case GIMPLE_ASM:
> 	      for (i = 0; i < gimple_asm_ninputs (stmt); i++)
> 		{
> 		  tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i));
> 		  any |= scan_expr (op, &gsi, false, data);
> 		}
> 	      for (i = 0; i < gimple_asm_noutputs (stmt); i++)
> 		{
> 		  tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i));
> 		  any |= scan_expr (op, &gsi, true, data);
> 		}

asm operands with memory constraints should be disqualified from SRA
(see walk_stmt_load_store_addr_ops and/or the operand scanner).

> 	    default:
> 	      if (analysis_stage)
> 		walk_gimple_op (stmt, disqualify_all, NULL);

You seem to be very eager to disqualify anything unknown ;)  (But I
yet have to come across a TREE_ADDRESSABLE check ...)

> 	      break;
> 	    }
> 
> 	  if (any)
> 	    {
> 	      ret = true;
> 	      bb_changed = true;
> 
> 	      if (!analysis_stage)

Oh.  So we reuse this function.  Hmm.

> 		{
> 		  update_stmt (stmt);
> 		  if (!stmt_could_throw_p (stmt))
> 		    remove_stmt_from_eh_region (stmt);

Usually

  if (maybe_clean_or_replace_eh_stmt (stmt, stmt)
      && gimple_purge_dead_eh_edges (gimple_bb (stmt)))

is the pattern for this.  But then you disqualified all throwing
expressions, no?

> 		}
> 	    }
> 	  if (deleted)
> 	    bb_changed = true;
> 	  else
> 	    {
> 	      gsi_next (&gsi);
> 	      ret = true;
> 	    }
> 	}
>       if (!analysis_stage && bb_changed)
> 	gimple_purge_dead_eh_edges (bb);
>     }
> 
>   return ret;
> }
> 
> /* Helper of QSORT function. There are pointers to accesses in the array.  An
>    access is considered smaller than another if it has smaller offset or if the
>    offsets are the same but is size is bigger. */
> 
> static int
> compare_access_positions (const void *a, const void *b)
> {
>   const access_p *fp1 = (const access_p *) a;
>   const access_p *fp2 = (const access_p *) b;
>   const access_p f1 = *fp1;
>   const access_p f2 = *fp2;
> 
>   if (f1->offset != f2->offset)
>     return f1->offset < f2->offset ? -1 : 1;
> 
>   if (f1->size == f2->size)
>     return 0;
>   /* We want the bigger accesses first, thus the opposite operator in the next
>      line: */
>   return f1->size > f2->size ? -1 : 1;
> }
> 
> 
> /* Append a name of the declaration to the name obstack.  A helper function for
>    make_fancy_name.  */
> 
> static void
> make_fancy_decl_name (tree decl)
> {
>   char buffer[32];
> 
>   tree name = DECL_NAME (decl);
>   if (name)
>     obstack_grow (&name_obstack, IDENTIFIER_POINTER (name),
> 		  IDENTIFIER_LENGTH (name));
>   else
>     {
>       sprintf (buffer, "D%u", DECL_UID (decl));

That would just be useless information.  I guess you copied this
from old SRA?

>       obstack_grow (&name_obstack, buffer, strlen (buffer));
>     }
> }
> 
> /* Helper for make_fancy_name.  */
> 
> static void
> make_fancy_name_1 (tree expr)
> {
>   char buffer[32];
>   tree index;
> 
>   if (DECL_P (expr))
>     {
>       make_fancy_decl_name (expr);
>       return;
>     }
> 
>   switch (TREE_CODE (expr))
>     {
>     case COMPONENT_REF:
>       make_fancy_name_1 (TREE_OPERAND (expr, 0));
>       obstack_1grow (&name_obstack, '$');
>       make_fancy_decl_name (TREE_OPERAND (expr, 1));
>       break;
> 
>     case ARRAY_REF:
>       make_fancy_name_1 (TREE_OPERAND (expr, 0));
>       obstack_1grow (&name_obstack, '$');
>       /* Arrays with only one element may not have a constant as their
> 	 index. */
>       index = TREE_OPERAND (expr, 1);
>       if (TREE_CODE (index) != INTEGER_CST)
> 	break;
>       sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index));
>       obstack_grow (&name_obstack, buffer, strlen (buffer));
> 
>       break;
> 
>     case BIT_FIELD_REF:
>     case REALPART_EXPR:
>     case IMAGPART_EXPR:
>       gcc_unreachable (); 	/* we treat these as scalars.  */
>       break;
>     default:
>       break;
>     }
> }
> 
> /* Create a human readable name for replacement variable of ACCESS.  */
> 
> static char *
> make_fancy_name (tree expr)
> {
>   make_fancy_name_1 (expr);
>   obstack_1grow (&name_obstack, '\0');
>   return XOBFINISH (&name_obstack, char *);
> }

As all new scalars are DECL_ARTIFICIAL anyway why bother to create
a fancy name? ....

> /* Helper function for build_ref_for_offset.  */
> 
> static bool
> build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset,
> 			tree exp_type)
> {
>   while (1)
>     {
>       tree fld;
>       tree tr_size, index;
>       HOST_WIDE_INT el_size;
> 
>       if (offset == 0 && exp_type
> 	  && useless_type_conversion_p (exp_type, type))
> 	return true;
> 
>       switch (TREE_CODE (type))
> 	{
> 	case UNION_TYPE:
> 	case QUAL_UNION_TYPE:
> 	case RECORD_TYPE:
> 	  /* Some ADA records are half-unions, treat all of them the same.  */
> 	  for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
> 	    {
> 	      HOST_WIDE_INT pos, size;
> 	      tree expr, *expr_ptr;
> 
> 	      if (TREE_CODE (fld) != FIELD_DECL)
> 		continue;
> 
> 	      pos = int_bit_position (fld);
> 	      gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0);
> 	      size = tree_low_cst (DECL_SIZE (fld), 1);
> 	      if (pos > offset || (pos + size) <= offset)
> 		continue;
> 
> 	      if (res)
> 		{
> 		  expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld,
> 				 NULL_TREE);
> 		  expr_ptr = &expr;
> 		}
> 	      else
> 		expr_ptr = NULL;
> 	      if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld),
> 					  offset - pos, exp_type))
> 		{
> 		  if (res)
> 		    *res = expr;
> 		  return true;
> 		}
> 	    }
> 	  return false;
> 
> 	case ARRAY_TYPE:
> 	  tr_size = TYPE_SIZE (TREE_TYPE (type));
> 	  if (!tr_size || !host_integerp (tr_size, 1))
> 	    return false;
> 	  el_size = tree_low_cst (tr_size, 1);
> 
> 	  index = build_int_cst (TYPE_DOMAIN (type), offset / el_size);
> 	  if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type))))
> 	    index = int_const_binop (PLUS_EXPR, index,
> 				     TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0);
> 	  if (res)
> 	    *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE,
> 			   NULL_TREE);
> 	  offset = offset % el_size;
> 	  type = TREE_TYPE (type);
> 	  break;
> 
> 	default:
> 	  if (offset != 0)
> 	    return false;
> 
> 	  if (exp_type)
> 	    return false;
> 	  else
> 	    return true;
> 	}
>     }
> }
> 
> /* Construct an expression that would reference a part of aggregate *EXPR of
>    type TYPE at the given OFFSET of the type EXP_TYPE.  If EXPR is NULL, the
>    function only determines whether it can build such a reference without
>    actually doing it.
> 
>    FIXME: Eventually this should be replaced with
>    maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a
>    minor rewrite of fold_stmt.
>  */
> 
> static bool
> build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset,
> 		      tree exp_type, bool allow_ptr)
> {
>   if (allow_ptr && POINTER_TYPE_P (type))
>     {
>       type = TREE_TYPE (type);
>       if (expr)
> 	*expr = fold_build1 (INDIRECT_REF, type, *expr);
>     }
> 
>   return build_ref_for_offset_1 (expr, type, offset, exp_type);
> }
> 
> /* The very first phase of intraprocedural SRA.  It marks in candidate_bitmap
>    those with type which is suitable for scalarization.  */
> 
> static bool
> find_var_candidates (void)
> {
>   tree var, type;
>   referenced_var_iterator rvi;
>   bool ret = false;
> 
>   FOR_EACH_REFERENCED_VAR (var, rvi)
>     {
>       if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL)
>         continue;
>       type = TREE_TYPE (var);
> 
>       if (!AGGREGATE_TYPE_P (type)
> 	  || needs_to_live_in_memory (var)

Ok, here's the TREE_ADDRESSABLE check.  I'm finally convinced that
disqualify_all should go ;)

> 	  || TREE_THIS_VOLATILE (var)
> 	  || !COMPLETE_TYPE_P (type)
> 	  || !host_integerp (TYPE_SIZE (type), 1)
>           || tree_low_cst (TYPE_SIZE (type), 1) == 0
> 	  || type_internals_preclude_sra_p (type))
> 	continue;
> 
>       bitmap_set_bit (candidate_bitmap, DECL_UID (var));
> 
>       if (dump_file)

&& (dump_flags & TDF_DETAILS)

> 	{
> 	  fprintf (dump_file, "Candidate (%d): ", DECL_UID (var));
> 	  print_generic_expr (dump_file, var, 0);
> 	  fprintf (dump_file, "\n");
> 	}
>       ret = true;
>     }
> 
>   return ret;
> }
> 
> /* Return true if TYPE should be considered a scalar type by SRA.  */
> 
> static bool
> is_sra_scalar_type (tree type)
> {
>   enum tree_code code = TREE_CODE (type);
>   return (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type)
> 	  || FIXED_POINT_TYPE_P (type) || POINTER_TYPE_P (type)
> 	  || code == VECTOR_TYPE || code == COMPLEX_TYPE
> 	  || code == OFFSET_TYPE);
> }

Why is this anything different from is_gimple_reg_type ()?

> /* Sort all accesses for the given variable, check for partial overlaps and
>    return NULL if there are any.  If there are none, pick a representative for
>    each combination of offset and size and create a linked list out of them.
>    Return the pointer to the first representative and make sure it is the first
>    one in the vector of accesses.  */
> 
> static struct access *
> sort_and_splice_var_accesses (tree var)
> {
>   int i, j, access_count;
>   struct access *res, **prev_acc_ptr = &res;
>   VEC (access_p, heap) *access_vec;
>   bool first = true;
>   HOST_WIDE_INT low = -1, high = 0;
> 
>   access_vec = get_base_access_vector (var);
>   if (!access_vec)
>     return NULL;
>   access_count = VEC_length (access_p, access_vec);
> 
>   /* Sort by <OFFSET, SIZE>.  */
>   qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p),
> 	 compare_access_positions);
> 
>   i = 0;
>   while (i < access_count)
>     {
>       struct access *access = VEC_index (access_p, access_vec, i);
>       bool modification = access->write;
>       bool grp_read = !access->write;
>       bool grp_bfr_lhs = access->grp_bfr_lhs;
>       bool first_scalar = is_sra_scalar_type (access->type);
>       bool unscalarizable_region = access->grp_unscalarizable_region;
> 
>       if (first || access->offset >= high)
> 	{
> 	  first = false;
> 	  low = access->offset;
> 	  high = access->offset + access->size;
> 	}
>       else if (access->offset > low && access->offset + access->size > high)
> 	return NULL;
>       else
> 	gcc_assert (access->offset >= low
> 		    && access->offset + access->size <= high);
> 
>       j = i + 1;
>       while (j < access_count)
> 	{
> 	  struct access *ac2 = VEC_index (access_p, access_vec, j);
> 	  if (ac2->offset != access->offset || ac2->size != access->size)
> 	    break;
> 	  modification |= ac2->write;
> 	  grp_read |= !ac2->write;
> 	  grp_bfr_lhs |= ac2->grp_bfr_lhs;
> 	  unscalarizable_region |= ac2->grp_unscalarizable_region;
> 	  relink_to_new_repr (access, ac2);
> 
> 	  /* If one of the equivalent accesses is scalar, use it as a
> 	     representative (this happens when when there is for example on a
> 	     single scalar field in a structure).  */
> 	  if (!first_scalar && is_sra_scalar_type (ac2->type))
> 	    {
> 	      struct access tmp_acc;
> 	      first_scalar = true;
> 
> 	      memcpy (&tmp_acc, ac2, sizeof (struct access));
> 	      memcpy (ac2, access,  sizeof (struct access));
> 	      memcpy (access, &tmp_acc, sizeof (struct access));
> 	    }
> 	  ac2->group_representative = access;
> 	  j++;
> 	}
> 
>       i = j;
> 
>       access->group_representative = access;
>       access->grp_write = modification;
>       access->grp_read = grp_read;
>       access->grp_bfr_lhs = grp_bfr_lhs;
>       access->grp_unscalarizable_region = unscalarizable_region;
>       if (access->first_link)
> 	add_access_to_work_queue (access);
> 
>       *prev_acc_ptr = access;
>       prev_acc_ptr = &access->next_grp;
>     }
> 
>   gcc_assert (res == VEC_index (access_p, access_vec, 0));
>   return res;
> }
> 
> /* Create a variable for the given ACCESS which determines the type, name and a
>    few other properties.  Return the variable declaration and store it also to
>    ACCESS->replacement.  */
> 
> static tree
> create_access_replacement (struct access *access)
> {
>   tree repl;
> 
>   repl = make_rename_temp (access->type, "SR");
>   get_var_ann (repl);
>   add_referenced_var (repl);
> 
>   DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base);
>   DECL_ARTIFICIAL (repl) = 1;
> 
>   if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base))

at least && !DECL_ARTIFICIAL (access->base) I think.

>     {
>       char *pretty_name = make_fancy_name (access->expr);
> 
>       DECL_NAME (repl) = get_identifier (pretty_name);
>       obstack_free (&name_obstack, pretty_name);
> 
>       SET_DECL_DEBUG_EXPR (repl, access->expr);
>       DECL_DEBUG_EXPR_IS_FROM (repl) = 1;
>       DECL_IGNORED_P (repl) = 0;
>       TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base);
>     }
>   else
>     {
>       DECL_IGNORED_P (repl) = 1;
>       TREE_NO_WARNING (repl) = 1;
>     }

So just copy DECL_IGNORED_P and TREE_NO_WARNING from access->base
unconditionally?

>   if (access->grp_bfr_lhs)
>     DECL_GIMPLE_REG_P (repl) = 0;

But you never set it (see update_address_taken for more cases,
most notably VIEW_CONVERT_EXPR on the lhs which need to be taken
care of).  You should set it for COMPLEX_TYPE and VECTOR_TYPE 
replacements.

>   if (dump_file)
>     {
>       fprintf (dump_file, "Created a replacement for ");
>       print_generic_expr (dump_file, access->base, 0);
>       fprintf (dump_file, " offset: %u, size: %u: ",
> 	       (unsigned) access->offset, (unsigned) access->size);
>       print_generic_expr (dump_file, repl, 0);
>       fprintf (dump_file, "\n");
>     }
> 
>   return repl;
> }
> 
> /* Return ACCESS scalar replacement, create it if it does not exist yet.  */
> 
> static inline tree
> get_access_replacement (struct access *access)
> {
>   gcc_assert (access->grp_to_be_replaced);
> 
>   if (access->replacement_decl)
>     return access->replacement_decl;
> 
>   access->replacement_decl = create_access_replacement (access);
>   return access->replacement_decl;
> }
> 
> /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the
>    linked list along the way.  Stop when *ACCESS is NULL or the access pointed
>    to it is not "within" the root.  */
> 
> static void
> build_access_subtree (struct access **access)
> {
>   struct access *root = *access, *last_child = NULL;
>   HOST_WIDE_INT limit = root->offset + root->size;
> 
>   *access = (*access)->next_grp;
>   while  (*access && (*access)->offset + (*access)->size <= limit)
>     {
>       if (!last_child)
> 	root->first_child = *access;
>       else
> 	last_child->next_sibling = *access;
>       last_child = *access;
> 
>       build_access_subtree (access);
>     }
> }
> 
> /* Build a tree of access representatives, ACCESS is the pointer to the first
>    one, others are linked in a list by the next_grp field.  Decide about scalar
>    replacements on the way, return true iff any are to be created.  */
> 
> static void
> build_access_trees (struct access *access)
> {
>   while (access)
>     {
>       struct access *root = access;
> 
>       build_access_subtree (&access);
>       root->next_grp = access;
>     }
> }
> 
> /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when
>    both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  Also set
>    all sorts of access flags appropriately along the way, notably always ser
>    grp_read when MARK_READ is true and grp_write when MARK_WRITE is true.  */
> 
> static bool
> analyze_access_subtree (struct access *root, bool allow_replacements,
> 			bool mark_read, bool mark_write)
> {
>   struct access *child;
>   HOST_WIDE_INT limit = root->offset + root->size;
>   HOST_WIDE_INT covered_to = root->offset;
>   bool scalar = is_sra_scalar_type (root->type);
>   bool hole = false, sth_created = false;
> 
>   if (mark_read)
>     root->grp_read = true;
>   else if (root->grp_read)
>     mark_read = true;
> 
>   if (mark_write)
>     root->grp_write = true;
>   else if (root->grp_write)
>     mark_write = true;
> 
>   if (root->grp_unscalarizable_region)
>     allow_replacements = false;
> 
>   for (child = root->first_child; child; child = child->next_sibling)
>     {
>       if (!hole && child->offset < covered_to)
> 	hole = true;
>       else
> 	covered_to += child->size;
> 
>       sth_created |= analyze_access_subtree (child,
> 					     allow_replacements && !scalar,
> 					     mark_read, mark_write);
> 
>       root->grp_unscalarized_data |= child->grp_unscalarized_data;
>       hole |= !child->grp_covered;
>     }
> 
>   if (allow_replacements && scalar && !root->first_child)
>     {
>       if (dump_file)

&& (dump_flags & TDF_DETAILS)

> 	{
> 	  fprintf (dump_file, "Marking ");
> 	  print_generic_expr (dump_file, root->base, 0);
> 	  fprintf (dump_file, " offset: %u, size: %u: ",
> 		   (unsigned) root->offset, (unsigned) root->size);
> 	  fprintf (dump_file, " to be replaced.\n");
> 	}
> 
>       root->grp_to_be_replaced = 1;
>       sth_created = true;
>       hole = false;
>     }
>   else if (covered_to < limit)
>     hole = true;
> 
>   if (sth_created && !hole)
>     {
>       root->grp_covered = 1;
>       return true;
>     }
>   if (root->grp_write || TREE_CODE (root->base) == PARM_DECL)
>     root->grp_unscalarized_data = 1; /* not covered and written to */
>   if (sth_created)
>     return true;
>   return false;
> }
> 
> /* Analyze all access trees linked by next_grp by the means of
>    analyze_access_subtree.  */
> static bool
> analyze_access_trees (struct access *access)
> {
>   bool ret = false;
> 
>   while (access)
>     {
>       if (analyze_access_subtree (access, true, false, false))
> 	ret = true;
>       access = access->next_grp;
>     }
> 
>   return ret;
> }
> 
> /* Return true iff a potential new child of LACC at offset OFFSET and with size
>    SIZE would conflict with an already existing one.  If exactly such a child
>    already exists in LACC, store a pointer to it in EXACT_MATCH.  */
> 
> static bool
> child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset,
> 			      HOST_WIDE_INT size, struct access **exact_match)
> {
>   struct access *child;
> 
>   for (child = lacc->first_child; child; child = child->next_sibling)
>     {
>       if (child->offset == norm_offset && child->size == size)
> 	{
> 	  *exact_match = child;
> 	  return true;
> 	}
> 
>       if (child->offset < norm_offset + size
> 	  && child->offset + child->size > norm_offset)
> 	return true;
>     }
> 
>   return false;
> }
> 
> /* Set the expr of TARGET to one just like MODEL but with is own base at the
>    bottom of the handled components.  */
> 
> static void
> duplicate_expr_for_different_base (struct access *target,
> 				   struct access *model)
> {
>   tree t, expr = unshare_expr (model->expr);
> 
>   gcc_assert (handled_component_p (expr));
>   t = expr;
>   while (handled_component_p (TREE_OPERAND (t, 0)))
>     t = TREE_OPERAND (t, 0);
>   gcc_assert (TREE_OPERAND (t, 0) == model->base);
>   TREE_OPERAND (t, 0) = target->base;
> 
>   target->expr = expr;
> }
> 
> 
> /* Create a new child access of PARENT, with all properties just like MODEL
>    except for its offset and with its grp_write false and grp_read true.
>    Return the new access. Note that this access is created long after all
>    splicing and sorting, it's not located in any access vector and is
>    automatically a representative of its group.  */
> 
> static struct access *
> create_artificial_child_access (struct access *parent, struct access *model,
> 				HOST_WIDE_INT new_offset)
> {
>   struct access *access;
>   struct access **child;
> 
>   gcc_assert (!model->grp_unscalarizable_region);
> 
>   access = (struct access *) pool_alloc (access_pool);
>   memset (access, 0, sizeof (struct access));
>   access->base = parent->base;
>   access->offset = new_offset;
>   access->size = model->size;
>   duplicate_expr_for_different_base (access, model);
>   access->type = model->type;
>   access->grp_write = true;
>   access->grp_read = false;
> 
>   child = &parent->first_child;
>   while (*child && (*child)->offset < new_offset)
>     child = &(*child)->next_sibling;
> 
>   access->next_sibling = *child;
>   *child = access;
> 
>   return access;
> }
> 
> 
> /* Propagate all subaccesses of RACC across an assignment link to LACC. Return
>    true if any new subaccess was created.  Additionally, if RACC is a scalar
>    access but LACC is not, change the type of the latter.  */
> 
> static bool
> propagate_subacesses_accross_link (struct access *lacc, struct access *racc)
> {
>   struct access *rchild;
>   HOST_WIDE_INT norm_delta = lacc->offset - racc->offset;
>   bool ret = false;
> 
>   if (is_sra_scalar_type (lacc->type)
>       || lacc->grp_unscalarizable_region
>       || racc->grp_unscalarizable_region)
>     return false;
> 
>   if (!lacc->first_child && !racc->first_child
>       && is_sra_scalar_type (racc->type)
>       && (sra_mode == SRA_MODE_INTRA
>           || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base))))
>     {
>       duplicate_expr_for_different_base (lacc, racc);
>       lacc->type = racc->type;
>       return false;
>     }
> 
>   gcc_assert (lacc->size <= racc->size);
> 
>   for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling)
>     {
>       struct access *new_acc = NULL;
>       HOST_WIDE_INT norm_offset = rchild->offset + norm_delta;
> 
>       if (rchild->grp_unscalarizable_region)
> 	continue;
> 
>       if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size,
> 					&new_acc))
> 	{
> 	  if (new_acc && rchild->first_child)
> 	    ret |= propagate_subacesses_accross_link (new_acc, rchild);
> 	  continue;
> 	}
> 
>       new_acc = create_artificial_child_access (lacc, rchild, norm_offset);
>       if (racc->first_child)
> 	propagate_subacesses_accross_link (new_acc, rchild);
> 
>       ret = true;
>     }
> 
>   return ret;
> }
> 
> /* Propagate all subaccesses across assignment links.  */
> 
> static void
> propagate_all_subaccesses (void)
> {
>   while (work_queue_head)
>     {
>       struct access *racc = pop_access_from_work_queue ();
>       struct assign_link *link;
> 
>       gcc_assert (racc->first_link);
> 
>       for (link = racc->first_link; link; link = link->next)
> 	{
> 	  struct access *lacc = link->lacc;
> 
> 	  if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base)))
> 	    continue;
> 	  lacc = lacc->group_representative;
> 	  if (propagate_subacesses_accross_link (lacc, racc)
> 	      && lacc->first_link)
> 	    add_access_to_work_queue (lacc);
> 	}
>     }
> }
> 
> /* Go through all accesses collected throughout the (intraprocedural) analysis
>    stage, exclude overlapping ones, identify representatives and build trees
>    out of them, making decisions about scalarization on the way.  Return true
>    iff there are any to-be-scalarized variables after this stage. */
> 
> static bool
> analyze_all_variable_accesses (void)
> {
>   tree var;
>   referenced_var_iterator rvi;
>   bool res = false;
> 
>   FOR_EACH_REFERENCED_VAR (var, rvi)
>     if (bitmap_bit_p (candidate_bitmap, DECL_UID (var)))
>       {
> 	struct access *access;
> 
> 	access = sort_and_splice_var_accesses (var);
> 	if (access)
> 	  build_access_trees (access);
> 	else
> 	  disqualify_candidate (var,
> 				"No or inhibitingly overlapping accesses.");
>       }
> 
>   propagate_all_subaccesses ();
> 
>   FOR_EACH_REFERENCED_VAR (var, rvi)
>     if (bitmap_bit_p (candidate_bitmap, DECL_UID (var)))
>       {
> 	struct access *access = get_first_repr_for_decl (var);
> 
> 	if (analyze_access_trees (access))
> 	  {
> 	    res = true;
> 	    if (dump_file)

&& (dump_flags & TDF_DETAILS)

> 	      {
> 		fprintf (dump_file, "\nAccess trees for ");
> 		print_generic_expr (dump_file, var, 0);
> 		fprintf (dump_file, " (UID: %u): \n", DECL_UID (var));
> 		dump_access_tree (dump_file, access);
> 		fprintf (dump_file, "\n");
> 	      }
> 	  }
> 	else
> 	  disqualify_candidate (var, "No scalar replacements to be created.");
>       }
> 
>   return res;
> }
> 
> /* Return true iff a reference statement into aggregate AGG can be built for
>    every single to-be-replaced accesses that is a child of ACCESS, its sibling
>    or a child of its sibling. TOP_OFFSET is the offset from the processed
>    access subtree that has to be subtracted from offset of each access.  */
> 
> static bool
> ref_expr_for_all_replacements_p (struct access *access, tree agg,
> 				 HOST_WIDE_INT top_offset)
> {
>   do
>     {
>       if (access->grp_to_be_replaced
> 	  && !build_ref_for_offset (NULL, TREE_TYPE (agg),
> 				    access->offset - top_offset,
> 				    access->type, false))
> 	return false;
> 
>       if (access->first_child
> 	  && !ref_expr_for_all_replacements_p (access->first_child, agg,
> 					       top_offset))
> 	return false;
> 
>       access = access->next_sibling;
>     }
>   while (access);
> 
>   return true;
> }
> 
> 
> /* Generate statements copying scalar replacements of accesses within a subtree
>    into or out of AGG.  ACCESS is the first child of the root of the subtree to
>    be processed.  AGG is an aggregate type expression (can be a declaration but
>    does not have to be, it can for example also be an indirect_ref).
>    TOP_OFFSET is the offset of the processed subtree which has to be subtracted
>    from offsets of individual accesses to get corresponding offsets for AGG.
>    If CHUNK_SIZE is non-null, copy only replacements in the interval
>    <start_offset, start_offset + chunk_size>, otherwise copy all.  GSI is a
>    statement iterator used to place the new statements.  WRITE should be true
>    when the statements should write from AGG to the replacement and false if
>    vice versa.  if INSERT_AFTER is true, new statements will be added after the
>    current statement in GSI, they will be added before the statement
>    otherwise.  */
> 
> static void
> generate_subtree_copies (struct access *access, tree agg,
> 			 HOST_WIDE_INT top_offset,
> 			 HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size,
> 			 gimple_stmt_iterator *gsi, bool write,
> 			 bool insert_after)
> {
>   do
>     {
>       tree expr = unshare_expr (agg);
> 
>       if (chunk_size && access->offset >= start_offset + chunk_size)
> 	return;
> 
>       if (access->grp_to_be_replaced
> 	  && (chunk_size == 0
> 	      || access->offset + access->size > start_offset))
> 	{
> 	  bool repl_found;
> 	  gimple stmt;
> 
> 	  repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg),
> 					     access->offset - top_offset,
> 					     access->type, false);
> 	  gcc_assert (repl_found);
> 
> 	  if (write)
> 	    stmt = gimple_build_assign (get_access_replacement (access), expr);
> 	  else
> 	    {
> 	      tree repl = get_access_replacement (access);
> 	      TREE_NO_WARNING (repl) = 1;
> 	      stmt = gimple_build_assign (expr, repl);
> 	    }
> 
> 	  if (insert_after)
> 	    gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> 	  else
> 	    gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> 	  update_stmt (stmt);
> 	}
> 
>       if (access->first_child)
> 	generate_subtree_copies (access->first_child, agg, top_offset,
> 				 start_offset, chunk_size, gsi,
> 				 write, insert_after);
> 
>       access = access->next_sibling;
>     }
>   while (access);
> }
> 
> /* Assign zero to all scalar replacements in an access subtree.  ACCESS is the
>    the root of the subtree to be processed.  GSI is the statement iterator used
>    for inserting statements which are added after the current statement if
>    INSERT_AFTER is true or before it otherwise.  */
> 
> static void
> init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi,
> 			bool insert_after)
> 
> {
>   struct access *child;
> 
>   if (access->grp_to_be_replaced)
>     {
>       gimple stmt;
> 
>       stmt = gimple_build_assign (get_access_replacement (access),
> 				  fold_convert (access->type,
> 						integer_zero_node));
>       if (insert_after)
> 	gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
>       else
> 	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
>       update_stmt (stmt);
>     }
> 
>   for (child = access->first_child; child; child = child->next_sibling)
>     init_subtree_with_zero (child, gsi, insert_after);
> }
> 
> /* Search for an access representative for the given expression EXPR and
>    return it or NULL if it cannot be found.  */
> 
> static struct access *
> get_access_for_expr (tree expr)
> {
>   HOST_WIDE_INT offset, size, max_size;
>   tree base;
> 
>   if (TREE_CODE (expr) == NOP_EXPR

CONVERT_EXPR_P (expr)

>       || TREE_CODE (expr) == VIEW_CONVERT_EXPR)

VIEW_CONVERT_EXPR is also a handled_component_p.

Note that NOP_EXPR should never occur here - that would be invalid
gimple.  So I think you can (and should) just delete the above.

>     expr = TREE_OPERAND (expr, 0);
> 
>   if (handled_component_p (expr))
>     {
>       base = get_ref_base_and_extent (expr, &offset, &size, &max_size);
>       size = max_size;
>       if (size == -1 || !base || !DECL_P (base))
> 	return NULL;
>     }
>   else if (DECL_P (expr))
>     {
>       tree tree_size;
> 
>       base = expr;
>       tree_size = TYPE_SIZE (TREE_TYPE (base));
>       if (tree_size && host_integerp (tree_size, 1))
> 	size = max_size = tree_low_cst (tree_size, 1);
>       else
> 	return NULL;
> 
>       offset = 0;

See above.  get_ref_base_and_extent handles plain DECLs just fine.

>     }
>   else
>     return NULL;
> 
>   if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>     return NULL;
> 
>   return get_var_base_offset_size_access (base, offset, size);
> }
> 
> /* Substitute into *EXPR an expression of type TYPE with the value of the
>    replacement of ACCESS.  This is done either by producing a special V_C_E
>    assignment statement converting the replacement to a new temporary of the
>    requested type if TYPE is not TREE_ADDRESSABLE or by going through the base
>    aggregate if it is.  */
> 
> static void
> fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access,
> 				 gimple_stmt_iterator *gsi, bool write)
> {
>   tree repl = get_access_replacement (access);
>   if (!TREE_ADDRESSABLE (type))
>     {
>       tree tmp = create_tmp_var (type, "SRvce");
> 
>       add_referenced_var (tmp);
>       if (is_gimple_reg_type (type))
> 	tmp = make_ssa_name (tmp, NULL);

Should be always is_gimple_reg_type () if it is a type suitable for
a SRA scalar replacement.  But you should set DECL_GIMPLE_REG_P for
VECTOR and COMPLEX types here.

>       if (write)
> 	{
> 	  gimple stmt;
> 	  tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp);

This needs to either always fold to plain 'tmp' or tmp has to be a
non-register.  Otherwise you will create invalid gimple.

> 	  *expr = tmp;
> 	  if (is_gimple_reg_type (type))
> 	    SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi);

See above.

> 	  stmt = gimple_build_assign (repl, conv);
> 	  gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> 	  update_stmt (stmt);
> 	}
>       else
> 	{
> 	  gimple stmt;
> 	  tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl);
> 
> 	  stmt = gimple_build_assign (tmp, conv);
> 	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> 	  if (is_gimple_reg_type (type))
> 	    SSA_NAME_DEF_STMT (tmp) = stmt;

See above.  (I wonder if the patch still passes bootstrap & regtest
after the typecking patch)

> 	  *expr = tmp;
> 	  update_stmt (stmt);
> 	}
>     }
>   else
>     {
>       if (write)
> 	{
> 	  gimple stmt;
> 
> 	  stmt = gimple_build_assign (repl, unshare_expr (access->expr));
> 	  gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> 	  update_stmt (stmt);
> 	}
>       else
> 	{
> 	  gimple stmt;
> 
> 	  stmt = gimple_build_assign (unshare_expr (access->expr), repl);
> 	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> 	  update_stmt (stmt);
> 	}

I don't understand this path.  Are the types here always compatible?

>     }
> }
> 
> 
> /* Callback for scan_function.  Replace the expression EXPR with a scalar
>    replacement if there is one and generate other statements to do type
>    conversion or subtree copying if necessary.  GSI is used to place newly
>    created statements, WRITE is true if the expression is being written to (it
>    is on a LHS of a statement or output in an assembly statement).  */
> 
> static bool
> sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write,
> 		 void *data ATTRIBUTE_UNUSED)
> {
>   struct access *access;
>   tree type, bfr;
> 
>   if (TREE_CODE (*expr) == BIT_FIELD_REF)
>     {
>       bfr = *expr;
>       expr = &TREE_OPERAND (*expr, 0);
>     }
>   else
>     bfr = NULL_TREE;
> 
>   if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR)
>     expr = &TREE_OPERAND (*expr, 0);

Why strip these early?  I think this is wrong (or do you always want to
produce complex type replacements, even if only the real or imaginary part
are used?  If so a strathegic comment somewhere is missing.)

>   type = TREE_TYPE (*expr);
> 
>   access = get_access_for_expr (*expr);
>   if (!access)
>     return false;
> 
>   if (access->grp_to_be_replaced)
>     {
>       if (!useless_type_conversion_p (type, access->type))
> 	fix_incompatible_types_for_expr (expr, type, access, gsi, write);
>       else
> 	*expr = get_access_replacement (access);
>     }
> 
>   if (access->first_child)
>     {
>       HOST_WIDE_INT start_offset, chunk_size;
>       if (bfr
> 	  && host_integerp (TREE_OPERAND (bfr, 1), 1)
> 	  && host_integerp (TREE_OPERAND (bfr, 2), 1))
> 	{
> 	  start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1);
> 	  chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1);
> 	}
>       else
> 	start_offset = chunk_size = 0;
> 
>       generate_subtree_copies (access->first_child, access->base, 0,
> 			       start_offset, chunk_size, gsi, write, write);
>     }
>   return true;
> }
> 
> /* Store all replacements in the access tree rooted in TOP_RACC either to their
>    base aggregate if there are unscalarized data or directly to LHS
>    otherwise.  */
> 
> static void
> handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs,
> 				     gimple_stmt_iterator *gsi)
> {
>   if (top_racc->grp_unscalarized_data)
>     generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0,
> 			     gsi, false, false);
>   else
>     generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset,
> 			     0, 0, gsi, false, false);
> }
> 
> 
> /* Try to generate statements to load all sub-replacements in an access
>    (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC
>    (sub)tree.  If that is not possible, refresh the TOP_RACC base aggregate and
>    load the accesses from it.  LEFT_OFFSET is the offset of the left whole
>    subtree being copied, RIGHT_OFFSET is the same thing for the right subtree.
>    GSI is stmt iterator used for statement insertions.  *REFRESHED is true iff
>    the rhs top aggregate has already been refreshed by contents of its scalar
>    reductions and is set to true if this function has to do it.  */
> 
> static void
> load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc,
> 				 HOST_WIDE_INT left_offset,
> 				 HOST_WIDE_INT right_offset,
> 				 gimple_stmt_iterator *old_gsi,
> 				 gimple_stmt_iterator *new_gsi,
> 				 bool *refreshed, tree lhs)
> {
>   do
>     {
>       if (lacc->grp_to_be_replaced)
> 	{
> 	  struct access *racc;
> 	  HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset;
> 
> 	  racc = find_access_in_subtree (top_racc, offset, lacc->size);
> 	  if (racc && racc->grp_to_be_replaced)
> 	    {
> 	      gimple stmt;
> 
> 	      if (useless_type_conversion_p (lacc->type, racc->type))
> 		stmt = gimple_build_assign (get_access_replacement (lacc),
> 					    get_access_replacement (racc));
> 	      else
> 		{
> 		  tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type,
> 					  get_access_replacement (racc));
> 		  stmt = gimple_build_assign (get_access_replacement (lacc),
> 					      rhs);
> 		}
> 
> 	      gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT);
> 	      update_stmt (stmt);
> 	    }
> 	  else
> 	    {
> 	      tree expr = unshare_expr (top_racc->base);
> 	      bool repl_found;
> 	      gimple stmt;
> 
> 	      /* No suitable access on the right hand side, need to load from
> 		 the aggregate.  See if we have to update it first... */
> 	      if (!*refreshed)
> 		{
> 		  gcc_assert (top_racc->first_child);
> 		  handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi);
> 		  *refreshed = true;
> 		}
> 
> 	      repl_found = build_ref_for_offset (&expr,
> 						 TREE_TYPE (top_racc->base),
> 						 lacc->offset - left_offset,
> 						 lacc->type, false);
> 	      gcc_assert (repl_found);
> 	      stmt = gimple_build_assign (get_access_replacement (lacc),
> 					  expr);
> 	      gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT);
> 	      update_stmt (stmt);
> 	    }
> 	}
>       else if (lacc->grp_read && !lacc->grp_covered && !*refreshed)
> 	{
> 	  handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi);
> 	  *refreshed = true;
> 	}
> 
>       if (lacc->first_child)
> 	load_assign_lhs_subreplacements (lacc->first_child, top_racc,
> 					 left_offset, right_offset,
> 					 old_gsi, new_gsi, refreshed, lhs);
>       lacc = lacc->next_sibling;
>     }
>   while (lacc);
> }
> 
> /* Return true iff ACC is non-NULL and has subaccesses.  */
> 
> static inline bool
> access_has_children_p (struct access *acc)
> {
>   return acc && acc->first_child;
> }
> 
> /* Modify assignments with a CONSTRUCTOR on their RHS.  STMT contains a pointer
>    to the assignment and GSI is the statement iterator pointing at it.  Returns
>    the same values as sra_modify_assign.  */
> 
> static enum scan_assign_result
> sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi)
> {
>   tree lhs = gimple_assign_lhs (*stmt);
>   struct access *acc;
> 
>   gcc_assert (TREE_CODE (lhs) != REALPART_EXPR
> 	      && TREE_CODE (lhs) != IMAGPART_EXPR);
>   acc = get_access_for_expr (lhs);
>   if (!acc)
>     return SRA_SA_NONE;
> 
>   if (VEC_length (constructor_elt,
> 		  CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0)
>     {
>       /* I have never seen this code path trigger but if it can happen the
> 	 following should handle it gracefully.  */

It can trigger for vector constants.

>       if (access_has_children_p (acc))
> 	generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi,
> 				 true, true);
>       return SRA_SA_PROCESSED;
>     }
> 
>   if (acc->grp_covered)
>     {
>       init_subtree_with_zero (acc, gsi, false);
>       unlink_stmt_vdef (*stmt);
>       gsi_remove (gsi, true);
>       return SRA_SA_REMOVED;
>     }
>   else
>     {
>       init_subtree_with_zero (acc, gsi, true);
>       return SRA_SA_PROCESSED;
>     }
> }
> 
> 
> /* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with
>    to-be-scalarized expressions with them.  STMT is the statement and GSI is
>    the iterator used to place new helper statements.  Returns the same values
>    as sra_modify_assign.  */
> 
> static enum scan_assign_result
> sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi)
> {
>   tree lhs, complex, ptype, rp, ip;
>   struct access *access;
>   gimple new_stmt, aux_stmt;
> 
>   lhs = gimple_assign_lhs (stmt);
>   complex = TREE_OPERAND (lhs, 0);
> 
>   access = get_access_for_expr (complex);
> 
>   if (!access || !access->grp_to_be_replaced)
>     return SRA_SA_NONE;
> 
>   ptype = TREE_TYPE (TREE_TYPE (complex));
>   rp = create_tmp_var (ptype, "SRr");
>   add_referenced_var (rp);
>   rp = make_ssa_name (rp, NULL);
> 
>   ip = create_tmp_var (ptype, "SRp");
>   add_referenced_var (ip);
>   ip = make_ssa_name (ip, NULL);
> 
>   if (TREE_CODE (lhs) == IMAGPART_EXPR)
>     {
>       aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype,
> 					     get_access_replacement (access)));
>       SSA_NAME_DEF_STMT (rp) = aux_stmt;
>       gimple_assign_set_lhs (stmt, ip);
>       SSA_NAME_DEF_STMT (ip) = stmt;
>     }
>   else
>     {
>       aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype,
> 					     get_access_replacement (access)));
>       SSA_NAME_DEF_STMT (ip) = aux_stmt;
>       gimple_assign_set_lhs (stmt, rp);
>       SSA_NAME_DEF_STMT (rp) = stmt;
>     }
> 
>   gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT);
>   update_stmt (aux_stmt);
>   new_stmt = gimple_build_assign (get_access_replacement (access),
> 				  fold_build2 (COMPLEX_EXPR, access->type,
> 					       rp, ip));
>   gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT);
>   update_stmt (new_stmt);

Hm.  So you do what complex lowering does here.  Note that this may
create loads from uninitialized memory with all its problems.

WRT the complex stuff.  If you would do scalarization and analysis
just on the components (not special case REAL/IMAGPART_EXPR everywhere)
it should work better, correct?  You still could handle group
scalarization for the case of for example passing a complex argument
to a function.

void bar(_Complex float);
void foo(float x, float y)
{
  _Complex float z = x;
  __imag z = y;
  bar(z);
}

The same applies for vectors - the REAL/IMAGPART_EXPRs equivalent
there is BIT_FIELD_REF.

>   return SRA_SA_PROCESSED;
> }
> 
> /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components.  */
> 
> static bool
> contains_view_convert_expr_p (tree t)
> {
>   while (1)
>     {
>       if (TREE_CODE (t) == VIEW_CONVERT_EXPR)
> 	return true;
>       if (!handled_component_p (t))
> 	return false;
>       t = TREE_OPERAND (t, 0);
>     }
> }

Place this in tree-flow-inline.h next to ref_contains_array_ref, also
structure the loop in the same way.

> /* Change STMT to assign compatible types by means of adding component or array
>    references or VIEW_CONVERT_EXPRs.  All parameters have the same meaning as
>    variable with the same names in sra_modify_assign.  This is done in a
>    such a complicated way in order to make
>    testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some
>    cases.  */
> 
> static void
> fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt,
> 				   struct access *lacc, struct access *racc,
> 				   tree lhs, tree *rhs, tree ltype, tree rtype)
> {
>   if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype)
>       && !access_has_children_p (lacc))
>     {
>       tree expr = unshare_expr (lhs);
>       bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype,
> 					 false);
>       if (found)
> 	{
> 	  gimple_assign_set_lhs (*stmt, expr);
> 	  return;
> 	}
>     }
> 
>   if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype)
>       && !access_has_children_p (racc))
>     {
>       tree expr = unshare_expr (*rhs);
>       bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype,
> 					 false);
>       if (found)
> 	{
> 	  gimple_assign_set_rhs1 (*stmt, expr);
> 	  return;
> 	}
>     }
> 
>   *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs);
>   gimple_assign_set_rhs_from_tree (gsi, *rhs);
>   *stmt = gsi_stmt (*gsi);

Reading this I have a deja-vu - isn't there another function in this
file doing the same thing?  You are doing much unsharing even though
you re-build the access tree from scratch?

> }
> 
> /* Callback of scan_function to process assign statements.  It examines both
>    sides of the statement, replaces them with a scalare replacement if there is
>    one and generating copying of replacements if scalarized aggregates have been
>    used in the assignment.  STMT is a pointer to the assign statement, GSI is
>    used to hold generated statements for type conversions and subtree
>    copying.  */
> 
> static enum scan_assign_result
> sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi,
> 		   void *data ATTRIBUTE_UNUSED)
> {
>   struct access *lacc, *racc;
>   tree ltype, rtype;
>   tree lhs, rhs;
>   bool modify_this_stmt;
> 
>   if (gimple_assign_rhs2 (*stmt))

!gimple_assign_single_p (*stmt)

(the only gimple assign that may access memory)

>     return SRA_SA_NONE;
>   lhs = gimple_assign_lhs (*stmt);
>   rhs = gimple_assign_rhs1 (*stmt);
> 
>   if (TREE_CODE (rhs) == CONSTRUCTOR)
>     return sra_modify_constructor_assign (stmt, gsi);
> 
>   if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR)
>     return sra_modify_partially_complex_lhs (*stmt, gsi);
> 
>   if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR
>       || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF)
>     {
>       modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt),
> 					  gsi, false, data);
>       modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt),
> 					   gsi, true, data);
>       return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE;
>     }
> 
>   lacc = get_access_for_expr (lhs);
>   racc = get_access_for_expr (rhs);
>   if (!lacc && !racc)
>     return SRA_SA_NONE;
> 
>   modify_this_stmt = ((lacc && lacc->grp_to_be_replaced)
> 		      || (racc && racc->grp_to_be_replaced));
> 
>   if (lacc && lacc->grp_to_be_replaced)
>     {
>       lhs = get_access_replacement (lacc);
>       gimple_assign_set_lhs (*stmt, lhs);
>       ltype = lacc->type;
>     }
>   else
>     ltype = TREE_TYPE (lhs);
> 
>   if (racc && racc->grp_to_be_replaced)
>     {
>       rhs = get_access_replacement (racc);
>       gimple_assign_set_rhs1 (*stmt, rhs);
>       rtype = racc->type;
>     }
>   else
>     rtype = TREE_TYPE (rhs);
> 
>   /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate
>      the statement makes the position of this pop_stmt_changes() a bit awkward
>      but hopefully make some sense.  */

I don't see pop_stmt_changes().

>   if (modify_this_stmt)
>     {
>       if (!useless_type_conversion_p (ltype, rtype))
> 	fix_modified_assign_compatibility (gsi, stmt, lacc, racc,
> 					   lhs, &rhs, ltype, rtype);
>     }
> 
>   if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs)
>       || (access_has_children_p (racc)
> 	  && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset))
>       || (access_has_children_p (lacc)
> 	  && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset)))

?  A comment is missing what this case is about ...

(this smells like fixup that could be avoided by doing things correct
in the first place)

>     {
>       if (access_has_children_p (racc))
> 	generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0,
> 				 gsi, false, false);
>       if (access_has_children_p (lacc))
> 	generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0,
> 				 gsi, true, true);
>     }
>   else
>     {
>       if (access_has_children_p (lacc) && access_has_children_p (racc))
> 	{
> 	  gimple_stmt_iterator orig_gsi = *gsi;
> 	  bool refreshed;
> 
> 	  if (lacc->grp_read && !lacc->grp_covered)
> 	    {
> 	      handle_unscalarized_data_in_subtree (racc, lhs, gsi);
> 	      refreshed = true;
> 	    }
> 	  else
> 	    refreshed = false;
> 
> 	  load_assign_lhs_subreplacements (lacc->first_child, racc,
> 					   lacc->offset, racc->offset,
> 					   &orig_gsi, gsi, &refreshed, lhs);
> 	  if (!refreshed || !racc->grp_unscalarized_data)
> 	    {
> 	      if (*stmt == gsi_stmt (*gsi))
> 		gsi_next (gsi);
> 
> 	      unlink_stmt_vdef (*stmt);
> 	      gsi_remove (&orig_gsi, true);
> 	      return SRA_SA_REMOVED;
> 	    }
> 	}
>       else
> 	{
> 	  if (access_has_children_p (racc))
> 	    {
> 	      if (!racc->grp_unscalarized_data)
> 		{
> 		  generate_subtree_copies (racc->first_child, lhs,
> 					   racc->offset, 0, 0, gsi,
> 					   false, false);
> 		  gcc_assert (*stmt == gsi_stmt (*gsi));
> 		  unlink_stmt_vdef (*stmt);
> 		  gsi_remove (gsi, true);
> 		  return SRA_SA_REMOVED;
> 		}
> 	      else
> 		generate_subtree_copies (racc->first_child, lhs,
> 					 racc->offset, 0, 0, gsi, false, true);
> 	    }
> 	  else if (access_has_children_p (lacc))
> 	    generate_subtree_copies (lacc->first_child, rhs, lacc->offset,
> 				     0, 0, gsi, true, true);
> 	}
>     }
> 
>   return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE;
> }
> 
> /* Generate statements initializing scalar replacements of parts of function
>    parameters.  */
> 
> static void
> initialize_parameter_reductions (void)
> {
>   gimple_stmt_iterator gsi;
>   gimple_seq seq = NULL;
>   tree parm;
> 
>   for (parm = DECL_ARGUMENTS (current_function_decl);
>        parm;
>        parm = TREE_CHAIN (parm))
>     {
>       VEC (access_p, heap) *access_vec;
>       struct access *access;
> 
>       if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm)))
> 	continue;
>       access_vec = get_base_access_vector (parm);
>       if (!access_vec)
> 	continue;
> 
>       if (!seq)
> 	{
> 	  seq = gimple_seq_alloc ();
> 	  gsi = gsi_start (seq);
> 	}
> 
>       for (access = VEC_index (access_p, access_vec, 0);
> 	   access;
> 	   access = access->next_grp)
> 	generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true);
>     }
> 
>   if (seq)
>     gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq);
> }
> 
> /* The "main" function of intraprocedural SRA passes.  Runs the analysis and if
>    it reveals there are components of some aggregates to be scalarized, it runs
>    the required transformations.  */
> static unsigned int
> perform_intra_sra (void)
> {
>   int ret = 0;
>   sra_initialize ();
> 
>   if (!find_var_candidates ())
>     goto out;
> 
>   if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL,
> 		      true, NULL))
>     goto out;
> 
>   if (!analyze_all_variable_accesses ())
>     goto out;
> 
>   scan_function (sra_modify_expr, sra_modify_assign, NULL,
> 		 false, NULL);
>   initialize_parameter_reductions ();
> 
>   ret = TODO_update_ssa;

redundant set.

>   if (sra_mode == SRA_MODE_EARLY_INTRA)
>     ret = TODO_update_ssa;
>   else
>     ret = TODO_update_ssa | TODO_rebuild_alias;

in fact you shouldn't (need to) rebuild alias.

>  out:
>   sra_deinitialize ();
>   return ret;
> }
> 
> /* Perform early intraprocedural SRA.  */
> static unsigned int
> early_intra_sra (void)
> {
>   sra_mode = SRA_MODE_EARLY_INTRA;
>   return perform_intra_sra ();
> }
> 
> /* Perform "late" intraprocedural SRA.  */
> static unsigned int
> late_intra_sra (void)
> {
>   sra_mode = SRA_MODE_INTRA;
>   return perform_intra_sra ();
> }
> 
> 
> static bool
> gate_intra_sra (void)
> {
>   return flag_tree_sra != 0;
> }
> 
> 
> struct gimple_opt_pass pass_sra_early =
> {
>  {
>   GIMPLE_PASS,
>   "esra",	 			/* name */
>   gate_intra_sra,			/* gate */
>   early_intra_sra,			/* execute */
>   NULL,					/* sub */
>   NULL,					/* next */
>   0,					/* static_pass_number */
>   TV_TREE_SRA,				/* tv_id */
>   PROP_cfg | PROP_ssa,                  /* properties_required */
>   0,					/* properties_provided */
>   0,					/* properties_destroyed */
>   0,					/* todo_flags_start */
>   TODO_dump_func
>   | TODO_update_ssa
>   | TODO_ggc_collect
>   | TODO_verify_ssa			/* todo_flags_finish */
>  }
> };
> 
> 
> struct gimple_opt_pass pass_sra =
> {
>  {
>   GIMPLE_PASS,
>   "sra",	 			/* name */
>   gate_intra_sra,			/* gate */
>   late_intra_sra,			/* execute */
>   NULL,					/* sub */
>   NULL,					/* next */
>   0,					/* static_pass_number */
>   TV_TREE_SRA,				/* tv_id */
>   PROP_cfg | PROP_ssa,                  /* properties_required */
>   0,					/* properties_provided */
>   0,					/* properties_destroyed */
>   TODO_update_address_taken,		/* todo_flags_start */
>   TODO_dump_func
>   | TODO_update_ssa
>   | TODO_ggc_collect
>   | TODO_verify_ssa			/* todo_flags_finish */
>  }
> };


Overall it looks good - I'm still a little bit confused, but that's
likely because reading code from top to bottom doesn't make the
most sense in all cases ;)

Looking forward to a second look on a revised version.

Thanks,
Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]