This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)


Hello,

these patches enable sum and product inlining in the non-scalar case
(before, they were inlined in the scalar case only).

Let's consider the scalar expression:
  sum(a(:,1,1,:,:))
This is currently inlined (scalar expression), and the scalarizer has a dim
array designating array dimensions used for scalarization. In the case above,
dim = {0, 3, 4}, and the scalarizer state corresponding to `a' has non-NULL
values in the corresponding lbound[0], lbound[3], lbound[4] (same, for ubound,
stride,...) as presented in the scheme below.

loop
  \
   --- (...) ---- ss(a) --- (...) ----> gfc_ss_terminator
                  |\
                  | descriptor
                  |  |\
                  |  | lbound  ( )  ( )  ( )  ( )  ( ) 
                  |   \
                  |    ubound  ( )  ( )  ( )  ( )  ( )
                  |             ^              ^    ^
                  \        +----+              |    |
                   dim     | +-----------------+    |
                     \     | | +--------------------+
                      ---> 0 3 4



Now if one considers the non-scalar expression:
  sum(a(:,1,1,:,:), dim=2)
the dim array would now be only {3} as we are suming on the second non-scalar
dimension only, which means that lbound[0] and lbound[4] (same for ubound,
stride...) are left available for another scalarizer.  This set of patches
make the necessary changes so that another scalarizer can use those dimensions
left available as below.

loop
  \
   --- (...) ---- ss(sum(a,2)) --- (...) ----> gfc_ss_terminator
                  |\
                  | dim 
                  |  \ 
                  |   ---> 0 4
                  |        | +----------------------+
                  |        +----+                   |
                   \            |                   |
                    descriptor  |                   |
                   / |\         V                   V
                  |  | lbound  ( )  ( )  ( )  ( )  ( ) 
                  |   \
                  |    ubound  ( )  ( )  ( )  ( )  ( )
                  |                            ^
                  |        +-------------------+
                  |   ---> 3
                  |  / 
                  | dim 
                  |/
   --- (...) ---- nested_ss(a) --- (...) ----> gfc_ss_terminator
  /
nested_loop


* Structure changes.

Between the outer scalarizer loop (using dimensions 0 and 4 above) and the inner
one (using dimension 3) almost all the information is the same:
type, expression, descriptor, string_length, ...

So, in order to not have to update many structs at the same type, if one
item needs to be updated, a new struct is created containing all the shared
content, and all the gfc_ss structs have a pointer to it instead of holding
the content directly.
Left in the gfc_ss structs are dim array and dimension for the obvious that
they depend on the loop. This requires that those fields are moved from
gfc_ss_info, and is the reason for most of the 13..19 patches.
Also left in the gfc_ss structs are the linked list pointers, for the same
reason that they depend on the loop.  They don't need to be moved though.
All the rest is moved to the new shared struct.
See patches 20..30 for details.

A bunch of new pointers are added to ease retrieval of related content: loop
associated with a gfc_ss struct, parent gfc_ss struct (i.e. the one in the outer
loop), associated gfc_ss struct in the inner loop.
The gfc_loopinfo structs get the same kind of changes: three additional fields;
one for the outer loop, and two for a linked list of nested loop pointers.
See patches 31..53 for details


* Code changes

All the changes above require the whole scalarizer to be updated, not only
because its core structures have changed, but also to handle more than one loop:
 - In cases we were previously looping over all the dimensions of a loop, 
   we'll now need to loop over all the dimensions and over all the loops
   available. For example gfc_trans_create_temp_array uses loop bounds to guess
   allocation size; it has to consider more than one loop now.
 - In cases we were focusing on one single array, we have to take into account
   the fact that the array information can be scattered across multiple loops. 
 - The code that was executed before the loop previously has to be taken out
   of the outermost loop now. This implies that the inner loop is already
   available when handling the outer loop. It will in fact be created at walk
   time.
See patches 31..53 for details.

A few corner cases not in the core of the scalarizer need additional fixes to
prevent regressions, in patches 54..61. The rest (62..66) takes care of inlining
sum (and product).

The full patch is attached for the compulsive testers.
To ease (well, somewhat) review, the patch has been split into pieces.
See my follow-up mails for details.  The general outline is below.
There is no need to spend too much reviewing power on the 1..30 patches, unless
there is a concern about the core struct changes.

01..06: Step by step gfc_trans_preloop_setup rewrite.
07..12: Various preliminary cleanups.
13..19: Function interfaces changes.
20..30: Core structs reorganisation.
31..53: Update the scalarizer.
54..61: Prevent regressions.
62..66: Inline sum.


Regression tested on x86_64-unknown-freebsd8.2. OK for trunk?

Mikael

PS: I hereby confess my failure to not split the patch too much. :-(

Attachment: pr43829-tests.CL
Description: Text document

Index: function_optimize_7.f90
===================================================================
--- function_optimize_7.f90	(révision 180154)
+++ function_optimize_7.f90	(copie de travail)
@@ -12,6 +12,7 @@
   real, intent(out) :: z
   character(60) :: line
   real, external :: ext_func
+  integer :: one = 1
   interface
      elemental function element(x)
        real, intent(in) :: x
@@ -33,7 +34,7 @@
   z = element(x) + element(x)
   i = mypure(x) - mypure(x)
   z = elem_impure(x) - elem_impure(x)
-  s_out = sum(s_in,1) + 3.14 / sum(s_in,1) ! { dg-warning "Creating array temporary" }
+  s_out = sum(s_in,one) + 3.14 / sum(s_in,one) ! { dg-warning "Creating array temporary" }
 end subroutine xx
 ! { dg-final { scan-tree-dump-times "matmul_r4" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "__builtin_sinf" 1 "original" } }

Attachment: inline_sum_2.f90
Description: Text document

Attachment: inline_sum_1.f90
Description: Text document

Attachment: inline_sum_bounds_check_1.f90
Description: Text document

Attachment: inline_sum_bounds_check_2.f90
Description: Text document

Attachment: inline_product_1.f90
Description: Text document

diff --git a/array.c b/array.c
index 3e6b9d2..a1449fd 100644
--- a/array.c
+++ b/array.c
@@ -70,6 +70,7 @@ match_subscript (gfc_array_ref *ar, int init, bool match_star)
 
   i = ar->dimen + ar->codimen;
 
+  gfc_gobble_whitespace ();
   ar->c_where[i] = gfc_current_locus;
   ar->start[i] = ar->end[i] = ar->stride[i] = NULL;
 
diff --git a/dependency.c b/dependency.c
index c43af00..fd7fa73 100644
--- a/dependency.c
+++ b/dependency.c
@@ -713,6 +713,17 @@ gfc_check_argument_var_dependency (gfc_expr *var, sym_intent intent,
 	    return gfc_check_fncall_dependency (var, intent, NULL,
 						expr->value.function.actual,
 						ELEM_CHECK_VARIABLE);
+
+	  if (gfc_inline_intrinsic_function_p (expr))
+	    {
+	      /* The TRANSPOSE case should have been caught in the
+		 noncopying intrinsic case above.  */
+	      gcc_assert (expr->value.function.isym->id != GFC_ISYM_TRANSPOSE);
+
+	      return gfc_check_fncall_dependency (var, intent, NULL,
+						  expr->value.function.actual,
+						  ELEM_CHECK_VARIABLE);
+	    }
 	}
       return 0;
 
diff --git a/frontend-passes.c b/frontend-passes.c
index 5b1a644..a19f22d 100644
--- a/frontend-passes.c
+++ b/frontend-passes.c
@@ -203,8 +203,8 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED,
       /* Conversions are handled on the fly by the middle end,
 	 transpose during trans-* stages and TRANSFER by the middle end.  */
       if ((*e)->value.function.isym->id == GFC_ISYM_CONVERSION
-	  || (*e)->value.function.isym->id == GFC_ISYM_TRANSPOSE
-	  || (*e)->value.function.isym->id == GFC_ISYM_TRANSFER)
+	  || (*e)->value.function.isym->id == GFC_ISYM_TRANSFER
+	  || gfc_inline_intrinsic_function_p (*e))
 	return 0;
 
       /* Don't create an array temporary for elemental functions,
@@ -567,7 +567,8 @@ optimize_binop_array_assignment (gfc_code *c, gfc_expr **rhs, bool seen_op)
 	   && ! (e->value.function.isym
 		 && (e->value.function.isym->elemental
 		     || e->ts.type != c->expr1->ts.type
-		     || e->ts.kind != c->expr1->ts.kind)))
+		     || e->ts.kind != c->expr1->ts.kind))
+	   && ! gfc_inline_intrinsic_function_p (e))
     {
 
       gfc_code *n;
diff --git a/gfortran.h b/gfortran.h
index da3477d..b869ca3 100644
--- a/gfortran.h
+++ b/gfortran.h
@@ -2880,6 +2880,9 @@ void gfc_generate_code (gfc_namespace *);
 void gfc_generate_module_code (gfc_namespace *);
 void gfc_init_coarray_decl (bool);
 
+/* trans-intrinsic.c */
+bool gfc_inline_intrinsic_function_p (gfc_expr *);
+
 /* bbt.c */
 typedef int (*compare_fn) (void *, void *);
 void gfc_insert_bbt (void *, void *, compare_fn);
diff --git a/matchexp.c b/matchexp.c
index 8b99ce9..cd70dc0 100644
--- a/matchexp.c
+++ b/matchexp.c
@@ -201,6 +201,7 @@ match_level_1 (gfc_expr **result)
   locus where;
   match m;
 
+  gfc_gobble_whitespace ();
   where = gfc_current_locus;
   uop = NULL;
   m = match_defined_operator (&uop);
diff --git a/trans-array.c b/trans-array.c
index 3472804..80875a7 100644
--- a/trans-array.c
+++ b/trans-array.c
@@ -463,11 +463,9 @@ void
 gfc_mark_ss_chain_used (gfc_ss * ss, unsigned flags)
 {
   for (; ss != gfc_ss_terminator; ss = ss->next)
-    ss->useflags = flags;
+    ss->info->useflags = flags;
 }
 
-static void gfc_free_ss (gfc_ss *);
-
 
 /* Free a gfc_ss chain.  */
 
@@ -486,20 +484,35 @@ gfc_free_ss_chain (gfc_ss * ss)
 }
 
 
+static void
+free_ss_info (gfc_ss_info *ss_info)
+{
+  ss_info->refcount--;
+  if (ss_info->refcount > 0)
+    return;
+
+  gcc_assert (ss_info->refcount == 0);
+  free (ss_info);
+}
+
+
 /* Free a SS.  */
 
-static void
+void
 gfc_free_ss (gfc_ss * ss)
 {
+  gfc_ss_info *ss_info;
   int n;
 
-  switch (ss->type)
+  ss_info = ss->info;
+
+  switch (ss_info->type)
     {
     case GFC_SS_SECTION:
-      for (n = 0; n < ss->data.info.dimen; n++)
+      for (n = 0; n < ss->dimen; n++)
 	{
-	  if (ss->data.info.subscript[ss->data.info.dim[n]])
-	    gfc_free_ss_chain (ss->data.info.subscript[ss->data.info.dim[n]]);
+	  if (ss_info->data.array.subscript[ss->dim[n]])
+	    gfc_free_ss_chain (ss_info->data.array.subscript[ss->dim[n]]);
 	}
       break;
 
@@ -507,6 +520,7 @@ gfc_free_ss (gfc_ss * ss)
       break;
     }
 
+  free_ss_info (ss_info);
   free (ss);
 }
 
@@ -517,17 +531,20 @@ gfc_ss *
 gfc_get_array_ss (gfc_ss *next, gfc_expr *expr, int dimen, gfc_ss_type type)
 {
   gfc_ss *ss;
-  gfc_ss_info *info;
+  gfc_ss_info *ss_info;
   int i;
 
+  ss_info = gfc_get_ss_info ();
+  ss_info->refcount++;
+  ss_info->type = type;
+  ss_info->expr = expr;
+
   ss = gfc_get_ss ();
+  ss->info = ss_info;
   ss->next = next;
-  ss->type = type;
-  ss->expr = expr;
-  info = &ss->data.info;
-  info->dimen = dimen;
-  for (i = 0; i < info->dimen; i++)
-    info->dim[i] = i;
+  ss->dimen = dimen;
+  for (i = 0; i < ss->dimen; i++)
+    ss->dim[i] = i;
 
   return ss;
 }
@@ -539,13 +556,21 @@ gfc_ss *
 gfc_get_temp_ss (tree type, tree string_length, int dimen)
 {
   gfc_ss *ss;
+  gfc_ss_info *ss_info;
+  int i;
+
+  ss_info = gfc_get_ss_info ();
+  ss_info->refcount++;
+  ss_info->type = GFC_SS_TEMP;
+  ss_info->string_length = string_length;
+  ss_info->data.temp.type = type;
 
   ss = gfc_get_ss ();
+  ss->info = ss_info;
   ss->next = gfc_ss_terminator;
-  ss->type = GFC_SS_TEMP;
-  ss->string_length = string_length;
-  ss->data.temp.dimen = dimen;
-  ss->data.temp.type = type;
+  ss->dimen = dimen;
+  for (i = 0; i < ss->dimen; i++)
+    ss->dim[i] = i;
 
   return ss;
 }
@@ -557,11 +582,16 @@ gfc_ss *
 gfc_get_scalar_ss (gfc_ss *next, gfc_expr *expr)
 {
   gfc_ss *ss;
+  gfc_ss_info *ss_info;
+
+  ss_info = gfc_get_ss_info ();
+  ss_info->refcount++;
+  ss_info->type = GFC_SS_SCALAR;
+  ss_info->expr = expr;
 
   ss = gfc_get_ss ();
+  ss->info = ss_info;
   ss->next = next;
-  ss->type = GFC_SS_SCALAR;
-  ss->expr = expr;
 
   return ss;
 }
@@ -572,6 +602,7 @@ gfc_get_scalar_ss (gfc_ss *next, gfc_expr *expr)
 void
 gfc_cleanup_loop (gfc_loopinfo * loop)
 {
+  gfc_loopinfo *loop_next, **ploop;
   gfc_ss *ss;
   gfc_ss *next;
 
@@ -583,6 +614,44 @@ gfc_cleanup_loop (gfc_loopinfo * loop)
       gfc_free_ss (ss);
       ss = next;
     }
+
+  /* Remove reference to self in the parent loop.  */
+  if (loop->parent)
+    for (ploop = &loop->parent->nested; *ploop; ploop = &(*ploop)->next)
+      if (*ploop == loop)
+	{
+	  *ploop = loop->next;
+	  break;
+	}
+
+  /* Free non-freed nested loops.  */
+  for (loop = loop->nested; loop; loop = loop_next)
+    {
+      loop_next = loop->next;
+      gfc_cleanup_loop (loop);
+      free (loop);
+    }
+}
+
+
+static void
+set_ss_loop (gfc_ss *ss, gfc_loopinfo *loop)
+{
+  int n;
+
+  for (; ss != gfc_ss_terminator; ss = ss->next)
+    {
+      ss->loop = loop;
+
+      if (ss->info->type == GFC_SS_SCALAR
+	  || ss->info->type == GFC_SS_REFERENCE
+	  || ss->info->type == GFC_SS_TEMP)
+	continue;
+
+      for (n = 0; n < GFC_MAX_DIMENSIONS; n++)
+	if (ss->info->data.array.subscript[n] != NULL)
+	  set_ss_loop (ss->info->data.array.subscript[n], loop);
+    }
 }
 
 
@@ -592,13 +661,36 @@ void
 gfc_add_ss_to_loop (gfc_loopinfo * loop, gfc_ss * head)
 {
   gfc_ss *ss;
+  gfc_loopinfo *nested_loop;
 
   if (head == gfc_ss_terminator)
     return;
 
+  set_ss_loop (head, loop);
+
   ss = head;
   for (; ss && ss != gfc_ss_terminator; ss = ss->next)
     {
+      if (ss->nested_ss)
+	{
+	  nested_loop = ss->nested_ss->loop;
+
+	  /* More than one ss can belong to the same loop.  Hence, we add the
+	     loop to the chain only if it is different from the previously
+	     added one, to avoid duplicate nested loops.  */
+	  if (nested_loop != loop->nested)
+	    {
+	      gcc_assert (nested_loop->parent == NULL);
+	      nested_loop->parent = loop;
+
+	      gcc_assert (nested_loop->next == NULL);
+	      nested_loop->next = loop->nested;
+	      loop->nested = nested_loop;
+	    }
+	  else
+	    gcc_assert (nested_loop->parent == loop);
+	}
+
       if (ss->next == gfc_ss_terminator)
 	ss->loop_chain = loop->ss;
       else
@@ -633,41 +725,54 @@ void
 gfc_set_loop_bounds_from_array_spec (gfc_interface_mapping * mapping,
 				     gfc_se * se, gfc_array_spec * as)
 {
-  int n, dim;
+  int n, dim, total_dim;
   gfc_se tmpse;
+  gfc_ss *ss;
   tree lower;
   tree upper;
   tree tmp;
 
-  if (as && as->type == AS_EXPLICIT)
-    for (n = 0; n < se->loop->dimen; n++)
-      {
-	dim = se->ss->data.info.dim[n];
-	gcc_assert (dim < as->rank);
-	gcc_assert (se->loop->dimen == as->rank);
-	if (se->loop->to[n] == NULL_TREE)
-	  {
-	    /* Evaluate the lower bound.  */
-	    gfc_init_se (&tmpse, NULL);
-	    gfc_apply_interface_mapping (mapping, &tmpse, as->lower[dim]);
-	    gfc_add_block_to_block (&se->pre, &tmpse.pre);
-	    gfc_add_block_to_block (&se->post, &tmpse.post);
-	    lower = fold_convert (gfc_array_index_type, tmpse.expr);
-
-	    /* ...and the upper bound.  */
-	    gfc_init_se (&tmpse, NULL);
-	    gfc_apply_interface_mapping (mapping, &tmpse, as->upper[dim]);
-	    gfc_add_block_to_block (&se->pre, &tmpse.pre);
-	    gfc_add_block_to_block (&se->post, &tmpse.post);
-	    upper = fold_convert (gfc_array_index_type, tmpse.expr);
-
-	    /* Set the upper bound of the loop to UPPER - LOWER.  */
-	    tmp = fold_build2_loc (input_location, MINUS_EXPR,
-				   gfc_array_index_type, upper, lower);
-	    tmp = gfc_evaluate_now (tmp, &se->pre);
-	    se->loop->to[n] = tmp;
-	  }
-      }
+  total_dim = 0;
+
+  if (!as || as->type != AS_EXPLICIT)
+    return;
+
+  for (ss = se->ss; ss; ss = ss->parent)
+    {
+      total_dim += ss->loop->dimen;
+      for (n = 0; n < ss->loop->dimen; n++)
+	{
+	  /* The bound is known, nothing to do.  */
+	  if (ss->loop->to[n] != NULL_TREE)
+	    continue;
+
+	  dim = ss->dim[n];
+	  gcc_assert (dim < as->rank);
+	  gcc_assert (ss->loop->dimen <= as->rank);
+
+	  /* Evaluate the lower bound.  */
+	  gfc_init_se (&tmpse, NULL);
+	  gfc_apply_interface_mapping (mapping, &tmpse, as->lower[dim]);
+	  gfc_add_block_to_block (&se->pre, &tmpse.pre);
+	  gfc_add_block_to_block (&se->post, &tmpse.post);
+	  lower = fold_convert (gfc_array_index_type, tmpse.expr);
+
+	  /* ...and the upper bound.  */
+	  gfc_init_se (&tmpse, NULL);
+	  gfc_apply_interface_mapping (mapping, &tmpse, as->upper[dim]);
+	  gfc_add_block_to_block (&se->pre, &tmpse.pre);
+	  gfc_add_block_to_block (&se->post, &tmpse.post);
+	  upper = fold_convert (gfc_array_index_type, tmpse.expr);
+
+	  /* Set the upper bound of the loop to UPPER - LOWER.  */
+	  tmp = fold_build2_loc (input_location, MINUS_EXPR,
+				 gfc_array_index_type, upper, lower);
+	  tmp = gfc_evaluate_now (tmp, &se->pre);
+	  ss->loop->to[n] = tmp;
+	}
+    }
+
+  gcc_assert (total_dim == as->rank);
 }
 
 
@@ -685,7 +790,7 @@ gfc_set_loop_bounds_from_array_spec (gfc_interface_mapping * mapping,
 
 static void
 gfc_trans_allocate_array_storage (stmtblock_t * pre, stmtblock_t * post,
-				  gfc_ss_info * info, tree size, tree nelem,
+				  gfc_array_info * info, tree size, tree nelem,
 				  tree initial, bool dynamic, bool dealloc)
 {
   tree tmp;
@@ -800,28 +905,62 @@ gfc_trans_allocate_array_storage (stmtblock_t * pre, stmtblock_t * post,
 }
 
 
-/* Get the array reference dimension corresponding to the given loop dimension.
-   It is different from the true array dimension given by the dim array in
-   the case of a partial array reference
-   It is different from the loop dimension in the case of a transposed array.
-   */
+/* Get the scalarizer array dimension corresponding to actual array dimension
+   given by ARRAY_DIM.
+
+   For example, if SS represents the array ref a(1,:,:,1), it is a
+   bidimensional scalarizer array, and the result would be 0 for ARRAY_DIM=1,
+   and 1 for ARRAY_DIM=2.
+   If SS represents transpose(a(:,1,1,:)), it is again a bidimensional
+   scalarizer array, and the result would be 1 for ARRAY_DIM=0 and 0 for
+   ARRAY_DIM=3.
+   If SS represents sum(a(:,:,:,1), dim=1), it is a 2+1-dimensional scalarizer
+   array.  If called on the inner ss, the result would be respectively 0,1,2 for
+   ARRAY_DIM=0,1,2.  If called on the outer ss, the result would be 0,1
+   for ARRAY_DIM=1,2.  */
 
 static int
-get_array_ref_dim (gfc_ss_info *info, int loop_dim)
+get_scalarizer_dim_for_array_dim (gfc_ss *ss, int array_dim)
 {
-  int n, array_dim, array_ref_dim;
+  int array_ref_dim;
+  int n;
 
   array_ref_dim = 0;
-  array_dim = info->dim[loop_dim];
 
-  for (n = 0; n < info->dimen; n++)
-    if (n != loop_dim && info->dim[n] < array_dim)
-      array_ref_dim++;
+  for (; ss; ss = ss->parent)
+    for (n = 0; n < ss->dimen; n++)
+      if (ss->dim[n] < array_dim)
+	array_ref_dim++;
 
   return array_ref_dim;
 }
 
 
+static gfc_ss *
+innermost_ss (gfc_ss *ss)
+{
+  while (ss->nested_ss != NULL)
+    ss = ss->nested_ss;
+
+  return ss;
+}
+
+
+
+/* Get the array reference dimension corresponding to the given loop dimension.
+   It is different from the true array dimension given by the dim array in
+   the case of a partial array reference (i.e. a(:,:,1,:) for example)
+   It is different from the loop dimension in the case of a transposed array.
+   */
+
+static int
+get_array_ref_dim_for_loop_dim (gfc_ss *ss, int loop_dim)
+{
+  return get_scalarizer_dim_for_array_dim (innermost_ss (ss),
+					   ss->dim[loop_dim]);
+}
+
+
 /* Generate code to create and initialize the descriptor for a temporary
    array.  This is used for both temporaries needed by the scalarizer, and
    functions returning arrays.  Adjusts the loop variables to be
@@ -833,15 +972,16 @@ get_array_ref_dim (gfc_ss_info *info, int loop_dim)
    callee allocated array.
 
    PRE, POST, INITIAL, DYNAMIC and DEALLOC are as for
-   gfc_trans_allocate_array_storage.
- */
+   gfc_trans_allocate_array_storage.  */
 
 tree
-gfc_trans_create_temp_array (stmtblock_t * pre, stmtblock_t * post,
-			     gfc_loopinfo * loop, gfc_ss_info * info,
+gfc_trans_create_temp_array (stmtblock_t * pre, stmtblock_t * post, gfc_ss * ss,
 			     tree eltype, tree initial, bool dynamic,
 			     bool dealloc, bool callee_alloc, locus * where)
 {
+  gfc_loopinfo *loop;
+  gfc_ss *s;
+  gfc_array_info *info;
   tree from[GFC_MAX_DIMENSIONS], to[GFC_MAX_DIMENSIONS];
   tree type;
   tree desc;
@@ -851,49 +991,63 @@ gfc_trans_create_temp_array (stmtblock_t * pre, stmtblock_t * post,
   tree cond;
   tree or_expr;
   int n, dim, tmp_dim;
+  int total_dim = 0;
 
   memset (from, 0, sizeof (from));
   memset (to, 0, sizeof (to));
 
-  gcc_assert (info->dimen > 0);
-  gcc_assert (loop->dimen == info->dimen);
+  info = &ss->info->data.array;
+
+  gcc_assert (ss->dimen > 0);
+  gcc_assert (ss->loop->dimen == ss->dimen);
 
   if (gfc_option.warn_array_temp && where)
     gfc_warning ("Creating array temporary at %L", where);
 
   /* Set the lower bound to zero.  */
-  for (n = 0; n < loop->dimen; n++)
+  for (s = ss; s; s = s->parent)
     {
-      dim = info->dim[n];
+      loop = s->loop;
 
-      /* Callee allocated arrays may not have a known bound yet.  */
-      if (loop->to[n])
-	loop->to[n] = gfc_evaluate_now (
+      total_dim += loop->dimen;
+      for (n = 0; n < loop->dimen; n++)
+	{
+	  dim = s->dim[n];
+
+	  /* Callee allocated arrays may not have a known bound yet.  */
+	  if (loop->to[n])
+	    loop->to[n] = gfc_evaluate_now (
 			fold_build2_loc (input_location, MINUS_EXPR,
 					 gfc_array_index_type,
 					 loop->to[n], loop->from[n]),
 			pre);
-      loop->from[n] = gfc_index_zero_node;
-
-      /* We are constructing the temporary's descriptor based on the loop
-	 dimensions. As the dimensions may be accessed in arbitrary order
-	 (think of transpose) the size taken from the n'th loop may not map
-	 to the n'th dimension of the array. We need to reconstruct loop infos
-	 in the right order before using it to set the descriptor
-	 bounds.  */
-      tmp_dim = get_array_ref_dim (info, n);
-      from[tmp_dim] = loop->from[n];
-      to[tmp_dim] = loop->to[n];
-
-      info->delta[dim] = gfc_index_zero_node;
-      info->start[dim] = gfc_index_zero_node;
-      info->end[dim] = gfc_index_zero_node;
-      info->stride[dim] = gfc_index_one_node;
+	  loop->from[n] = gfc_index_zero_node;
+
+	  /* We have just changed the loop bounds, we must clear the
+	     corresponding specloop, so that delta calculation is not skipped
+	     later in gfc_set_delta.  */
+	  loop->specloop[n] = NULL;
+
+	  /* We are constructing the temporary's descriptor based on the loop
+	     dimensions.  As the dimensions may be accessed in arbitrary order
+	     (think of transpose) the size taken from the n'th loop may not map
+	     to the n'th dimension of the array.  We need to reconstruct loop
+	     infos in the right order before using it to set the descriptor
+	     bounds.  */
+	  tmp_dim = get_scalarizer_dim_for_array_dim (ss, dim);
+	  from[tmp_dim] = loop->from[n];
+	  to[tmp_dim] = loop->to[n];
+
+	  info->delta[dim] = gfc_index_zero_node;
+	  info->start[dim] = gfc_index_zero_node;
+	  info->end[dim] = gfc_index_zero_node;
+	  info->stride[dim] = gfc_index_one_node;
+	}
     }
 
   /* Initialize the descriptor.  */
   type =
-    gfc_get_array_type_bounds (eltype, info->dimen, 0, from, to, 1,
+    gfc_get_array_type_bounds (eltype, total_dim, 0, from, to, 1,
 			       GFC_ARRAY_UNKNOWN, true);
   desc = gfc_create_var (type, "atmp");
   GFC_DECL_PACKED_ARRAY (desc) = 1;
@@ -922,59 +1076,61 @@ gfc_trans_create_temp_array (stmtblock_t * pre, stmtblock_t * post,
 
   /* If there is at least one null loop->to[n], it is a callee allocated
      array.  */
-  for (n = 0; n < loop->dimen; n++)
-    if (loop->to[n] == NULL_TREE)
+  for (n = 0; n < total_dim; n++)
+    if (to[n] == NULL_TREE)
       {
 	size = NULL_TREE;
 	break;
       }
 
-  for (n = 0; n < loop->dimen; n++)
-    {
-      dim = info->dim[n];
-
-      if (size == NULL_TREE)
+  if (size == NULL_TREE)
+    for (s = ss; s; s = s->parent)
+      for (n = 0; n < s->loop->dimen; n++)
 	{
+	  dim = get_scalarizer_dim_for_array_dim (ss, ss->dim[n]);
+
 	  /* For a callee allocated array express the loop bounds in terms
 	     of the descriptor fields.  */
 	  tmp = fold_build2_loc (input_location,
 		MINUS_EXPR, gfc_array_index_type,
 		gfc_conv_descriptor_ubound_get (desc, gfc_rank_cst[dim]),
 		gfc_conv_descriptor_lbound_get (desc, gfc_rank_cst[dim]));
-	  loop->to[n] = tmp;
-	  continue;
+	  s->loop->to[n] = tmp;
 	}
-	
-      /* Store the stride and bound components in the descriptor.  */
-      gfc_conv_descriptor_stride_set (pre, desc, gfc_rank_cst[n], size);
+  else
+    {
+      for (n = 0; n < total_dim; n++)
+	{
+	  /* Store the stride and bound components in the descriptor.  */
+	  gfc_conv_descriptor_stride_set (pre, desc, gfc_rank_cst[n], size);
 
-      gfc_conv_descriptor_lbound_set (pre, desc, gfc_rank_cst[n],
-				      gfc_index_zero_node);
+	  gfc_conv_descriptor_lbound_set (pre, desc, gfc_rank_cst[n],
+					  gfc_index_zero_node);
 
-      gfc_conv_descriptor_ubound_set (pre, desc, gfc_rank_cst[n],
-				      to[n]);
+	  gfc_conv_descriptor_ubound_set (pre, desc, gfc_rank_cst[n], to[n]);
 
-      tmp = fold_build2_loc (input_location, PLUS_EXPR, gfc_array_index_type,
-			     to[n], gfc_index_one_node);
+	  tmp = fold_build2_loc (input_location, PLUS_EXPR,
+				 gfc_array_index_type,
+				 to[n], gfc_index_one_node);
 
-      /* Check whether the size for this dimension is negative.  */
-      cond = fold_build2_loc (input_location, LE_EXPR, boolean_type_node, tmp,
-			      gfc_index_zero_node);
-      cond = gfc_evaluate_now (cond, pre);
+	  /* Check whether the size for this dimension is negative.  */
+	  cond = fold_build2_loc (input_location, LE_EXPR, boolean_type_node,
+				  tmp, gfc_index_zero_node);
+	  cond = gfc_evaluate_now (cond, pre);
 
-      if (n == 0)
-	or_expr = cond;
-      else
-	or_expr = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-				   boolean_type_node, or_expr, cond);
+	  if (n == 0)
+	    or_expr = cond;
+	  else
+	    or_expr = fold_build2_loc (input_location, TRUTH_OR_EXPR,
+				       boolean_type_node, or_expr, cond);
 
-      size = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
-			      size, tmp);
-      size = gfc_evaluate_now (size, pre);
+	  size = fold_build2_loc (input_location, MULT_EXPR,
+				  gfc_array_index_type, size, tmp);
+	  size = gfc_evaluate_now (size, pre);
+	}
     }
 
   /* Get the size of the array.  */
-
   if (size && !callee_alloc)
     {
       /* If or_expr is true, then the extent in at least one
@@ -997,8 +1153,11 @@ gfc_trans_create_temp_array (stmtblock_t * pre, stmtblock_t * post,
   gfc_trans_allocate_array_storage (pre, post, info, size, nelem, initial,
 				    dynamic, dealloc);
 
-  if (info->dimen > loop->temp_dim)
-    loop->temp_dim = info->dimen;
+  while (ss->parent)
+    ss = ss->parent;
+
+  if (ss->dimen > ss->loop->temp_dim)
+    ss->loop->temp_dim = ss->dimen;
 
   return size;
 }
@@ -1849,77 +2008,120 @@ gfc_build_constant_array_constructor (gfc_expr * expr, tree type)
    gfc_build_constant_array_constructor.  */
 
 static void
-gfc_trans_constant_array_constructor (gfc_loopinfo * loop,
-				      gfc_ss * ss, tree type)
+trans_constant_array_constructor (gfc_ss * ss, tree type)
 {
-  gfc_ss_info *info;
+  gfc_array_info *info;
   tree tmp;
   int i;
 
-  tmp = gfc_build_constant_array_constructor (ss->expr, type);
+  tmp = gfc_build_constant_array_constructor (ss->info->expr, type);
 
-  info = &ss->data.info;
+  info = &ss->info->data.array;
 
   info->descriptor = tmp;
   info->data = gfc_build_addr_expr (NULL_TREE, tmp);
   info->offset = gfc_index_zero_node;
 
-  for (i = 0; i < info->dimen; i++)
+  for (i = 0; i < ss->dimen; i++)
     {
       info->delta[i] = gfc_index_zero_node;
       info->start[i] = gfc_index_zero_node;
       info->end[i] = gfc_index_zero_node;
       info->stride[i] = gfc_index_one_node;
     }
+}
+
+
+static int
+get_rank (gfc_loopinfo *loop)
+{
+  int rank;
 
-  if (info->dimen > loop->temp_dim)
-    loop->temp_dim = info->dimen;
+  rank = 0;
+  for (; loop; loop = loop->parent)
+    rank += loop->dimen;
+
+  return rank;
 }
 
+
 /* Helper routine of gfc_trans_array_constructor to determine if the
    bounds of the loop specified by LOOP are constant and simple enough
-   to use with gfc_trans_constant_array_constructor.  Returns the
+   to use with trans_constant_array_constructor.  Returns the
    iteration count of the loop if suitable, and NULL_TREE otherwise.  */
 
 static tree
-constant_array_constructor_loop_size (gfc_loopinfo * loop)
+constant_array_constructor_loop_size (gfc_loopinfo * l)
 {
+  gfc_loopinfo *loop;
   tree size = gfc_index_one_node;
   tree tmp;
-  int i;
+  int i, total_dim;
+
+  total_dim = get_rank (l);
 
-  for (i = 0; i < loop->dimen; i++)
+  for (loop = l; loop; loop = loop->parent)
     {
-      /* If the bounds aren't constant, return NULL_TREE.  */
-      if (!INTEGER_CST_P (loop->from[i]) || !INTEGER_CST_P (loop->to[i]))
-	return NULL_TREE;
-      if (!integer_zerop (loop->from[i]))
+      for (i = 0; i < loop->dimen; i++)
 	{
-	  /* Only allow nonzero "from" in one-dimensional arrays.  */
-	  if (loop->dimen != 1)
+	  /* If the bounds aren't constant, return NULL_TREE.  */
+	  if (!INTEGER_CST_P (loop->from[i]) || !INTEGER_CST_P (loop->to[i]))
 	    return NULL_TREE;
-	  tmp = fold_build2_loc (input_location, MINUS_EXPR,
-				 gfc_array_index_type,
-				 loop->to[i], loop->from[i]);
+	  if (!integer_zerop (loop->from[i]))
+	    {
+	      /* Only allow nonzero "from" in one-dimensional arrays.  */
+	      if (total_dim != 1)
+		return NULL_TREE;
+	      tmp = fold_build2_loc (input_location, MINUS_EXPR,
+				     gfc_array_index_type,
+				     loop->to[i], loop->from[i]);
+	    }
+	  else
+	    tmp = loop->to[i];
+	  tmp = fold_build2_loc (input_location, PLUS_EXPR,
+				 gfc_array_index_type, tmp, gfc_index_one_node);
+	  size = fold_build2_loc (input_location, MULT_EXPR,
+				  gfc_array_index_type, size, tmp);
 	}
-      else
-	tmp = loop->to[i];
-      tmp = fold_build2_loc (input_location, PLUS_EXPR, gfc_array_index_type,
-			     tmp, gfc_index_one_node);
-      size = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
-			      size, tmp);
     }
 
   return size;
 }
 
 
+static tree *
+get_loop_upper_bound_for_array (gfc_ss *array, int array_dim)
+{
+  gfc_ss *ss;
+  int n;
+
+  gcc_assert (array->nested_ss == NULL);
+
+  for (ss = array; ss; ss = ss->parent)
+    for (n = 0; n < ss->loop->dimen; n++)
+      if (array_dim == get_array_ref_dim_for_loop_dim (ss, n))
+	return &(ss->loop->to[n]);
+
+  gcc_unreachable ();
+}
+
+
+static gfc_loopinfo *
+outermost_loop (gfc_loopinfo * loop)
+{
+  while (loop->parent != NULL)
+    loop = loop->parent;
+
+  return loop;
+}
+
+
 /* Array constructors are handled by constructing a temporary, then using that
    within the scalarization loop.  This is not optimal, but seems by far the
    simplest method.  */
 
 static void
-gfc_trans_array_constructor (gfc_loopinfo * loop, gfc_ss * ss, locus * where)
+trans_array_constructor (gfc_ss * ss)
 {
   gfc_constructor_base c;
   tree offset;
@@ -1927,90 +2129,107 @@ gfc_trans_array_constructor (gfc_loopinfo * loop, gfc_ss * ss, locus * where)
   tree desc;
   tree type;
   tree tmp;
+  tree *loop_ubound0;
   bool dynamic;
   bool old_first_len, old_typespec_chararray_ctor;
   tree old_first_len_val;
+  gfc_loopinfo *loop, *outer_loop;
+  gfc_ss_info *ss_info;
+  gfc_expr *expr;
+  gfc_ss *s;
 
   /* Save the old values for nested checking.  */
   old_first_len = first_len;
   old_first_len_val = first_len_val;
   old_typespec_chararray_ctor = typespec_chararray_ctor;
 
+  loop = ss->loop;
+  outer_loop = outermost_loop (loop);
+  ss_info = ss->info;
+  expr = ss_info->expr;
+
   /* Do bounds-checking here and in gfc_trans_array_ctor_element only if no
      typespec was given for the array constructor.  */
-  typespec_chararray_ctor = (ss->expr->ts.u.cl
-			     && ss->expr->ts.u.cl->length_from_typespec);
+  typespec_chararray_ctor = (expr->ts.u.cl
+			     && expr->ts.u.cl->length_from_typespec);
 
   if ((gfc_option.rtcheck & GFC_RTCHECK_BOUNDS)
-      && ss->expr->ts.type == BT_CHARACTER && !typespec_chararray_ctor)
+      && expr->ts.type == BT_CHARACTER && !typespec_chararray_ctor)
     {  
       first_len_val = gfc_create_var (gfc_charlen_type_node, "len");
       first_len = true;
     }
 
-  gcc_assert (ss->data.info.dimen == loop->dimen);
+  gcc_assert (ss->dimen == ss->loop->dimen);
 
-  c = ss->expr->value.constructor;
-  if (ss->expr->ts.type == BT_CHARACTER)
+  c = expr->value.constructor;
+  if (expr->ts.type == BT_CHARACTER)
     {
       bool const_string;
       
       /* get_array_ctor_strlen walks the elements of the constructor, if a
 	 typespec was given, we already know the string length and want the one
 	 specified there.  */
-      if (typespec_chararray_ctor && ss->expr->ts.u.cl->length
-	  && ss->expr->ts.u.cl->length->expr_type != EXPR_CONSTANT)
+      if (typespec_chararray_ctor && expr->ts.u.cl->length
+	  && expr->ts.u.cl->length->expr_type != EXPR_CONSTANT)
 	{
 	  gfc_se length_se;
 
 	  const_string = false;
 	  gfc_init_se (&length_se, NULL);
-	  gfc_conv_expr_type (&length_se, ss->expr->ts.u.cl->length,
+	  gfc_conv_expr_type (&length_se, expr->ts.u.cl->length,
 			      gfc_charlen_type_node);
-	  ss->string_length = length_se.expr;
-	  gfc_add_block_to_block (&loop->pre, &length_se.pre);
-	  gfc_add_block_to_block (&loop->post, &length_se.post);
+	  ss_info->string_length = length_se.expr;
+	  gfc_add_block_to_block (&outer_loop->pre, &length_se.pre);
+	  gfc_add_block_to_block (&outer_loop->post, &length_se.post);
 	}
       else
-	const_string = get_array_ctor_strlen (&loop->pre, c,
-					      &ss->string_length);
+	const_string = get_array_ctor_strlen (&outer_loop->pre, c,
+					      &ss_info->string_length);
 
       /* Complex character array constructors should have been taken care of
 	 and not end up here.  */
-      gcc_assert (ss->string_length);
+      gcc_assert (ss_info->string_length);
 
-      ss->expr->ts.u.cl->backend_decl = ss->string_length;
+      expr->ts.u.cl->backend_decl = ss_info->string_length;
 
-      type = gfc_get_character_type_len (ss->expr->ts.kind, ss->string_length);
+      type = gfc_get_character_type_len (expr->ts.kind, ss_info->string_length);
       if (const_string)
 	type = build_pointer_type (type);
     }
   else
-    type = gfc_typenode_for_spec (&ss->expr->ts);
+    type = gfc_typenode_for_spec (&expr->ts);
 
   /* See if the constructor determines the loop bounds.  */
   dynamic = false;
 
-  if (ss->expr->shape && loop->dimen > 1 && loop->to[0] == NULL_TREE)
+  loop_ubound0 = get_loop_upper_bound_for_array (ss, 0);
+
+  if (expr->shape && get_rank (loop) > 1 && *loop_ubound0 == NULL_TREE)
     {
       /* We have a multidimensional parameter.  */
-      int n;
-      for (n = 0; n < ss->expr->rank; n++)
-      {
-	loop->from[n] = gfc_index_zero_node;
-	loop->to[n] = gfc_conv_mpz_to_tree (ss->expr->shape [n],
-					    gfc_index_integer_kind);
-	loop->to[n] = fold_build2_loc (input_location, MINUS_EXPR,
-			  	       gfc_array_index_type,
-				       loop->to[n], gfc_index_one_node);
-      }
+      for (s = ss; s; s = s->parent)
+	{
+	  int n;
+	  for (n = 0; n < s->loop->dimen; n++)
+	    {
+	      s->loop->from[n] = gfc_index_zero_node;
+	      s->loop->to[n] = gfc_conv_mpz_to_tree (expr->shape[s->dim[n]],
+						     gfc_index_integer_kind);
+	      s->loop->to[n] = fold_build2_loc (input_location, MINUS_EXPR,
+						gfc_array_index_type,
+						s->loop->to[n],
+						gfc_index_one_node);
+	    }
+	}
     }
 
-  if (loop->to[0] == NULL_TREE)
+  if (*loop_ubound0 == NULL_TREE)
     {
       mpz_t size;
 
       /* We should have a 1-dimensional, zero-based loop.  */
+      gcc_assert (loop->parent == NULL && loop->nested == NULL);
       gcc_assert (loop->dimen == 1);
       gcc_assert (integer_zerop (loop->from[0]));
 
@@ -2033,24 +2252,24 @@ gfc_trans_array_constructor (gfc_loopinfo * loop, gfc_ss * ss, locus * where)
 	  tree size = constant_array_constructor_loop_size (loop);
 	  if (size && compare_tree_int (size, nelem) == 0)
 	    {
-	      gfc_trans_constant_array_constructor (loop, ss, type);
+	      trans_constant_array_constructor (ss, type);
 	      goto finish;
 	    }
 	}
     }
 
-  if (TREE_CODE (loop->to[0]) == VAR_DECL)
+  if (TREE_CODE (*loop_ubound0) == VAR_DECL)
     dynamic = true;
 
-  gfc_trans_create_temp_array (&loop->pre, &loop->post, loop, &ss->data.info,
-			       type, NULL_TREE, dynamic, true, false, where);
+  gfc_trans_create_temp_array (&outer_loop->pre, &outer_loop->post, ss, type,
+			       NULL_TREE, dynamic, true, false, &expr->where);
 
-  desc = ss->data.info.descriptor;
+  desc = ss_info->data.array.descriptor;
   offset = gfc_index_zero_node;
   offsetvar = gfc_create_var_np (gfc_array_index_type, "offset");
   TREE_NO_WARNING (offsetvar) = 1;
   TREE_USED (offsetvar) = 0;
-  gfc_trans_array_constructor_value (&loop->pre, type, desc, c,
+  gfc_trans_array_constructor_value (&outer_loop->pre, type, desc, c,
 				     &offset, &offsetvar, dynamic);
 
   /* If the array grows dynamically, the upper bound of the loop variable
@@ -2060,12 +2279,12 @@ gfc_trans_array_constructor (gfc_loopinfo * loop, gfc_ss * ss, locus * where)
       tmp = fold_build2_loc (input_location, MINUS_EXPR,
 			     gfc_array_index_type,
 			     offsetvar, gfc_index_one_node);
-      tmp = gfc_evaluate_now (tmp, &loop->pre);
+      tmp = gfc_evaluate_now (tmp, &outer_loop->pre);
       gfc_conv_descriptor_ubound_set (&loop->pre, desc, gfc_rank_cst[0], tmp);
-      if (loop->to[0] && TREE_CODE (loop->to[0]) == VAR_DECL)
-	gfc_add_modify (&loop->pre, loop->to[0], tmp);
+      if (*loop_ubound0 && TREE_CODE (*loop_ubound0) == VAR_DECL)
+	gfc_add_modify (&outer_loop->pre, *loop_ubound0, tmp);
       else
-	loop->to[0] = tmp;
+	*loop_ubound0 = tmp;
     }
 
   if (TREE_USED (offsetvar))
@@ -2095,8 +2314,10 @@ finish:
    loop bounds.  */
 
 static void
-gfc_set_vector_loop_bounds (gfc_loopinfo * loop, gfc_ss_info * info)
+set_vector_loop_bounds (gfc_ss * ss)
 {
+  gfc_loopinfo *loop, *outer_loop;
+  gfc_array_info *info;
   gfc_se se;
   tree tmp;
   tree desc;
@@ -2104,27 +2325,36 @@ gfc_set_vector_loop_bounds (gfc_loopinfo * loop, gfc_ss_info * info)
   int n;
   int dim;
 
-  for (n = 0; n < loop->dimen; n++)
+  outer_loop = outermost_loop (ss->loop);
+
+  info = &ss->info->data.array;
+
+  for (; ss; ss = ss->parent)
     {
-      dim = info->dim[n];
-      if (info->ref->u.ar.dimen_type[dim] == DIMEN_VECTOR
-	  && loop->to[n] == NULL)
+      loop = ss->loop;
+
+      for (n = 0; n < loop->dimen; n++)
 	{
+	  dim = ss->dim[n];
+	  if (info->ref->u.ar.dimen_type[dim] != DIMEN_VECTOR
+	      || loop->to[n] != NULL)
+	    continue;
+
 	  /* Loop variable N indexes vector dimension DIM, and we don't
 	     yet know the upper bound of loop variable N.  Set it to the
 	     difference between the vector's upper and lower bounds.  */
 	  gcc_assert (loop->from[n] == gfc_index_zero_node);
 	  gcc_assert (info->subscript[dim]
-		      && info->subscript[dim]->type == GFC_SS_VECTOR);
+		      && info->subscript[dim]->info->type == GFC_SS_VECTOR);
 
 	  gfc_init_se (&se, NULL);
-	  desc = info->subscript[dim]->data.info.descriptor;
+	  desc = info->subscript[dim]->info->data.array.descriptor;
 	  zero = gfc_rank_cst[0];
 	  tmp = fold_build2_loc (input_location, MINUS_EXPR,
 			     gfc_array_index_type,
 			     gfc_conv_descriptor_ubound_get (desc, zero),
 			     gfc_conv_descriptor_lbound_get (desc, zero));
-	  tmp = gfc_evaluate_now (tmp, &loop->pre);
+	  tmp = gfc_evaluate_now (tmp, &outer_loop->pre);
 	  loop->to[n] = tmp;
 	}
     }
@@ -2136,12 +2366,18 @@ gfc_set_vector_loop_bounds (gfc_loopinfo * loop, gfc_ss_info * info)
    but before the actual scalarizing loops.  */
 
 static void
-gfc_add_loop_ss_code (gfc_loopinfo * loop, gfc_ss * ss, bool subscript,
-		      locus * where)
+add_loop_ss_code (gfc_loopinfo * loop, gfc_ss * ss, bool subscript)
 {
+  gfc_loopinfo *nested_loop, *outer_loop;
   gfc_se se;
+  gfc_ss_info *ss_info;
+  gfc_array_info *info;
+  gfc_expr *expr;
+  bool skip_nested = false;
   int n;
 
+  outer_loop = outermost_loop (loop);
+
   /* TODO: This can generate bad code if there are ordering dependencies,
      e.g., a callee allocated function and an unknown size constructor.  */
   gcc_assert (ss != NULL);
@@ -2150,61 +2386,74 @@ gfc_add_loop_ss_code (gfc_loopinfo * loop, gfc_ss * ss, bool subscript,
     {
       gcc_assert (ss);
 
-      switch (ss->type)
+      /* Cross loop arrays are handled from within the most nested loop.  */
+      if (ss->nested_ss != NULL)
+	continue;
+
+      ss_info = ss->info;
+      expr = ss_info->expr;
+      info = &ss_info->data.array;
+
+      switch (ss_info->type)
 	{
 	case GFC_SS_SCALAR:
 	  /* Scalar expression.  Evaluate this now.  This includes elemental
 	     dimension indices, but not array section bounds.  */
 	  gfc_init_se (&se, NULL);
-	  gfc_conv_expr (&se, ss->expr);
-	  gfc_add_block_to_block (&loop->pre, &se.pre);
+	  gfc_conv_expr (&se, expr);
+	  gfc_add_block_to_block (&outer_loop->pre, &se.pre);
 
-	  if (ss->expr->ts.type != BT_CHARACTER)
+	  if (expr->ts.type != BT_CHARACTER)
 	    {
 	      /* Move the evaluation of scalar expressions outside the
 		 scalarization loop, except for WHERE assignments.  */
 	      if (subscript)
 		se.expr = convert(gfc_array_index_type, se.expr);
-	      if (!ss->where)
-		se.expr = gfc_evaluate_now (se.expr, &loop->pre);
-	      gfc_add_block_to_block (&loop->pre, &se.post);
+	      if (!ss_info->where)
+		se.expr = gfc_evaluate_now (se.expr, &outer_loop->pre);
+	      gfc_add_block_to_block (&outer_loop->pre, &se.post);
 	    }
 	  else
-	    gfc_add_block_to_block (&loop->post, &se.post);
+	    gfc_add_block_to_block (&outer_loop->post, &se.post);
 
-	  ss->data.scalar.expr = se.expr;
-	  ss->string_length = se.string_length;
+	  ss_info->data.scalar.value = se.expr;
+	  ss_info->string_length = se.string_length;
 	  break;
 
 	case GFC_SS_REFERENCE:
 	  /* Scalar argument to elemental procedure.  Evaluate this
 	     now.  */
 	  gfc_init_se (&se, NULL);
-	  gfc_conv_expr (&se, ss->expr);
-	  gfc_add_block_to_block (&loop->pre, &se.pre);
-	  gfc_add_block_to_block (&loop->post, &se.post);
+	  gfc_conv_expr (&se, expr);
+	  gfc_add_block_to_block (&outer_loop->pre, &se.pre);
+	  gfc_add_block_to_block (&outer_loop->post, &se.post);
 
-	  ss->data.scalar.expr = gfc_evaluate_now (se.expr, &loop->pre);
-	  ss->string_length = se.string_length;
+	  ss_info->data.scalar.value = gfc_evaluate_now (se.expr,
+							 &outer_loop->pre);
+	  ss_info->string_length = se.string_length;
 	  break;
 
 	case GFC_SS_SECTION:
 	  /* Add the expressions for scalar and vector subscripts.  */
 	  for (n = 0; n < GFC_MAX_DIMENSIONS; n++)
-	    if (ss->data.info.subscript[n])
-	      gfc_add_loop_ss_code (loop, ss->data.info.subscript[n], true,
-				    where);
-
-	  gfc_set_vector_loop_bounds (loop, &ss->data.info);
+	    if (info->subscript[n])
+	      {
+		add_loop_ss_code (loop, info->subscript[n], true);
+		/* The recursive call will have taken care of the nested loops.
+		   No need to do it twice.  */
+		skip_nested = true;
+	      }
+
+	  set_vector_loop_bounds (ss);
 	  break;
 
 	case GFC_SS_VECTOR:
 	  /* Get the vector's descriptor and store it in SS.  */
 	  gfc_init_se (&se, NULL);
-	  gfc_conv_expr_descriptor (&se, ss->expr, gfc_walk_expr (ss->expr));
-	  gfc_add_block_to_block (&loop->pre, &se.pre);
-	  gfc_add_block_to_block (&loop->post, &se.post);
-	  ss->data.info.descriptor = se.expr;
+	  gfc_conv_expr_descriptor (&se, expr, gfc_walk_expr (expr));
+	  gfc_add_block_to_block (&outer_loop->pre, &se.pre);
+	  gfc_add_block_to_block (&outer_loop->post, &se.post);
+	  info->descriptor = se.expr;
 	  break;
 
 	case GFC_SS_INTRINSIC:
@@ -2217,26 +2466,26 @@ gfc_add_loop_ss_code (gfc_loopinfo * loop, gfc_ss * ss, bool subscript,
 	  gfc_init_se (&se, NULL);
 	  se.loop = loop;
 	  se.ss = ss;
-	  gfc_conv_expr (&se, ss->expr);
-	  gfc_add_block_to_block (&loop->pre, &se.pre);
-	  gfc_add_block_to_block (&loop->post, &se.post);
-	  ss->string_length = se.string_length;
+	  gfc_conv_expr (&se, expr);
+	  gfc_add_block_to_block (&outer_loop->pre, &se.pre);
+	  gfc_add_block_to_block (&outer_loop->post, &se.post);
+	  ss_info->string_length = se.string_length;
 	  break;
 
 	case GFC_SS_CONSTRUCTOR:
-	  if (ss->expr->ts.type == BT_CHARACTER
-		&& ss->string_length == NULL
-		&& ss->expr->ts.u.cl
-		&& ss->expr->ts.u.cl->length)
+	  if (expr->ts.type == BT_CHARACTER
+	      && ss_info->string_length == NULL
+	      && expr->ts.u.cl
+	      && expr->ts.u.cl->length)
 	    {
 	      gfc_init_se (&se, NULL);
-	      gfc_conv_expr_type (&se, ss->expr->ts.u.cl->length,
+	      gfc_conv_expr_type (&se, expr->ts.u.cl->length,
 				  gfc_charlen_type_node);
-	      ss->string_length = se.expr;
-	      gfc_add_block_to_block (&loop->pre, &se.pre);
-	      gfc_add_block_to_block (&loop->post, &se.post);
+	      ss_info->string_length = se.expr;
+	      gfc_add_block_to_block (&outer_loop->pre, &se.pre);
+	      gfc_add_block_to_block (&outer_loop->post, &se.post);
 	    }
-	  gfc_trans_array_constructor (loop, ss, where);
+	  trans_array_constructor (ss);
 	  break;
 
         case GFC_SS_TEMP:
@@ -2248,6 +2497,11 @@ gfc_add_loop_ss_code (gfc_loopinfo * loop, gfc_ss * ss, bool subscript,
 	  gcc_unreachable ();
 	}
     }
+
+  if (!skip_nested)
+    for (nested_loop = loop->nested; nested_loop;
+	 nested_loop = nested_loop->next)
+      add_loop_ss_code (nested_loop, nested_loop->ss, subscript);
 }
 
 
@@ -2258,16 +2512,21 @@ static void
 gfc_conv_ss_descriptor (stmtblock_t * block, gfc_ss * ss, int base)
 {
   gfc_se se;
+  gfc_ss_info *ss_info;
+  gfc_array_info *info;
   tree tmp;
 
+  ss_info = ss->info;
+  info = &ss_info->data.array;
+
   /* Get the descriptor for the array to be scalarized.  */
-  gcc_assert (ss->expr->expr_type == EXPR_VARIABLE);
+  gcc_assert (ss_info->expr->expr_type == EXPR_VARIABLE);
   gfc_init_se (&se, NULL);
   se.descriptor_only = 1;
-  gfc_conv_expr_lhs (&se, ss->expr);
+  gfc_conv_expr_lhs (&se, ss_info->expr);
   gfc_add_block_to_block (block, &se.pre);
-  ss->data.info.descriptor = se.expr;
-  ss->string_length = se.string_length;
+  info->descriptor = se.expr;
+  ss_info->string_length = se.string_length;
 
   if (base)
     {
@@ -2281,15 +2540,15 @@ gfc_conv_ss_descriptor (stmtblock_t * block, gfc_ss * ss, int base)
 	    || (TREE_CODE (tmp) == ADDR_EXPR
 		&& DECL_P (TREE_OPERAND (tmp, 0)))))
 	tmp = gfc_evaluate_now (tmp, block);
-      ss->data.info.data = tmp;
+      info->data = tmp;
 
       tmp = gfc_conv_array_offset (se.expr);
-      ss->data.info.offset = gfc_evaluate_now (tmp, block);
+      info->offset = gfc_evaluate_now (tmp, block);
 
       /* Make absolutely sure that the saved_offset is indeed saved
 	 so that the variable is still accessible after the loops
 	 are translated.  */
-      ss->data.info.saved_offset = ss->data.info.offset;
+      info->saved_offset = info->offset;
     }
 }
 
@@ -2430,42 +2689,25 @@ gfc_conv_array_ubound (tree descriptor, int dim)
 /* Generate code to perform an array index bound check.  */
 
 static tree
-gfc_trans_array_bound_check (gfc_se * se, tree descriptor, tree index, int n,
-			     locus * where, bool check_upper)
+trans_array_bound_check (gfc_se * se, gfc_ss *ss, tree index, int n,
+			 locus * where, bool check_upper)
 {
   tree fault;
   tree tmp_lo, tmp_up;
+  tree descriptor;
   char *msg;
   const char * name = NULL;
 
   if (!(gfc_option.rtcheck & GFC_RTCHECK_BOUNDS))
     return index;
 
+  descriptor = ss->info->data.array.descriptor;
+
   index = gfc_evaluate_now (index, &se->pre);
 
   /* We find a name for the error message.  */
-  if (se->ss)
-    name = se->ss->expr->symtree->name;
-
-  if (!name && se->loop && se->loop->ss && se->loop->ss->expr
-      && se->loop->ss->expr->symtree)
-    name = se->loop->ss->expr->symtree->name;
-
-  if (!name && se->loop && se->loop->ss && se->loop->ss->loop_chain
-      && se->loop->ss->loop_chain->expr
-      && se->loop->ss->loop_chain->expr->symtree)
-    name = se->loop->ss->loop_chain->expr->symtree->name;
-
-  if (!name && se->loop && se->loop->ss && se->loop->ss->expr)
-    {
-      if (se->loop->ss->expr->expr_type == EXPR_FUNCTION
-	  && se->loop->ss->expr->value.function.name)
-	name = se->loop->ss->expr->value.function.name;
-      else
-	if (se->loop->ss->type == GFC_SS_CONSTRUCTOR
-	    || se->loop->ss->type == GFC_SS_SCALAR)
-	  name = "unnamed constant";
-    }
+  name = ss->info->expr->symtree->n.sym->name;
+  gcc_assert (name != NULL);
 
   if (TREE_CODE (descriptor) == VAR_DECL)
     name = IDENTIFIER_POINTER (DECL_NAME (descriptor));
@@ -2525,13 +2767,16 @@ gfc_trans_array_bound_check (gfc_se * se, tree descriptor, tree index, int n,
    DIM is the array dimension, I is the loop dimension.  */
 
 static tree
-gfc_conv_array_index_offset (gfc_se * se, gfc_ss_info * info, int dim, int i,
-			     gfc_array_ref * ar, tree stride)
+conv_array_index_offset (gfc_se * se, gfc_ss * ss, int dim, int i,
+			 gfc_array_ref * ar, tree stride)
 {
+  gfc_array_info *info;
   tree index;
   tree desc;
   tree data;
 
+  info = &ss->info->data.array;
+
   /* Get the index into the array for this dimension.  */
   if (ar)
     {
@@ -2544,21 +2789,20 @@ gfc_conv_array_index_offset (gfc_se * se, gfc_ss_info * info, int dim, int i,
 	case DIMEN_ELEMENT:
 	  /* Elemental dimension.  */
 	  gcc_assert (info->subscript[dim]
-		      && info->subscript[dim]->type == GFC_SS_SCALAR);
+		      && info->subscript[dim]->info->type == GFC_SS_SCALAR);
 	  /* We've already translated this value outside the loop.  */
-	  index = info->subscript[dim]->data.scalar.expr;
+	  index = info->subscript[dim]->info->data.scalar.value;
 
-	  index = gfc_trans_array_bound_check (se, info->descriptor,
-			index, dim, &ar->where,
-			ar->as->type != AS_ASSUMED_SIZE
-			|| dim < ar->dimen - 1);
+	  index = trans_array_bound_check (se, ss, index, dim, &ar->where,
+					   ar->as->type != AS_ASSUMED_SIZE
+					   || dim < ar->dimen - 1);
 	  break;
 
 	case DIMEN_VECTOR:
 	  gcc_assert (info && se->loop);
 	  gcc_assert (info->subscript[dim]
-		      && info->subscript[dim]->type == GFC_SS_VECTOR);
-	  desc = info->subscript[dim]->data.info.descriptor;
+		      && info->subscript[dim]->info->type == GFC_SS_VECTOR);
+	  desc = info->subscript[dim]->info->data.array.descriptor;
 
 	  /* Get a zero-based index into the vector.  */
 	  index = fold_build2_loc (input_location, MINUS_EXPR,
@@ -2578,10 +2822,9 @@ gfc_conv_array_index_offset (gfc_se * se, gfc_ss_info * info, int dim, int i,
 	  index = fold_convert (gfc_array_index_type, index);
 
 	  /* Do any bounds checking on the final info->descriptor index.  */
-	  index = gfc_trans_array_bound_check (se, info->descriptor,
-			index, dim, &ar->where,
-			ar->as->type != AS_ASSUMED_SIZE
-			|| dim < ar->dimen - 1);
+	  index = trans_array_bound_check (se, ss, index, dim, &ar->where,
+					   ar->as->type != AS_ASSUMED_SIZE
+					   || dim < ar->dimen - 1);
 	  break;
 
 	case DIMEN_RANGE:
@@ -2613,11 +2856,11 @@ gfc_conv_array_index_offset (gfc_se * se, gfc_ss_info * info, int dim, int i,
       /* Pointer functions can have stride[0] different from unity. 
 	 Use the stride returned by the function call and stored in
 	 the descriptor for the temporary.  */ 
-      if (se->ss && se->ss->type == GFC_SS_FUNCTION
-	    && se->ss->expr
-	    && se->ss->expr->symtree
-	    && se->ss->expr->symtree->n.sym->result
-	    && se->ss->expr->symtree->n.sym->result->attr.pointer)
+      if (se->ss && se->ss->info->type == GFC_SS_FUNCTION
+	  && se->ss->info->expr
+	  && se->ss->info->expr->symtree
+	  && se->ss->info->expr->symtree->n.sym->result
+	  && se->ss->info->expr->symtree->n.sym->result->attr.pointer)
 	stride = gfc_conv_descriptor_stride_get (info->descriptor,
 						 gfc_rank_cst[dim]);
 
@@ -2640,31 +2883,33 @@ gfc_conv_array_index_offset (gfc_se * se, gfc_ss_info * info, int dim, int i,
 static void
 gfc_conv_scalarized_array_ref (gfc_se * se, gfc_array_ref * ar)
 {
-  gfc_ss_info *info;
+  gfc_array_info *info;
   tree decl = NULL_TREE;
   tree index;
   tree tmp;
+  gfc_ss *ss;
+  gfc_expr *expr;
   int n;
 
-  info = &se->ss->data.info;
+  ss = se->ss;
+  expr = ss->info->expr;
+  info = &ss->info->data.array;
   if (ar)
     n = se->loop->order[0];
   else
     n = 0;
 
-  index = gfc_conv_array_index_offset (se, info, info->dim[n], n, ar,
-				       info->stride0);
+  index = conv_array_index_offset (se, ss, ss->dim[n], n, ar, info->stride0);
   /* Add the offset for this dimension to the stored offset for all other
      dimensions.  */
   if (!integer_zerop (info->offset))
     index = fold_build2_loc (input_location, PLUS_EXPR, gfc_array_index_type,
 			     index, info->offset);
 
-  if (se->ss->expr && is_subref_array (se->ss->expr))
-    decl = se->ss->expr->symtree->n.sym->backend_decl;
+  if (expr && is_subref_array (expr))
+    decl = expr->symtree->n.sym->backend_decl;
 
-  tmp = build_fold_indirect_ref_loc (input_location,
-				 info->data);
+  tmp = build_fold_indirect_ref_loc (input_location, info->data);
   se->expr = gfc_build_array_ref (tmp, index, decl);
 }
 
@@ -2674,7 +2919,7 @@ gfc_conv_scalarized_array_ref (gfc_se * se, gfc_array_ref * ar)
 void
 gfc_conv_tmp_array_ref (gfc_se * se)
 {
-  se->string_length = se->ss->string_length;
+  se->string_length = se->ss->info->string_length;
   gfc_conv_scalarized_array_ref (se, NULL);
   gfc_advance_se_ss_chain (se);
 }
@@ -2830,6 +3075,33 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_symbol * sym,
 }
 
 
+/* Add the offset corresponding to array's ARRAY_DIM dimension and loop's
+   LOOP_DIM dimension (if any) to array's offset.  */
+
+static void
+add_array_offset (stmtblock_t *pblock, gfc_loopinfo *loop, gfc_ss *ss,
+		  gfc_array_ref *ar, int array_dim, int loop_dim)
+{
+  gfc_se se;
+  gfc_array_info *info;
+  tree stride, index;
+
+  info = &ss->info->data.array;
+
+  gfc_init_se (&se, NULL);
+  se.loop = loop;
+  se.expr = info->descriptor;
+  stride = gfc_conv_array_stride (info->descriptor, array_dim);
+  index = conv_array_index_offset (&se, ss, array_dim, loop_dim, ar, stride);
+  gfc_add_block_to_block (pblock, &se.pre);
+
+  info->offset = fold_build2_loc (input_location, PLUS_EXPR,
+				  gfc_array_index_type,
+				  info->offset, index);
+  info->offset = gfc_evaluate_now (info->offset, pblock);
+}
+
+
 /* Generate the code to be executed immediately before entering a
    scalarization loop.  */
 
@@ -2837,100 +3109,98 @@ static void
 gfc_trans_preloop_setup (gfc_loopinfo * loop, int dim, int flag,
 			 stmtblock_t * pblock)
 {
-  tree index;
   tree stride;
-  gfc_ss_info *info;
-  gfc_ss *ss;
-  gfc_se se;
+  gfc_ss_info *ss_info;
+  gfc_array_info *info;
+  gfc_ss_type ss_type;
+  gfc_ss *ss, *pss;
+  gfc_loopinfo *ploop;
+  gfc_array_ref *ar;
   int i;
 
   /* This code will be executed before entering the scalarization loop
      for this dimension.  */
   for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
     {
-      if ((ss->useflags & flag) == 0)
+      ss_info = ss->info;
+
+      if ((ss_info->useflags & flag) == 0)
 	continue;
 
-      if (ss->type != GFC_SS_SECTION
-	  && ss->type != GFC_SS_FUNCTION && ss->type != GFC_SS_CONSTRUCTOR
-	  && ss->type != GFC_SS_COMPONENT)
+      ss_type = ss_info->type;
+      if (ss_type != GFC_SS_SECTION
+	  && ss_type != GFC_SS_FUNCTION
+	  && ss_type != GFC_SS_CONSTRUCTOR
+	  && ss_type != GFC_SS_COMPONENT)
 	continue;
 
-      info = &ss->data.info;
+      info = &ss_info->data.array;
 
-      if (dim >= info->dimen)
-	continue;
+      gcc_assert (dim < ss->dimen);
+      gcc_assert (ss->dimen == loop->dimen);
+
+      if (info->ref)
+	ar = &info->ref->u.ar;
+      else
+	ar = NULL;
 
-      if (dim == info->dimen - 1)
+      if (dim == loop->dimen - 1 && loop->parent != NULL)
 	{
-	  /* For the outermost loop calculate the offset due to any
-	     elemental dimensions.  It will have been initialized with the
-	     base offset of the array.  */
-	  if (info->ref)
-	    {
-	      for (i = 0; i < info->ref->u.ar.dimen; i++)
-		{
-		  if (info->ref->u.ar.dimen_type[i] != DIMEN_ELEMENT)
-		    continue;
+	  /* If we are in the outermost dimension of this loop, the previous
+	     dimension shall be in the parent loop.  */
+	  gcc_assert (ss->parent != NULL);
 
-		  gfc_init_se (&se, NULL);
-		  se.loop = loop;
-		  se.expr = info->descriptor;
-		  stride = gfc_conv_array_stride (info->descriptor, i);
-		  index = gfc_conv_array_index_offset (&se, info, i, -1,
-						       &info->ref->u.ar,
-						       stride);
-		  gfc_add_block_to_block (pblock, &se.pre);
-
-		  info->offset = fold_build2_loc (input_location, PLUS_EXPR,
-						  gfc_array_index_type,
-						  info->offset, index);
-		  info->offset = gfc_evaluate_now (info->offset, pblock);
-		}
-	    }
+	  pss = ss->parent;
+	  ploop = loop->parent;
 
-	  i = loop->order[0];
-	  /* For the time being, the innermost loop is unconditionally on
-	     the first dimension of the scalarization loop.  */
-	  gcc_assert (i == 0);
-	  stride = gfc_conv_array_stride (info->descriptor, info->dim[i]);
+	  /* ss and ss->parent are about the same array.  */
+	  gcc_assert (ss_info == pss->info);
+	}
+      else
+	{
+	  ploop = loop;
+	  pss = ss;
+	}
+
+      if (dim == loop->dimen - 1)
+	i = 0;
+      else
+	i = dim + 1;
+
+      /* For the time being, there is no loop reordering.  */
+      gcc_assert (i == ploop->order[i]);
+      i = ploop->order[i];
+
+      if (dim == loop->dimen - 1 && loop->parent == NULL)
+	{
+	  stride = gfc_conv_array_stride (info->descriptor,
+					  innermost_ss (ss)->dim[i]);
 
 	  /* Calculate the stride of the innermost loop.  Hopefully this will
 	     allow the backend optimizers to do their stuff more effectively.
 	   */
 	  info->stride0 = gfc_evaluate_now (stride, pblock);
-	}
-      else
-	{
-	  /* Add the offset for the previous loop dimension.  */
-	  gfc_array_ref *ar;
 
+	  /* For the outermost loop calculate the offset due to any
+	     elemental dimensions.  It will have been initialized with the
+	     base offset of the array.  */
 	  if (info->ref)
 	    {
-	      ar = &info->ref->u.ar;
-	      i = loop->order[dim + 1];
-	    }
-	  else
-	    {
-	      ar = NULL;
-	      i = dim + 1;
-	    }
+	      for (i = 0; i < ar->dimen; i++)
+		{
+		  if (ar->dimen_type[i] != DIMEN_ELEMENT)
+		    continue;
 
-	  gfc_init_se (&se, NULL);
-	  se.loop = loop;
-	  se.expr = info->descriptor;
-	  stride = gfc_conv_array_stride (info->descriptor, info->dim[i]);
-	  index = gfc_conv_array_index_offset (&se, info, info->dim[i], i,
-					       ar, stride);
-	  gfc_add_block_to_block (pblock, &se.pre);
-	  info->offset = fold_build2_loc (input_location, PLUS_EXPR,
-					  gfc_array_index_type, info->offset,
-					  index);
-	  info->offset = gfc_evaluate_now (info->offset, pblock);
+		  add_array_offset (pblock, loop, ss, ar, i, /* unused */ -1);
+		}
+	    }
 	}
+      else
+	/* Add the offset for the previous loop dimension.  */
+	add_array_offset (pblock, ploop, ss, ar, pss->dim[i], i);
 
       /* Remember this offset for the second loop.  */
-      if (dim == loop->temp_dim - 1)
+      if (dim == loop->temp_dim - 1 && loop->parent == NULL)
         info->saved_offset = info->offset;
     }
 }
@@ -3114,8 +3384,9 @@ gfc_trans_scalarizing_loops (gfc_loopinfo * loop, stmtblock_t * body)
   gfc_add_expr_to_block (&loop->pre, tmp);
 
   /* Clear all the used flags.  */
-  for (ss = loop->ss; ss; ss = ss->loop_chain)
-    ss->useflags = 0;
+  for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
+    if (ss->parent == NULL)
+      ss->info->useflags = 0;
 }
 
 
@@ -3147,15 +3418,22 @@ gfc_trans_scalarized_loop_boundary (gfc_loopinfo * loop, stmtblock_t * body)
   /* Restore the initial offsets.  */
   for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
     {
-      if ((ss->useflags & 2) == 0)
+      gfc_ss_type ss_type;
+      gfc_ss_info *ss_info;
+
+      ss_info = ss->info;
+
+      if ((ss_info->useflags & 2) == 0)
 	continue;
 
-      if (ss->type != GFC_SS_SECTION
-	  && ss->type != GFC_SS_FUNCTION && ss->type != GFC_SS_CONSTRUCTOR
-	  && ss->type != GFC_SS_COMPONENT)
+      ss_type = ss_info->type;
+      if (ss_type != GFC_SS_SECTION
+	  && ss_type != GFC_SS_FUNCTION
+	  && ss_type != GFC_SS_CONSTRUCTOR
+	  && ss_type != GFC_SS_COMPONENT)
 	continue;
 
-      ss->data.info.offset = ss->data.info.saved_offset;
+      ss_info->data.array.offset = ss_info->data.array.saved_offset;
     }
 
   /* Restart all the inner loops we just finished.  */
@@ -3217,12 +3495,12 @@ gfc_conv_section_startstride (gfc_loopinfo * loop, gfc_ss * ss, int dim)
   gfc_expr *stride = NULL;
   tree desc;
   gfc_se se;
-  gfc_ss_info *info;
+  gfc_array_info *info;
   gfc_array_ref *ar;
 
-  gcc_assert (ss->type == GFC_SS_SECTION);
+  gcc_assert (ss->info->type == GFC_SS_SECTION);
 
-  info = &ss->data.info;
+  info = &ss->info->data.array;
   ar = &info->ref->u.ar;
 
   if (ar->dimen_type[dim] == DIMEN_VECTOR)
@@ -3277,25 +3555,25 @@ gfc_conv_ss_startstride (gfc_loopinfo * loop)
   /* Determine the rank of the loop.  */
   for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
     {
-      switch (ss->type)
+      switch (ss->info->type)
 	{
 	case GFC_SS_SECTION:
 	case GFC_SS_CONSTRUCTOR:
 	case GFC_SS_FUNCTION:
 	case GFC_SS_COMPONENT:
-	  loop->dimen = ss->data.info.dimen;
+	  loop->dimen = ss->dimen;
 	  goto done;
 
 	/* As usual, lbound and ubound are exceptions!.  */
 	case GFC_SS_INTRINSIC:
-	  switch (ss->expr->value.function.isym->id)
+	  switch (ss->info->expr->value.function.isym->id)
 	    {
 	    case GFC_ISYM_LBOUND:
 	    case GFC_ISYM_UBOUND:
 	    case GFC_ISYM_LCOBOUND:
 	    case GFC_ISYM_UCOBOUND:
 	    case GFC_ISYM_THIS_IMAGE:
-	      loop->dimen = ss->data.info.dimen;
+	      loop->dimen = ss->dimen;
 	      goto done;
 
 	    default:
@@ -3315,21 +3593,31 @@ done:
   /* Loop over all the SS in the chain.  */
   for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
     {
-      if (ss->expr && ss->expr->shape && !ss->shape)
-	ss->shape = ss->expr->shape;
+      gfc_ss_info *ss_info;
+      gfc_array_info *info;
+      gfc_expr *expr;
+
+      ss_info = ss->info;
+      expr = ss_info->expr;
+      info = &ss_info->data.array;
+
+      if (expr && expr->shape && !info->shape)
+	info->shape = expr->shape;
 
-      switch (ss->type)
+      switch (ss_info->type)
 	{
 	case GFC_SS_SECTION:
-	  /* Get the descriptor for the array.  */
-	  gfc_conv_ss_descriptor (&loop->pre, ss, !loop->array_parameter);
+	  /* Get the descriptor for the array.  If it is a cross loops array,
+	     we got the descriptor already in the outermost loop.  */
+	  if (ss->parent == NULL)
+	    gfc_conv_ss_descriptor (&loop->pre, ss, !loop->array_parameter);
 
-	  for (n = 0; n < ss->data.info.dimen; n++)
-	    gfc_conv_section_startstride (loop, ss, ss->data.info.dim[n]);
+	  for (n = 0; n < ss->dimen; n++)
+	    gfc_conv_section_startstride (loop, ss, ss->dim[n]);
 	  break;
 
 	case GFC_SS_INTRINSIC:
-	  switch (ss->expr->value.function.isym->id)
+	  switch (expr->value.function.isym->id)
 	    {
 	    /* Fall through to supply start and stride.  */
 	    case GFC_ISYM_LBOUND:
@@ -3345,11 +3633,13 @@ done:
 
 	case GFC_SS_CONSTRUCTOR:
 	case GFC_SS_FUNCTION:
-	  for (n = 0; n < ss->data.info.dimen; n++)
+	  for (n = 0; n < ss->dimen; n++)
 	    {
-	      ss->data.info.start[n] = gfc_index_zero_node;
-	      ss->data.info.end[n] = gfc_index_zero_node;
-	      ss->data.info.stride[n] = gfc_index_one_node;
+	      int dim = ss->dim[n];
+
+	      info->start[dim]  = gfc_index_zero_node;
+	      info->end[dim]    = gfc_index_zero_node;
+	      info->stride[dim] = gfc_index_one_node;
 	    }
 	  break;
 
@@ -3366,7 +3656,7 @@ done:
       tree end;
       tree size[GFC_MAX_DIMENSIONS];
       tree stride_pos, stride_neg, non_zerosized, tmp2, tmp3;
-      gfc_ss_info *info;
+      gfc_array_info *info;
       char *msg;
       int dim;
 
@@ -3378,18 +3668,27 @@ done:
       for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
 	{
 	  stmtblock_t inner;
+	  gfc_ss_info *ss_info;
+	  gfc_expr *expr;
+	  locus *expr_loc;
+	  const char *expr_name;
 
-	  if (ss->type != GFC_SS_SECTION)
+	  ss_info = ss->info;
+	  if (ss_info->type != GFC_SS_SECTION)
 	    continue;
 
 	  /* Catch allocatable lhs in f2003.  */
 	  if (gfc_option.flag_realloc_lhs && ss->is_alloc_lhs)
 	    continue;
 
+	  expr = ss_info->expr;
+	  expr_loc = &expr->where;
+	  expr_name = expr->symtree->name;
+
 	  gfc_start_block (&inner);
 
 	  /* TODO: range checking for mapped dimensions.  */
-	  info = &ss->data.info;
+	  info = &ss_info->data.array;
 
 	  /* This code only checks ranges.  Elemental and vector
 	     dimensions are checked later.  */
@@ -3397,7 +3696,7 @@ done:
 	    {
 	      bool check_upper;
 
-	      dim = info->dim[n];
+	      dim = ss->dim[n];
 	      if (info->ref->u.ar.dimen_type[dim] != DIMEN_RANGE)
 		continue;
 
@@ -3411,12 +3710,12 @@ done:
 	      tmp = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node,
 				     info->stride[dim], gfc_index_zero_node);
 	      asprintf (&msg, "Zero stride is not allowed, for dimension %d "
-			"of array '%s'", dim + 1, ss->expr->symtree->name);
+			"of array '%s'", dim + 1, expr_name);
 	      gfc_trans_runtime_check (true, false, tmp, &inner,
-				       &ss->expr->where, msg);
+				       expr_loc, msg);
 	      free (msg);
 
-	      desc = ss->data.info.descriptor;
+	      desc = info->descriptor;
 
 	      /* This is the run-time equivalent of resolve.c's
 		 check_dimension().  The logical is more readable there
@@ -3470,14 +3769,14 @@ done:
 					  non_zerosized, tmp2);
 		  asprintf (&msg, "Index '%%ld' of dimension %d of array '%s' "
 			    "outside of expected range (%%ld:%%ld)",
-			    dim + 1, ss->expr->symtree->name);
+			    dim + 1, expr_name);
 		  gfc_trans_runtime_check (true, false, tmp, &inner,
-					   &ss->expr->where, msg,
+					   expr_loc, msg,
 		     fold_convert (long_integer_type_node, info->start[dim]),
 		     fold_convert (long_integer_type_node, lbound),
 		     fold_convert (long_integer_type_node, ubound));
 		  gfc_trans_runtime_check (true, false, tmp2, &inner,
-					   &ss->expr->where, msg,
+					   expr_loc, msg,
 		     fold_convert (long_integer_type_node, info->start[dim]),
 		     fold_convert (long_integer_type_node, lbound),
 		     fold_convert (long_integer_type_node, ubound));
@@ -3492,9 +3791,9 @@ done:
 					 boolean_type_node, non_zerosized, tmp);
 		  asprintf (&msg, "Index '%%ld' of dimension %d of array '%s' "
 			    "below lower bound of %%ld",
-			    dim + 1, ss->expr->symtree->name);
+			    dim + 1, expr_name);
 		  gfc_trans_runtime_check (true, false, tmp, &inner,
-					   &ss->expr->where, msg,
+					   expr_loc, msg,
 		     fold_convert (long_integer_type_node, info->start[dim]),
 		     fold_convert (long_integer_type_node, lbound));
 		  free (msg);
@@ -3524,14 +3823,14 @@ done:
 					  boolean_type_node, non_zerosized, tmp3);
 		  asprintf (&msg, "Index '%%ld' of dimension %d of array '%s' "
 			    "outside of expected range (%%ld:%%ld)",
-			    dim + 1, ss->expr->symtree->name);
+			    dim + 1, expr_name);
 		  gfc_trans_runtime_check (true, false, tmp2, &inner,
-					   &ss->expr->where, msg,
+					   expr_loc, msg,
 		     fold_convert (long_integer_type_node, tmp),
 		     fold_convert (long_integer_type_node, ubound), 
 		     fold_convert (long_integer_type_node, lbound));
 		  gfc_trans_runtime_check (true, false, tmp3, &inner,
-					   &ss->expr->where, msg,
+					   expr_loc, msg,
 		     fold_convert (long_integer_type_node, tmp),
 		     fold_convert (long_integer_type_node, ubound), 
 		     fold_convert (long_integer_type_node, lbound));
@@ -3541,9 +3840,9 @@ done:
 		{
 		  asprintf (&msg, "Index '%%ld' of dimension %d of array '%s' "
 			    "below lower bound of %%ld",
-			    dim + 1, ss->expr->symtree->name);
+			    dim + 1, expr_name);
 		  gfc_trans_runtime_check (true, false, tmp2, &inner,
-					   &ss->expr->where, msg,
+					   expr_loc, msg,
 		     fold_convert (long_integer_type_node, tmp),
 		     fold_convert (long_integer_type_node, lbound));
 		  free (msg);
@@ -3570,10 +3869,10 @@ done:
 					  boolean_type_node, tmp, size[n]);
 		  asprintf (&msg, "Array bound mismatch for dimension %d "
 			    "of array '%s' (%%ld/%%ld)",
-			    dim + 1, ss->expr->symtree->name);
+			    dim + 1, expr_name);
 
 		  gfc_trans_runtime_check (true, false, tmp3, &inner,
-					   &ss->expr->where, msg,
+					   expr_loc, msg,
 			fold_convert (long_integer_type_node, tmp),
 			fold_convert (long_integer_type_node, size[n]));
 
@@ -3587,10 +3886,10 @@ done:
 
 	  /* For optional arguments, only check bounds if the argument is
 	     present.  */
-	  if (ss->expr->symtree->n.sym->attr.optional
-	      || ss->expr->symtree->n.sym->attr.not_always_present)
+	  if (expr->symtree->n.sym->attr.optional
+	      || expr->symtree->n.sym->attr.not_always_present)
 	    tmp = build3_v (COND_EXPR,
-			    gfc_conv_expr_present (ss->expr->symtree->n.sym),
+			    gfc_conv_expr_present (expr->symtree->n.sym),
 			    tmp, build_empty_stmt (input_location));
 
 	  gfc_add_expr_to_block (&block, tmp);
@@ -3600,6 +3899,9 @@ done:
       tmp = gfc_finish_block (&block);
       gfc_add_expr_to_block (&loop->pre, tmp);
     }
+
+  for (loop = loop->nested; loop; loop = loop->next)
+    gfc_conv_ss_startstride (loop);
 }
 
 /* Return true if both symbols could refer to the same data object.  Does
@@ -3643,12 +3945,16 @@ gfc_could_be_alias (gfc_ss * lss, gfc_ss * rss)
 {
   gfc_ref *lref;
   gfc_ref *rref;
+  gfc_expr *lexpr, *rexpr;
   gfc_symbol *lsym;
   gfc_symbol *rsym;
   bool lsym_pointer, lsym_target, rsym_pointer, rsym_target;
 
-  lsym = lss->expr->symtree->n.sym;
-  rsym = rss->expr->symtree->n.sym;
+  lexpr = lss->info->expr;
+  rexpr = rss->info->expr;
+
+  lsym = lexpr->symtree->n.sym;
+  rsym = rexpr->symtree->n.sym;
 
   lsym_pointer = lsym->attr.pointer;
   lsym_target = lsym->attr.target;
@@ -3666,7 +3972,7 @@ gfc_could_be_alias (gfc_ss * lss, gfc_ss * rss)
   /* For derived types we must check all the component types.  We can ignore
      array references as these will have the same base type as the previous
      component ref.  */
-  for (lref = lss->expr->ref; lref != lss->data.info.ref; lref = lref->next)
+  for (lref = lexpr->ref; lref != lss->info->data.array.ref; lref = lref->next)
     {
       if (lref->type != REF_COMPONENT)
 	continue;
@@ -3686,7 +3992,7 @@ gfc_could_be_alias (gfc_ss * lss, gfc_ss * rss)
 	    return 1;
 	}
 
-      for (rref = rss->expr->ref; rref != rss->data.info.ref;
+      for (rref = rexpr->ref; rref != rss->info->data.array.ref;
 	   rref = rref->next)
 	{
 	  if (rref->type != REF_COMPONENT)
@@ -3721,7 +4027,7 @@ gfc_could_be_alias (gfc_ss * lss, gfc_ss * rss)
   lsym_pointer = lsym->attr.pointer;
   lsym_target = lsym->attr.target;
 
-  for (rref = rss->expr->ref; rref != rss->data.info.ref; rref = rref->next)
+  for (rref = rexpr->ref; rref != rss->info->data.array.ref; rref = rref->next)
     {
       if (rref->type != REF_COMPONENT)
 	break;
@@ -3746,6 +4052,14 @@ gfc_could_be_alias (gfc_ss * lss, gfc_ss * rss)
 }
 
 
+void gfc_make_loop_temp_ss (tree type, tree string_length, gfc_loopinfo *loop)
+{
+  loop->temp_ss = gfc_get_temp_ss (type, string_length, loop->dimen);
+  gcc_assert (loop->temp_ss->dimen == loop->dimen);
+  gfc_add_ss_to_loop (loop, loop->temp_ss);
+}
+
+
 /* Resolve array data dependencies.  Creates a temporary if required.  */
 /* TODO: Calc dependencies with gfc_expr rather than gfc_ss, and move to
    dependency.c.  */
@@ -3757,20 +4071,25 @@ gfc_conv_resolve_dependencies (gfc_loopinfo * loop, gfc_ss * dest,
   gfc_ss *ss;
   gfc_ref *lref;
   gfc_ref *rref;
+  gfc_expr *dest_expr;
+  gfc_expr *ss_expr;
   int nDepend = 0;
   int i, j;
 
   loop->temp_ss = NULL;
+  dest_expr = dest->info->expr;
 
   for (ss = rss; ss != gfc_ss_terminator; ss = ss->next)
     {
-      if (ss->type != GFC_SS_SECTION)
+      if (ss->info->type != GFC_SS_SECTION)
 	continue;
 
-      if (dest->expr->symtree->n.sym != ss->expr->symtree->n.sym)
+      ss_expr = ss->info->expr;
+
+      if (dest_expr->symtree->n.sym != ss_expr->symtree->n.sym)
 	{
 	  if (gfc_could_be_alias (dest, ss)
-		|| gfc_are_equivalenced_arrays (dest->expr, ss->expr))
+	      || gfc_are_equivalenced_arrays (dest_expr, ss_expr))
 	    {
 	      nDepend = 1;
 	      break;
@@ -3778,18 +4097,18 @@ gfc_conv_resolve_dependencies (gfc_loopinfo * loop, gfc_ss * dest,
 	}
       else
 	{
-	  lref = dest->expr->ref;
-	  rref = ss->expr->ref;
+	  lref = dest_expr->ref;
+	  rref = ss_expr->ref;
 
 	  nDepend = gfc_dep_resolver (lref, rref, &loop->reverse[0]);
 
 	  if (nDepend == 1)
 	    break;
 
-	  for (i = 0; i < dest->data.info.dimen; i++)
-	    for (j = 0; j < ss->data.info.dimen; j++)
+	  for (i = 0; i < dest->dimen; i++)
+	    for (j = 0; j < ss->dimen; j++)
 	      if (i != j
-		  && dest->data.info.dim[i] == ss->data.info.dim[j])
+		  && dest->dim[i] == ss->dim[j])
 		{
 		  /* If we don't access array elements in the same order,
 		     there is a dependency.  */
@@ -3838,38 +4157,36 @@ temporary:
 
   if (nDepend == 1)
     {
-      tree base_type = gfc_typenode_for_spec (&dest->expr->ts);
+      tree base_type = gfc_typenode_for_spec (&dest_expr->ts);
       if (GFC_ARRAY_TYPE_P (base_type)
 	  || GFC_DESCRIPTOR_TYPE_P (base_type))
 	base_type = gfc_get_element_type (base_type);
-      loop->temp_ss = gfc_get_temp_ss (base_type, dest->string_length,
-				       loop->dimen);
-      gfc_add_ss_to_loop (loop, loop->temp_ss);
+      gfc_make_loop_temp_ss (base_type, dest->info->string_length, loop);
     }
   else
     loop->temp_ss = NULL;
 }
 
 
-/* Initialize the scalarization loop.  Creates the loop variables.  Determines
-   the range of the loop variables.  Creates a temporary if required.
-   Calculates how to transform from loop variables to array indices for each
-   expression.  Also generates code for scalar expressions which have been
-   moved outside the loop.  */
+/* Browse through each array's information from the scalarizer and set the loop
+   bounds according to the "best" one (per dimension), i.e. the one which
+   provides the most information (constant bounds, shape, etc).  */
 
-void
-gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
+static void
+set_loop_bounds (gfc_loopinfo *loop)
 {
   int n, dim, spec_dim;
-  gfc_ss_info *info;
-  gfc_ss_info *specinfo;
+  gfc_array_info *info;
+  gfc_array_info *specinfo;
   gfc_ss *ss;
   tree tmp;
-  gfc_ss *loopspec[GFC_MAX_DIMENSIONS];
+  gfc_ss **loopspec;
   bool dynamic[GFC_MAX_DIMENSIONS];
   mpz_t *cshape;
   mpz_t i;
 
+  loopspec = loop->specloop;
+
   mpz_init (i);
   for (n = 0; n < loop->dimen; n++)
     {
@@ -3879,16 +4196,21 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 	 loop for this dimension.  We try to pick the simplest term.  */
       for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
 	{
-	  if (ss->type == GFC_SS_SCALAR || ss->type == GFC_SS_REFERENCE)
+	  gfc_ss_type ss_type;
+
+	  ss_type = ss->info->type;
+	  if (ss_type == GFC_SS_SCALAR
+	      || ss_type == GFC_SS_TEMP
+	      || ss_type == GFC_SS_REFERENCE)
 	    continue;
 
-	  info = &ss->data.info;
-	  dim = info->dim[n];
+	  info = &ss->info->data.array;
+	  dim = ss->dim[n];
 
 	  if (loopspec[n] != NULL)
 	    {
-	      specinfo = &loopspec[n]->data.info;
-	      spec_dim = specinfo->dim[n];
+	      specinfo = &loopspec[n]->info->data.array;
+	      spec_dim = loopspec[n]->dim[n];
 	    }
 	  else
 	    {
@@ -3897,19 +4219,19 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 	      spec_dim = 0;
 	    }
 
-	  if (ss->shape)
+	  if (info->shape)
 	    {
-	      gcc_assert (ss->shape[dim]);
+	      gcc_assert (info->shape[dim]);
 	      /* The frontend has worked out the size for us.  */
 	      if (!loopspec[n]
-		  || !loopspec[n]->shape
+		  || !specinfo->shape
 		  || !integer_zerop (specinfo->start[spec_dim]))
 		/* Prefer zero-based descriptors if possible.  */
 		loopspec[n] = ss;
 	      continue;
 	    }
 
-	  if (ss->type == GFC_SS_CONSTRUCTOR)
+	  if (ss_type == GFC_SS_CONSTRUCTOR)
 	    {
 	      gfc_constructor_base base;
 	      /* An unknown size constructor will always be rank one.
@@ -3921,7 +4243,7 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 		 can be determined at compile time.  Prefer not to otherwise,
 		 since the general case involves realloc, and it's better to
 		 avoid that overhead if possible.  */
-	      base = ss->expr->value.constructor;
+	      base = ss->info->expr->value.constructor;
 	      dynamic[n] = gfc_get_array_constructor_size (&i, base);
 	      if (!dynamic[n] || !loopspec[n])
 		loopspec[n] = ss;
@@ -3930,7 +4252,7 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 
 	  /* TODO: Pick the best bound if we have a choice between a
 	     function and something else.  */
-	  if (ss->type == GFC_SS_FUNCTION)
+	  if (ss_type == GFC_SS_FUNCTION)
 	    {
 	      loopspec[n] = ss;
 	      continue;
@@ -3941,7 +4263,7 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 	  if (loopspec[n] && ss->is_alloc_lhs)
 	    continue;
 
-	  if (ss->type != GFC_SS_SECTION)
+	  if (ss_type != GFC_SS_SECTION)
 	    continue;
 
 	  if (!loopspec[n])
@@ -3953,7 +4275,7 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 	     known lower bound
 	     known upper bound
 	   */
-	  else if ((loopspec[n]->type == GFC_SS_CONSTRUCTOR && dynamic[n])
+	  else if ((loopspec[n]->info->type == GFC_SS_CONSTRUCTOR && dynamic[n])
 		   || n >= loop->dimen)
 	    loopspec[n] = ss;
 	  else if (integer_onep (info->stride[dim])
@@ -3975,16 +4297,16 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 	 that's bad news.  */
       gcc_assert (loopspec[n]);
 
-      info = &loopspec[n]->data.info;
-      dim = info->dim[n];
+      info = &loopspec[n]->info->data.array;
+      dim = loopspec[n]->dim[n];
 
       /* Set the extents of this range.  */
-      cshape = loopspec[n]->shape;
+      cshape = info->shape;
       if (cshape && INTEGER_CST_P (info->start[dim])
 	  && INTEGER_CST_P (info->stride[dim]))
 	{
 	  loop->from[n] = info->start[dim];
-	  mpz_set (i, cshape[get_array_ref_dim (info, n)]);
+	  mpz_set (i, cshape[get_array_ref_dim_for_loop_dim (loopspec[n], n)]);
 	  mpz_sub_ui (i, i, 1);
 	  /* To = from + (size - 1) * stride.  */
 	  tmp = gfc_conv_mpz_to_tree (i, gfc_index_integer_kind);
@@ -3999,7 +4321,7 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
       else
 	{
 	  loop->from[n] = info->start[dim];
-	  switch (loopspec[n]->type)
+	  switch (loopspec[n]->info->type)
 	    {
 	    case GFC_SS_CONSTRUCTOR:
 	      /* The upper bound is calculated when we expand the
@@ -4046,65 +4368,98 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 	  loop->from[n] = gfc_index_zero_node;
 	}
     }
+  mpz_clear (i);
+
+  for (loop = loop->nested; loop; loop = loop->next)
+    set_loop_bounds (loop);
+}
+
+
+/* Initialize the scalarization loop.  Creates the loop variables.  Determines
+   the range of the loop variables.  Creates a temporary if required.
+   Also generates code for scalar expressions which have been
+   moved outside the loop.  */
+
+void
+gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
+{
+  gfc_ss *tmp_ss;
+  tree tmp;
+
+  set_loop_bounds (loop);
 
   /* Add all the scalar code that can be taken out of the loops.
      This may include calculating the loop bounds, so do it before
      allocating the temporary.  */
-  gfc_add_loop_ss_code (loop, loop->ss, false, where);
+  add_loop_ss_code (loop, loop->ss, false);
 
+  tmp_ss = loop->temp_ss;
   /* If we want a temporary then create it.  */
-  if (loop->temp_ss != NULL)
+  if (tmp_ss != NULL)
     {
-      gcc_assert (loop->temp_ss->type == GFC_SS_TEMP);
+      gfc_ss_info *tmp_ss_info;
+
+      tmp_ss_info = tmp_ss->info;
+      gcc_assert (tmp_ss_info->type == GFC_SS_TEMP);
+      gcc_assert (loop->parent == NULL);
 
       /* Make absolutely sure that this is a complete type.  */
-      if (loop->temp_ss->string_length)
-	loop->temp_ss->data.temp.type
+      if (tmp_ss_info->string_length)
+	tmp_ss_info->data.temp.type
 		= gfc_get_character_type_len_for_eltype
-			(TREE_TYPE (loop->temp_ss->data.temp.type),
-			 loop->temp_ss->string_length);
+			(TREE_TYPE (tmp_ss_info->data.temp.type),
+			 tmp_ss_info->string_length);
 
-      tmp = loop->temp_ss->data.temp.type;
-      n = loop->temp_ss->data.temp.dimen;
-      memset (&loop->temp_ss->data.info, 0, sizeof (gfc_ss_info));
-      loop->temp_ss->type = GFC_SS_SECTION;
-      loop->temp_ss->data.info.dimen = n;
+      tmp = tmp_ss_info->data.temp.type;
+      memset (&tmp_ss_info->data.array, 0, sizeof (gfc_array_info));
+      tmp_ss_info->type = GFC_SS_SECTION;
 
-      gcc_assert (loop->temp_ss->data.info.dimen != 0);
-      for (n = 0; n < loop->temp_ss->data.info.dimen; n++)
-	loop->temp_ss->data.info.dim[n] = n;
+      gcc_assert (tmp_ss->dimen != 0);
 
-      gfc_trans_create_temp_array (&loop->pre, &loop->post, loop,
-				   &loop->temp_ss->data.info, tmp, NULL_TREE,
-				   false, true, false, where);
+      gfc_trans_create_temp_array (&loop->pre, &loop->post, tmp_ss, tmp,
+				   NULL_TREE, false, true, false, where);
     }
 
-  for (n = 0; n < loop->temp_dim; n++)
-    loopspec[loop->order[n]] = NULL;
-
-  mpz_clear (i);
-
   /* For array parameters we don't have loop variables, so don't calculate the
      translations.  */
-  if (loop->array_parameter)
-    return;
+  if (!loop->array_parameter)
+    gfc_set_delta (loop);
+}
+
+
+/* Calculates how to transform from loop variables to array indices for each
+   array: once loop bounds are chosen, sets the difference (DELTA field) between
+   loop bounds and array reference bounds, for each array info.  */
+
+void
+gfc_set_delta (gfc_loopinfo *loop)
+{
+  gfc_ss *ss, **loopspec;
+  gfc_array_info *info;
+  tree tmp;
+  int n, dim;
+
+  loopspec = loop->specloop;
 
   /* Calculate the translation from loop variables to array indices.  */
   for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
     {
-      if (ss->type != GFC_SS_SECTION && ss->type != GFC_SS_COMPONENT
-	    && ss->type != GFC_SS_CONSTRUCTOR)
+      gfc_ss_type ss_type;
 
+      ss_type = ss->info->type;
+      if (ss_type != GFC_SS_SECTION
+	  && ss_type != GFC_SS_COMPONENT
+	  && ss_type != GFC_SS_CONSTRUCTOR)
 	continue;
 
-      info = &ss->data.info;
+      info = &ss->info->data.array;
 
-      for (n = 0; n < info->dimen; n++)
+      for (n = 0; n < ss->dimen; n++)
 	{
 	  /* If we are specifying the range the delta is already set.  */
 	  if (loopspec[n] != ss)
 	    {
-	      dim = ss->data.info.dim[n];
+	      dim = ss->dim[n];
 
 	      /* Calculate the offset relative to the loop variable.
 		 First multiply by the stride.  */
@@ -4123,6 +4478,9 @@ gfc_conv_loop_setup (gfc_loopinfo * loop, locus * where)
 	    }
 	}
     }
+
+  for (loop = loop->nested; loop; loop = loop->next)
+    gfc_set_delta (loop);
 }
 
 
@@ -5662,15 +6020,17 @@ get_array_charlen (gfc_expr *expr, gfc_se *se)
     }
 }
 
+
 /* Helper function to check dimensions.  */
 static bool
-dim_ok (gfc_ss_info *info)
+transposed_dims (gfc_ss *ss)
 {
   int n;
-  for (n = 0; n < info->dimen; n++)
-    if (info->dim[n] != n)
-      return false;
-  return true;
+
+  for (n = 0; n < ss->dimen; n++)
+    if (ss->dim[n] != n)
+      return true;
+  return false;
 }
 
 /* Convert an array for passing as an actual argument.  Expressions and
@@ -5705,8 +6065,10 @@ dim_ok (gfc_ss_info *info)
 void
 gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
 {
+  gfc_ss_type ss_type;
+  gfc_ss_info *ss_info;
   gfc_loopinfo loop;
-  gfc_ss_info *info;
+  gfc_array_info *info;
   int need_tmp;
   int n;
   tree tmp;
@@ -5716,11 +6078,15 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
   tree offset;
   int full;
   bool subref_array_target = false;
-  gfc_expr *arg;
+  gfc_expr *arg, *ss_expr;
 
   gcc_assert (ss != NULL);
   gcc_assert (ss != gfc_ss_terminator);
 
+  ss_info = ss->info;
+  ss_type = ss_info->type;
+  ss_expr = ss_info->expr;
+
   /* Special case things we know we can pass easily.  */
   switch (expr->expr_type)
     {
@@ -5728,9 +6094,9 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
       /* If we have a linear array section, we can pass it directly.
 	 Otherwise we need to copy it into a temporary.  */
 
-      gcc_assert (ss->type == GFC_SS_SECTION);
-      gcc_assert (ss->expr == expr);
-      info = &ss->data.info;
+      gcc_assert (ss_type == GFC_SS_SECTION);
+      gcc_assert (ss_expr == expr);
+      info = &ss_info->data.array;
 
       /* Get the descriptor for the array.  */
       gfc_conv_ss_descriptor (&se->pre, ss, 0);
@@ -5757,7 +6123,7 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
       else
 	full = gfc_full_array_ref_p (info->ref, NULL);
 
-      if (full && dim_ok (info))
+      if (full && !transposed_dims (ss))
 	{
 	  if (se->direct_byref && !se->byref_noassign)
 	    {
@@ -5807,7 +6173,7 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
 
       if (se->direct_byref)
 	{
-	  gcc_assert (ss->type == GFC_SS_FUNCTION && ss->expr == expr);
+	  gcc_assert (ss_type == GFC_SS_FUNCTION && ss_expr == expr);
 
 	  /* For pointer assignments pass the descriptor directly.  */
 	  if (se->ss == NULL)
@@ -5819,16 +6185,17 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
 	  return;
 	}
 
-      if (ss->expr != expr || ss->type != GFC_SS_FUNCTION)
+      if (ss_expr != expr || ss_type != GFC_SS_FUNCTION)
 	{
-	  if (ss->expr != expr)
+	  if (ss_expr != expr)
 	    /* Elemental function.  */
 	    gcc_assert ((expr->value.function.esym != NULL
 			 && expr->value.function.esym->attr.elemental)
 			|| (expr->value.function.isym != NULL
-			    && expr->value.function.isym->elemental));
+			    && expr->value.function.isym->elemental)
+			|| gfc_inline_intrinsic_function_p (expr));
 	  else
-	    gcc_assert (ss->type == GFC_SS_INTRINSIC);
+	    gcc_assert (ss_type == GFC_SS_INTRINSIC);
 
 	  need_tmp = 1;
 	  if (expr->ts.type == BT_CHARACTER
@@ -5840,19 +6207,19 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
       else
 	{
 	  /* Transformational function.  */
-	  info = &ss->data.info;
+	  info = &ss_info->data.array;
 	  need_tmp = 0;
 	}
       break;
 
     case EXPR_ARRAY:
       /* Constant array constructors don't need a temporary.  */
-      if (ss->type == GFC_SS_CONSTRUCTOR
+      if (ss_type == GFC_SS_CONSTRUCTOR
 	  && expr->ts.type != BT_CHARACTER
 	  && gfc_constant_array_constructor_p (expr->value.constructor))
 	{
 	  need_tmp = 0;
-	  info = &ss->data.info;
+	  info = &ss_info->data.array;
 	}
       else
 	{
@@ -5894,15 +6261,13 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
 	get_array_charlen (expr, se);
 
       /* Tell the scalarizer to make a temporary.  */
-      loop.temp_ss = gfc_get_temp_ss (gfc_typenode_for_spec (&expr->ts),
-				      ((expr->ts.type == BT_CHARACTER)
-				       ? expr->ts.u.cl->backend_decl
-				       : NULL),
-				      loop.dimen);
+      gfc_make_loop_temp_ss (gfc_typenode_for_spec (&expr->ts),
+			     ((expr->ts.type == BT_CHARACTER)
+			      ? expr->ts.u.cl->backend_decl
+			      : NULL),
+			     &loop);
 
-      se->string_length = loop.temp_ss->string_length;
-      gcc_assert (loop.temp_ss->data.temp.dimen == loop.dimen);
-      gfc_add_ss_to_loop (&loop, loop.temp_ss);
+      se->string_length = loop.temp_ss->info->string_length;
     }
 
   gfc_conv_loop_setup (&loop, & expr->where);
@@ -5952,12 +6317,12 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
       /* Finish the copying loops.  */
       gfc_trans_scalarizing_loops (&loop, &block);
 
-      desc = loop.temp_ss->data.info.descriptor;
+      desc = loop.temp_ss->info->data.array.descriptor;
     }
-  else if (expr->expr_type == EXPR_FUNCTION && dim_ok (info))
+  else if (expr->expr_type == EXPR_FUNCTION && !transposed_dims (ss))
     {
       desc = info->descriptor;
-      se->string_length = ss->string_length;
+      se->string_length = ss_info->string_length;
     }
   else
     {
@@ -5974,7 +6339,7 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
       tree to;
       tree base;
 
-      ndim = info->ref ? info->ref->u.ar.dimen : info->dimen;
+      ndim = info->ref ? info->ref->u.ar.dimen : ss->dimen;
 
       if (se->want_coarray)
 	{
@@ -6058,8 +6423,8 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
 	      && info->ref->u.ar.dimen_type[n] == DIMEN_ELEMENT)
 	    {
 	      gcc_assert (info->subscript[n]
-		      && info->subscript[n]->type == GFC_SS_SCALAR);
-	      start = info->subscript[n]->data.scalar.expr;
+			  && info->subscript[n]->info->type == GFC_SS_SCALAR);
+	      start = info->subscript[n]->info->data.scalar.value;
 	    }
 	  else
 	    {
@@ -6089,7 +6454,7 @@ gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
  
 	  /* look for the corresponding scalarizer dimension: dim.  */
 	  for (dim = 0; dim < ndim; dim++)
-	    if (info->dim[dim] == n)
+	    if (ss->dim[dim] == n)
 	      break;
 
 	  /* loop exited early: the DIM being looked for has been found.  */
@@ -7145,6 +7510,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   stmtblock_t fblock;
   gfc_ss *rss;
   gfc_ss *lss;
+  gfc_array_info *linfo;
   tree realloc_expr;
   tree alloc_expr;
   tree size1;
@@ -7175,11 +7541,11 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
       /* Find the ss for the lhs.  */
       lss = loop->ss;
       for (; lss && lss != gfc_ss_terminator; lss = lss->loop_chain)
-	if (lss->expr && lss->expr->expr_type == EXPR_VARIABLE)
+	if (lss->info->expr && lss->info->expr->expr_type == EXPR_VARIABLE)
 	  break;
       if (lss == gfc_ss_terminator)
 	return NULL_TREE;
-      expr1 = lss->expr;
+      expr1 = lss->info->expr;
     }
 
   /* Bail out if this is not a valid allocate on assignment.  */
@@ -7190,17 +7556,19 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   /* Find the ss for the lhs.  */
   lss = loop->ss;
   for (; lss && lss != gfc_ss_terminator; lss = lss->loop_chain)
-    if (lss->expr == expr1)
+    if (lss->info->expr == expr1)
       break;
 
   if (lss == gfc_ss_terminator)
     return NULL_TREE;
 
+  linfo = &lss->info->data.array;
+
   /* Find an ss for the rhs. For operator expressions, we see the
      ss's for the operands. Any one of these will do.  */
   rss = loop->ss;
   for (; rss && rss != gfc_ss_terminator; rss = rss->loop_chain)
-    if (rss->expr != expr1 && rss != loop->temp_ss)
+    if (rss->info->expr != expr1 && rss != loop->temp_ss)
       break;
 
   if (expr2 && rss == gfc_ss_terminator)
@@ -7210,7 +7578,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
 
   /* Since the lhs is allocatable, this must be a descriptor type.
      Get the data and array size.  */
-  desc = lss->data.info.descriptor;
+  desc = linfo->descriptor;
   gcc_assert (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (desc)));
   array1 = gfc_conv_descriptor_data_get (desc);
 
@@ -7280,7 +7648,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
 
   /* Get the rhs size.  Fix both sizes.  */
   if (expr2)
-    desc2 = rss->data.info.descriptor;
+    desc2 = rss->info->data.array.descriptor;
   else
     desc2 = NULL_TREE;
   size2 = gfc_index_one_node;
@@ -7370,21 +7738,21 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
      running offset.  Use the saved_offset instead.  */
   tmp = gfc_conv_descriptor_offset (desc);
   gfc_add_modify (&fblock, tmp, offset);
-  if (lss->data.info.saved_offset
-	&& TREE_CODE (lss->data.info.saved_offset) == VAR_DECL)
-      gfc_add_modify (&fblock, lss->data.info.saved_offset, tmp);
+  if (linfo->saved_offset
+      && TREE_CODE (linfo->saved_offset) == VAR_DECL)
+    gfc_add_modify (&fblock, linfo->saved_offset, tmp);
 
   /* Now set the deltas for the lhs.  */
   for (n = 0; n < expr1->rank; n++)
     {
       tmp = gfc_conv_descriptor_lbound_get (desc, gfc_rank_cst[n]);
-      dim = lss->data.info.dim[n];
+      dim = lss->dim[n];
       tmp = fold_build2_loc (input_location, MINUS_EXPR,
 			     gfc_array_index_type, tmp,
 			     loop->from[dim]);
-      if (lss->data.info.delta[dim]
-	    && TREE_CODE (lss->data.info.delta[dim]) == VAR_DECL)
-	gfc_add_modify (&fblock, lss->data.info.delta[dim], tmp);
+      if (linfo->delta[dim]
+	  && TREE_CODE (linfo->delta[dim]) == VAR_DECL)
+	gfc_add_modify (&fblock, linfo->delta[dim], tmp);
     }
 
   /* Get the new lhs size in bytes.  */
@@ -7448,11 +7816,11 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   gfc_add_expr_to_block (&fblock, tmp);
 
   /* Make sure that the scalarizer data pointer is updated.  */
-  if (lss->data.info.data
-	&& TREE_CODE (lss->data.info.data) == VAR_DECL)
+  if (linfo->data
+      && TREE_CODE (linfo->data) == VAR_DECL)
     {
       tmp = gfc_conv_descriptor_data_get (desc);
-      gfc_add_modify (&fblock, lss->data.info.data, tmp);
+      gfc_add_modify (&fblock, linfo->data, tmp);
     }
 
   /* Add the exit label.  */
@@ -7636,13 +8004,13 @@ gfc_walk_array_ref (gfc_ss * ss, gfc_expr * expr, gfc_ref * ref)
       switch (ar->type)
 	{
 	case AR_ELEMENT:
-	  for (n = ar->dimen + ar->codimen - 1; n >= 0; n--)
+	  for (n = ar->dimen - 1; n >= 0; n--)
 	    ss = gfc_get_scalar_ss (ss, ar->start[n]);
 	  break;
 
 	case AR_FULL:
 	  newss = gfc_get_array_ss (ss, expr, ar->as->rank, GFC_SS_SECTION);
-	  newss->data.info.ref = ref;
+	  newss->info->data.array.ref = ref;
 
 	  /* Make sure array is the same as array(:,:), this way
 	     we don't need to special case all the time.  */
@@ -7660,7 +8028,7 @@ gfc_walk_array_ref (gfc_ss * ss, gfc_expr * expr, gfc_ref * ref)
 
 	case AR_SECTION:
 	  newss = gfc_get_array_ss (ss, expr, 0, GFC_SS_SECTION);
-	  newss->data.info.ref = ref;
+	  newss->info->data.array.ref = ref;
 
 	  /* We add SS chains for all the subscripts in the section.  */
 	  for (n = 0; n < ar->dimen; n++)
@@ -7674,14 +8042,14 @@ gfc_walk_array_ref (gfc_ss * ss, gfc_expr * expr, gfc_ref * ref)
 		  gcc_assert (ar->start[n]);
 		  indexss = gfc_get_scalar_ss (gfc_ss_terminator, ar->start[n]);
 		  indexss->loop_chain = gfc_ss_terminator;
-		  newss->data.info.subscript[n] = indexss;
+		  newss->info->data.array.subscript[n] = indexss;
 		  break;
 
 		case DIMEN_RANGE:
                   /* We don't add anything for sections, just remember this
                      dimension for later.  */
-		  newss->data.info.dim[newss->data.info.dimen] = n;
-		  newss->data.info.dimen++;
+		  newss->dim[newss->dimen] = n;
+		  newss->dimen++;
 		  break;
 
 		case DIMEN_VECTOR:
@@ -7690,9 +8058,9 @@ gfc_walk_array_ref (gfc_ss * ss, gfc_expr * expr, gfc_ref * ref)
 		  indexss = gfc_get_array_ss (gfc_ss_terminator, ar->start[n],
 					      1, GFC_SS_VECTOR);
 		  indexss->loop_chain = gfc_ss_terminator;
-		  newss->data.info.subscript[n] = indexss;
-		  newss->data.info.dim[newss->data.info.dimen] = n;
-		  newss->data.info.dimen++;
+		  newss->info->data.array.subscript[n] = indexss;
+		  newss->dim[newss->dimen] = n;
+		  newss->dimen++;
 		  break;
 
 		default:
@@ -7702,8 +8070,8 @@ gfc_walk_array_ref (gfc_ss * ss, gfc_expr * expr, gfc_ref * ref)
 	    }
 	  /* We should have at least one non-elemental dimension,
 	     unless we are creating a descriptor for a (scalar) coarray.  */
-	  gcc_assert (newss->data.info.dimen > 0
-		      || newss->data.info.ref->u.ar.as->corank > 0);
+	  gcc_assert (newss->dimen > 0
+		      || newss->info->data.array.ref->u.ar.as->corank > 0);
 	  ss = newss;
 	  break;
 
@@ -7814,7 +8182,7 @@ gfc_walk_elemental_function_args (gfc_ss * ss, gfc_actual_arglist *arg,
 	  /* Scalar argument.  */
 	  gcc_assert (type == GFC_SS_SCALAR || type == GFC_SS_REFERENCE);
 	  newss = gfc_get_scalar_ss (head, arg->expr);
-	  newss->type = type;
+	  newss->info->type = type;
 	}
       else
 	scalar = 0;
diff --git a/trans-array.h b/trans-array.h
index 4d737bd..9894b6a 100644
--- a/trans-array.h
+++ b/trans-array.h
@@ -31,9 +31,8 @@ void gfc_set_loop_bounds_from_array_spec (gfc_interface_mapping *,
 					  gfc_se *, gfc_array_spec *);
 
 /* Generate code to create a temporary array.  */
-tree gfc_trans_create_temp_array (stmtblock_t *, stmtblock_t *, gfc_loopinfo *,
-				  gfc_ss_info *, tree, tree, bool, bool, bool,
-				  locus *);
+tree gfc_trans_create_temp_array (stmtblock_t *, stmtblock_t *, gfc_ss *,
+				  tree, tree, bool, bool, bool, locus *);
 
 /* Generate function entry code for allocation of compiler allocated array
    variables.  */
@@ -89,6 +88,8 @@ void gfc_add_ss_to_loop (gfc_loopinfo *, gfc_ss *);
 void gfc_mark_ss_chain_used (gfc_ss *, unsigned);
 /* Free a gfc_ss chain.  */
 void gfc_free_ss_chain (gfc_ss *);
+/* Free a single gfc_ss element.  */
+void gfc_free_ss (gfc_ss *);
 /* Allocate a new array type ss.  */
 gfc_ss *gfc_get_array_ss (gfc_ss *, gfc_expr *, int, gfc_ss_type);
 /* Allocate a new temporary type ss.  */
@@ -112,6 +113,10 @@ void gfc_trans_scalarizing_loops (gfc_loopinfo *, stmtblock_t *);
 void gfc_trans_scalarized_loop_boundary (gfc_loopinfo *, stmtblock_t *);
 /* Initialize the scalarization loop parameters.  */
 void gfc_conv_loop_setup (gfc_loopinfo *, locus *);
+/* Set each array's delta.  */
+void gfc_set_delta (gfc_loopinfo *);
+/* Create and register a new gfc_ss for the loop's temporary.  */
+void gfc_make_loop_temp_ss (tree, tree, gfc_loopinfo *);
 /* Resolve array assignment dependencies.  */
 void gfc_conv_resolve_dependencies (gfc_loopinfo *, gfc_ss *, gfc_ss *);
 /* Build a null array descriptor constructor.  */
diff --git a/trans-const.c b/trans-const.c
index 5fbe765..fa820ef 100644
--- a/trans-const.c
+++ b/trans-const.c
@@ -358,6 +358,8 @@ gfc_conv_constant_to_tree (gfc_expr * expr)
 void
 gfc_conv_constant (gfc_se * se, gfc_expr * expr)
 {
+  gfc_ss *ss;
+
   /* We may be receiving an expression for C_NULL_PTR or C_NULL_FUNPTR.  If
      so, the expr_type will not yet be an EXPR_CONSTANT.  We need to make
      it so here.  */
@@ -380,14 +382,18 @@ gfc_conv_constant (gfc_se * se, gfc_expr * expr)
       return;
     }
 
-  if (se->ss != NULL)
+  ss = se->ss;
+  if (ss != NULL)
     {
-      gcc_assert (se->ss != gfc_ss_terminator);
-      gcc_assert (se->ss->type == GFC_SS_SCALAR);
-      gcc_assert (se->ss->expr == expr);
+      gfc_ss_info *ss_info;
+
+      ss_info = ss->info;
+      gcc_assert (ss != gfc_ss_terminator);
+      gcc_assert (ss_info->type == GFC_SS_SCALAR);
+      gcc_assert (ss_info->expr == expr);
 
-      se->expr = se->ss->data.scalar.expr;
-      se->string_length = se->ss->string_length;
+      se->expr = ss_info->data.scalar.value;
+      se->string_length = ss_info->string_length;
       gfc_advance_se_ss_chain (se);
       return;
     }
diff --git a/trans-expr.c b/trans-expr.c
index 09b98d0..4b40327 100644
--- a/trans-expr.c
+++ b/trans-expr.c
@@ -83,6 +83,7 @@ void
 gfc_advance_se_ss_chain (gfc_se * se)
 {
   gfc_se *p;
+  gfc_ss *ss;
 
   gcc_assert (se != NULL && se->ss != NULL && se->ss != gfc_ss_terminator);
 
@@ -91,9 +92,18 @@ gfc_advance_se_ss_chain (gfc_se * se)
   while (p != NULL)
     {
       /* Simple consistency check.  */
-      gcc_assert (p->parent == NULL || p->parent->ss == p->ss);
+      gcc_assert (p->parent == NULL || p->parent->ss == p->ss
+		  || p->parent->ss->nested_ss == p->ss);
+
+      /* If we were in a nested loop, the next scalarized expression can be
+	 on the parent ss' next pointer.  Thus we should not take the next
+	 pointer blindly, but rather go up one nest level as long as next
+	 is the end of chain.  */
+      ss = p->ss;
+      while (ss->next == gfc_ss_terminator && ss->parent != NULL)
+	ss = ss->parent;
 
-      p->ss = p->ss->next;
+      p->ss = ss->next;
 
       p = p->parent;
     }
@@ -613,6 +623,7 @@ conv_parent_component_references (gfc_se * se, gfc_ref * ref)
 static void
 gfc_conv_variable (gfc_se * se, gfc_expr * expr)
 {
+  gfc_ss *ss;
   gfc_ref *ref;
   gfc_symbol *sym;
   tree parent_decl = NULL_TREE;
@@ -622,16 +633,19 @@ gfc_conv_variable (gfc_se * se, gfc_expr * expr)
   bool entry_master;
 
   sym = expr->symtree->n.sym;
-  if (se->ss != NULL)
+  ss = se->ss;
+  if (ss != NULL)
     {
+      gfc_ss_info *ss_info = ss->info;
+
       /* Check that something hasn't gone horribly wrong.  */
-      gcc_assert (se->ss != gfc_ss_terminator);
-      gcc_assert (se->ss->expr == expr);
+      gcc_assert (ss != gfc_ss_terminator);
+      gcc_assert (ss_info->expr == expr);
 
       /* A scalarized term.  We already know the descriptor.  */
-      se->expr = se->ss->data.info.descriptor;
-      se->string_length = se->ss->string_length;
-      for (ref = se->ss->data.info.ref; ref; ref = ref->next)
+      se->expr = ss_info->data.array.descriptor;
+      se->string_length = ss_info->string_length;
+      for (ref = ss_info->data.array.ref; ref; ref = ref->next)
 	if (ref->type == REF_ARRAY && ref->u.ar.type != AR_ELEMENT)
 	  break;
     }
@@ -2359,7 +2373,7 @@ gfc_conv_subref_array_arg (gfc_se * parmse, gfc_expr * expr, int g77,
   gfc_ss *rss;
   gfc_loopinfo loop;
   gfc_loopinfo loop2;
-  gfc_ss_info *info;
+  gfc_array_info *info;
   tree offset;
   tree tmp_index;
   tree tmp;
@@ -2395,21 +2409,18 @@ gfc_conv_subref_array_arg (gfc_se * parmse, gfc_expr * expr, int g77,
 		|| GFC_DESCRIPTOR_TYPE_P (base_type))
     base_type = gfc_get_element_type (base_type);
 
-  loop.temp_ss = gfc_get_temp_ss (base_type, ((expr->ts.type == BT_CHARACTER)
-					      ? expr->ts.u.cl->backend_decl
-					      : NULL),
-				  loop.dimen);
+  gfc_make_loop_temp_ss (base_type, (expr->ts.type == BT_CHARACTER)
+				    ? expr->ts.u.cl->backend_decl
+				    : NULL,
+			 &loop);
 
-  parmse->string_length = loop.temp_ss->string_length;
-
-  /* Associate the SS with the loop.  */
-  gfc_add_ss_to_loop (&loop, loop.temp_ss);
+  parmse->string_length = loop.temp_ss->info->string_length;
 
   /* Setup the scalarizing loops.  */
   gfc_conv_loop_setup (&loop, &expr->where);
 
   /* Pass the temporary descriptor back to the caller.  */
-  info = &loop.temp_ss->data.info;
+  info = &loop.temp_ss->info->data.array;
   parmse->expr = info->descriptor;
 
   /* Setup the gfc_se structures.  */
@@ -2488,8 +2499,8 @@ gfc_conv_subref_array_arg (gfc_se * parmse, gfc_expr * expr, int g77,
      dimensions, so this is very simple.  The offset is only computed
      outside the innermost loop, so the overall transfer could be
      optimized further.  */
-  info = &rse.ss->data.info;
-  dimen = info->dimen;
+  info = &rse.ss->info->data.array;
+  dimen = rse.ss->dimen;
 
   tmp_index = gfc_index_zero_node;
   for (n = dimen - 1; n > 0; n--)
@@ -2854,7 +2865,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   tree fntype;
   gfc_se parmse;
   gfc_ss *argss;
-  gfc_ss_info *info;
+  gfc_array_info *info;
   int byref;
   int parm_kind;
   tree type;
@@ -2893,8 +2904,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
     {
       if (!sym->attr.elemental)
 	{
-	  gcc_assert (se->ss->type == GFC_SS_FUNCTION);
-	  if (se->ss->useflags)
+	  gcc_assert (se->ss->info->type == GFC_SS_FUNCTION);
+	  if (se->ss->info->useflags)
 	    {
 	      gcc_assert ((!comp && gfc_return_by_reference (sym)
 			   && sym->result->attr.dimension)
@@ -2906,7 +2917,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	      return 0;
 	    }
 	}
-      info = &se->ss->data.info;
+      info = &se->ss->info->data.array;
     }
   else
     info = NULL;
@@ -2979,12 +2990,23 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	  gfc_init_se (&parmse, se);
 	  gfc_conv_derived_to_class (&parmse, e, fsym->ts);
 	}
-      else if (se->ss && se->ss->useflags)
+      else if (se->ss && se->ss->info->useflags)
 	{
 	  /* An elemental function inside a scalarized loop.  */
 	  gfc_init_se (&parmse, se);
-	  gfc_conv_expr_reference (&parmse, e);
 	  parm_kind = ELEMENTAL;
+
+	  if (se->ss->dimen > 0
+	      && se->ss->info->data.array.ref == NULL)
+	    {
+	      gfc_conv_tmp_array_ref (&parmse);
+	      if (e->ts.type == BT_CHARACTER)
+		gfc_conv_string_parameter (&parmse);
+	      else
+		parmse.expr = gfc_build_addr_expr (NULL_TREE, parmse.expr);
+	    }
+	  else
+	    gfc_conv_expr_reference (&parmse, e);
 	}
       else
 	{
@@ -3582,7 +3604,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 
 	  /* Set the type of the array.  */
 	  tmp = gfc_typenode_for_spec (&comp->ts);
-	  gcc_assert (info->dimen == se->loop->dimen);
+	  gcc_assert (se->ss->dimen == se->loop->dimen);
 
 	  /* Evaluate the bounds of the result, if known.  */
 	  gfc_set_loop_bounds_from_array_spec (&mapping, se, comp->as);
@@ -3602,9 +3624,10 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	     returns a pointer, the temporary will be a shallow copy and
 	     mustn't be deallocated.  */
 	  callee_alloc = comp->attr.allocatable || comp->attr.pointer;
-	  gfc_trans_create_temp_array (&se->pre, &se->post, se->loop, info, tmp,
-				       NULL_TREE, false, !comp->attr.pointer,
-				       callee_alloc, &se->ss->expr->where);
+	  gfc_trans_create_temp_array (&se->pre, &se->post, se->ss,
+				       tmp, NULL_TREE, false,
+				       !comp->attr.pointer, callee_alloc,
+				       &se->ss->info->expr->where);
 
 	  /* Pass the temporary as the first argument.  */
 	  result = info->descriptor;
@@ -3617,7 +3640,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 
 	  /* Set the type of the array.  */
 	  tmp = gfc_typenode_for_spec (&ts);
-	  gcc_assert (info->dimen == se->loop->dimen);
+	  gcc_assert (se->ss->dimen == se->loop->dimen);
 
 	  /* Evaluate the bounds of the result, if known.  */
 	  gfc_set_loop_bounds_from_array_spec (&mapping, se, sym->result->as);
@@ -3637,9 +3660,10 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	     returns a pointer, the temporary will be a shallow copy and
 	     mustn't be deallocated.  */
 	  callee_alloc = sym->attr.allocatable || sym->attr.pointer;
-	  gfc_trans_create_temp_array (&se->pre, &se->post, se->loop, info, tmp,
-				       NULL_TREE, false, !sym->attr.pointer,
-				       callee_alloc, &se->ss->expr->where);
+	  gfc_trans_create_temp_array (&se->pre, &se->post, se->ss,
+				       tmp, NULL_TREE, false,
+				       !sym->attr.pointer, callee_alloc,
+				       &se->ss->info->expr->where);
 
 	  /* Pass the temporary as the first argument.  */
 	  result = info->descriptor;
@@ -4237,8 +4261,11 @@ is_zero_initializer_p (gfc_expr * expr)
 static void
 gfc_conv_array_constructor_expr (gfc_se * se, gfc_expr * expr)
 {
-  gcc_assert (se->ss != NULL && se->ss != gfc_ss_terminator);
-  gcc_assert (se->ss->expr == expr && se->ss->type == GFC_SS_CONSTRUCTOR);
+  gfc_ss *ss;
+
+  ss = se->ss;
+  gcc_assert (ss != NULL && ss != gfc_ss_terminator);
+  gcc_assert (ss->info->expr == expr && ss->info->type == GFC_SS_CONSTRUCTOR);
 
   gfc_conv_tmp_array_ref (se);
 }
@@ -4342,6 +4369,7 @@ gfc_trans_subarray_assign (tree dest, gfc_component * cm, gfc_expr * expr)
   gfc_se lse;
   gfc_ss *rss;
   gfc_ss *lss;
+  gfc_array_info *lss_array;
   stmtblock_t body;
   stmtblock_t block;
   gfc_loopinfo loop;
@@ -4365,19 +4393,20 @@ gfc_trans_subarray_assign (tree dest, gfc_component * cm, gfc_expr * expr)
   /* Create a SS for the destination.  */
   lss = gfc_get_array_ss (gfc_ss_terminator, NULL, cm->as->rank,
 			  GFC_SS_COMPONENT);
-  lss->shape = gfc_get_shape (cm->as->rank);
-  lss->data.info.descriptor = dest;
-  lss->data.info.data = gfc_conv_array_data (dest);
-  lss->data.info.offset = gfc_conv_array_offset (dest);
+  lss_array = &lss->info->data.array;
+  lss_array->shape = gfc_get_shape (cm->as->rank);
+  lss_array->descriptor = dest;
+  lss_array->data = gfc_conv_array_data (dest);
+  lss_array->offset = gfc_conv_array_offset (dest);
   for (n = 0; n < cm->as->rank; n++)
     {
-      lss->data.info.start[n] = gfc_conv_array_lbound (dest, n);
-      lss->data.info.stride[n] = gfc_index_one_node;
+      lss_array->start[n] = gfc_conv_array_lbound (dest, n);
+      lss_array->stride[n] = gfc_index_one_node;
 
-      mpz_init (lss->shape[n]);
-      mpz_sub (lss->shape[n], cm->as->upper[n]->value.integer,
+      mpz_init (lss_array->shape[n]);
+      mpz_sub (lss_array->shape[n], cm->as->upper[n]->value.integer,
 	       cm->as->lower[n]->value.integer);
-      mpz_add_ui (lss->shape[n], lss->shape[n], 1);
+      mpz_add_ui (lss_array->shape[n], lss_array->shape[n], 1);
     }
   
   /* Associate the SS with the loop.  */
@@ -4420,8 +4449,8 @@ gfc_trans_subarray_assign (tree dest, gfc_component * cm, gfc_expr * expr)
   gfc_add_block_to_block (&block, &loop.pre);
   gfc_add_block_to_block (&block, &loop.post);
 
-  gcc_assert (lss->shape != NULL);
-  gfc_free_shape (&lss->shape, cm->as->rank);
+  gcc_assert (lss_array->shape != NULL);
+  gfc_free_shape (&lss_array->shape, cm->as->rank);
   gfc_cleanup_loop (&loop);
 
   return gfc_finish_block (&block);
@@ -4817,15 +4846,22 @@ gfc_conv_substring_expr (gfc_se * se, gfc_expr * expr)
 void
 gfc_conv_expr (gfc_se * se, gfc_expr * expr)
 {
-  if (se->ss && se->ss->expr == expr
-      && (se->ss->type == GFC_SS_SCALAR || se->ss->type == GFC_SS_REFERENCE))
+  gfc_ss *ss;
+
+  ss = se->ss;
+  if (ss && ss->info->expr == expr
+      && (ss->info->type == GFC_SS_SCALAR
+	  || ss->info->type == GFC_SS_REFERENCE))
     {
+      gfc_ss_info *ss_info;
+
+      ss_info = ss->info;
       /* Substitute a scalar expression evaluated outside the scalarization
          loop.  */
-      se->expr = se->ss->data.scalar.expr;
-      if (se->ss->type == GFC_SS_REFERENCE)
+      se->expr = ss_info->data.scalar.value;
+      if (ss_info->type == GFC_SS_REFERENCE)
 	se->expr = gfc_build_addr_expr (NULL_TREE, se->expr);
-      se->string_length = se->ss->string_length;
+      se->string_length = ss_info->string_length;
       gfc_advance_se_ss_chain (se);
       return;
     }
@@ -4942,10 +4978,12 @@ gfc_conv_expr_type (gfc_se * se, gfc_expr * expr, tree type)
 void
 gfc_conv_expr_reference (gfc_se * se, gfc_expr * expr)
 {
+  gfc_ss *ss;
   tree var;
 
-  if (se->ss && se->ss->expr == expr
-      && se->ss->type == GFC_SS_REFERENCE)
+  ss = se->ss;
+  if (ss && ss->info->expr == expr
+      && ss->info->type == GFC_SS_REFERENCE)
     {
       /* Returns a reference to the scalar evaluated outside the loop
 	 for this case.  */
@@ -6150,7 +6188,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
 
       /* Find a non-scalar SS from the lhs.  */
       while (lss_section != gfc_ss_terminator
-	     && lss_section->type != GFC_SS_SECTION)
+	     && lss_section->info->type != GFC_SS_SECTION)
 	lss_section = lss_section->next;
 
       gcc_assert (lss_section != gfc_ss_terminator);
diff --git a/trans-intrinsic.c b/trans-intrinsic.c
index 83fc4fc..973f912 100644
--- a/trans-intrinsic.c
+++ b/trans-intrinsic.c
@@ -1004,7 +1004,7 @@ trans_this_image (gfc_se * se, gfc_expr *expr)
       gcc_assert (!expr->value.function.actual->next->expr);
       gcc_assert (corank > 0);
       gcc_assert (se->loop->dimen == 1);
-      gcc_assert (se->ss->expr == expr);
+      gcc_assert (se->ss->info->expr == expr);
 
       dim_arg = se->loop->loopvar[0];
       dim_arg = fold_build2_loc (input_location, PLUS_EXPR,
@@ -1321,7 +1321,7 @@ gfc_conv_intrinsic_bound (gfc_se * se, gfc_expr * expr, int upper)
       /* Create an implicit second parameter from the loop variable.  */
       gcc_assert (!arg2->expr);
       gcc_assert (se->loop->dimen == 1);
-      gcc_assert (se->ss->expr == expr);
+      gcc_assert (se->ss->info->expr == expr);
       gfc_advance_se_ss_chain (se);
       bound = se->loop->loopvar[0];
       bound = fold_build2_loc (input_location, MINUS_EXPR,
@@ -1515,7 +1515,7 @@ conv_intrinsic_cobound (gfc_se * se, gfc_expr * expr)
       gcc_assert (!arg2->expr);
       gcc_assert (corank > 0);
       gcc_assert (se->loop->dimen == 1);
-      gcc_assert (se->ss->expr == expr);
+      gcc_assert (se->ss->info->expr == expr);
 
       bound = se->loop->loopvar[0];
       bound = fold_build2_loc (input_location, PLUS_EXPR, gfc_array_index_type,
@@ -2323,7 +2323,7 @@ gfc_conv_intrinsic_funcall (gfc_se * se, gfc_expr * expr)
   gfc_symbol *sym;
   VEC(tree,gc) *append_args;
 
-  gcc_assert (!se->ss || se->ss->expr == expr);
+  gcc_assert (!se->ss || se->ss->info->expr == expr);
 
   if (se->ss)
     gcc_assert (expr->rank > 0);
@@ -2557,6 +2557,20 @@ gfc_conv_intrinsic_count (gfc_se * se, gfc_expr * expr)
   se->expr = resvar;
 }
 
+
+/* Update given gfc_se to have ss component pointing to the nested gfc_ss
+   struct and return the corresponding loopinfo.  */
+
+static gfc_loopinfo *
+enter_nested_loop (gfc_se *se)
+{
+  se->ss = se->ss->nested_ss;
+  gcc_assert (se->ss == se->ss->loop->ss);
+
+  return se->ss->loop;
+}
+
+
 /* Inline implementation of the sum and product intrinsics.  */
 static void
 gfc_conv_intrinsic_arith (gfc_se * se, gfc_expr * expr, enum tree_code op,
@@ -2568,20 +2582,23 @@ gfc_conv_intrinsic_arith (gfc_se * se, gfc_expr * expr, enum tree_code op,
   stmtblock_t body;
   stmtblock_t block;
   tree tmp;
-  gfc_loopinfo loop;
-  gfc_actual_arglist *actual;
-  gfc_ss *arrayss;
-  gfc_ss *maskss;
+  gfc_loopinfo loop, *ploop;
+  gfc_actual_arglist *arg_array, *arg_mask;
+  gfc_ss *arrayss = NULL;
+  gfc_ss *maskss = NULL;
   gfc_se arrayse;
   gfc_se maskse;
+  gfc_se *parent_se;
   gfc_expr *arrayexpr;
   gfc_expr *maskexpr;
 
-  if (se->ss)
+  if (expr->rank > 0)
     {
-      gfc_conv_intrinsic_funcall (se, expr);
-      return;
+      gcc_assert (gfc_inline_intrinsic_function_p (expr));
+      parent_se = se;
     }
+  else
+    parent_se = NULL;
 
   type = gfc_typenode_for_spec (&expr->ts);
   /* Initialize the result.  */
@@ -2608,52 +2625,66 @@ gfc_conv_intrinsic_arith (gfc_se * se, gfc_expr * expr, enum tree_code op,
 
   gfc_add_modify (&se->pre, resvar, tmp);
 
-  /* Walk the arguments.  */
-  actual = expr->value.function.actual;
-  arrayexpr = actual->expr;
-  arrayss = gfc_walk_expr (arrayexpr);
-  gcc_assert (arrayss != gfc_ss_terminator);
+  arg_array = expr->value.function.actual;
+
+  arrayexpr = arg_array->expr;
 
   if (op == NE_EXPR || norm2)
     /* PARITY and NORM2.  */
     maskexpr = NULL;
   else
     {
-      actual = actual->next->next;
-      gcc_assert (actual);
-      maskexpr = actual->expr;
+      arg_mask  = arg_array->next->next;
+      gcc_assert (arg_mask != NULL);
+      maskexpr = arg_mask->expr;
     }
 
-  if (maskexpr && maskexpr->rank != 0)
+  if (expr->rank == 0)
     {
-      maskss = gfc_walk_expr (maskexpr);
-      gcc_assert (maskss != gfc_ss_terminator);
+      /* Walk the arguments.  */
+      arrayss = gfc_walk_expr (arrayexpr);
+      gcc_assert (arrayss != gfc_ss_terminator);
+
+      if (maskexpr && maskexpr->rank > 0)
+	{
+	  maskss = gfc_walk_expr (maskexpr);
+	  gcc_assert (maskss != gfc_ss_terminator);
+	}
+      else
+	maskss = NULL;
+
+      /* Initialize the scalarizer.  */
+      gfc_init_loopinfo (&loop);
+      gfc_add_ss_to_loop (&loop, arrayss);
+      if (maskexpr && maskexpr->rank > 0)
+	gfc_add_ss_to_loop (&loop, maskss);
+
+      /* Initialize the loop.  */
+      gfc_conv_ss_startstride (&loop);
+      gfc_conv_loop_setup (&loop, &expr->where);
+
+      gfc_mark_ss_chain_used (arrayss, 1);
+      if (maskexpr && maskexpr->rank > 0)
+	gfc_mark_ss_chain_used (maskss, 1);
+
+      ploop = &loop;
     }
   else
-    maskss = NULL;
-
-  /* Initialize the scalarizer.  */
-  gfc_init_loopinfo (&loop);
-  gfc_add_ss_to_loop (&loop, arrayss);
-  if (maskss)
-    gfc_add_ss_to_loop (&loop, maskss);
+    /* All the work has been done in the parent loops.  */
+    ploop = enter_nested_loop (se);
 
-  /* Initialize the loop.  */
-  gfc_conv_ss_startstride (&loop);
-  gfc_conv_loop_setup (&loop, &expr->where);
+  gcc_assert (ploop);
 
-  gfc_mark_ss_chain_used (arrayss, 1);
-  if (maskss)
-    gfc_mark_ss_chain_used (maskss, 1);
   /* Generate the loop body.  */
-  gfc_start_scalarized_body (&loop, &body);
+  gfc_start_scalarized_body (ploop, &body);
 
   /* If we have a mask, only add this element if the mask is set.  */
-  if (maskss)
+  if (maskexpr && maskexpr->rank > 0)
     {
-      gfc_init_se (&maskse, NULL);
-      gfc_copy_loopinfo_to_se (&maskse, &loop);
-      maskse.ss = maskss;
+      gfc_init_se (&maskse, parent_se);
+      gfc_copy_loopinfo_to_se (&maskse, ploop);
+      if (expr->rank == 0)
+	maskse.ss = maskss;
       gfc_conv_expr_val (&maskse, maskexpr);
       gfc_add_block_to_block (&body, &maskse.pre);
 
@@ -2663,9 +2694,10 @@ gfc_conv_intrinsic_arith (gfc_se * se, gfc_expr * expr, enum tree_code op,
     gfc_init_block (&block);
 
   /* Do the actual summation/product.  */
-  gfc_init_se (&arrayse, NULL);
-  gfc_copy_loopinfo_to_se (&arrayse, &loop);
-  arrayse.ss = arrayss;
+  gfc_init_se (&arrayse, parent_se);
+  gfc_copy_loopinfo_to_se (&arrayse, ploop);
+  if (expr->rank == 0)
+    arrayse.ss = arrayss;
   gfc_conv_expr_val (&arrayse, arrayexpr);
   gfc_add_block_to_block (&block, &arrayse.pre);
 
@@ -2740,7 +2772,7 @@ gfc_conv_intrinsic_arith (gfc_se * se, gfc_expr * expr, enum tree_code op,
 
   gfc_add_block_to_block (&block, &arrayse.post);
 
-  if (maskss)
+  if (maskexpr && maskexpr->rank > 0)
     {
       /* We enclose the above in if (mask) {...} .  */
 
@@ -2752,30 +2784,43 @@ gfc_conv_intrinsic_arith (gfc_se * se, gfc_expr * expr, enum tree_code op,
     tmp = gfc_finish_block (&block);
   gfc_add_expr_to_block (&body, tmp);
 
-  gfc_trans_scalarizing_loops (&loop, &body);
+  gfc_trans_scalarizing_loops (ploop, &body);
 
   /* For a scalar mask, enclose the loop in an if statement.  */
-  if (maskexpr && maskss == NULL)
+  if (maskexpr && maskexpr->rank == 0)
     {
-      gfc_init_se (&maskse, NULL);
-      gfc_conv_expr_val (&maskse, maskexpr);
       gfc_init_block (&block);
-      gfc_add_block_to_block (&block, &loop.pre);
-      gfc_add_block_to_block (&block, &loop.post);
+      gfc_add_block_to_block (&block, &ploop->pre);
+      gfc_add_block_to_block (&block, &ploop->post);
       tmp = gfc_finish_block (&block);
 
-      tmp = build3_v (COND_EXPR, maskse.expr, tmp,
-		      build_empty_stmt (input_location));
+      if (expr->rank > 0)
+	{
+	  tmp = build3_v (COND_EXPR, se->ss->info->data.scalar.value, tmp,
+			  build_empty_stmt (input_location));
+	  gfc_advance_se_ss_chain (se);
+	}
+      else
+	{
+	  gcc_assert (expr->rank == 0);
+	  gfc_init_se (&maskse, NULL);
+	  gfc_conv_expr_val (&maskse, maskexpr);
+	  tmp = build3_v (COND_EXPR, maskse.expr, tmp,
+			  build_empty_stmt (input_location));
+	}
+
       gfc_add_expr_to_block (&block, tmp);
       gfc_add_block_to_block (&se->pre, &block);
+      gcc_assert (se->post.head == NULL);
     }
   else
     {
-      gfc_add_block_to_block (&se->pre, &loop.pre);
-      gfc_add_block_to_block (&se->pre, &loop.post);
+      gfc_add_block_to_block (&se->pre, &ploop->pre);
+      gfc_add_block_to_block (&se->pre, &ploop->post);
     }
 
-  gfc_cleanup_loop (&loop);
+  if (expr->rank == 0)
+    gfc_cleanup_loop (ploop);
 
   if (norm2)
     {
@@ -3061,6 +3106,23 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   /* Initialize the loop.  */
   gfc_conv_ss_startstride (&loop);
+
+  /* The code generated can have more than one loop in sequence (see the
+     comment at the function header).  This doesn't work well with the
+     scalarizer, which changes arrays' offset when the scalarization loops
+     are generated (see gfc_trans_preloop_setup).  Fortunately, {min,max}loc
+     are  currently inlined in the scalar case only (for which loop is of rank
+     one).  As there is no dependency to care about in that case, there is no
+     temporary, so that we can use the scalarizer temporary code to handle
+     multiple loops.  Thus, we set temp_dim here, we call gfc_mark_ss_chain_used
+     with flag=3 later, and we use gfc_trans_scalarized_loop_boundary even later
+     to restore offset.
+     TODO: this prevents inlining of rank > 0 minmaxloc calls, so this
+     should eventually go away.  We could either create two loops properly,
+     or find another way to save/restore the array offsets between the two
+     loops (without conflicting with temporary management), or use a single
+     loop minmaxloc implementation.  See PR 31067.  */
+  loop.temp_dim = loop.dimen;
   gfc_conv_loop_setup (&loop, &expr->where);
 
   gcc_assert (loop.dimen == 1);
@@ -3090,9 +3152,17 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op)
       TREE_USED (lab2) = 1;
     }
 
-  gfc_mark_ss_chain_used (arrayss, 1);
+  /* An offset must be added to the loop
+     counter to obtain the required position.  */
+  gcc_assert (loop.from[0]);
+
+  tmp = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type,
+			 gfc_index_one_node, loop.from[0]);
+  gfc_add_modify (&loop.pre, offset, tmp);
+
+  gfc_mark_ss_chain_used (arrayss, lab1 ? 3 : 1);
   if (maskss)
-    gfc_mark_ss_chain_used (maskss, 1);
+    gfc_mark_ss_chain_used (maskss, lab1 ? 3 : 1);
   /* Generate the loop body.  */
   gfc_start_scalarized_body (&loop, &body);
 
@@ -3123,16 +3193,6 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op)
   /* Assign the value to the limit...  */
   gfc_add_modify (&ifblock, limit, arrayse.expr);
 
-  /* Remember where we are.  An offset must be added to the loop
-     counter to obtain the required position.  */
-  if (loop.from[0])
-    tmp = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type,
-			   gfc_index_one_node, loop.from[0]);
-  else
-    tmp = gfc_index_one_node;
-
-  gfc_add_modify (&block, offset, tmp);
-
   if (nonempty == NULL && HONOR_NANS (DECL_MODE (limit)))
     {
       stmtblock_t ifblock2;
@@ -3188,7 +3248,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   if (lab1)
     {
-      gfc_trans_scalarized_loop_end (&loop, 0, &body);
+      gfc_trans_scalarized_loop_boundary (&loop, &body);
 
       if (HONOR_NANS (DECL_MODE (limit)))
 	{
@@ -3203,7 +3263,6 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
       gfc_add_expr_to_block (&loop.code[0], build1_v (GOTO_EXPR, lab2));
       gfc_add_expr_to_block (&loop.code[0], build1_v (LABEL_EXPR, lab1));
-      gfc_start_block (&body);
 
       /* If we have a mask, only check this element if the mask is set.  */
       if (maskss)
@@ -3232,16 +3291,6 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op)
       /* Assign the value to the limit...  */
       gfc_add_modify (&ifblock, limit, arrayse.expr);
 
-      /* Remember where we are.  An offset must be added to the loop
-	 counter to obtain the required position.  */
-      if (loop.from[0])
-	tmp = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type,
-			       gfc_index_one_node, loop.from[0]);
-      else
-	tmp = gfc_index_one_node;
-
-      gfc_add_modify (&block, offset, tmp);
-
       tmp = fold_build2_loc (input_location, PLUS_EXPR, TREE_TYPE (pos),
 			     loop.loopvar[0], offset);
       gfc_add_modify (&ifblock, pos, tmp);
@@ -3518,6 +3567,22 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   /* Initialize the loop.  */
   gfc_conv_ss_startstride (&loop);
+
+  /* The code generated can have more than one loop in sequence (see the
+     comment at the function header).  This doesn't work well with the
+     scalarizer, which changes arrays' offset when the scalarization loops
+     are generated (see gfc_trans_preloop_setup).  Fortunately, {min,max}val
+     are  currently inlined in the scalar case only.  As there is no dependency
+     to care about in that case, there is no temporary, so that we can use the
+     scalarizer temporary code to handle multiple loops.  Thus, we set temp_dim
+     here, we call gfc_mark_ss_chain_used with flag=3 later, and we use
+     gfc_trans_scalarized_loop_boundary even later to restore offset.
+     TODO: this prevents inlining of rank > 0 minmaxval calls, so this
+     should eventually go away.  We could either create two loops properly,
+     or find another way to save/restore the array offsets between the two
+     loops (without conflicting with temporary management), or use a single
+     loop minmaxval implementation.  See PR 31067.  */
+  loop.temp_dim = loop.dimen;
   gfc_conv_loop_setup (&loop, &expr->where);
 
   if (nonempty == NULL && maskss == NULL
@@ -3549,9 +3614,9 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op)
 	}
     }
 
-  gfc_mark_ss_chain_used (arrayss, 1);
+  gfc_mark_ss_chain_used (arrayss, lab ? 3 : 1);
   if (maskss)
-    gfc_mark_ss_chain_used (maskss, 1);
+    gfc_mark_ss_chain_used (maskss, lab ? 3 : 1);
   /* Generate the loop body.  */
   gfc_start_scalarized_body (&loop, &body);
 
@@ -3661,15 +3726,13 @@ gfc_conv_intrinsic_minmaxval (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   if (lab)
     {
-      gfc_trans_scalarized_loop_end (&loop, 0, &body);
+      gfc_trans_scalarized_loop_boundary (&loop, &body);
 
       tmp = fold_build3_loc (input_location, COND_EXPR, type, nonempty,
 			     nan_cst, huge_cst);
       gfc_add_modify (&loop.code[0], limit, tmp);
       gfc_add_expr_to_block (&loop.code[0], build1_v (LABEL_EXPR, lab));
 
-      gfc_start_block (&body);
-
       /* If we have a mask, only add this element if the mask is set.  */
       if (maskss)
 	{
@@ -5269,14 +5332,14 @@ gfc_conv_intrinsic_transfer (gfc_se * se, gfc_expr * expr)
   gfc_actual_arglist *arg;
   gfc_se argse;
   gfc_ss *ss;
-  gfc_ss_info *info;
+  gfc_array_info *info;
   stmtblock_t block;
   int n;
   bool scalar_mold;
 
   info = NULL;
   if (se->loop)
-    info = &se->ss->data.info;
+    info = &se->ss->info->data.array;
 
   /* Convert SOURCE.  The output from this stage is:-
 	source_bytes = length of the source in bytes
@@ -5501,9 +5564,8 @@ gfc_conv_intrinsic_transfer (gfc_se * se, gfc_expr * expr)
 
   /* Build a destination descriptor, using the pointer, source, as the
      data field.  */
-  gfc_trans_create_temp_array (&se->pre, &se->post, se->loop,
-			       info, mold_type, NULL_TREE, false, true, false,
-			       &expr->where);
+  gfc_trans_create_temp_array (&se->pre, &se->post, se->ss, mold_type,
+			       NULL_TREE, false, true, false, &expr->where);
 
   /* Cast the pointer to the result.  */
   tmp = gfc_conv_descriptor_data_get (info->descriptor);
@@ -6634,7 +6696,7 @@ gfc_conv_intrinsic_function (gfc_se * se, gfc_expr * expr)
       break;
 
     case GFC_ISYM_TRANSFER:
-      if (se->ss && se->ss->useflags)
+      if (se->ss && se->ss->info->useflags)
 	/* Access the previously obtained result.  */
 	gfc_conv_tmp_array_ref (se);
       else
@@ -6753,19 +6815,17 @@ walk_inline_intrinsic_transpose (gfc_ss *ss, gfc_expr *expr)
 
   for (tmp_ss = arg_ss; ; tmp_ss = tmp_ss->next)
     {
-      if (tmp_ss->type != GFC_SS_SCALAR
-	  && tmp_ss->type != GFC_SS_REFERENCE)
+      if (tmp_ss->info->type != GFC_SS_SCALAR
+	  && tmp_ss->info->type != GFC_SS_REFERENCE)
 	{
 	  int tmp_dim;
-	  gfc_ss_info *info;
 
-	  info = &tmp_ss->data.info;
-	  gcc_assert (info->dimen == 2);
+	  gcc_assert (tmp_ss->dimen == 2);
 
 	  /* We just invert dimensions.  */
-	  tmp_dim = info->dim[0];
-	  info->dim[0] = info->dim[1];
-	  info->dim[1] = tmp_dim;
+	  tmp_dim = tmp_ss->dim[0];
+	  tmp_ss->dim[0] = tmp_ss->dim[1];
+	  tmp_ss->dim[1] = tmp_dim;
 	}
 
       /* Stop when tmp_ss points to the last valid element of the chain...  */
@@ -6780,12 +6840,127 @@ walk_inline_intrinsic_transpose (gfc_ss *ss, gfc_expr *expr)
 }
 
 
+/* Move the given dimension of the given gfc_ss list to a nested gfc_ss list.
+   This has the side effect of reversing the nested list, so there is no
+   need to call gfc_reverse_ss on it (the given list is assumed not to be
+   reversed yet).   */
+
+static gfc_ss *
+nest_loop_dimension (gfc_ss *ss, int dim)
+{
+  int ss_dim, i;
+  gfc_ss *new_ss, *prev_ss = gfc_ss_terminator;
+  gfc_loopinfo *new_loop;
+
+  gcc_assert (ss != gfc_ss_terminator);
+
+  for (; ss != gfc_ss_terminator; ss = ss->next)
+    {
+      new_ss = gfc_get_ss ();
+      new_ss->next = prev_ss;
+      new_ss->parent = ss;
+      new_ss->info = ss->info;
+      new_ss->info->refcount++;
+      if (ss->dimen != 0)
+	{
+	  gcc_assert (ss->info->type != GFC_SS_SCALAR
+		      && ss->info->type != GFC_SS_REFERENCE);
+
+	  new_ss->dimen = 1;
+	  new_ss->dim[0] = ss->dim[dim];
+
+	  gcc_assert (dim < ss->dimen);
+
+	  ss_dim = --ss->dimen;
+	  for (i = dim; i < ss_dim; i++)
+	    ss->dim[i] = ss->dim[i + 1];
+
+	  ss->dim[ss_dim] = 0;
+	}
+      prev_ss = new_ss;
+
+      if (ss->nested_ss)
+	{
+	  ss->nested_ss->parent = new_ss;
+	  new_ss->nested_ss = ss->nested_ss;
+	}
+      ss->nested_ss = new_ss;
+    }
+
+  new_loop = gfc_get_loopinfo ();
+  gfc_init_loopinfo (new_loop);
+
+  gcc_assert (prev_ss != NULL);
+  gcc_assert (prev_ss != gfc_ss_terminator);
+  gfc_add_ss_to_loop (new_loop, prev_ss);
+  return new_ss->parent;
+}
+
+
+/* Create the gfc_ss list for the SUM/PRODUCT arguments when the function
+   is to be inlined.  */
+
+static gfc_ss *
+walk_inline_intrinsic_arith (gfc_ss *ss, gfc_expr *expr)
+{
+  gfc_ss *tmp_ss, *tail, *array_ss;
+  gfc_actual_arglist *arg1, *arg2, *arg3;
+  int sum_dim;
+  bool scalar_mask = false;
+
+  /* The rank of the result will be determined later.  */
+  arg1 = expr->value.function.actual;
+  arg2 = arg1->next;
+  arg3 = arg2->next;
+  gcc_assert (arg3 != NULL);
+
+  if (expr->rank == 0)
+    return ss;
+
+  tmp_ss = gfc_ss_terminator;
+
+  if (arg3->expr)
+    {
+      gfc_ss *mask_ss;
+
+      mask_ss = gfc_walk_subexpr (tmp_ss, arg3->expr);
+      if (mask_ss == tmp_ss)
+	scalar_mask = 1;
+
+      tmp_ss = mask_ss;
+    }
+
+  array_ss = gfc_walk_subexpr (tmp_ss, arg1->expr);
+  gcc_assert (array_ss != tmp_ss);
+
+  /* Odd thing: If the mask is scalar, it is used by the frontend after
+     the array (to make it array around the nested loop). Thus it shall
+     be after array_ss once the gfc_ss list is reversed.  */
+  if (scalar_mask)
+    tmp_ss = gfc_get_scalar_ss (array_ss, arg3->expr);
+  else
+    tmp_ss = array_ss;
+
+  /* "Hide" the dimension on which we will sum in the first arg's scalarization
+     chain.  */
+  sum_dim = mpz_get_si (arg2->expr->value.integer) - 1;
+  tail = nest_loop_dimension (tmp_ss, sum_dim);
+  tail->next = ss;
+
+  return tmp_ss;
+}
+
+
 static gfc_ss *
 walk_inline_intrinsic_function (gfc_ss * ss, gfc_expr * expr)
 {
 
   switch (expr->value.function.isym->id)
     {
+      case GFC_ISYM_PRODUCT:
+      case GFC_ISYM_SUM:
+	return walk_inline_intrinsic_arith (ss, expr);
+
       case GFC_ISYM_TRANSPOSE:
 	return walk_inline_intrinsic_transpose (ss, expr);
 
@@ -6802,7 +6977,7 @@ walk_inline_intrinsic_function (gfc_ss * ss, gfc_expr * expr)
 void
 gfc_add_intrinsic_ss_code (gfc_loopinfo * loop ATTRIBUTE_UNUSED, gfc_ss * ss)
 {
-  switch (ss->expr->value.function.isym->id)
+  switch (ss->info->expr->value.function.isym->id)
     {
     case GFC_ISYM_UBOUND:
     case GFC_ISYM_LBOUND:
@@ -6847,11 +7022,26 @@ gfc_walk_intrinsic_libfunc (gfc_ss * ss, gfc_expr * expr)
 bool
 gfc_inline_intrinsic_function_p (gfc_expr *expr)
 {
+  gfc_actual_arglist *args;
+
   if (!expr->value.function.isym)
     return false;
 
   switch (expr->value.function.isym->id)
     {
+    case GFC_ISYM_PRODUCT:
+    case GFC_ISYM_SUM:
+      /* Disable inline expansion if code size matters.  */
+      if (optimize_size)
+	return false;
+
+      args = expr->value.function.actual;
+      /* We need to be able to subset the SUM argument at compile-time.  */
+      if (args->next->expr && args->next->expr->expr_type != EXPR_CONSTANT)
+	return false;
+
+      return true;
+
     case GFC_ISYM_TRANSPOSE:
       return true;
 
diff --git a/trans-io.c b/trans-io.c
index bbf5a02..12dfcf8 100644
--- a/trans-io.c
+++ b/trans-io.c
@@ -1937,6 +1937,7 @@ transfer_array_component (tree expr, gfc_component * cm, locus * where)
   int n;
   gfc_ss *ss;
   gfc_se se;
+  gfc_array_info *ss_array;
 
   gfc_start_block (&block);
   gfc_init_se (&se, NULL);
@@ -1948,19 +1949,20 @@ transfer_array_component (tree expr, gfc_component * cm, locus * where)
 
   ss = gfc_get_array_ss (gfc_ss_terminator, NULL, cm->as->rank,
 			 GFC_SS_COMPONENT);
-  ss->shape = gfc_get_shape (cm->as->rank);
-  ss->data.info.descriptor = expr;
-  ss->data.info.data = gfc_conv_array_data (expr);
-  ss->data.info.offset = gfc_conv_array_offset (expr);
+  ss_array = &ss->info->data.array;
+  ss_array->shape = gfc_get_shape (cm->as->rank);
+  ss_array->descriptor = expr;
+  ss_array->data = gfc_conv_array_data (expr);
+  ss_array->offset = gfc_conv_array_offset (expr);
   for (n = 0; n < cm->as->rank; n++)
     {
-      ss->data.info.start[n] = gfc_conv_array_lbound (expr, n);
-      ss->data.info.stride[n] = gfc_index_one_node;
+      ss_array->start[n] = gfc_conv_array_lbound (expr, n);
+      ss_array->stride[n] = gfc_index_one_node;
 
-      mpz_init (ss->shape[n]);
-      mpz_sub (ss->shape[n], cm->as->upper[n]->value.integer,
+      mpz_init (ss_array->shape[n]);
+      mpz_sub (ss_array->shape[n], cm->as->upper[n]->value.integer,
                cm->as->lower[n]->value.integer);
-      mpz_add_ui (ss->shape[n], ss->shape[n], 1);
+      mpz_add_ui (ss_array->shape[n], ss_array->shape[n], 1);
     }
 
   /* Once we got ss, we use scalarizer to create the loop.  */
@@ -1995,8 +1997,8 @@ transfer_array_component (tree expr, gfc_component * cm, locus * where)
   gfc_add_block_to_block (&block, &loop.pre);
   gfc_add_block_to_block (&block, &loop.post);
 
-  gcc_assert (ss->shape != NULL);
-  gfc_free_shape (&ss->shape, cm->as->rank);
+  gcc_assert (ss_array->shape != NULL);
+  gfc_free_shape (&ss_array->shape, cm->as->rank);
   gfc_cleanup_loop (&loop);
 
   return gfc_finish_block (&block);
diff --git a/trans-stmt.c b/trans-stmt.c
index c71eeec..0d793f9 100644
--- a/trans-stmt.c
+++ b/trans-stmt.c
@@ -178,6 +178,41 @@ gfc_trans_entry (gfc_code * code)
 }
 
 
+/* Replace a gfc_ss structure by another both in the gfc_se struct
+   and the gfc_loopinfo struct.  This is used in gfc_conv_elemental_dependencies
+   to replace a variable ss by the corresponding temporary.  */
+
+static void
+replace_ss (gfc_se *se, gfc_ss *old_ss, gfc_ss *new_ss)
+{
+  gfc_ss **sess, **loopss;
+
+  /* The old_ss is a ss for a single variable.  */
+  gcc_assert (old_ss->info->type == GFC_SS_SECTION);
+
+  for (sess = &(se->ss); *sess != gfc_ss_terminator; sess = &((*sess)->next))
+    if (*sess == old_ss)
+      break;
+  gcc_assert (*sess != gfc_ss_terminator);
+
+  *sess = new_ss;
+  new_ss->next = old_ss->next;
+
+
+  for (loopss = &(se->loop->ss); *loopss != gfc_ss_terminator;
+       loopss = &((*loopss)->loop_chain))
+    if (*loopss == old_ss)
+      break;
+  gcc_assert (*loopss != gfc_ss_terminator);
+
+  *loopss = new_ss;
+  new_ss->loop_chain = old_ss->loop_chain;
+  new_ss->loop = old_ss->loop;
+
+  gfc_free_ss (old_ss);
+}
+
+
 /* Check for dependencies between INTENT(IN) and INTENT(OUT) arguments of
    elemental subroutines.  Make temporaries for output arguments if any such
    dependencies are found.  Output arguments are chosen because internal_unpack
@@ -190,15 +225,10 @@ gfc_conv_elemental_dependencies (gfc_se * se, gfc_se * loopse,
   gfc_actual_arglist *arg0;
   gfc_expr *e;
   gfc_formal_arglist *formal;
-  gfc_loopinfo tmp_loop;
   gfc_se parmse;
   gfc_ss *ss;
-  gfc_ss_info *info;
   gfc_symbol *fsym;
-  gfc_ref *ref;
-  int n;
   tree data;
-  tree offset;
   tree size;
   tree tmp;
 
@@ -217,14 +247,9 @@ gfc_conv_elemental_dependencies (gfc_se * se, gfc_se * loopse,
 	continue;
 
       /* Obtain the info structure for the current argument.  */ 
-      info = NULL;
       for (ss = loopse->ss; ss && ss != gfc_ss_terminator; ss = ss->next)
-	{
-	  if (ss->expr != e)
-	    continue;
-	  info = &ss->data.info;
+	if (ss->info->expr == e)
 	  break;
-	}
 
       /* If there is a dependency, create a temporary and use it
 	 instead of the variable.  */
@@ -237,49 +262,17 @@ gfc_conv_elemental_dependencies (gfc_se * se, gfc_se * loopse,
 	{
 	  tree initial, temptype;
 	  stmtblock_t temp_post;
+	  gfc_ss *tmp_ss;
 
-	  /* Make a local loopinfo for the temporary creation, so that
-	     none of the other ss->info's have to be renormalized.  */
-	  gfc_init_loopinfo (&tmp_loop);
-	  tmp_loop.dimen = info->dimen;
-	  for (n = 0; n < info->dimen; n++)
-	    {
-	      tmp_loop.to[n] = loopse->loop->to[n];
-	      tmp_loop.from[n] = loopse->loop->from[n];
-	      tmp_loop.order[n] = loopse->loop->order[n];
-	    }
+	  tmp_ss = gfc_get_array_ss (gfc_ss_terminator, NULL, ss->dimen,
+				     GFC_SS_SECTION);
+	  gfc_mark_ss_chain_used (tmp_ss, 1);
+	  tmp_ss->info->expr = ss->info->expr;
+	  replace_ss (loopse, ss, tmp_ss);
 
 	  /* Obtain the argument descriptor for unpacking.  */
 	  gfc_init_se (&parmse, NULL);
 	  parmse.want_pointer = 1;
-
-	  /* The scalarizer introduces some specific peculiarities when
-	     handling elemental subroutines; the stride can be needed up to
-	     the dim_array - 1, rather than dim_loop - 1 to calculate
-	     offsets outside the loop.  For this reason, we make sure that
-	     the descriptor has the dimensionality of the array by converting
-	     trailing elements into ranges with end = start.  */
-	  for (ref = e->ref; ref; ref = ref->next)
-	    if (ref->type == REF_ARRAY && ref->u.ar.type == AR_SECTION)
-	      break;
-
-	  if (ref)
-	    {
-	      bool seen_range = false;
-	      for (n = 0; n < ref->u.ar.dimen; n++)
-		{
-		  if (ref->u.ar.dimen_type[n] == DIMEN_RANGE)
-		    seen_range = true;
-
-		  if (!seen_range
-			|| ref->u.ar.dimen_type[n] != DIMEN_ELEMENT)
-		    continue;
-
-		  ref->u.ar.end[n] = gfc_copy_expr (ref->u.ar.start[n]);
-		  ref->u.ar.dimen_type[n] = DIMEN_RANGE;
-		}
-	    }
-
 	  gfc_conv_expr_descriptor (&parmse, e, gfc_walk_expr (e));
 	  gfc_add_block_to_block (&se->pre, &parmse.pre);
 
@@ -309,29 +302,15 @@ gfc_conv_elemental_dependencies (gfc_se * se, gfc_se * loopse,
 	  size = gfc_create_var (gfc_array_index_type, NULL);
 	  data = gfc_create_var (pvoid_type_node, NULL);
 	  gfc_init_block (&temp_post);
-	  tmp = gfc_trans_create_temp_array (&se->pre, &temp_post,
-					     &tmp_loop, info, temptype,
-					     initial,
-					     false, true, false,
-					     &arg->expr->where);
+	  tmp = gfc_trans_create_temp_array (&se->pre, &temp_post, tmp_ss,
+					     temptype, initial, false, true,
+					     false, &arg->expr->where);
 	  gfc_add_modify (&se->pre, size, tmp);
-	  tmp = fold_convert (pvoid_type_node, info->data);
+	  tmp = fold_convert (pvoid_type_node, tmp_ss->info->data.array.data);
 	  gfc_add_modify (&se->pre, data, tmp);
 
-	  /* Calculate the offset for the temporary.  */
-	  offset = gfc_index_zero_node;
-	  for (n = 0; n < info->dimen; n++)
-	    {
-	      tmp = gfc_conv_descriptor_stride_get (info->descriptor,
-						    gfc_rank_cst[n]);
-	      tmp = fold_build2_loc (input_location, MULT_EXPR,
-				     gfc_array_index_type,
-				     loopse->loop->from[n], tmp);
-	      offset = fold_build2_loc (input_location, MINUS_EXPR,
-					gfc_array_index_type, offset, tmp);
-	    }
-	  info->offset = gfc_create_var (gfc_array_index_type, NULL);	  
-	  gfc_add_modify (&se->pre, info->offset, offset);
+	  /* Update other ss' delta.  */
+	  gfc_set_delta (loopse->loop);
 
 	  /* Copy the result back using unpack.  */
 	  tmp = build_call_expr_loc (input_location,
@@ -3306,7 +3285,7 @@ gfc_trans_pointer_assign_need_temp (gfc_expr * expr1, gfc_expr * expr2,
   gfc_ss *lss, *rss;
   gfc_se lse;
   gfc_se rse;
-  gfc_ss_info *info;
+  gfc_array_info *info;
   gfc_loopinfo loop;
   tree desc;
   tree parm;
@@ -3388,7 +3367,7 @@ gfc_trans_pointer_assign_need_temp (gfc_expr * expr1, gfc_expr * expr2,
 
       gfc_conv_loop_setup (&loop, &expr2->where);
 
-      info = &rss->data.info;
+      info = &rss->info->data.array;
       desc = info->descriptor;
 
       /* Make a new descriptor.  */
@@ -4048,7 +4027,7 @@ gfc_trans_where_assign (gfc_expr *expr1, gfc_expr *expr2,
 
   /* Find a non-scalar SS from the lhs.  */
   while (lss_section != gfc_ss_terminator
-         && lss_section->type != GFC_SS_SECTION)
+	 && lss_section->info->type != GFC_SS_SECTION)
     lss_section = lss_section->next;
 
   gcc_assert (lss_section != gfc_ss_terminator);
@@ -4062,7 +4041,7 @@ gfc_trans_where_assign (gfc_expr *expr1, gfc_expr *expr2,
     {
       /* The rhs is scalar.  Add a ss for the expression.  */
       rss = gfc_get_scalar_ss (gfc_ss_terminator, expr2);
-      rss->where = 1;
+      rss->info->where = 1;
     }
 
   /* Associate the SS with the loop.  */
@@ -4501,7 +4480,7 @@ gfc_trans_where_3 (gfc_code * cblock, gfc_code * eblock)
   if (tsss == gfc_ss_terminator)
     {
       tsss = gfc_get_scalar_ss (gfc_ss_terminator, tsrc);
-      tsss->where = 1;
+      tsss->info->where = 1;
     }
   gfc_add_ss_to_loop (&loop, tdss);
   gfc_add_ss_to_loop (&loop, tsss);
@@ -4516,7 +4495,7 @@ gfc_trans_where_3 (gfc_code * cblock, gfc_code * eblock)
       if (esss == gfc_ss_terminator)
 	{
 	  esss = gfc_get_scalar_ss (gfc_ss_terminator, esrc);
-	  esss->where = 1;
+	  esss->info->where = 1;
 	}
       gfc_add_ss_to_loop (&loop, edss);
       gfc_add_ss_to_loop (&loop, esss);
diff --git a/trans.h b/trans.h
index 535c207..22033d3 100644
--- a/trans.h
+++ b/trans.h
@@ -108,17 +108,13 @@ typedef enum
 gfc_coarray_type;
 
 
-/* Scalarization State chain.  Created by walking an expression tree before
-   creating the scalarization loops. Then passed as part of a gfc_se structure
-   to translate the expression inside the loop.  Note that these chains are
-   terminated by gfc_se_terminator, not NULL.  A NULL pointer in a gfc_se
-   indicates to gfc_conv_* that this is a scalar expression.
-   Note that some member arrays correspond to scalarizer rank and others
-   are the variable rank.  */
+/* The array-specific scalarization informations.  The array members of
+   this struct are indexed by actual array index, and thus can be sparse.  */
 
-typedef struct gfc_ss_info
+typedef struct gfc_array_info
 {
-  int dimen;
+  mpz_t *shape;
+
   /* The ref that holds information on this section.  */
   gfc_ref *ref;
   /* The descriptor of this array.  */
@@ -139,12 +135,8 @@ typedef struct gfc_ss_info
   tree end[GFC_MAX_DIMENSIONS];
   tree stride[GFC_MAX_DIMENSIONS];
   tree delta[GFC_MAX_DIMENSIONS];
-
-  /* Translation from loop dimensions to actual dimensions.
-     actual_dim = dim[loop_dim]  */
-  int dim[GFC_MAX_DIMENSIONS];
 }
-gfc_ss_info;
+gfc_array_info;
 
 typedef enum
 {
@@ -190,47 +182,82 @@ typedef enum
 }
 gfc_ss_type;
 
-/* SS structures can only belong to a single loopinfo.  They must be added
-   otherwise they will not get freed.  */
-typedef struct gfc_ss
+
+typedef struct gfc_ss_info
 {
+  int refcount;
   gfc_ss_type type;
   gfc_expr *expr;
-  mpz_t *shape;
   tree string_length;
+
   union
   {
     /* If type is GFC_SS_SCALAR or GFC_SS_REFERENCE.  */
     struct
     {
-      tree expr;
+      tree value;
     }
     scalar;
 
     /* GFC_SS_TEMP.  */
     struct
     {
-      /* The rank of the temporary.  May be less than the rank of the
-         assigned expression.  */
-      int dimen;
       tree type;
     }
     temp;
+
     /* All other types.  */
-    gfc_ss_info info;
+    gfc_array_info array;
   }
   data;
 
+  /* This is used by assignments requiring temporaries.  The bits specify which
+     loops the terms appear in.  This will be 1 for the RHS expressions,
+     2 for the LHS expressions, and 3(=1|2) for the temporary.  */
+  unsigned useflags:2;
+
+  /* Suppresses precalculation of scalars in WHERE assignments.  */
+  unsigned where:1;
+}
+gfc_ss_info;
+
+#define gfc_get_ss_info() XCNEW (gfc_ss_info)
+
+
+/* Scalarization State chain.  Created by walking an expression tree before
+   creating the scalarization loops.  Then passed as part of a gfc_se structure
+   to translate the expression inside the loop.  Note that these chains are
+   terminated by gfc_ss_terminator, not NULL.  A NULL pointer in a gfc_se
+   indicates to gfc_conv_* that this is a scalar expression.
+   SS structures can only belong to a single loopinfo.  They must be added
+   otherwise they will not get freed.  */
+
+typedef struct gfc_ss
+{
+  gfc_ss_info *info;
+
+  int dimen;
+  /* Translation from loop dimensions to actual array dimensions.
+     actual_dim = dim[loop_dim]  */
+  int dim[GFC_MAX_DIMENSIONS];
+
   /* All the SS in a loop and linked through loop_chain.  The SS for an
      expression are linked by the next pointer.  */
   struct gfc_ss *loop_chain;
   struct gfc_ss *next;
 
-  /* This is used by assignments requiring temporaries. The bits specify which
-     loops the terms appear in.  This will be 1 for the RHS expressions,
-     2 for the LHS expressions, and 3(=1|2) for the temporary.  The bit
-     'where' suppresses precalculation of scalars in WHERE assignments.  */
-  unsigned useflags:2, where:1, is_alloc_lhs:1;
+  /* Non-null if the ss is part of a nested loop.  */
+  struct gfc_ss *parent;
+
+  /* If the evaluation of an expression requires a nested loop (for example
+     if the sum intrinsic is evaluated inline), this points to the nested
+     loop's gfc_ss.  */
+  struct gfc_ss *nested_ss;
+
+  /* The loop this gfc_ss is in.  */
+  struct gfc_loopinfo *loop;
+
+  unsigned is_alloc_lhs:1;
 }
 gfc_ss;
 #define gfc_get_ss() XCNEW (gfc_ss)
@@ -252,6 +279,12 @@ typedef struct gfc_loopinfo
   /* The SS describing the temporary used in an assignment.  */
   gfc_ss *temp_ss;
 
+  /* Non-null if this loop is nested in another one.  */
+  struct gfc_loopinfo *parent;
+
+  /* Chain of nested loops.  */
+  struct gfc_loopinfo *nested, *next;
+
   /* The scalarization loop index variables.  */
   tree loopvar[GFC_MAX_DIMENSIONS];
 
@@ -277,6 +310,7 @@ typedef struct gfc_loopinfo
 }
 gfc_loopinfo;
 
+#define gfc_get_loopinfo() XCNEW (gfc_loopinfo)
 
 /* Information about a symbol that has been shadowed by a temporary.  */
 typedef struct
@@ -363,9 +397,6 @@ tree gfc_builtin_decl_for_float_kind (enum built_in_function, int);
 tree gfc_conv_intrinsic_subroutine (gfc_code *);
 void gfc_conv_intrinsic_function (gfc_se *, gfc_expr *);
 
-/* Is the intrinsic expanded inline.  */
-bool gfc_inline_intrinsic_function_p (gfc_expr *);
-
 /* Does an intrinsic map directly to an external library call
    This is true for array-returning intrinsics, unless
    gfc_inline_intrinsic_function_p returns true.  */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]