RFC: Patch to implement the fortran matmul/transpose optimisation

Richard Sandiford richard@codesourcery.com
Thu Aug 25 10:34:00 GMT 2005


This patch implements the matmul (transpose (...), ...) optimisation
mentioned on the wiki.  It's just a preview really; it's obviously
not suitable for stage 3.  Feedback welcome!

The patch has two parts.  The first part avoids creating temporaries
for tranpose() where possible, instead creating a new descriptor that
uses the original, untransposed data.  The second part optimises matmul()
for a transposed first argument.

The second part is based on Victor Leikehman's patch:

    http://gcc.gnu.org/ml/fortran/2004-11/msg00242.html

(which is also linked to from the wiki).

The transpose patch
===================

The basic idea is to look at each function call to see if any of the
arguments are a call to transpose (...).  If so, we try to avoid
creating a temporary for the result of the tranpose, instead using
an array descriptor in which the two dimension specifications have
been swapped around.

This transformation is clearly not safe in all cases.  We first have to
make sure that the argument to the transpose does not alias any other
argument to the containing function, unless the latter has intent(in).

For example, given:

    call foo (transpose (a), a)

it's safe to avoid the transpose() temporary if the second argument
to foo() has intent(in), but it isn't safe otherwise.

A policy decision is needed here.  If we have:

    a = foo (transpose (a))

then we'll need a temporary for the result of either transpose() or foo().
Which should it be?  We'd normally avoid using a temporary for the result
of foo(), instead making foo() assign directly to "a".  But would it be
better (or at least OK) to avoid the transpose() temporary instead?

It's a tricky decision, so I'd appreciate any comments that seasoned
Fortran folks might have.  On the one hand, using a temporary for
transpose() is probably going to cause least user surprise, and will
be better if foo() has been optimised for a non-transposed argument.
On the other hand, if we had:

    a = matmul (transpose (a), b)

then it would be better to use the temporary for the matmul() result,
because matmul() is so much more efficient when given a transposed
first argument.

The way the patch is implemented means that using a temporary for
the result of foo() is easier than using a temporary for the result
of transpose(), so that's what I've done.  I can change it if there
are strong objections though.

The code that avoids the temporary in "a = foo (...)" uses a special
dependency function, gfc_check_fncall_dependency, to check whether "a"
aliases any of the arguments to foo().  A large part of the patch is
about generalising gfc_check_fncall_dependency so that it can be used
for the new optimisation as well.  Specifically:

  - The main loop body is split into a separate function,
    gfc_check_argument_var_dependency.

  - gfc_check_argument_var_dependency has a new "intent" argument to 
    to say whether the variable is an input and/or an output to the
    function.  It's an output when checking "a = foo (...)" and an
    input when checking transpose() arguments.

  - A new function, gfc_check_argument_dependency, generalises the
    variable-only check in gfc_check_argument_var_dependency to
    cope with any type of expression.  This is necessary because
    transpose() arguments are less restricted than the left hand
    side of an assignment.

  - gfc_check_fncall_dependency now takes an "intent" argument too,
    for the same reasons as above.  If the expression under test has
    intent "in", the function skips dependence checks for arguments
    that also have intent "in".

  - The "expr" argument to gfc_check_fncall_dependency is replaced
    with separate a function symbol and argument list.  This allows
    the function to be used for subroutine as well as function calls.

At the moment, we don't have an easy way of determining the intent
of an intrinsic function argument.  The patch therefore treats all
such arguements as having unknown intent.  Thus:

    b = matmul (transpose (a), a)

will not be optimised, because we assume that the second argument
to matmul() might not be intent(in).  This can be cleaned up later
if we ever have a direct representation of an intrinic function's
interface.

There are various possible ways of representing optimised calls to
transpose().  For example, we could use a new intrinsic code like
GFC_ISYM_TRANSPOSE_INLINE to distinguish them from normal calls.
However, if we did that, we'd either have to make gfc_is_intrinsic_libcall
pretend that GFC_ISYM_TRANSPOSE_INLINE was a libcall, or we'd have to teach
its caller (gfc_walk_intrinsic_function) about the new special case.
Neither felt right to me.  The whole point of adding the new code would
be to emphasise that the function no longer produces a call instruction,
yet most of the frontend (including gfc_walk_intrinsic_function) doesn't
care about such a low-level detail.

I was also worried that using new codes might not scale.  What if we
wanted to record some other property about the call too?  We might then
need a four-way split.

In the end, I thought it would be better to add a new gfc_expr flag to say
"we're avoiding the temporary here".  We already have an "unsigned int"
bitfield "from_H" (for Hollerith constants), so there's no space penalty
in doing this.  It will then be easy to check whether the optimisation
has been applied, but without affecting code that doesn't care either way.

The optimisation itself is implemented as part of the resolve pass.
This was Paul's suggestion (thanks), and means that we can take
advantage of the existing expression-walking code.

The matmul patch
================

The original matmul code has two alternative loops, one optimised for
when the inner dimensions have a stride of 1, and one that handles any
stride.  Victor removed this distinction, instead using one loop for
transposed arrays and one for non-transposed arrays.  I thought it would
be better to keep the original optimisation too.  The patch therefore
provides four alternative loops:

    {transposed, non-transposed} x {strides are 1, general strides}

Testing
=======

I've bootstrapped and regression-tested the patch on i686-pc-linux-gnu.
I also used the attached transpose-checks.f90 test to make sure that
transpose()s were being optimised as expected.

I also ran galgel on powerpc64-linux-gnu.  The base run used neither
patch and the peak results used both.  The results were:

    178.galgel        2900       364       797      2900       227      1275 
    178.galgel        2900       365       794      2900       228      1275*
    178.galgel        2900       364       796*     2900       228      1274 

Richard


gcc/fortran/
	* Make-lang.in (fortran/trans-expr.o, fortran/trans-resolve.o): Depend
	on fortran/dependency.h.
	* gfortran.h (gfc_expr): Add an "inline_noncopying_intrinsic" flag.
	* dependency.h (gfc_get_noncopying_intrinsic_argument): Declare.
	(gfc_check_fncall_dependency): Change prototype.
	* dependency.c (gfc_get_noncopying_intrinsic_argument): New function.
	(gfc_check_argument_var_dependency): New function, split from
	gfc_check_fncall_dependency.
	(gfc_check_argument_dependency): New function.
	(gfc_check_fncall_dependency): Replace the expression parameter with
	separate symbol and argument list parameters.  Generalize the function
	to handle dependencies for any type of expression, not just variables.
	Accept a further argument giving the intent of the expression being
	tested.  Ignore	intent(in) arguments if that expression is also
	intent(in).
	* resolve.c: Include dependency.h.
	(find_noncopying_intrinsics): New function.
	(resolve_function, resolve_call): Call it on success.
	* trans-array.h (gfc_conv_array_transpose): Declare.
	(gfc_check_fncall_dependency): Remove prototype.
	* trans-array.c (gfc_conv_array_transpose): New function.
	(gfc_conv_expr_descriptor): Expand comments.  Generalise the
	handling of the !want_pointer && !direct_byref case.
	* trans-intrinsic.c (gfc_conv_intrinsic_function): Don't use the
	libcall handling if the expression is to be evaluated inline.
	Add a case for handling inline transpose()s.
	* trans-expr.c: Include dependency.h.
	(gfc_trans_arrayfunc_assign): Adjust for the new interface provided
	by gfc_check_fncall_dependency.

libgfortran/
	* m4/matmul.m4: Use a different order in the special case of a
	transposed first argument.
	* generated/matmul_c4.c, generated/matmul_c8.c,
	* generated/matmul_i4.c, generated/matmul_i8.c,
	* generated/matmul_l4.c, generated/matmul_l8.c,
	* generated/matmul_r4.c, generated/matmul_r8.c: Regenerated.

Index: gcc/fortran/Make-lang.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/Make-lang.in,v
retrieving revision 1.20
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.20 Make-lang.in
*** gcc/fortran/Make-lang.in 13 Jul 2005 13:33:31 -0000 1.20
--- gcc/fortran/Make-lang.in 18 Aug 2005 08:41:04 -0000
*************** fortran/trans-decl.o: $(GFORTRAN_TRANS_D
*** 289,295 ****
  fortran/trans-types.o: $(GFORTRAN_TRANS_DEPS) gt-fortran-trans-types.h \
    real.h toplev.h $(TARGET_H)
  fortran/trans-const.o: $(GFORTRAN_TRANS_DEPS)
! fortran/trans-expr.o: $(GFORTRAN_TRANS_DEPS)
  fortran/trans-stmt.o: $(GFORTRAN_TRANS_DEPS)
  fortran/trans-io.o: $(GFORTRAN_TRANS_DEPS) gt-fortran-trans-io.h
  fortran/trans-array.o: $(GFORTRAN_TRANS_DEPS)
--- 289,295 ----
  fortran/trans-types.o: $(GFORTRAN_TRANS_DEPS) gt-fortran-trans-types.h \
    real.h toplev.h $(TARGET_H)
  fortran/trans-const.o: $(GFORTRAN_TRANS_DEPS)
! fortran/trans-expr.o: $(GFORTRAN_TRANS_DEPS) fortran/dependency.h
  fortran/trans-stmt.o: $(GFORTRAN_TRANS_DEPS)
  fortran/trans-io.o: $(GFORTRAN_TRANS_DEPS) gt-fortran-trans-io.h
  fortran/trans-array.o: $(GFORTRAN_TRANS_DEPS)
*************** fortran/trans-intrinsic.o: $(GFORTRAN_TR
*** 297,300 ****
    gt-fortran-trans-intrinsic.h
  fortran/dependency.o: $(GFORTRAN_TRANS_DEPS) fortran/dependency.h
  fortran/trans-common.o: $(GFORTRAN_TRANS_DEPS)
! 
--- 297,300 ----
    gt-fortran-trans-intrinsic.h
  fortran/dependency.o: $(GFORTRAN_TRANS_DEPS) fortran/dependency.h
  fortran/trans-common.o: $(GFORTRAN_TRANS_DEPS)
! fortran/resolve.o: fortran/dependency.h
Index: gcc/fortran/gfortran.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/gfortran.h,v
retrieving revision 1.78
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.78 gfortran.h
*** gcc/fortran/gfortran.h 9 Aug 2005 17:33:10 -0000 1.78
--- gcc/fortran/gfortran.h 18 Aug 2005 08:41:05 -0000
*************** typedef struct gfc_expr
*** 1079,1084 ****
--- 1079,1087 ----
  
    /* True if it is converted from Hollerith constant.  */
    unsigned int from_H : 1;
+   /* True if the expression is a call to a function that returns an array,
+      and if we have decided not to allocate temporary data for that array.  */
+   unsigned int inline_noncopying_intrinsic : 1;
  
    union
    {
Index: gcc/fortran/dependency.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/dependency.h,v
retrieving revision 1.4
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.4 dependency.h
*** gcc/fortran/dependency.h 25 Jun 2005 00:40:34 -0000 1.4
--- gcc/fortran/dependency.h 18 Aug 2005 08:41:05 -0000
*************** 02110-1301, USA.  */
*** 21,27 ****
  
  
  
! int gfc_check_fncall_dependency (gfc_expr *, gfc_expr *);
  int gfc_check_dependency (gfc_expr *, gfc_expr *, gfc_expr **, int);
  int gfc_is_same_range (gfc_array_ref *, gfc_array_ref *, int, int);
  int gfc_dep_compare_expr (gfc_expr *, gfc_expr *);
--- 21,29 ----
  
  
  
! gfc_expr *gfc_get_noncopying_intrinsic_argument (gfc_expr *);
! int gfc_check_fncall_dependency (gfc_expr *, sym_intent, gfc_symbol *,
! 				 gfc_actual_arglist *);
  int gfc_check_dependency (gfc_expr *, gfc_expr *, gfc_expr **, int);
  int gfc_is_same_range (gfc_array_ref *, gfc_array_ref *, int, int);
  int gfc_dep_compare_expr (gfc_expr *, gfc_expr *);
Index: gcc/fortran/dependency.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/dependency.c,v
retrieving revision 1.8
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.8 dependency.c
*** gcc/fortran/dependency.c 25 Jun 2005 00:40:34 -0000 1.8
--- gcc/fortran/dependency.c 18 Aug 2005 08:41:05 -0000
*************** gfc_is_same_range (gfc_array_ref * ar1, 
*** 175,246 ****
  }
  
  
! /* Dependency checking for direct function return by reference.
!    Returns true if the arguments of the function depend on the
!    destination.  This is considerably less conservative than other
!    dependencies because many function arguments will already be
!    copied into a temporary.  */
  
! int
! gfc_check_fncall_dependency (gfc_expr * dest, gfc_expr * fncall)
  {
-   gfc_actual_arglist *actual;
    gfc_ref *ref;
-   gfc_expr *expr;
    int n;
  
!   gcc_assert (dest->expr_type == EXPR_VARIABLE
! 	  && fncall->expr_type == EXPR_FUNCTION);
!   gcc_assert (fncall->rank > 0);
  
!   for (actual = fncall->value.function.actual; actual; actual = actual->next)
      {
!       expr = actual->expr;
  
!       /* Skip args which are not present.  */
!       if (!expr)
! 	continue;
  
!       /* Non-variable expressions will be allocated temporaries anyway.  */
!       switch (expr->expr_type)
  	{
! 	case EXPR_VARIABLE:
! 	  if (expr->rank > 1)
! 	    {
! 	      /* This is an array section.  */
! 	      for (ref = expr->ref; ref; ref = ref->next)
! 		{
! 		  if (ref->type == REF_ARRAY && ref->u.ar.type != AR_ELEMENT)
! 		    break;
! 		}
! 	      gcc_assert (ref);
! 	      /* AR_FULL can't contain vector subscripts.  */
! 	      if (ref->u.ar.type == AR_SECTION)
! 		{
! 		  for (n = 0; n < ref->u.ar.dimen; n++)
! 		    {
! 		      if (ref->u.ar.dimen_type[n] == DIMEN_VECTOR)
! 			break;
! 		    }
! 		  /* Vector subscript array sections will be copied to a
! 		     temporary.  */
! 		  if (n != ref->u.ar.dimen)
! 		    continue;
! 		}
! 	    }
  
! 	  if (gfc_check_dependency (dest, actual->expr, NULL, 0))
! 	    return 1;
! 	  break;
  
- 	case EXPR_ARRAY:
- 	  if (gfc_check_dependency (dest, expr, NULL, 0))
- 	    return 1;
- 	  break;
  
! 	default:
! 	  break;
  	}
      }
  
    return 0;
--- 175,314 ----
  }
  
  
! /* Some array-returning intrinsics can be implemented by reusing the
!    data from one of the array arguments.  For example, TRANPOSE does
!    not necessarily need to allocate new data: it can be implemented
!    by copying the original array's descriptor and simply swapping the
!    two dimension specifications.
  
!    If EXPR is a call to such an intrinsic, return the argument
!    whose data can be reused, otherwise return NULL.  */
! 
! gfc_expr *
! gfc_get_noncopying_intrinsic_argument (gfc_expr * expr)
! {
!   if (expr->expr_type != EXPR_FUNCTION || !expr->value.function.isym)
!     return 0;
! 
!   switch (expr->value.function.isym->generic_id)
!     {
!     case GFC_ISYM_TRANSPOSE:
!       return expr->value.function.actual->expr;
! 
!     default:
!       return 0;
!     }
! }
! 
! 
! /* Return true if array variable VAR could be passed to the same function
!    as argument EXPR without interfering with EXPR.  INTENT is the intent
!    of VAR.
! 
!    This is considerably less conservative than other dependencies
!    because many function arguments will already be copied into a
!    temporary.  */
! 
! static int
! gfc_check_argument_var_dependency (gfc_expr * var, sym_intent intent,
! 				   gfc_expr * expr)
  {
    gfc_ref *ref;
    int n;
  
!   gcc_assert (var->expr_type == EXPR_VARIABLE);
!   gcc_assert (var->rank > 0);
  
!   switch (expr->expr_type)
      {
!     case EXPR_VARIABLE:
!       if (expr->rank > 1)
! 	{
! 	  /* This is an array section.  */
! 	  for (ref = expr->ref; ref; ref = ref->next)
! 	    if (ref->type == REF_ARRAY && ref->u.ar.type != AR_ELEMENT)
! 	      break;
! 	  gcc_assert (ref);
! 
! 	  /* Vector subscript array sections will be copied to a
! 	     temporary.  AR_FULL can't contain vector subscripts.  */
! 	  if (ref->u.ar.type == AR_SECTION)
! 	    for (n = 0; n < ref->u.ar.dimen; n++)
! 	      if (ref->u.ar.dimen_type[n] == DIMEN_VECTOR)
! 		return false;
! 	}
!       return gfc_check_dependency (var, expr, NULL, 0);
  
!     case EXPR_ARRAY:
!       return gfc_check_dependency (var, expr, NULL, 0);
  
!     case EXPR_FUNCTION:
!       if (intent != INTENT_IN && expr->inline_noncopying_intrinsic)
  	{
! 	  expr = gfc_get_noncopying_intrinsic_argument (expr);
! 	  return gfc_check_argument_var_dependency (var, intent, expr);
! 	}
!       return 0;
  
!     default:
!       return 0;
!     }
! }
  
  
! /* Like gfc_check_argument_var_dependency, but extended to any
!    array expression OTHER, not just variables.  */
! 
! static int
! gfc_check_argument_dependency (gfc_expr * other, sym_intent intent,
! 			       gfc_expr * expr)
! {
!   switch (other->expr_type)
!     {
!     case EXPR_VARIABLE:
!       return gfc_check_argument_var_dependency (other, intent, expr);
! 
!     case EXPR_FUNCTION:
!       if (other->inline_noncopying_intrinsic)
! 	{
! 	  other = gfc_get_noncopying_intrinsic_argument (other);
! 	  return gfc_check_argument_dependency (other, INTENT_IN, expr);
  	}
+       return 0;
+ 
+     default:
+       return 0;
+     }
+ }
+ 
+ 
+ /* Like gfc_check_argument_dependency, but check all the arguments in ACTUAL.
+    FNSYM is the function being called, or NULL if not known.  */
+ 
+ int
+ gfc_check_fncall_dependency (gfc_expr * other, sym_intent intent,
+ 			     gfc_symbol * fnsym, gfc_actual_arglist * actual)
+ {
+   gfc_formal_arglist *formal;
+   gfc_expr *expr;
+ 
+   formal = fnsym ? fnsym->formal : NULL;
+   for (; actual; actual = actual->next, formal = formal ? formal->next : NULL)
+     {
+       expr = actual->expr;
+ 
+       /* Skip args which are not present.  */
+       if (!expr)
+ 	continue;
+ 
+       /* Skip intent(in) arguments if OTHER itself is intent(in).  */
+       if (formal
+ 	  && intent == INTENT_IN
+ 	  && formal->sym->attr.intent == INTENT_IN)
+ 	continue;
+ 
+       if (gfc_check_argument_dependency (other, intent, expr))
+ 	return 1;
      }
  
    return 0;
Index: gcc/fortran/resolve.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/resolve.c,v
retrieving revision 1.50
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.50 resolve.c
*** gcc/fortran/resolve.c 7 Aug 2005 22:56:17 -0000 1.50
--- gcc/fortran/resolve.c 18 Aug 2005 08:41:06 -0000
*************** 02110-1301, USA.  */
*** 24,29 ****
--- 24,30 ----
  #include "system.h"
  #include "gfortran.h"
  #include "arith.h"  /* For gfc_compare_expr().  */
+ #include "dependency.h"
  
  
  /* Stack to push the current if we descend into a block during
*************** resolve_actual_arglist (gfc_actual_argli
*** 749,754 ****
--- 750,773 ----
  }
  
  
+ /* Go through each actual argument in ACTUAL and see if it can be
+    implemented as an inlined, non-copying intrinsic.  FNSYM is the
+    function being called, or NULL if not known.  */
+ 
+ static void
+ find_noncopying_intrinsics (gfc_symbol * fnsym, gfc_actual_arglist * actual)
+ {
+   gfc_actual_arglist *ap;
+   gfc_expr *expr;
+ 
+   for (ap = actual; ap; ap = ap->next)
+     if (ap->expr
+ 	&& (expr = gfc_get_noncopying_intrinsic_argument (ap->expr))
+ 	&& !gfc_check_fncall_dependency (expr, INTENT_IN, fnsym, actual))
+       ap->expr->inline_noncopying_intrinsic = 1;
+ }
+ 
+ 
  /************* Function resolution *************/
  
  /* Resolve a function call known to be generic.
*************** resolve_function (gfc_expr * expr)
*** 1095,1100 ****
--- 1114,1122 ----
  	}
      }
  
+   if (t == SUCCESS)
+     find_noncopying_intrinsics (expr->value.function.esym,
+ 				expr->value.function.actual);
    return t;
  }
  
*************** resolve_call (gfc_code * c)
*** 1317,1343 ****
    if (resolve_actual_arglist (c->ext.actual) == FAILURE)
      return FAILURE;
  
!   if (c->resolved_sym != NULL)
!     return SUCCESS;
! 
!   switch (procedure_kind (c->symtree->n.sym))
!     {
!     case PTYPE_GENERIC:
!       t = resolve_generic_s (c);
!       break;
  
!     case PTYPE_SPECIFIC:
!       t = resolve_specific_s (c);
!       break;
  
!     case PTYPE_UNKNOWN:
!       t = resolve_unknown_s (c);
!       break;
  
!     default:
!       gfc_internal_error ("resolve_subroutine(): bad function type");
!     }
  
    return t;
  }
  
--- 1339,1366 ----
    if (resolve_actual_arglist (c->ext.actual) == FAILURE)
      return FAILURE;
  
!   t = SUCCESS;
!   if (c->resolved_sym == NULL)
!     switch (procedure_kind (c->symtree->n.sym))
!       {
!       case PTYPE_GENERIC:
! 	t = resolve_generic_s (c);
! 	break;
  
!       case PTYPE_SPECIFIC:
! 	t = resolve_specific_s (c);
! 	break;
  
!       case PTYPE_UNKNOWN:
! 	t = resolve_unknown_s (c);
! 	break;
  
!       default:
! 	gfc_internal_error ("resolve_subroutine(): bad function type");
!       }
  
+   if (t == SUCCESS)
+     find_noncopying_intrinsics (c->resolved_sym, c->ext.actual);
    return t;
  }
  
Index: gcc/fortran/trans-array.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/trans-array.h,v
retrieving revision 1.12
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.12 trans-array.h
*** gcc/fortran/trans-array.h 6 Aug 2005 12:56:18 -0000 1.12
--- gcc/fortran/trans-array.h 18 Aug 2005 08:41:06 -0000
*************** void gfc_conv_tmp_ref (gfc_se *);
*** 86,91 ****
--- 86,93 ----
  void gfc_conv_expr_descriptor (gfc_se *, gfc_expr *, gfc_ss *);
  /* Convert an array for passing as an actual function parameter.  */
  void gfc_conv_array_parameter (gfc_se *, gfc_expr *, gfc_ss *, int);
+ /* Evaluate and transpose a matrix expression.  */
+ void gfc_conv_array_transpose (gfc_se *, gfc_expr *);
  
  /* These work with both descriptors and descriptorless arrays.  */
  tree gfc_conv_array_data (tree);
*************** tree gfc_conv_descriptor_ubound (tree, t
*** 107,114 ****
  
  /* Dependency checking for WHERE and FORALL.  */
  int gfc_check_dependency (gfc_expr *, gfc_expr *, gfc_expr **, int);
- /* Dependency checking for function calls.  */
- int gfc_check_fncall_dependency (gfc_expr *, gfc_expr *);
  
  /* Add pre-loop scalarization code for intrinsic functions which require
     special handling.  */
--- 109,114 ----
Index: gcc/fortran/trans-array.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/trans-array.c,v
retrieving revision 1.54
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.54 trans-array.c
*** gcc/fortran/trans-array.c 6 Aug 2005 12:56:18 -0000 1.54
--- gcc/fortran/trans-array.c 18 Aug 2005 08:41:06 -0000
*************** gfc_trans_allocate_temp_array (gfc_loopi
*** 620,625 ****
--- 620,713 ----
  }
  
  
+ /* Generate code to tranpose array EXPR by creating a new descriptor
+    in which the dimension specifications have been reversed.  */
+ 
+ void
+ gfc_conv_array_transpose (gfc_se * se, gfc_expr * expr)
+ {
+   tree dest, src, dest_index, src_index;
+   gfc_loopinfo *loop;
+   gfc_ss_info *dest_info, *src_info;
+   gfc_ss *dest_ss, *src_ss;
+   gfc_se src_se;
+   int n;
+ 
+   loop = se->loop;
+ 
+   src_ss = gfc_walk_expr (expr);
+   dest_ss = se->ss;
+ 
+   src_info = &src_ss->data.info;
+   dest_info = &dest_ss->data.info;
+ 
+   /* Get a descriptor for EXPR.  */
+   gfc_init_se (&src_se, NULL);
+   gfc_conv_expr_descriptor (&src_se, expr, src_ss);
+   gfc_add_block_to_block (&se->pre, &src_se.pre);
+   gfc_add_block_to_block (&se->post, &src_se.post);
+   src = src_se.expr;
+ 
+   /* Allocate a new descriptor for the return value.  */
+   dest = gfc_create_var (TREE_TYPE (src), "atmp");
+   dest_info->descriptor = dest;
+   se->expr = dest;
+ 
+   /* Copy across the dtype field.  */
+   gfc_add_modify_expr (&se->pre,
+ 		       gfc_conv_descriptor_dtype (dest),
+ 		       gfc_conv_descriptor_dtype (src));
+ 
+   /* Copy the dimension information, renumbering dimension 1 to 0 and
+      0 to 1.  */
+   gcc_assert (dest_info->dimen == 2);
+   gcc_assert (src_info->dimen == 2);
+   for (n = 0; n < 2; n++)
+     {
+       dest_info->delta[n] = integer_zero_node;
+       dest_info->start[n] = integer_zero_node;
+       dest_info->stride[n] = integer_one_node;
+       dest_info->dim[n] = n;
+ 
+       dest_index = gfc_rank_cst[n];
+       src_index = gfc_rank_cst[1 - n];
+ 
+       gfc_add_modify_expr (&se->pre,
+ 			   gfc_conv_descriptor_stride (dest, dest_index),
+ 			   gfc_conv_descriptor_stride (src, src_index));
+ 
+       gfc_add_modify_expr (&se->pre,
+ 			   gfc_conv_descriptor_lbound (dest, dest_index),
+ 			   gfc_conv_descriptor_lbound (src, src_index));
+ 
+       gfc_add_modify_expr (&se->pre,
+ 			   gfc_conv_descriptor_ubound (dest, dest_index),
+ 			   gfc_conv_descriptor_ubound (src, src_index));
+ 
+       if (!loop->to[n])
+         {
+ 	  gcc_assert (integer_zerop (loop->from[n]));
+ 	  loop->to[n] = build2 (MINUS_EXPR, gfc_array_index_type,
+ 				gfc_conv_descriptor_ubound (dest, dest_index),
+ 				gfc_conv_descriptor_lbound (dest, dest_index));
+         }
+     }
+ 
+   /* Copy the data pointer.  */
+   dest_info->data = gfc_conv_descriptor_data_get (src);
+   gfc_conv_descriptor_data_set (&se->pre, dest, dest_info->data);
+ 
+   /* Copy the offset.  This is not changed by transposition: the top-left
+      element is still at the same offset as before.  */
+   dest_info->offset = gfc_conv_descriptor_offset (src);
+   gfc_add_modify_expr (&se->pre,
+ 		       gfc_conv_descriptor_offset (dest),
+ 		       dest_info->offset);
+ 
+   if (dest_info->dimen > loop->temp_dim)
+     loop->temp_dim = dest_info->dimen;
+ }
+ 
  /* Make sure offset is a variable.  */
  
  static void
*************** gfc_trans_dummy_array_bias (gfc_symbol *
*** 3458,3468 ****
  }
  
  
! /* Convert an array for passing as an actual parameter.  Expressions and
     vector subscripts are evaluated and stored in a temporary, which is then
     passed.  For whole arrays the descriptor is passed.  For array sections
     a modified copy of the descriptor is passed, but using the original data.
!    Also used for array pointer assignments by setting se->direct_byref.  */
  
  void
  gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
--- 3546,3573 ----
  }
  
  
! /* Convert an array for passing as an actual argument.  Expressions and
     vector subscripts are evaluated and stored in a temporary, which is then
     passed.  For whole arrays the descriptor is passed.  For array sections
     a modified copy of the descriptor is passed, but using the original data.
! 
!    This function is also used for array pointer assignments, and there
!    are three cases:
! 
!      - want_pointer && !se->direct_byref
! 	 EXPR is an actual argument.  On exit, se->expr contains a
! 	 pointer to the array descriptor.
! 
!      - !want_pointer && !se->direct_byref
! 	 EXPR is an actual argument to an intrinsic function or the
! 	 left-hand side of a pointer assignment.  On exit, se->expr
! 	 contains the descriptor for EXPR.
! 
!      - !want_pointer && se->direct_byref
! 	 EXPR is the right-hand side of a pointer assignment and
! 	 se->expr is the descriptor for the previously-evaluated
! 	 left-hand side.  The function creates an assignment from
! 	 EXPR to se->expr.  */
  
  void
  gfc_conv_expr_descriptor (gfc_se * se, gfc_expr * expr, gfc_ss * ss)
*************** gfc_conv_expr_descriptor (gfc_se * se, g
*** 3637,3643 ****
    if (!need_tmp)
      loop.array_parameter = 1;
    else
!     gcc_assert (se->want_pointer && !se->direct_byref);
  
    /* Setup the scalarizing loops and bounds.  */
    gfc_conv_ss_startstride (&loop);
--- 3742,3749 ----
    if (!need_tmp)
      loop.array_parameter = 1;
    else
!     /* The right-hand side of a pointer assignment mustn't use a temporary.  */
!     gcc_assert (!se->direct_byref);
  
    /* Setup the scalarizing loops and bounds.  */
    gfc_conv_ss_startstride (&loop);
*************** gfc_conv_expr_descriptor (gfc_se * se, g
*** 3718,3734 ****
        gfc_add_modify_expr (&loop.pre, tmp, gfc_index_zero_node);
  
        gcc_assert (is_gimple_lvalue (desc));
-       se->expr = gfc_build_addr_expr (NULL, desc);
      }
    else if (expr->expr_type == EXPR_FUNCTION)
      {
        desc = info->descriptor;
  
-       if (se->want_pointer)
- 	se->expr = gfc_build_addr_expr (NULL_TREE, desc);
-       else
- 	se->expr = desc;
- 
        if (expr->ts.type == BT_CHARACTER)
  	se->string_length = expr->symtree->n.sym->ts.cl->backend_decl;
      }
--- 3824,3834 ----
*************** gfc_conv_expr_descriptor (gfc_se * se, g
*** 3879,3893 ****
  	  tmp = gfc_conv_descriptor_offset (parm);
  	  gfc_add_modify_expr (&loop.pre, tmp, gfc_index_zero_node);
  	}
  
!       if (!se->direct_byref)
! 	{
! 	  /* Get a pointer to the new descriptor.  */
!           if (se->want_pointer)
! 	    se->expr = gfc_build_addr_expr (NULL, parm);
!           else
!             se->expr = parm;
! 	}
      }
  
    gfc_add_block_to_block (&se->pre, &loop.pre);
--- 3979,3994 ----
  	  tmp = gfc_conv_descriptor_offset (parm);
  	  gfc_add_modify_expr (&loop.pre, tmp, gfc_index_zero_node);
  	}
+       desc = parm;
+     }
  
!   if (!se->direct_byref)
!     {
!       /* Get a pointer to the new descriptor.  */
!       if (se->want_pointer)
! 	se->expr = gfc_build_addr_expr (NULL, desc);
!       else
! 	se->expr = desc;
      }
  
    gfc_add_block_to_block (&se->pre, &loop.pre);
Index: gcc/fortran/trans-intrinsic.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/trans-intrinsic.c,v
retrieving revision 1.53
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.53 trans-intrinsic.c
*** gcc/fortran/trans-intrinsic.c 9 Aug 2005 17:33:12 -0000 1.53
--- gcc/fortran/trans-intrinsic.c 18 Aug 2005 08:41:07 -0000
*************** gfc_conv_intrinsic_function (gfc_se * se
*** 2690,2696 ****
  
    name = &expr->value.function.name[2];
  
!   if (expr->rank > 0)
      {
        lib = gfc_is_intrinsic_libcall (expr);
        if (lib != 0)
--- 2690,2696 ----
  
    name = &expr->value.function.name[2];
  
!   if (expr->rank > 0 && !expr->inline_noncopying_intrinsic)
      {
        lib = gfc_is_intrinsic_libcall (expr);
        if (lib != 0)
*************** gfc_conv_intrinsic_function (gfc_se * se
*** 2899,2904 ****
--- 2899,2914 ----
        gfc_conv_intrinsic_bound (se, expr, 0);
        break;
  
+     case GFC_ISYM_TRANSPOSE:
+       if (se->ss && se->ss->useflags)
+ 	{
+ 	  gfc_conv_tmp_array_ref (se);
+ 	  gfc_advance_se_ss_chain (se);
+ 	}
+       else
+ 	gfc_conv_array_transpose (se, expr->value.function.actual->expr);
+       break;
+ 
      case GFC_ISYM_LEN:
        gfc_conv_intrinsic_len (se, expr);
        break;
Index: gcc/fortran/trans-expr.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fortran/trans-expr.c,v
retrieving revision 1.56
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.56 trans-expr.c
*** gcc/fortran/trans-expr.c 6 Aug 2005 12:56:18 -0000 1.56
--- gcc/fortran/trans-expr.c 18 Aug 2005 08:41:07 -0000
*************** 02110-1301, USA.  */
*** 39,44 ****
--- 39,45 ----
  #include "trans-array.h"
  /* Only for gfc_trans_assign and gfc_trans_pointer_assign.  */
  #include "trans-stmt.h"
+ #include "dependency.h"
  
  static tree gfc_trans_structure_assign (tree dest, gfc_expr * expr);
  
*************** gfc_trans_arrayfunc_assign (gfc_expr * e
*** 2168,2174 ****
      return NULL;
  
    /* Check for a dependency.  */
!   if (gfc_check_fncall_dependency (expr1, expr2))
      return NULL;
  
    /* The frontend doesn't seem to bother filling in expr->symtree for intrinsic
--- 2169,2177 ----
      return NULL;
  
    /* Check for a dependency.  */
!   if (gfc_check_fncall_dependency (expr1, INTENT_OUT,
! 				   expr2->value.function.esym,
! 				   expr2->value.function.actual))
      return NULL;
  
    /* The frontend doesn't seem to bother filling in expr->symtree for intrinsic
Index: libgfortran/m4/matmul.m4
===================================================================
RCS file: /cvs/gcc/gcc/libgfortran/m4/matmul.m4,v
retrieving revision 1.14
diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.14 matmul.m4
*** libgfortran/m4/matmul.m4 7 Jul 2005 22:08:05 -0000 1.14
--- libgfortran/m4/matmul.m4 15 Aug 2005 13:23:19 -0000
*************** Boston, MA 02111-1307, USA.  */
*** 35,50 ****
  #include "libgfortran.h"'
  include(iparm.m4)dnl
  
! /* This is a C version of the following fortran pseudo-code. The key
!    point is the loop order -- we access all arrays column-first, which
!    improves the performance enough to boost galgel spec score by 50%.
  
     DIMENSION A(M,COUNT), B(COUNT,N), C(M,N)
!    C = 0
!    DO J=1,N
!      DO K=1,COUNT
         DO I=1,M
!          C(I,J) = C(I,J)+A(I,K)*B(K,J)
  */
  
  extern void matmul_`'rtype_code (rtype * retarray, rtype * a, rtype * b);
--- 35,63 ----
  #include "libgfortran.h"'
  include(iparm.m4)dnl
  
! /* The order of loops is different in the case of plain matrix
!    multiplication C=MATMUL(A,B), and in the frequent special case where
!    the argument A is the temporary result of a TRANSPOSE intrinsic:
!    C=MATMUL(TRANSPOSE(A),B).  Transposed temporaries are detected by
!    looking at their strides.
! 
!    The equivalent Fortran pseudo-code is:
  
     DIMENSION A(M,COUNT), B(COUNT,N), C(M,N)
!    IF (.NOT.IS_TRANSPOSED(A)) THEN
!      C = 0
!      DO J=1,N
!        DO K=1,COUNT
!          DO I=1,M
!            C(I,J) = C(I,J)+A(I,K)*B(K,J)
!    ELSE
!      DO J=1,N
         DO I=1,M
!          S = 0
!          DO K=1,COUNT
!            S = S+A(I,K)+B(K,J)
!          C(I,J) = S
!    ENDIF
  */
  
  extern void matmul_`'rtype_code (rtype * retarray, rtype * a, rtype * b);
*************** sinclude(`matmul_asm_'rtype_code`.m4')dn
*** 204,210 ****
  	    }
  	}
      }
!   else
      {
        for (y = 0; y < ycount; y++)
  	for (x = 0; x < xcount; x++)
--- 217,244 ----
  	    }
  	}
      }
!   else if (rxstride == 1 && aystride == 1 && bxstride == 1)
!     {
!       rtype_name *bbase_y;
!       rtype_name *dest_y;
!       rtype_name *abase_x;
!       rtype_name s;
! 
!       for (y = 0; y < ycount; y++)
! 	{
! 	  bbase_y = &bbase[y*bystride];
! 	  dest_y = &dest[y*rystride];
! 	  for (x = 0; x < xcount; x++)
! 	    {
! 	      abase_x = &abase[x*axstride];
! 	      s = (rtype_name) 0;
! 	      for (n = 0; n < count; n++)
! 		s += abase_x[n] * bbase_y[n];
! 	      dest_y[x] = s;
! 	    }
! 	}
!     }
!   else if (axstride < aystride)
      {
        for (y = 0; y < ycount; y++)
  	for (x = 0; x < xcount; x++)
*************** sinclude(`matmul_asm_'rtype_code`.m4')dn
*** 216,219 ****
--- 250,274 ----
  	    /* dest[x,y] += a[x,n] * b[n,y] */
  	    dest[x*rxstride + y*rystride] += abase[x*axstride + n*aystride] * bbase[n*bxstride + y*bystride];
      }
+   else
+     {
+       rtype_name *bbase_y;
+       rtype_name *dest_y;
+       rtype_name *abase_x;
+       rtype_name s;
+ 
+       for (y = 0; y < ycount; y++)
+ 	{
+ 	  bbase_y = &bbase[y*bystride];
+ 	  dest_y = &dest[y*rystride];
+ 	  for (x = 0; x < xcount; x++)
+ 	    {
+ 	      abase_x = &abase[x*axstride];
+ 	      s = (rtype_name) 0;
+ 	      for (n = 0; n < count; n++)
+ 		s += abase_x[n*aystride] * bbase_y[n*bxstride];
+ 	      dest_y[x*rxstride] = s;
+ 	    }
+ 	}
+     }
  }

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: transpose-checks.f90
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20050825/33744d8f/attachment.f90>


More information about the Gcc-patches mailing list