Bug 32512 - efficiency of RESHAPE and SPREAD
efficiency of RESHAPE and SPREAD
Status: NEW
Product: gcc
Classification: Unclassified
Component: fortran
4.3.0
: P3 enhancement
: ---
Assigned To: Not yet assigned to anyone
: missed-optimization
Depends on: 37577
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-26 06:27 UTC by Jaroslav Hajek
Modified: 2012-09-21 18:57 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-06-26 11:54:21


Attachments
speedtester of TRANSPOSE, RESHAPE and SPREAD (1.46 KB, text/plain)
2007-06-26 06:36 UTC, Jaroslav Hajek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jaroslav Hajek 2007-06-26 06:27:53 UTC
gfortran currently implements the TRANSPOSE intrinsic by simply rearranging the array descriptor, which can save considerable time when passed to an assumed-shape argument. Two more intrinsics can be implemented in this way:

RESHAPE can check for contiguity of its argument and if so, only construct a new descriptor sharing the data. 

If, as I assume, the array element reference a(i1,i2,i3) is computed equivalently to:
*((type *)(offset + i1*dim[0].stride + i2*dim[1].stride + i3*dim[2].stride)

then one can also implement a fast version of SPREAD by using zero strides
in the array descriptor
(thus the array reference will ignore some of the indices). This also covers broadcasting a scalar to an array (all zero strides).

For code using whole array manipulation intensively, I'd expect significant savings of both time and memory. Can't find at this moment how to attach a file in this form, so I'll post a test case afterwards.

cheers to all gfortran developers,
Jaroslav Hajek
Comment 1 Jaroslav Hajek 2007-06-26 06:36:36 UTC
Created attachment 13790 [details]
speedtester of TRANSPOSE, RESHAPE and SPREAD

this testcase tests speed of constructing an array via TRANSPOSE, RESHAPE and SPREAD, by passing the result to a computationally non-intensive subroutine (involving only one row and column of the matrix).
Comment 2 Jaroslav Hajek 2007-06-26 10:56:46 UTC
Just an informal note: 
Apparently (using the testcase), EkoPath 3.0 has a fast RESHAPE but not
fast SPREAD, while Intel 9.1 and current g95 have neither.

Comment 3 Tobias Burnus 2007-06-26 11:54:21 UTC
Just to show how much time can be saved, I compared the speed with different compilers (on x86-64/Linux):

gfortran sunstudio12  ifort9.1/10 open64  NAGf95  g95
-------------------------------------------------------
 0.024   0.02575      0.024       0.024   0.03    0.028 (1)
 0.028   0.025365     0.024       0.024   0.03    0.028 (2)
 6.12    0.025514     2.420       0.024   2.59   21.373 (3)
10.47    4.307007   18.4371       2.616  11.58   10.660 (4)
-------------------------------------------------------
(1) matrix_op(a), (2) matrix_op(transpose(a))
(3) matrix_op(reshape(b,shape=(/n,n+1/)))
(4) matrix_op(spread(c,dim=1,ncopies=n))

Thus especially RESHAPE has a huge potential.
Comment 4 Thomas Koenig 2008-03-28 23:23:44 UTC
Subject: Bug 32512

Author: tkoenig
Date: Fri Mar 28 23:22:49 2008
New Revision: 133702

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=133702
Log:
2008-03-28  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR libfortran/32972
	PR libfortran/32512
	* Makefile.am:  Add new variable, i_spread_c, containing
	pack_i1.c, pack_i2.c, pack_i4.c, pack_i8.c, spread_i16.c,
	spread_r4.c, spread_r8.c, spread_r10.c, spread_r16.c,
	spread_c4.c, spread_c8.c, spread_c10.c, spread_c16.c.
	* Makefile.in:  Regenerated.
	* libgfortran.h:  Add prototypes for spread_i1, spread_i2,
	spread_i4, spread_i8, spread_i16, spread_r4, spread_r8,
	spread_c4, spread_c8, spread_c10, spread_c16,
	spread_scalar_i1, spread_scalar_i2, spread_scalar_i4,
	spread_scalar_i8, spread_scalar_i16, spread_scalar_r4
	spread_scalar_r8, spread_scalar_c4, spread_scalar_c8,
	spread_scalar_c10 and spread_scalar_c16.
	Add macros to isolate both type and size information
	from array descriptors with a single mask operation.
	* intrinsics/spread_generic.c:  Add calls to specific
	spread functions.
	* m4/spread.m4:  New file.
	* generated/spread_i1.c:  New file.
	* generated/spread_i2.c:  New file.
	* generated/spread_i4.c:  New file.
	* generated/spread_i8.c:  New file.
	* generated/spread_i16.c:  New file.
	* generated/spread_r4.c:  New file.
	* generated/spread_r8.c:  New file.
	* generated/spread_r10.c:  New file.
	* generated/spread_r16.c:  New file.
	* generated/spread_c4.c:  New file.
	* generated/spread_c8.c:  New file.
	* generated/spread_c10.c:  New file.
	* generated/spread_c16.c:  New file.

2008-03-28  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR libfortran/32972
	PR libfortran/32512
	* intrinsic_spread_1.f90:  New file.
	* intrinsic_spread_2.f90:  New file.
	* intrinsic_spread_3.f90:  New file.


Added:
    trunk/gcc/testsuite/gfortran.dg/intrinsic_spread_1.f90
    trunk/gcc/testsuite/gfortran.dg/intrinsic_spread_2.f90
    trunk/gcc/testsuite/gfortran.dg/intrinsic_spread_3.f90
    trunk/libgfortran/generated/spread_c10.c
    trunk/libgfortran/generated/spread_c16.c
    trunk/libgfortran/generated/spread_c4.c
    trunk/libgfortran/generated/spread_c8.c
    trunk/libgfortran/generated/spread_i1.c
    trunk/libgfortran/generated/spread_i16.c
    trunk/libgfortran/generated/spread_i2.c
    trunk/libgfortran/generated/spread_i4.c
    trunk/libgfortran/generated/spread_i8.c
    trunk/libgfortran/generated/spread_r10.c
    trunk/libgfortran/generated/spread_r16.c
    trunk/libgfortran/generated/spread_r4.c
    trunk/libgfortran/generated/spread_r8.c
    trunk/libgfortran/m4/spread.m4
Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/libgfortran/ChangeLog
    trunk/libgfortran/Makefile.am
    trunk/libgfortran/Makefile.in
    trunk/libgfortran/intrinsics/spread_generic.c
    trunk/libgfortran/libgfortran.h

Comment 5 Thomas Koenig 2008-04-13 20:16:46 UTC
Subject: Bug 32512

Author: tkoenig
Date: Sun Apr 13 20:15:58 2008
New Revision: 134245

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=134245
Log:
2008-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>
	Francois-Xavier Coudert  <fxcoudert@gcc.gnu.org>

	PR libfortran/32972
	PR libfortran/32512
	configure.ac:  Add test for uintptr_t.
	configure:  Regenerated.
	config.h.in:  Regenerated.
	* libgfortran.h: GFC_DTYPE_DERIVED_1:  New macro.
	GFC_DTYPE_DERIVED_2:  New macro.
	GFC_DTYPE_DERIVED_4:  New macro.
	GFC_DTYPE_DERIVED_8:  New macro.
	GFC_DTYPE_DERIVED_16:  New macro.
	GFC_UNALIGNED_2:  New macro.
	GFC_UNALIGNED_4:  New macro.
	GFC_UNALIGNED_8:  New macro.
	GFC_UNALIGNED_16:  New macro.
	intptr_t:  Define if we don't have it.
	uintptr_t:  Likewise.
	* runtime/backtrace.c (show_backtrace):  Use intptr_t.
	* intrinsics/signal.c (signal_sub):  Likewise.
	(signal_sub_int):  Likewise.
	(alarm_sub_int_i4):  Likewise.
	* intrinsics/spread_generic.c (spread):  Use the integer
	routines for handling derived types of sizes 1, 2, 4, 8 and 16
	if the alignment of all pointers is correct.
	(spread_scalar):  Likewise.
	* intrinsics/pack_generic.c (pack):  Likewise.
	Use GFD_DTYPE_TYPE_SIZE to avoid nested switch statements.
	* intrinsics/unpack_generic.c (unpack1):  Likewise.
	(unpack0):  Likewise.
	* runtime/in_pack_generic.c (internal_pack):  Likewise.
	* runtime/in_unpack_generic.c (internal_unpack):  Likewise.

2008-04-13  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR libfortran/32972
	PR libfortran/32512
	* gfortran.dg/internal_pack_1.f90:  Add test for derived type.
	* gfortran.dg/intrinsic_spread_1.f90:  Likewise.
	* gfortran.dg/intrinsic_pack_1.f90:  Likewise.
	* gfortran.dg/intrinsic_unpack_1.f90:  Likewise.


Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gfortran.dg/internal_pack_1.f90
    trunk/gcc/testsuite/gfortran.dg/intrinsic_pack_1.f90
    trunk/gcc/testsuite/gfortran.dg/intrinsic_spread_1.f90
    trunk/gcc/testsuite/gfortran.dg/intrinsic_unpack_1.f90
    trunk/libgfortran/ChangeLog
    trunk/libgfortran/config.h.in
    trunk/libgfortran/configure
    trunk/libgfortran/configure.ac
    trunk/libgfortran/intrinsics/pack_generic.c
    trunk/libgfortran/intrinsics/signal.c
    trunk/libgfortran/intrinsics/spread_generic.c
    trunk/libgfortran/intrinsics/unpack_generic.c
    trunk/libgfortran/libgfortran.h
    trunk/libgfortran/runtime/backtrace.c
    trunk/libgfortran/runtime/in_pack_generic.c
    trunk/libgfortran/runtime/in_unpack_generic.c

Comment 6 Paul Thomas 2008-11-30 19:53:29 UTC
Index: libgfortran/generated/reshape_r4.c
===================================================================
--- libgfortran/generated/reshape_r4.c	(revision 142291)
+++ libgfortran/generated/reshape_r4.c	(working copy)
@@ -81,7 +81,7 @@
   const GFC_REAL_4 *src;
   int n;
   int dim;
-  int sempty, pempty, shape_empty;
+  int sempty, pempty, shape_empty, contiguous_data;
   index_type shape_data[GFC_MAX_DIMENSIONS];
 
   rdim = shape->dim[0].ubound - shape->dim[0].lbound + 1;
@@ -100,6 +100,24 @@
       }
     }
 
+  contiguous_data = 1;
+
+  for (n = 0; n < GFC_DESCRIPTOR_RANK(source); n++)
+    {
+      if (n == 0)
+      {
+        if (source->dim[n].stride != 1)
+	  contiguous_data = 0;
+      }
+      else if (contiguous_data)
+      {
+	int del = source->dim[n-1].ubound - source->dim[n-1].lbound + 1;
+        if (source->dim[n].stride != del)
+	  contiguous_data = 0;	
+      }
+    }
+   st_printf ("data %d\n", (int)contiguous_data);
+
   if (ret->data == NULL)
     {
       rs = 1;
@@ -112,6 +130,11 @@
 	  rs *= rex;
 	}
       ret->offset = 0;
+      if (contiguous_data)
+      {
+	ret->data = source->data;
+	return;
+      }
       ret->data = internal_malloc_size ( rs * sizeof (GFC_REAL_4));
       ret->dtype = (source->dtype & ~GFC_DTYPE_RANK_MASK) | rdim;
     }

Was an experiment to see if an improvement to reshape could easily be implemented in the library.  It fails completely, of course, because the source is freed!  This does show that a flag in the descriptor to say that 'this' cannot be freed would be a boon.

Paul
Comment 7 Daniel Franke 2010-05-09 18:43:08 UTC
(In reply to comment #6)
> Was an experiment to see if an improvement to reshape could easily be
> implemented in the library.  It fails completely, of course, because the source
> is freed!  This does show that a flag in the descriptor to say that 'this'
> cannot be freed would be a boon.

Thus marking as dependent on array-descriptor reform.