gfortran currently implements the TRANSPOSE intrinsic by simply rearranging the array descriptor, which can save considerable time when passed to an assumed-shape argument. Two more intrinsics can be implemented in this way: RESHAPE can check for contiguity of its argument and if so, only construct a new descriptor sharing the data. If, as I assume, the array element reference a(i1,i2,i3) is computed equivalently to: *((type *)(offset + i1*dim[0].stride + i2*dim[1].stride + i3*dim[2].stride) then one can also implement a fast version of SPREAD by using zero strides in the array descriptor (thus the array reference will ignore some of the indices). This also covers broadcasting a scalar to an array (all zero strides). For code using whole array manipulation intensively, I'd expect significant savings of both time and memory. Can't find at this moment how to attach a file in this form, so I'll post a test case afterwards. cheers to all gfortran developers, Jaroslav Hajek
Created attachment 13790 [details] speedtester of TRANSPOSE, RESHAPE and SPREAD this testcase tests speed of constructing an array via TRANSPOSE, RESHAPE and SPREAD, by passing the result to a computationally non-intensive subroutine (involving only one row and column of the matrix).
Just an informal note: Apparently (using the testcase), EkoPath 3.0 has a fast RESHAPE but not fast SPREAD, while Intel 9.1 and current g95 have neither.
Just to show how much time can be saved, I compared the speed with different compilers (on x86-64/Linux): gfortran sunstudio12 ifort9.1/10 open64 NAGf95 g95 ------------------------------------------------------- 0.024 0.02575 0.024 0.024 0.03 0.028 (1) 0.028 0.025365 0.024 0.024 0.03 0.028 (2) 6.12 0.025514 2.420 0.024 2.59 21.373 (3) 10.47 4.307007 18.4371 2.616 11.58 10.660 (4) ------------------------------------------------------- (1) matrix_op(a), (2) matrix_op(transpose(a)) (3) matrix_op(reshape(b,shape=(/n,n+1/))) (4) matrix_op(spread(c,dim=1,ncopies=n)) Thus especially RESHAPE has a huge potential.
Subject: Bug 32512 Author: tkoenig Date: Fri Mar 28 23:22:49 2008 New Revision: 133702 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=133702 Log: 2008-03-28 Thomas Koenig <tkoenig@gcc.gnu.org> PR libfortran/32972 PR libfortran/32512 * Makefile.am: Add new variable, i_spread_c, containing pack_i1.c, pack_i2.c, pack_i4.c, pack_i8.c, spread_i16.c, spread_r4.c, spread_r8.c, spread_r10.c, spread_r16.c, spread_c4.c, spread_c8.c, spread_c10.c, spread_c16.c. * Makefile.in: Regenerated. * libgfortran.h: Add prototypes for spread_i1, spread_i2, spread_i4, spread_i8, spread_i16, spread_r4, spread_r8, spread_c4, spread_c8, spread_c10, spread_c16, spread_scalar_i1, spread_scalar_i2, spread_scalar_i4, spread_scalar_i8, spread_scalar_i16, spread_scalar_r4 spread_scalar_r8, spread_scalar_c4, spread_scalar_c8, spread_scalar_c10 and spread_scalar_c16. Add macros to isolate both type and size information from array descriptors with a single mask operation. * intrinsics/spread_generic.c: Add calls to specific spread functions. * m4/spread.m4: New file. * generated/spread_i1.c: New file. * generated/spread_i2.c: New file. * generated/spread_i4.c: New file. * generated/spread_i8.c: New file. * generated/spread_i16.c: New file. * generated/spread_r4.c: New file. * generated/spread_r8.c: New file. * generated/spread_r10.c: New file. * generated/spread_r16.c: New file. * generated/spread_c4.c: New file. * generated/spread_c8.c: New file. * generated/spread_c10.c: New file. * generated/spread_c16.c: New file. 2008-03-28 Thomas Koenig <tkoenig@gcc.gnu.org> PR libfortran/32972 PR libfortran/32512 * intrinsic_spread_1.f90: New file. * intrinsic_spread_2.f90: New file. * intrinsic_spread_3.f90: New file. Added: trunk/gcc/testsuite/gfortran.dg/intrinsic_spread_1.f90 trunk/gcc/testsuite/gfortran.dg/intrinsic_spread_2.f90 trunk/gcc/testsuite/gfortran.dg/intrinsic_spread_3.f90 trunk/libgfortran/generated/spread_c10.c trunk/libgfortran/generated/spread_c16.c trunk/libgfortran/generated/spread_c4.c trunk/libgfortran/generated/spread_c8.c trunk/libgfortran/generated/spread_i1.c trunk/libgfortran/generated/spread_i16.c trunk/libgfortran/generated/spread_i2.c trunk/libgfortran/generated/spread_i4.c trunk/libgfortran/generated/spread_i8.c trunk/libgfortran/generated/spread_r10.c trunk/libgfortran/generated/spread_r16.c trunk/libgfortran/generated/spread_r4.c trunk/libgfortran/generated/spread_r8.c trunk/libgfortran/m4/spread.m4 Modified: trunk/gcc/testsuite/ChangeLog trunk/libgfortran/ChangeLog trunk/libgfortran/Makefile.am trunk/libgfortran/Makefile.in trunk/libgfortran/intrinsics/spread_generic.c trunk/libgfortran/libgfortran.h
Subject: Bug 32512 Author: tkoenig Date: Sun Apr 13 20:15:58 2008 New Revision: 134245 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=134245 Log: 2008-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> Francois-Xavier Coudert <fxcoudert@gcc.gnu.org> PR libfortran/32972 PR libfortran/32512 configure.ac: Add test for uintptr_t. configure: Regenerated. config.h.in: Regenerated. * libgfortran.h: GFC_DTYPE_DERIVED_1: New macro. GFC_DTYPE_DERIVED_2: New macro. GFC_DTYPE_DERIVED_4: New macro. GFC_DTYPE_DERIVED_8: New macro. GFC_DTYPE_DERIVED_16: New macro. GFC_UNALIGNED_2: New macro. GFC_UNALIGNED_4: New macro. GFC_UNALIGNED_8: New macro. GFC_UNALIGNED_16: New macro. intptr_t: Define if we don't have it. uintptr_t: Likewise. * runtime/backtrace.c (show_backtrace): Use intptr_t. * intrinsics/signal.c (signal_sub): Likewise. (signal_sub_int): Likewise. (alarm_sub_int_i4): Likewise. * intrinsics/spread_generic.c (spread): Use the integer routines for handling derived types of sizes 1, 2, 4, 8 and 16 if the alignment of all pointers is correct. (spread_scalar): Likewise. * intrinsics/pack_generic.c (pack): Likewise. Use GFD_DTYPE_TYPE_SIZE to avoid nested switch statements. * intrinsics/unpack_generic.c (unpack1): Likewise. (unpack0): Likewise. * runtime/in_pack_generic.c (internal_pack): Likewise. * runtime/in_unpack_generic.c (internal_unpack): Likewise. 2008-04-13 Thomas Koenig <tkoenig@gcc.gnu.org> PR libfortran/32972 PR libfortran/32512 * gfortran.dg/internal_pack_1.f90: Add test for derived type. * gfortran.dg/intrinsic_spread_1.f90: Likewise. * gfortran.dg/intrinsic_pack_1.f90: Likewise. * gfortran.dg/intrinsic_unpack_1.f90: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/internal_pack_1.f90 trunk/gcc/testsuite/gfortran.dg/intrinsic_pack_1.f90 trunk/gcc/testsuite/gfortran.dg/intrinsic_spread_1.f90 trunk/gcc/testsuite/gfortran.dg/intrinsic_unpack_1.f90 trunk/libgfortran/ChangeLog trunk/libgfortran/config.h.in trunk/libgfortran/configure trunk/libgfortran/configure.ac trunk/libgfortran/intrinsics/pack_generic.c trunk/libgfortran/intrinsics/signal.c trunk/libgfortran/intrinsics/spread_generic.c trunk/libgfortran/intrinsics/unpack_generic.c trunk/libgfortran/libgfortran.h trunk/libgfortran/runtime/backtrace.c trunk/libgfortran/runtime/in_pack_generic.c trunk/libgfortran/runtime/in_unpack_generic.c
Index: libgfortran/generated/reshape_r4.c =================================================================== --- libgfortran/generated/reshape_r4.c (revision 142291) +++ libgfortran/generated/reshape_r4.c (working copy) @@ -81,7 +81,7 @@ const GFC_REAL_4 *src; int n; int dim; - int sempty, pempty, shape_empty; + int sempty, pempty, shape_empty, contiguous_data; index_type shape_data[GFC_MAX_DIMENSIONS]; rdim = shape->dim[0].ubound - shape->dim[0].lbound + 1; @@ -100,6 +100,24 @@ } } + contiguous_data = 1; + + for (n = 0; n < GFC_DESCRIPTOR_RANK(source); n++) + { + if (n == 0) + { + if (source->dim[n].stride != 1) + contiguous_data = 0; + } + else if (contiguous_data) + { + int del = source->dim[n-1].ubound - source->dim[n-1].lbound + 1; + if (source->dim[n].stride != del) + contiguous_data = 0; + } + } + st_printf ("data %d\n", (int)contiguous_data); + if (ret->data == NULL) { rs = 1; @@ -112,6 +130,11 @@ rs *= rex; } ret->offset = 0; + if (contiguous_data) + { + ret->data = source->data; + return; + } ret->data = internal_malloc_size ( rs * sizeof (GFC_REAL_4)); ret->dtype = (source->dtype & ~GFC_DTYPE_RANK_MASK) | rdim; } Was an experiment to see if an improvement to reshape could easily be implemented in the library. It fails completely, of course, because the source is freed! This does show that a flag in the descriptor to say that 'this' cannot be freed would be a boon. Paul
(In reply to comment #6) > Was an experiment to see if an improvement to reshape could easily be > implemented in the library. It fails completely, of course, because the source > is freed! This does show that a flag in the descriptor to say that 'this' > cannot be freed would be a boon. Thus marking as dependent on array-descriptor reform.
Is this still an actual feature request? or have things changed in the past 7 years?