GCC Gathering 2011: Notes from the Fortran BOF

At the GCC Gathering 2011 in London, there was also a small gathering of the three gfortran developers in a small circle - and some discussion with other GCC developers. The notes are a bit shorter than hoped for as two gfortaners already had to leave early afternoon on Saturday.

Contents

GCC Gathering 2011: Notes from the Fortran BOF

Debugging issues

While there was no time to discuss issues during the GDB session, the following debug issues where at least mentioned at some point:

Module debugging: Mostly worked, but with renaming it failed. It turned out to be a now fixed gdb bug. For completeness, in gdb one can print use-associated module variables simply by "p var_name", but one can also directly access non-associated module variables via "p module_name::var_name". (Cf. also now closed PR 24526.)
Character(kind=4): gdb now supports CHARACTER(kind=4) debugging. However there are still some issues, cf. PR49565
OOP debugging: For type bound procedures and VTABLEs, we still have to add the required support, cf. PR49475
Fortran Arrays: (not discussed) The standard FSF gdb still lacks support for arrays with array descriptor ("Fortran 90 arrays") and also C99's variable-length arrays (VLA). A mostly working implementation exists in Red Hat/Fedora's gdb (and, thus, in versions based on it such as SUSE's gdb), but the current version is not yet mergeable into the FSF gdb - and seemingly no one is currently working on making it mergeable. The patches are in the VLA and Fedora branches of Archer (git view)

Scalarizer

gfortran's scalarizer is rather complex; every time one tries to modify it, one has to try to understand it again. For performance and maintenance reasons, it makes sense to move it out of trans*.c. The plan was to move to a middle-end scalarizer, which also handles all the whole-array operations.

See also the discussion about a new FE scalarizer

Initial middle end patch, enabling assignment including ALL and ANY and early front-end patch
TODO: Move the remaining bits to the new infrastructure - and then remove the front-end scalarizer. One needs to ensure that also all optimizations are available in the ME version; I think it currently misses the optimization to switch, if possible, to memmove/memcpy for assignments.
The plan was that as for now, all the temporary generation is happening in the FE - the middle-end scalarizer/whole-array support will assume that temporaries are already generated, if needed
The idea is that one generates the scalarization loop (__builtin_array_idx) and the scalarization expression (__builtin_array_delta), the storage in the scalarization (__builtin_array_ridx) and the glue of all (__builtin_array_store), similarly to what it is done already in the FE. (See C example mea-*.c plus some additional required patches).

For the matrix multiplication W = matmul (U, V) of matrix U(m,n), V(n,m) and result matrix W(n,n), one has to calculate w(i,j) = sum_{k=1}^m v(i,k)*u(k,j). [I am sure that I somewhere swapped the index order]. The C version of the whole-array implementation looks then as follows (taken from the patch):

matmul(float *w, float *u, float *v, int n, int m)
{
  float (*U)[n][m] = (float (*)[n][m])u;
  float (*V)[m][n] = (float (*)[m][n])v;
  float (*W)[n][n] = (float (*)[n][n])w;
  int i, j, k, l;
  float Ukj, Vil, VUij;

  Ukj = __builtin_array_idx (__builtin_array_select (U, m, 1, n, m), k, j);
  Vil = __builtin_array_idx (__builtin_array_select (V, n, 1, m, n), i, l);
  VUij = __builtin_array_delta (Vil * Ukj, m, k, l);

  __builtin_array_store (W, __builtin_array_ridx (VUij, i, n, j, n), n, 1, n, n);
}

* __builtin_array_select: Array selector with the arguments: (1) variable, (2) extend of first dimension, (3) stride of first dimension, (4) extend of second dimension, ...

Example: __builtin_array_select (U, m, 1, n, m) is a two-dimensional array consisting of the actual data "U", whose first dimension consists of "m" elements and the second one of "n". The stride for either dimension is "1".

* __builtin_array_idx: Takes an array selector as first argument, followed by rank arguments of integral type which denote the index variables.

Example: __builtin_array_idx (__builtin_array_select (V, n, 1, m, n), i, l); generates a the array element "U(i,l)", where "i" and "j" are indexes used in the scalarization; U itself is a two-dimensional array of size (n*m).

* __builtin_array_delta: The first argument is the actual expression, followed by the bounds of the expression and the variable.

Example: __builtin_array_delta (Vil * Ukj, m, k, l) matches a loop of for (k = l = 0; k < m; l++, k++) with the loop expression V(i,l)*U(k,j) such that it implements the matrix multiplication w(i,j) = sum_{k=1}^m u(i,k)*v(k,j).

* __builtin_array_ridx (VUij, i, n, j, n): Essentially the same as __builtin_array_select, just for storing the scalarization result for further use, i.e. one generates out of the scalars "w(i,j)" again an array. The first argument "VUij" is the result of the scalar expression generated by __builtin_array_delta; the second argument is an index variable (of integral type), the third argument is the extend; the fourth argument is the index variable of the second dimension and the fifth the extend of the second dimension.

* __builtin_array_store: Wrap up. Store the result in the variable (first argument), using the array-expression (second argument); the array has the extend/strides given as 3rd/4th and 5th/6th etc. argument.

GCC Gathering 2011: Notes from the Fortran BOF

Debugging issues

Scalarizer

Other topics