This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][ipa-branch]MatrixTransposing Optimization


This patch implements matrix transposing optimization, which is built on 
matrix flattening code, introduced in

The idea of transposing is organizing the matrix in a different layout 
such that the dimensions are reordered.
This could produce better cache behavior in some cases.

 For example, lets look at the matrix accesses in the following loop:

   for (i=0; i<N; i++)
    for (j=0; j<M; j++)
     access to a[i][j]

   This loop can produce good cache behavior because the elements of
   the inner dimension are accessed sequentially.

  However, if the accesses of the matrix were of the following form:

 for (i=0; i<N; i++)
  for (j=0; j<M; j++)
    access to a[j][i]

In this loop we  iterate the columns and not the rows. 
Therefore, replacing the rows and columns 
would have had an organization with better (cache) locality.
This is called matrix transposing.

 This  example, of course, could be enhanced to multiple dimension 
matrices as well.

Since a program could include all kind of accesses, there is a decision 
mechanism, which is a 
heuristic that tries to determine whether to transpose the matrix or not, 
according to the form of the more dominant accesses.
This decision is transferred to the flattening mechanism, and whether the 
matrix was transposed or not, the matrix 
(if possible) is flattened.
This decision making is based on profiling information and loop 
If profiling information is available,decision making mechanism will be 
operated, otherwise the 
matrix will only be flattened (if possible).

Both optimizations are described in a paper which was presented in GCC 
Summit 2006.
I have attached a few testcases that could be performed with profiling. 
Transoping and flattening decisions are dumped into the appropriate 

The performance measured on Linux PowerPC for matrix flattening on art 
benchmark is 35% improvement, and 9% for equake benchmark. 
If profiling is enabled, matrix transposing optimization can be 
additionally performed. In this case, we get 2x improvement on art. 

O.K. for ipa branch?


2006-10-04  Razya Ladelsky  <>

        * matrix-reorg.c: (struct index_info) : Remove.
        (struct access_site_info):  Add field 
        (struct matrix_info): Add dimension_size_orig.
        Remove hottest_dim, reorg_failed_reason, flatten_p,
        flatten_transpose_matrix_p,  flatten_decl.
        (analyze_transpose, mat_free,  sort_dim_hot_level ): New function.
        (check_transpose_p, analyze_transpose_p) : New static variables.
        (record_access_alloc_site_info, check_allocation_function,
        analyze_matrix_accesses, find_escaping_matrices_and_flatten,
        transform_access_sites, matrix_reorg):  Add support for 

Attachment: transpose_diff
Description: Binary data

Attachment: matrix-1.c
Description: Binary data

Attachment: matrix-2.c
Description: Binary data

Attachment: matrix-3.c
Description: Binary data

Attachment: matrix-4.c
Description: Binary data

Attachment: matrix-5.c
Description: Binary data

Attachment: matrix-6.c
Description: Binary data

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]