Bug 29550 - Optimize -fexternal-blas calls for conjg()
Summary: Optimize -fexternal-blas calls for conjg()
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: ---
Assignee: Thomas Koenig
URL:
Keywords: deferred, missed-optimization
Depends on:
Blocks: 36854 37131
  Show dependency treegraph
 
Reported: 2006-10-22 16:26 UTC by tobias.burnus
Modified: 2018-09-18 20:18 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.3.0
Last reconfirmed: 2006-10-23 21:56:37


Attachments
Patch which has a problem (3.32 KB, patch)
2018-08-26 18:30 UTC, Thomas Koenig
Details | Diff
Patch which shows the principle (3.28 KB, patch)
2018-08-29 20:50 UTC, Thomas Koenig
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description tobias.burnus 2006-10-22 16:26:35 UTC
Often, matrix multiplications contain transpose() or conj(), e.g.
  matmul(transpose(A),B)
or
  matmul(A,conj(transpose(B))
  matmul(A,transpose(conj(B))

The *gemm subroutines of BLAS anticipate this via the TRANSA and TRANSB options:
- 'N' (unchanged)
- 'T' (transpose)
- 'C' (hermitian conjugate / transpose+complex conjugate)

Thus for -fexternal-blas these extra options should be used, if possible.
Comment 1 Francois-Xavier Coudert 2006-10-22 18:45:29 UTC
The current code already recognizes matrix transposition and gives BLAS gemm functions the right TRANSA and TRANSB argument in this case.

Confirmed for CONJG, which we don't currently handle.
Comment 2 Francois-Xavier Coudert 2006-10-22 23:14:53 UTC
I've been thinking a bit about this. It's a common case, and it would probably be worth optimizing it.

We could detect in iresolve.c (gfc_resolve_matmul) that one (or both) of the arguments to MATMUL is a call to CONJ, and then rewrite the code to be MATMUL(A,B,2) instead of MATMUL(A,CONJG(B)), where the 2 is an extra "hidden" integer argument that means here that the second MATMUL arg is to be conjugated during the matrix multiplication.

After that, we could also detect in iresolve.c (gfc_resolve_conjg) when the result of a matmul call is conjugated (C = CONJG(MATMUL(A,B))) and optimize this as well.

Notes for a wannabe coder: that argument a (or b) is the conjg function can be identified by (a->expr_type == EXPR_FUNCTION && a->value.function->isym->generic_id == GFC_ISYM_CONJG). Rewriting the expressions might be a bit subtle, but not so hard; for the extra argument, see how the rrspacing and spacing intrinsics are implemented.
Comment 3 Francois-Xavier Coudert 2006-10-23 21:56:37 UTC
(In reply to comment #2)
> We could detect in iresolve.c (gfc_resolve_matmul) that one (or both) of the
> arguments to MATMUL is a call to CONJ, and then rewrite the code to be
> MATMUL(A,B,2) instead of MATMUL(A,CONJG(B)), where the 2 is an extra "hidden"
> integer argument that means here that the second MATMUL arg is to be conjugated
> during the matrix multiplication.

I've create a patch along these lines, and it was too slow. I'm now writing special functions for every possible case.
Comment 4 Janne Blomqvist 2006-11-04 14:18:23 UTC
Can't you use the same trick that the frontend already uses to detect the matmul(transpose(a),b) thing?
Comment 5 Francois-Xavier Coudert 2006-11-04 18:21:51 UTC
(In reply to comment #4)
> Can't you use the same trick that the frontend already uses to detect the
> matmul(transpose(a),b) thing?

The front-end doesn't "detect" the case you quote. It's only that transposition is done by creating a new array descriptor, pointing to the same data but with inverted strides; matmul being able to operate on all possible strides, it is indeed optimal.

For conjugation, it's a bit more difficult. We really need the front-end to recognize if one argument of matmul is conjugated, and simplify the code itself by 1) removing the conjugation and 2) emitting a different matmul call. The fact that there is no BLAS routine for matmul(conj(a),b) but only matmul(tranpose(conj(a))) makes it a bit more difficult.
Comment 6 Thomas Koenig 2010-09-13 18:53:53 UTC
Sounds like something for front end optimization.

Should we maybe generate the BLAS calls directly, instead of jumping
through the library functions?
Comment 7 Thomas Koenig 2017-05-29 21:12:51 UTC
We now have the function check_conjg_transpose_variable in frontend-passes.c,
which we could use for detecting and counting transpose and conjg.

Hmm... looks like we could do something with that in gfc_conv_intrinsic_funcall.
Comment 8 Thomas Koenig 2017-07-02 16:03:55 UTC
Looks doable.
Comment 9 Thomas Koenig 2018-01-05 11:43:59 UTC
I think I will defer this until gcc-9.
Comment 10 Eric Gallager 2018-03-22 19:13:30 UTC
(In reply to Thomas Koenig from comment #9)
> I think I will defer this until gcc-9.

adding "deferred" keyword then
Comment 11 Eric Gallager 2018-06-22 04:38:19 UTC
(In reply to Thomas Koenig from comment #9)
> I think I will defer this until gcc-9.

It's gcc-9's stage 1 now, so if this is being deferred until now, now's the time...
Comment 12 Thomas Koenig 2018-08-26 18:30:05 UTC
Created attachment 44600 [details]
Patch which has a problem

The attached patch shows how something could be done, but it
has one problem: The handling of -fblas-matmul-limit .

If we change the arguments to the matmul call, it would be
_necessary_ to always call the external function, otherwise
there would be wrong code.
Comment 13 Thomas Koenig 2018-08-29 20:50:53 UTC
Created attachment 44627 [details]
Patch which shows the principle

This shows a way in which this could be done. Bounds checking
and reallocation on assignment are still missing, but should be
doable (like the inline matmul stuff already does).
Comment 14 Thomas Koenig 2018-09-18 20:00:33 UTC
Author: tkoenig
Date: Tue Sep 18 19:59:46 2018
New Revision: 264411

URL: https://gcc.gnu.org/viewcvs?rev=264411&root=gcc&view=rev
Log:
2018-09-18  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/29550
	* gfortran.h (gfc_expr): Add external_blas flag.
	* frontend-passes.c (matrix_case): Add case A2TB2T.
	(optimize_namespace): Handle flag_external_blas by
	calling call_external_blas.
	(get_array_inq_function): Add argument okind. If
	it is nonzero, use it as the kind of argument
	to be used.
	(inline_limit_check): Remove m_case argument, add
	limit argument instead.  Remove assert about m_case.
	Set the limit for inlining from the limit argument.
	(matmul_lhs_realloc): Handle case A2TB2T.
	(inline_matmul_assign): Handle inline limit for other cases with
	two rank-two matrices.  Remove no-op calls to inline_limit_check.
	(call_external_blas): New function.
	* trans-intrinsic.c (gfc_conv_intrinsic_funcall): Do not add
	argument to external BLAS if external_blas is already set.

2018-09-18  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/29550
	* gfortran.dg/inline_matmul_13.f90: Adjust count for
	_gfortran_matmul.
	* gfortran.dg/inline_matmul_16.f90: Likewise.
	* gfortran.dg/promotion_2.f90: Add -fblas-matmul-limit=1.  Scan
	for dgemm instead of dgemm_.  Add call to random_number to make
	standard conforming.
	* gfortran.dg/matmul_blas_1.f90: New test.
	* gfortran.dg/matmul_bounds_14.f: New test.
	* gfortran.dg/matmul_bounds_15.f: New test.
	* gfortran.dg/matmul_bounds_16.f: New test.
	* gfortran.dg/blas_gemm_routines.f: New test / additional file for
	preceding tests.


Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/testsuite/ChangeLog
Comment 15 Thomas Koenig 2018-09-18 20:02:16 UTC
Fixed, closing.
Comment 16 Thomas Koenig 2018-09-18 20:18:41 UTC
Author: tkoenig
Date: Tue Sep 18 20:18:09 2018
New Revision: 264412

URL: https://gcc.gnu.org/viewcvs?rev=264412&root=gcc&view=rev
Log:
2018-09-18  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/29550
	* gfortran.h (gfc_expr): Add external_blas flag.
	* frontend-passes.c (matrix_case): Add case A2TB2T.
	(optimize_namespace): Handle flag_external_blas by
	calling call_external_blas.
	(get_array_inq_function): Add argument okind. If
	it is nonzero, use it as the kind of argument
	to be used.
	(inline_limit_check): Remove m_case argument, add
	limit argument instead.  Remove assert about m_case.
	Set the limit for inlining from the limit argument.
	(matmul_lhs_realloc): Handle case A2TB2T.
	(inline_matmul_assign): Handle inline limit for other cases with
	two rank-two matrices.  Remove no-op calls to inline_limit_check.
	(call_external_blas): New function.
	* trans-intrinsic.c (gfc_conv_intrinsic_funcall): Do not add
	argument to external BLAS if external_blas is already set.

2018-09-18  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/29550
	* gfortran.dg/inline_matmul_13.f90: Adjust count for
	_gfortran_matmul.
	* gfortran.dg/inline_matmul_16.f90: Likewise.
	* gfortran.dg/promotion_2.f90: Add -fblas-matmul-limit=1.  Scan
	for dgemm instead of dgemm_.  Add call to random_number to make
	standard conforming.
	* gfortran.dg/matmul_blas_1.f90: New test.
	* gfortran.dg/matmul_bounds_14.f: New test.
	* gfortran.dg/matmul_bounds_15.f: New test.
	* gfortran.dg/matmul_bounds_16.f: New test.
	* gfortran.dg/blas_gemm_routines.f: New test / additional file for
	preceding tests.

Added:
    trunk/gcc/testsuite/gfortran.dg/blas_gemm_routines.f
    trunk/gcc/testsuite/gfortran.dg/matmul_blas_1.f
    trunk/gcc/testsuite/gfortran.dg/matmul_bounds_14.f
    trunk/gcc/testsuite/gfortran.dg/matmul_bounds_15.f
    trunk/gcc/testsuite/gfortran.dg/matmul_bounds_16.f
Modified:
    trunk/gcc/fortran/frontend-passes.c
    trunk/gcc/fortran/gfortran.h
    trunk/gcc/fortran/trans-intrinsic.c
    trunk/gcc/testsuite/gfortran.dg/inline_matmul_13.f90
    trunk/gcc/testsuite/gfortran.dg/inline_matmul_16.f90
    trunk/gcc/testsuite/gfortran.dg/promotion_2.f90