This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- From: Jeff Law <law at redhat dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>
- Cc: Mike Stump <mikestump at comcast dot net>, Bin Cheng <bin dot cheng at arm dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 31 Oct 2014 13:20:04 -0600
- Subject: Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- Authentication-results: sourceware.org; auth=none
- References: <000001cfdc90$1d95c670$58c15350$ at arm dot com> <54384C12 dot 6060401 at redhat dot com> <80EFD85E-49B5-4F71-9401-40F8FA85BD65 at comcast dot net> <CAHFci2__KJST3yCzz4sKge9pN37CZwUZ8BvFL4++SuCJAsAtxw at mail dot gmail dot com> <545294D2 dot 2020502 at redhat dot com> <CAHFci2_PVQ3aaGvjLDG61=WyHMf0NQRppnNWdQomiSGc-Qnm9w at mail dot gmail dot com>
On 10/30/14 23:36, Bin.Cheng wrote:
I kicked around using reorg to do stuff like this in the past
(combination of unrelated insns). But ultimately I think the way to go
is have it happen when insns are on the ready list in the scheduler.
#2 would be the best solution for the case I was pondering, but I don't
think solving that case is terribly important given the processors for which
it was profitable haven't been made for a very long time.
I am thinking if it's possible to introduce a pattern-directed fusion.
Something like define_fusion, and adapting haifa-scheduler for it. I
agree there are two kinds (relevant and irrelevant) fusion types, and
it's not trivial to support both in one scheme. Do you have a
specific example that I can have a try?
For fusion of related insns like the load/store pairing, I think your
approach should work pretty well.
As to specific examples of independent insn fusion, the ones I'm most
familiar with are from the older PA chips. I wouldn't recommend
building something for those processors simply becuase they're so dated
that I don't believe anyone uses them anymore.
However, if you have cases (arm shift insns?), building for those is
fine. If you just want examples, the ones we tried to exploit on the PA
were fmpyadd/fmpysub, movb,tr and addb,tr
fmpyadd/fmpysub combined independent floating point multiply with an FP
add or sub insn. There's many conditions, but if you want a simple
example to play with, the attached file with -O2 -mschedule=7100LC ought
to generate one of these insns via pa_reorg.
addb,tr can combine an unconditional branch with a reg+reg or reg+imm5
addition operation. movb,tr combines an unconditional branch with a
reg-reg copy or load of a 5 bit immediate value into a general register.
I don't happen to have examples handy, but compiling integer code with
-O2 -mschedule=7100LC ought to trigger some.
The code in pa_reorg is O(n^2) or worse. It predates the hooks to allow
the target to reorder the ready queue. It would probably be relatively
easy to have that code run via those hooks and just look at the ready
queue. So it'd still be O(n^2), but the N would be *much* smaller. But
again, I don't think anyone uses PA7xxxx processors and hasn't for over
a decade, so it hasn't seemed worth the effort to change.
*> \brief \b CLARSCL2 performs reciprocal diagonal scaling on a vector.
* =========== DOCUMENTATION ===========
* Online html documentation available at
*> Download CLARSCL2 + dependencies
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/clarscl2.f">
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/clarscl2.f">
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/clarscl2.f">
* SUBROUTINE CLARSCL2 ( M, N, D, X, LDX )
* .. Scalar Arguments ..
* INTEGER M, N, LDX
* .. Array Arguments ..
* COMPLEX X( LDX, * )
* REAL D( * )
*> \par Purpose:
*> CLARSCL2 performs a reciprocal diagonal scaling on an vector:
*> x <-- inv(D) * x
*> where the REAL diagonal matrix D is stored as a vector.
*> Eventually to be replaced by BLAS_cge_diag_scale in the new BLAS
*> \param[in] M
*> M is INTEGER
*> The number of rows of D and X. M >= 0.
*> \param[in] N
*> N is INTEGER
*> The number of columns of D and X. N >= 0.
*> \param[in] D
*> D is REAL array, length M
*> Diagonal matrix D, stored as a vector of length M.
*> \param[in,out] X
*> X is COMPLEX array, dimension (LDX,N)
*> On entry, the vector X to be scaled by D.
*> On exit, the scaled vector.
*> \param[in] LDX
*> LDX is INTEGER
*> The leading dimension of the vector X. LDX >= 0.
*> \author Univ. of Tennessee
*> \author Univ. of California Berkeley
*> \author Univ. of Colorado Denver
*> \author NAG Ltd.
*> \date September 2012
*> \ingroup complexOTHERcomputational
SUBROUTINE CLARSCL2 ( M, N, D, X, LDX )
* -- LAPACK computational routine (version 3.4.2) --
* -- LAPACK is a software package provided by Univ. of Tennessee, --
* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
* September 2012
* .. Scalar Arguments ..
INTEGER M, N, LDX
* .. Array Arguments ..
COMPLEX X( LDX, * )
REAL D( * )
* .. Local Scalars ..
INTEGER I, J
* .. Executable Statements ..
DO J = 1, N
DO I = 1, M
X( I, J ) = X( I, J ) / D( I )