This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

From: Jeff Law <law at redhat dot com>
To: "Bin.Cheng" <amker dot cheng at gmail dot com>
Cc: Mike Stump <mikestump at comcast dot net>, Bin Cheng <bin dot cheng at arm dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
Date: Fri, 31 Oct 2014 13:20:04 -0600
Subject: Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
Authentication-results: sourceware.org; auth=none
References: <000001cfdc90$1d95c670$58c15350$ at arm dot com> <54384C12 dot 6060401 at redhat dot com> <80EFD85E-49B5-4F71-9401-40F8FA85BD65 at comcast dot net> <CAHFci2__KJST3yCzz4sKge9pN37CZwUZ8BvFL4++SuCJAsAtxw at mail dot gmail dot com> <545294D2 dot 2020502 at redhat dot com> <CAHFci2_PVQ3aaGvjLDG61=WyHMf0NQRppnNWdQomiSGc-Qnm9w at mail dot gmail dot com>

On 10/30/14 23:36, Bin.Cheng wrote:

#2 would be the best solution for the case I was pondering, but I don't
think solving that case is terribly important given the processors for which
it was profitable haven't been made for a very long time.

I am thinking if it's possible to introduce a pattern-directed fusion.
Something like define_fusion, and adapting haifa-scheduler for it.  I
agree there are two kinds (relevant and irrelevant) fusion types, and
it's not trivial to support both in one scheme.  Do you have a
specific example that I can have a try?

I kicked around using reorg to do stuff like this in the past(combination of unrelated insns). But ultimately I think the way to gois have it happen when insns are on the ready list in the scheduler.

For fusion of related insns like the load/store pairing, I think yourapproach should work pretty well.

As to specific examples of independent insn fusion, the ones I'm mostfamiliar with are from the older PA chips. I wouldn't recommendbuilding something for those processors simply becuase they're so datedthat I don't believe anyone uses them anymore.

However, if you have cases (arm shift insns?), building for those isfine. If you just want examples, the ones we tried to exploit on the PAwere fmpyadd/fmpysub, movb,tr and addb,tr

fmpyadd/fmpysub combined independent floating point multiply with an FPadd or sub insn. There's many conditions, but if you want a simpleexample to play with, the attached file with -O2 -mschedule=7100LC oughtto generate one of these insns via pa_reorg.

addb,tr can combine an unconditional branch with a reg+reg or reg+imm5addition operation. movb,tr combines an unconditional branch with areg-reg copy or load of a 5 bit immediate value into a general register.I don't happen to have examples handy, but compiling integer code with-O2 -mschedule=7100LC ought to trigger some.

The code in pa_reorg is O(n^2) or worse. It predates the hooks to allowthe target to reorder the ready queue. It would probably be relativelyeasy to have that code run via those hooks and just look at the readyqueue. So it'd still be O(n^2), but the N would be *much* smaller. Butagain, I don't think anyone uses PA7xxxx processors and hasn't for overa decade, so it hasn't seemed worth the effort to change.


Cheers,
Jeff

*> \brief \b CLARSCL2 performs reciprocal diagonal scaling on a vector.
*
*  =========== DOCUMENTATION ===========
*
* Online html documentation available at 
*            http://www.netlib.org/lapack/explore-html/ 
*
*> \htmlonly
*> Download CLARSCL2 + dependencies 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [TGZ]</a> 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [ZIP]</a> 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [TXT]</a>
*> \endhtmlonly 
*
*  Definition:
*  ===========
*
*       SUBROUTINE CLARSCL2 ( M, N, D, X, LDX )
* 
*       .. Scalar Arguments ..
*       INTEGER            M, N, LDX
*       ..
*       .. Array Arguments ..
*       COMPLEX            X( LDX, * )
*       REAL               D( * )
*       ..
*  
*
*> \par Purpose:
*  =============
*>
*> \verbatim
*>
*> CLARSCL2 performs a reciprocal diagonal scaling on an vector:
*>   x <-- inv(D) * x
*> where the REAL diagonal matrix D is stored as a vector.
*>
*> Eventually to be replaced by BLAS_cge_diag_scale in the new BLAS
*> standard.
*> \endverbatim
*
*  Arguments:
*  ==========
*
*> \param[in] M
*> \verbatim
*>          M is INTEGER
*>     The number of rows of D and X. M >= 0.
*> \endverbatim
*>
*> \param[in] N
*> \verbatim
*>          N is INTEGER
*>     The number of columns of D and X. N >= 0.
*> \endverbatim
*>
*> \param[in] D
*> \verbatim
*>          D is REAL array, length M
*>     Diagonal matrix D, stored as a vector of length M.
*> \endverbatim
*>
*> \param[in,out] X
*> \verbatim
*>          X is COMPLEX array, dimension (LDX,N)
*>     On entry, the vector X to be scaled by D.
*>     On exit, the scaled vector.
*> \endverbatim
*>
*> \param[in] LDX
*> \verbatim
*>          LDX is INTEGER
*>     The leading dimension of the vector X. LDX >= 0.
*> \endverbatim
*
*  Authors:
*  ========
*
*> \author Univ. of Tennessee 
*> \author Univ. of California Berkeley 
*> \author Univ. of Colorado Denver 
*> \author NAG Ltd. 
*
*> \date September 2012
*
*> \ingroup complexOTHERcomputational
*
*  =====================================================================
      SUBROUTINE CLARSCL2 ( M, N, D, X, LDX )
*
*  -- LAPACK computational routine (version 3.4.2) --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*     September 2012
*
*     .. Scalar Arguments ..
      INTEGER            M, N, LDX
*     ..
*     .. Array Arguments ..
      COMPLEX            X( LDX, * )
      REAL               D( * )
*     ..
*
*  =====================================================================
*
*     .. Local Scalars ..
      INTEGER            I, J
*     ..
*     .. Executable Statements ..
*
      DO J = 1, N
         DO I = 1, M
            X( I, J ) = X( I, J ) / D( I )
         END DO
      END DO

      RETURN
      END

References:
- Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
  - From: Jeff Law
- Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
  - From: Mike Stump
- Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
  - From: Bin.Cheng
- Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
  - From: Jeff Law
- Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
  - From: Bin.Cheng

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]