This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass


On 10/30/14 23:36, Bin.Cheng wrote:
#2 would be the best solution for the case I was pondering, but I don't
think solving that case is terribly important given the processors for which
it was profitable haven't been made for a very long time.
I am thinking if it's possible to introduce a pattern-directed fusion.
Something like define_fusion, and adapting haifa-scheduler for it.  I
agree there are two kinds (relevant and irrelevant) fusion types, and
it's not trivial to support both in one scheme.  Do you have a
specific example that I can have a try?
I kicked around using reorg to do stuff like this in the past (combination of unrelated insns). But ultimately I think the way to go is have it happen when insns are on the ready list in the scheduler.

For fusion of related insns like the load/store pairing, I think your approach should work pretty well.


As to specific examples of independent insn fusion, the ones I'm most familiar with are from the older PA chips. I wouldn't recommend building something for those processors simply becuase they're so dated that I don't believe anyone uses them anymore.

However, if you have cases (arm shift insns?), building for those is fine. If you just want examples, the ones we tried to exploit on the PA were fmpyadd/fmpysub, movb,tr and addb,tr

fmpyadd/fmpysub combined independent floating point multiply with an FP add or sub insn. There's many conditions, but if you want a simple example to play with, the attached file with -O2 -mschedule=7100LC ought to generate one of these insns via pa_reorg.

addb,tr can combine an unconditional branch with a reg+reg or reg+imm5 addition operation. movb,tr combines an unconditional branch with a reg-reg copy or load of a 5 bit immediate value into a general register. I don't happen to have examples handy, but compiling integer code with -O2 -mschedule=7100LC ought to trigger some.

The code in pa_reorg is O(n^2) or worse. It predates the hooks to allow the target to reorder the ready queue. It would probably be relatively easy to have that code run via those hooks and just look at the ready queue. So it'd still be O(n^2), but the N would be *much* smaller. But again, I don't think anyone uses PA7xxxx processors and hasn't for over a decade, so it hasn't seemed worth the effort to change.

Cheers,
Jeff
*> \brief \b CLARSCL2 performs reciprocal diagonal scaling on a vector.
*
*  =========== DOCUMENTATION ===========
*
* Online html documentation available at 
*            http://www.netlib.org/lapack/explore-html/ 
*
*> \htmlonly
*> Download CLARSCL2 + dependencies 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [TGZ]</a> 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [ZIP]</a> 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [TXT]</a>
*> \endhtmlonly 
*
*  Definition:
*  ===========
*
*       SUBROUTINE CLARSCL2 ( M, N, D, X, LDX )
* 
*       .. Scalar Arguments ..
*       INTEGER            M, N, LDX
*       ..
*       .. Array Arguments ..
*       COMPLEX            X( LDX, * )
*       REAL               D( * )
*       ..
*  
*
*> \par Purpose:
*  =============
*>
*> \verbatim
*>
*> CLARSCL2 performs a reciprocal diagonal scaling on an vector:
*>   x <-- inv(D) * x
*> where the REAL diagonal matrix D is stored as a vector.
*>
*> Eventually to be replaced by BLAS_cge_diag_scale in the new BLAS
*> standard.
*> \endverbatim
*
*  Arguments:
*  ==========
*
*> \param[in] M
*> \verbatim
*>          M is INTEGER
*>     The number of rows of D and X. M >= 0.
*> \endverbatim
*>
*> \param[in] N
*> \verbatim
*>          N is INTEGER
*>     The number of columns of D and X. N >= 0.
*> \endverbatim
*>
*> \param[in] D
*> \verbatim
*>          D is REAL array, length M
*>     Diagonal matrix D, stored as a vector of length M.
*> \endverbatim
*>
*> \param[in,out] X
*> \verbatim
*>          X is COMPLEX array, dimension (LDX,N)
*>     On entry, the vector X to be scaled by D.
*>     On exit, the scaled vector.
*> \endverbatim
*>
*> \param[in] LDX
*> \verbatim
*>          LDX is INTEGER
*>     The leading dimension of the vector X. LDX >= 0.
*> \endverbatim
*
*  Authors:
*  ========
*
*> \author Univ. of Tennessee 
*> \author Univ. of California Berkeley 
*> \author Univ. of Colorado Denver 
*> \author NAG Ltd. 
*
*> \date September 2012
*
*> \ingroup complexOTHERcomputational
*
*  =====================================================================
      SUBROUTINE CLARSCL2 ( M, N, D, X, LDX )
*
*  -- LAPACK computational routine (version 3.4.2) --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*     September 2012
*
*     .. Scalar Arguments ..
      INTEGER            M, N, LDX
*     ..
*     .. Array Arguments ..
      COMPLEX            X( LDX, * )
      REAL               D( * )
*     ..
*
*  =====================================================================
*
*     .. Local Scalars ..
      INTEGER            I, J
*     ..
*     .. Executable Statements ..
*
      DO J = 1, N
         DO I = 1, M
            X( I, J ) = X( I, J ) / D( I )
         END DO
      END DO

      RETURN
      END


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]