[Bug target/105556] RA assigns an MMA vector input operand to vs0-vs31 causing an MMA accumulator to be spilled

cvs-commit at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Wed May 18 02:33:59 GMT 2022


--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Peter Bergner <bergner@gcc.gnu.org>:


commit r13-579-gc6e36f05fbb081abb068958d8900ad34b303a70b
Author: Peter Bergner <bergner@linux.ibm.com>
Date:   Tue May 17 21:09:29 2022 -0500

    rs6000: Prefer assigning the MMA vector operands to altivec registers

    When optimizing the DGEMM kernel in OpenBLAS to use MMA, the MMA code
    uses all 8 accumulators, which overlap all vs0-vs31 vector registers.
    Current trunk assigns one of the normal vector inputs to one of the MMA
    instructions, which forces us to spill one of the accumulators to memory,
    leading to poor performance.  The solution here is to replace the "wa"
    constraints for the vector input operands in the MMA instruction patterns
    with "v,?wa" so that we prefer using the altivec registers vs32-vs63
    over the vs0-vs31 registers.

    2022-05-17  Peter Bergner  <bergner@linux.ibm.com>
                Segher Boessenkool  <segher@kernel.crashing.org>

            PR target/105556
            * config/rs6000/mma.md (mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
            mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
            mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
            mma_<vvi4i4i4>, mma_<avvi4i4i4>): Replace "wa" constraints with
            Update other operands accordingly.

More information about the Gcc-bugs mailing list