[Bug target/79251] PowerPC vec_insert generates store-hit-load if the element number is variable

cvs-commit at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Jan 22 14:04:38 GMT 2021


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79251

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Xiong Hu Luo <luoxhu@gcc.gnu.org>:

https://gcc.gnu.org/g:b29225597584b697762585e0b707b7cb4b427650

commit r11-6857-gb29225597584b697762585e0b707b7cb4b427650
Author: Xionghu Luo <luoxhu@linux.ibm.com>
Date:   Thu Jan 21 21:01:24 2021 -0600

    rs6000: Support variable insert and Expand vec_insert in expander [PR79251]

    vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
    to be insert, arg2 is the place to insert arg1 to arg0.  Current expander
    generates stxv+stwx+lxv if arg2 is variable instead of constant, which
    causes serious store hit load performance issue on Power.  This patch tries
     1) Build VIEW_CONVERT_EXPR for vec_insert (i, v, n) like v[n&3] = i to
    unify the gimple code, then expander could use vec_set_optab to expand.
     2) Expand the IFN VEC_SET to fast instructions: lvsr+insert+lvsl.
    In this way, "vec_insert (i, v, n)" and "v[n&3] = i" won't be expanded too
    early in gimple stage if arg2 is variable, avoid generating store hit load
    instructions.

    For Power9 V4SI:
            addi 9,1,-16
            rldic 6,6,2,60
            stxv 34,-16(1)
            stwx 5,9,6
            lxv 34,-16(1)
    =>
            rlwinm 6,6,2,28,29
            mtvsrwz 0,5
            lvsr 1,0,6
            lvsl 0,0,6
            xxperm 34,34,33
            xxinsertw 34,0,12
            xxperm 34,34,32

    Though instructions increase from 5 to 7, the performance is improved
    60% in typical cases.
    Tested with V2DI, V2DF V4SI, V4SF, V8HI, V16QI on Power9-LE.

    2021-01-22  Xionghu Luo  <luoxhu@linux.ibm.com>

    gcc/ChangeLog:

            PR target/79251
            PR target/98065

            * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
            Ajdust variable index vec_insert from address dereference to
            ARRAY_REF(VIEW_CONVERT_EXPR) tree expression.
            * config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var):
            New declaration.
            * config/rs6000/rs6000.c (rs6000_expand_vector_set_var): New
function.

    2021-01-22  Xionghu Luo  <luoxhu@linux.ibm.com>

    gcc/testsuite/ChangeLog:

            * gcc.target/powerpc/pr79251.p9.c: New test.
            * gcc.target/powerpc/pr79251-run.c: New test.
            * gcc.target/powerpc/pr79251.h: New header.


More information about the Gcc-bugs mailing list