[Bug rtl-optimization/71309] Copying fields within a struct followed by use results in load hit store

Tue Aug 4 03:12:01 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71309

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Xiong Hu Luo <luoxhu@gcc.gnu.org>:

https://gcc.gnu.org/g:265d817b1eb4644c7a9613ad6920315d98e2e0a4

commit r11-2526-g265d817b1eb4644c7a9613ad6920315d98e2e0a4
Author: Xionghu Luo <luoxhu@linux.ibm.com>
Date:   Mon Aug 3 22:09:15 2020 -0500

    dse: Remove partial load after full store for high part access[PR71309]

    v5 update as comments:
    1. Move const_rhs out of loop;
    2. Iterate from int size for read_mode.

    This patch could optimize(works for char/short/int/void*):

    6: r119:TI=[r118:DI+0x10]
    7: [r118:DI]=r119:TI
    8: r121:DI=[r118:DI+0x8]

    =>

    6: r119:TI=[r118:DI+0x10]
    16: r122:DI=r119:TI#8

    Final ASM will be as below without partial load after full store(stxv+ld):
      ld 10,16(3)
      mr 9,3
      ld 3,24(3)
      std 10,0(9)
      std 3,8(9)
      blr

    It could achieve ~25% performance improvement for typical cases on
    Power9.  Bootstrap and regression tested on Power9-LE.

    For AArch64, one ldr is replaced by mov with this patch:

    ldp     x2, x3, [x0, 16]
    stp     x2, x3, [x0]
    ldr     x0, [x0, 8]

    =>

    mov     x1, x0
    ldp     x2, x0, [x0, 16]
    stp     x2, x0, [x1]

    gcc/ChangeLog:

    2020-08-04  Xionghu Luo  <luoxhu@linux.ibm.com>

            PR rtl-optimization/71309
            * dse.c (find_shift_sequence): Use subreg of shifted from high part
            register to avoid loading from address.

    gcc/testsuite/ChangeLog:

    2020-08-04  Xionghu Luo  <luoxhu@linux.ibm.com>

            PR rtl-optimization/71309
            * gcc.target/powerpc/pr71309.c: New test.