[Bug target/72804] Poor code gen with -mvsx-timode

Fri Aug 5 16:53:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72804

--- Comment #1 from Peter Bergner <bergner at gcc dot gnu.org> ---
Using the following patch, I'm able to clean up the first simple test case:

Index: rs6000.c
===================================================================
--- rs6000.c    (revision 239144)
+++ rs6000.c    (working copy)
@@ -7747,7 +7747,6 @@ reg_offset_addressing_ok_p (machine_mode
     case V2DFmode:
     case V2DImode:
     case V1TImode:
-    case TImode:
     case TFmode:
     case KFmode:
       /* AltiVec/VSX vector modes.  Only reg+reg addressing was valid until
the

... meaning we end up with just the two ld's insns, similar to the
-mno-vsx-timode compile, but it doesn't help the bigger test case at all.  I
have a somewhat smaller test case that still shows the bad code gen (using the
patch above):

bergner@genoa:~/gcc/BUGS/LRA$ cat t2.i
__int128_t
ptr4 (__int128_t *p)
{
  return ~p[1];
}
bergner@genoa:~/gcc/BUGS/LRA$
/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7
-mno-vsx-timode  -S t2.i 

bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s 
ptr4:
        ld 11,24(3)
        ld 10,16(3)
        not 4,11
        not 3,10
        blr

bergner@genoa:~/gcc/BUGS/LRA$
/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7
-mvsx-timode  -S t2.i 
bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s 
ptr4:
        stdu 1,-352(1)
        addi 9,3,16
        lxvd2x 0,0,9
        addi 9,1,32
        stxvd2x 0,0,9
        ori 2,2,0
        lxvd2x 0,0,9
        addi 9,1,48
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,64
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,80
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,96
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,112
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,128
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,144
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,160
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,176
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,192
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,208
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,224
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,240
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,256
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,272
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,288
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,304
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,320
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        addi 9,1,336
        xxpermdi 0,0,0,2
        xxpermdi 0,0,0,2
        stxvd2x 0,0,9
        ori 2,2,0
        ld 10,0(9)
        ld 11,8(9)
        addi 1,1,352
        not 3,10
        not 4,11
        blr

Lot's of useless code in there! :-(

If I compare the rtl between the two, I see the following for -mno-vsx-timode:

(insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ])
        (reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64}
     (nil))
(insn 6 3 7 2 (set (reg:DI 160)
        (plus:DI (reg/v/f:DI 157 [ pD.2334 ])
            (const_int 16 [0x10]))) t2.i:4 75 {*adddi3}
     (nil))
(insn 7 6 8 2 (set (reg:TI 159)
        (mem:TI (reg:DI 160) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0 S16 A128]))
t2.i:4 568 {*movti_ppc64}
     (nil))
(insn 8 7 9 2 (set (reg:TI 158)
        (not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal}
     (nil))

versus for -mvsx-timode:

(insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ])
        (reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64}
     (nil))
(insn 6 3 7 2 (set (reg:TI 159)
        (mem:TI (plus:DI (reg/v/f:DI 157 [ pD.2334 ])
                (const_int 16 [0x10])) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0
S16 A128])) t2.i:4 955 {*vsx_le_perm_load_ti}
     (nil))
(insn 7 6 8 2 (set (reg:TI 158)
        (not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal}
     (nil))

Looking at the movti_ppc64 pattern, I see we're disabling it with the
VECTOR_MEM_NONE_P (<MODE>mode) test.  If I remove that, we get closer:

bergner@genoa:~/gcc/BUGS/LRA$
/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7
-mvsx-timode  -S t2.i
bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s 
ptr4:
        addi 9,3,16
        lxvd2x 0,0,9
        addi 9,1,-16
        stxvd2x 0,0,9
        ld 10,-16(1)
        ld 11,-8(1)
        not 3,10
        not 4,11
        blr

Still an unnecessary copt to the stack and back.

(insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ])
        (reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64}
     (nil))
(insn 6 3 7 2 (set (reg:TI 159)
        (mem:TI (plus:DI (reg/v/f:DI 157 [ pD.2334 ])
                (const_int 16 [0x10])) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0
S16 A128])) t2.i:4 568 {*movti_ppc64}
     (nil))
(insn 7 6 8 2 (set (reg:TI 158)
        (not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal}
     (nil))

So it seems the -mno-vsx-timode code doesn't allow the load to contain an
address other than a REG, whereas the -mvsx-timode code is allowing the RED+OFF
addressing.