[Bug target/72804] Poor code gen with -mvsx-timode
bergner at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Aug 5 16:53:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72804
--- Comment #1 from Peter Bergner <bergner at gcc dot gnu.org> ---
Using the following patch, I'm able to clean up the first simple test case:
Index: rs6000.c
===================================================================
--- rs6000.c (revision 239144)
+++ rs6000.c (working copy)
@@ -7747,7 +7747,6 @@ reg_offset_addressing_ok_p (machine_mode
case V2DFmode:
case V2DImode:
case V1TImode:
- case TImode:
case TFmode:
case KFmode:
/* AltiVec/VSX vector modes. Only reg+reg addressing was valid until
the
... meaning we end up with just the two ld's insns, similar to the
-mno-vsx-timode compile, but it doesn't help the bigger test case at all. I
have a somewhat smaller test case that still shows the bad code gen (using the
patch above):
bergner@genoa:~/gcc/BUGS/LRA$ cat t2.i
__int128_t
ptr4 (__int128_t *p)
{
return ~p[1];
}
bergner@genoa:~/gcc/BUGS/LRA$
/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7
-mno-vsx-timode -S t2.i
bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s
ptr4:
ld 11,24(3)
ld 10,16(3)
not 4,11
not 3,10
blr
bergner@genoa:~/gcc/BUGS/LRA$
/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7
-mvsx-timode -S t2.i
bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s
ptr4:
stdu 1,-352(1)
addi 9,3,16
lxvd2x 0,0,9
addi 9,1,32
stxvd2x 0,0,9
ori 2,2,0
lxvd2x 0,0,9
addi 9,1,48
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,64
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,80
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,96
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,112
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,128
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,144
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,160
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,176
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,192
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,208
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,224
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,240
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,256
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,272
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,288
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,304
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,320
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
addi 9,1,336
xxpermdi 0,0,0,2
xxpermdi 0,0,0,2
stxvd2x 0,0,9
ori 2,2,0
ld 10,0(9)
ld 11,8(9)
addi 1,1,352
not 3,10
not 4,11
blr
Lot's of useless code in there! :-(
If I compare the rtl between the two, I see the following for -mno-vsx-timode:
(insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ])
(reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64}
(nil))
(insn 6 3 7 2 (set (reg:DI 160)
(plus:DI (reg/v/f:DI 157 [ pD.2334 ])
(const_int 16 [0x10]))) t2.i:4 75 {*adddi3}
(nil))
(insn 7 6 8 2 (set (reg:TI 159)
(mem:TI (reg:DI 160) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0 S16 A128]))
t2.i:4 568 {*movti_ppc64}
(nil))
(insn 8 7 9 2 (set (reg:TI 158)
(not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal}
(nil))
versus for -mvsx-timode:
(insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ])
(reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64}
(nil))
(insn 6 3 7 2 (set (reg:TI 159)
(mem:TI (plus:DI (reg/v/f:DI 157 [ pD.2334 ])
(const_int 16 [0x10])) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0
S16 A128])) t2.i:4 955 {*vsx_le_perm_load_ti}
(nil))
(insn 7 6 8 2 (set (reg:TI 158)
(not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal}
(nil))
Looking at the movti_ppc64 pattern, I see we're disabling it with the
VECTOR_MEM_NONE_P (<MODE>mode) test. If I remove that, we get closer:
bergner@genoa:~/gcc/BUGS/LRA$
/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7
-mvsx-timode -S t2.i
bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s
ptr4:
addi 9,3,16
lxvd2x 0,0,9
addi 9,1,-16
stxvd2x 0,0,9
ld 10,-16(1)
ld 11,-8(1)
not 3,10
not 4,11
blr
Still an unnecessary copt to the stack and back.
(insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ])
(reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64}
(nil))
(insn 6 3 7 2 (set (reg:TI 159)
(mem:TI (plus:DI (reg/v/f:DI 157 [ pD.2334 ])
(const_int 16 [0x10])) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0
S16 A128])) t2.i:4 568 {*movti_ppc64}
(nil))
(insn 7 6 8 2 (set (reg:TI 158)
(not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal}
(nil))
So it seems the -mno-vsx-timode code doesn't allow the load to contain an
address other than a REG, whereas the -mvsx-timode code is allowing the RED+OFF
addressing.
More information about the Gcc-bugs
mailing list