This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
rs6000.md/altivec.md problem in setting of vector registers
- From: Dorit Naishlos <DORIT at il dot ibm dot com>
- To: gcc at gcc dot gnu dot org
- Date: Wed, 3 Mar 2004 18:56:18 +0200
- Subject: rs6000.md/altivec.md problem in setting of vector registers
Hi,
I think there is a problem in the way we model the setting of subregs (insn
"insvsi") in rs6000, or rather - a problem in the way reload phase handles
these patterns when they are generated to express an initialization of a
vector register. Consider the following example:
typedef int __attribute__((mode(V8HI))) v8hi;
#define N 1024
void foo5 (short n){
short a[N];
v8hi *pa = (v8hi *)a;
>> v8hi va = {n,n,n,n,n,n,n,n};
int i;
for (i=0; i<N/8; i++){
pa[i] = va;
}
bar1 (pa[2]);
}
In the RTL, this is expressed as a sequence of 8 insns that copy 'n' (which
resides in a scalar register) into each of 8 subregs in the temporary 'va'.
This takes place before the loop, and inside the loop we have a vector
store of 'va'. Later on, this initialization sequence of subregs will have
to be spilled - the 8 scalar registers (which hold the value of 'n') will
be spilled to memory, and a vector load will combine the 8 values into one
vector register.
This is indeed what happens when I compile the above program on i386 with
-msse2; the resulting code is efficient - with the 8 scalar stores and one
vector load before the loop, and only a vector store inside the loop.
However, compiling for powerpc with -maltivec, instead of spilling the 8
scalar registers before the loop, the register allocator decides to spill
the vector store insn which is inside the loop. As a result, we get spill
code of invariant data inside the loop. Here is the resulting assembly (the
spill code is marked with '>>'):
L2:
addi r7,r1,2112
slwi r3,r2,4
>> stw r9,0(r7)
addi r2,r2,1
>> stw r10,4(r7)
>> stw r11,8(r7)
>> stw r12,12(r7)
>> lvx v0,0,r7
stvx v0,r3,r8
bdnz L2
Below is some more detail; My question is - how to fix the machine
description such that reload phase will spill the subreg initialization
insns (outside the loop) as it does for i386 ?
thanks,
dorit
More detail:
====================
Actually, if you try to compile the above program with -maltivec, you'll
get ICE'd during reload with the following error:
simd-inv.c: In function `foo5':
simd-inv.c:29: error: unrecognizable insn:
(insn 88 87 89 0 (set (mem:V8HI (reg:SI 9 r9) [0 S16 A8])
(reg:V8HI 2 r2 [126])) -1 (nil)
(nil))
simd-inv.c:29: internal compiler error: in extract_insn, at recog.c:2083
This is because of a restriction I added a month ago to the following
define_insn in altivec.md (last 2 lines):
(define_insn "*movv8hi_internal1"
[(set (match_operand:V8HI 0 "nonimmediate_operand" "=m,v,v,o,r,r,v")
(match_operand:V8HI 1 "input_operand" "v,m,v,r,o,r,W"))]
"TARGET_ALTIVEC
>> && (altivec_register_operand (operands[0], V8HImode)
>> || altivec_register_operand (operands[1], V8HImode))"
If I remove the above 2 lines, compilation succeeds; However... we get the
same inefficiencies that brought us to add these 2 lines in the first place
(loop invariants don't get pulled out -
http://gcc.gnu.org/ml/gcc-patches/2004-01/msg02884.html);
(another question is how to model "*movv8hi_internal1" - we want to keep
the new restriction for the case of constants, however, looks like it's too
strict a restriction for non-constant inputs).
Here is what happens during compilation of the above example program when I
remove the 2 restriction lines from define_insn:
Up to phase .c.24.lreg,
=======================
we have a sequence of 8 insns in the loop prolog that initialize the vector
temporary 'va'; each of these insns looks something like:
(insn:HI ... (set (zero_extract:SI (subreg:SI (reg/v:V8HI 120 [ va ]) 0)
(const_int 16 [0x10])
(const_int 0 [0x0]))
(reg/v:SI 118 [ n ])) 106 {insvsi} (insn_list 11 (insn_list 3 (nil)))
(nil))
Inside the loop we have the store of 'va' into memory:
LOOP:
(insn:HI 25 24 27 1 (set (mem:V8HI (plus:SI (reg:SI 123)
(reg/v/f:SI 119 [ pa ])) [4 S16 A128])
(reg/v:V8HI 120 [ va ])) 554 {altivec_stvx_8hi} (insn_list 24 (nil))
(expr_list:REG_DEAD (reg:SI 123)
(nil)))
During phase .c.25.greg,
========================
the compiler does not report any spills for the initialization insns,
however it reports a spill for the vector store insn that's in the loop:
Reloads for insn # 25
Reload 0: GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0), optional,
can't combine, secondary_reload_p
Reload 1: reload_out (V8HI) = (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
(reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
reload_out_reg: (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
(reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
secondary_out_reload = 0
Reload 2: reload_in (SI) = (plus:SI (reg/f:SI 1 r1)
(const_int 2112 [0x840]))
BASE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1), can't combine
reload_in_reg: (plus:SI (reg/f:SI 1 r1)
(const_int 2112 [0x840]))
reload_reg_rtx: (reg:SI 7 r7)
Reload 3: reload_in (V8HI) = (reg/v:V8HI 9 r9 [orig:120 va ] [120])
ALTIVEC_REGS, RELOAD_FOR_INPUT (opnum = 1)
reload_in_reg: (reg/v:V8HI 9 r9 [orig:120 va ] [120])
reload_reg_rtx: (reg:V8HI 77 v0)
As a result, we now have a spill in the loop:
==============================================
LOOP:
[1] (insn 62 61 63 1 (set (mem:V8HI (reg:SI 7 r7) [0 S16 A8])
(reg/v:V8HI 9 r9 [orig:120 va ] [120])) 558 {*movv8hi_internal1}
(nil) (nil))
[2] (insn 63 62 25 1 (set (reg:V8HI 77 v0)
(mem:V8HI (reg:SI 7 r7) [0 S16 A8])) 550 {altivec_lvx_8hi} (nil)
(nil))
[3] (insn:HI 25 63 27 1 (set (mem:V8HI (plus:SI (reg:SI 0 r0 [123])
(reg/v/f:SI 8 r8 [orig:119 pa ] [119])) [4 S16 A128])
(reg:V8HI 77 v0)) 554 {altivec_stvx_8hi} (insn_list 24 (nil))
(nil))
insns [1] and [2] are the new spill code (insn [1] is the one that causes
the ICE I described above). Finally, during phase .c.29.rnreg, insn [1] is
expanded into a sequence of scalar insns, each of which looks like:
(insn 64 61 65 1 (set (mem:SI (reg:SI 7 r7) [0 S4 A8])
(reg:SI 9 r9 [ va ])) 309 {*movsi_internal1} (nil)
(nil))
In i386, the RTL of the subreg initialization insns looks as follows:
(insn:HI 41 40 43 0 (parallel [
(set (subreg:SI (reg/v:V8HI 61 [ va ]) 8)
(ior:SI (reg:SI 76)
(reg:SI 65)))
(clobber (reg:CC 17 flags))
]) 209 {*iorsi_1} (insn_list 39 (nil))
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_DEAD (reg:SI 76)
(nil))))
and they get spilled during .c.25.greg, and remain out side the loop.