This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GCC : how to add VFPU to PSP Allegrex (MIPS target) ?


Hello,

As i don't really know where to send this email, I thought you may help to redirect it to the right place.

I have some questions about vectorization which I would like to have for a special MIPS architecture : Allegrex w/ VFPU.

First, an explanation about what allegrex is :

   Allegrex is based on MIPS II with some MIPS III additions and has a
   standard FPU. It has no 64-bit ISA for both CPU and FPU (so long
   long and double need to be done though a software method).
   There is also some instructions usually found in OPCODE3 map to be
   in OPCODE2 map for Allegrex (min, max, madd(u), msub(u), etc.).

Allegrex also has a coprocessor 2 we called VFPU. This coprocessor is quite powerful and easy to use and can even be used as a replacement for FPU :

Details about VFPU :

   It has 128 32-bit single precision floating point registers, which
   are organized in 8 banks of 4x4 matrixes :
   $0
   	$1
   	$2
   	$3
   $32
   	$33
   	$34
   	$35
   $64
   	$65
   	$66
   	$67
   $96
   	$97
   	$98
   	$99

   	
   $4
   	$5
   	$6
   	$7
   $36
   	$37
   	$38
   	$39
   $68
   	$69
   	$70
   	$71
   $100
   	$101
   	$102
   	$103

   	
   $8
   	$9
   	$10
   	$11
   $40
   	$41
   	$42
   	$43
   $72
   	$73
   	$74
   	$75
   $104
   	$105
   	$106
   	$107

   	
   $12
   	$13
   	$14
   	$15
   $44
   	$45
   	$46
   	$47
   $76
   	$77
   	$78
   	$79
   $108
   	$109
   	$110
   	$111

   	
   $16
   	$17
   	$18
   	$19
   $48
   	$49
   	$50
   	$51
   $80
   	$81
   	$82
   	$83
   $112
   	$113
   	$114
   	$115

   	
   $20
   	$21
   	$22
   	$23
   $52
   	$53
   	$54
   	$55
   $84
   	$85
   	$86
   	$87
   $116
   	$117
   	$118
   	$119

   	
   $24
   	$25
   	$26
   	$27
   $56
   	$57
   	$58
   	$59
   $88
   	$89
   	$90
   	$91
   $120
   	$121
   	$122
   	$123

   	
   $28
   	$29
   	$30
   	$31
   $60
   	$61
   	$62
   	$63
   $92
   	$93
   	$94
   	$95
   $124
   	$125
   	$126
   	$127


$0..$127 : 32-bit scalar floats.


   those registers can be accessed in scalar format, in row or column
   vector format, in matrix or transposed format.

   when accessing as a row or a column, we can deal with 2, 3 or 4
   components as long as the numbers of those registers are inside the
   bank.

   when accessing as a matrix, we can deal with 2, 3 or 4  rows or
   columns as long as  theirs numbers are inside the bank.

Basically, operations on VFPU may look this way :

vadd.s S000, S010, S020 : $0 = $1 + $2

   vadd.t R000, R001, C030 : {$0, $1, $2} = {$32, $33, $34} + {$3, $35,
   $67}  <=> par { $0 = $32 + $3; $1 = $33 + $35; $2 = $34 + $67; }

   vtfm.p R200, M000, C010 : {$8, $9} = {{$0, $1}, {$32, $33}} x {$4,
   $36} <=> par { $8 = $0 * $4 + $1 * $36; $9 = $32 * $4 + $33 * $36 }

etc.

The questions now :

I would like to extend the use of VFPU in psp-gcc, the free PSP port of gcc through the vectorization

   * How can I make coexist the SF mode between the FPU registers and
     the VFPU registers in the argument list of a function ? there is
     no direct tranfer between them, you need to use a GPR register or
     a memory slot as an intermediate to do so, which is very slow. I
     would like to be able to distinguish them through an attribute. Is
     there any examples which address this problem ?
   * Another way to distinguish a VFPU scalar is to use "typedef  float
     __attribute__((vector_size(4))) V1SF;". Is that difficult to make
     it possible (right now, gcc refuses it) ?
   * Same question for V3SF, is that difficult to make it possible ?
   * V2SF and V4SF are possible (they are respectively row vector of
     two or four components). If I choose to let gcc allocate only the
     first 32 VFPU registers, the other being associated with one of
     first 32 registers, would it be difficult to have combined V2V1SF,
     V3V1SF, V4V1SF to define column vectors of two, three or four
     components ? and to have V2V2SF, V3V3SF, V4V4SF as matrixes ?

Right now, a vector needs a binary multiple of components. It doesn't allow autovectorization for 3D.

V4SF means allocation of 4 contigous registers amongst the first 32 registers : { i, i+1, i+2, i+3} with 0 <= i < 32 and i%4 = 0. Its size is 4*sizeof(float). For VFPU, it would describe a typical 4D row vector.

V4V1SF means allocation of 1 register amongst the first 32 registers : { i, i+32, i+64, i+96 } with 0 <= i < 32. But its size is 4*sizeof(float). For VFPU, it would describe a typical 4D column vector.

V2V2SF means allocation of 2 contigous registers amongst the first 32 registers : { { i, i+32 }, { i+1, i+33 } } 0 <= i < 32 and i%2 = 0. But its size is 2*2*sizeof(float). For VFPU, it would describe a 2D typical matrix.

V3SF would be equivalent to V4SF for the allocation, but its size is 3*sizeof(float) and its operations different. For VFPU, it would describe a typical 4D row vector.

I really hope some are feasible as this VFPU has a lot of operations for 2D, 3D and 4D which may greatly boost gcc maths.


Regards





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]