This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Vector registers on MIPS arch


On 06/04/16 10:44, Ilya Enkovich wrote:
> 2016-04-06 1:50 GMT+03:00 David Guillen Fandos <david@davidgf.net>:
>>
>> Thanks again Ilya,
>>
>> That seems to help to solve the problem. Now I'm facing another issue.
>> It seems the tree-vec-generic pass is promoting my vector operations to
>> BLKmode and therefore the VECTOR_MODE_P macro evaluates to false,
>> falling back to scalar mode.
>> I thought I got it working for a moment when I forgot to fix the
>> HARD_MODE_REGNO_OK macro that evaluated to false for vector registers.
>> In that case I mange to dodge this issue but I see another issue
>> regarding register allocation (obviously! :P)
>>
>> So the bottom line would be, how do I make sure that my "compute_type"
>> is V4SF instead of BLKmode? Where does this promotion happen?
> 
> TYPE_MODE macro for vectors is actually a call to vector_type_mode.  You
> should probably look at it first.  You may also check what mode_for_vector
> returns for float vector in your case.
> 
> Ilya
> 
>>
>> Thanks a lot!
>> David

Thanks a lot Ilya!

I managed to get it working. There were some bugs regarding register
allocation that ended up promoting the class to be BLKmode instead of
V4SFmode. I had to debug it a bit, which is tricky, but in the end I
found my way through it.

Just to finish this. Do you think from your experience that is difficult
to implement vector instructions that have variable sizes? This
particular VFU has 4, 3, 2 and 1 element operations with arbitrary
swizzling. This is, we can load a V3SF and perform a dot product
operation with another V3SF to get a V1SF for instance. Of course the
elements might overlap, so if a vreg is A B C D we can have a 4 element
vector ABCD or a pair of 3 element vregs ABC and BCD, the same logic
applies to have 3 registers of V2SF type and so forth. It is very
flexible. It also allows column and row arranging, so we can load 4
vectors in a 4x4 matrix and multiply them with another matrix
transposing them on the fly.

I guess this is too difficult to expose to gcc, which is more used to
intel SIMD stuff. In the past I wrote most of the kernels in assembly
and wrap them around C functions, but if you use classes and inline
functions having gcc on your side helps a lot (register allocation and
therefore less load/stores to memory).

Thanks a lot for your help!

David



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]