This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Vector registers on MIPS arch
- From: David Guillen Fandos <david at davidgf dot net>
- To: Ilya Enkovich <enkovich dot gnu at gmail dot com>
- Cc: Gcc Mailing List <gcc at gcc dot gnu dot org>
- Date: Wed, 6 Apr 2016 22:49:09 +0100
- Subject: Re: Vector registers on MIPS arch
- Authentication-results: sourceware.org; auth=none
- References: <56FF1331 dot 9080103 at davidgf dot net> <CAMbmDYYGvvWWABBwXJ+yQdxvvrZbt4Gd6zsTaGRa1nGd8ndZPg at mail dot gmail dot com> <5702F1CA dot 5040605 at davidgf dot net> <CAMbmDYb7tMPbmf1OdxF+RmTGxWc1377wE0rqyeu1cDtdfmKEGg at mail dot gmail dot com> <57044146 dot 5030206 at davidgf dot net> <CAMbmDYYtsxHBikUCybpXQKCRMDj2BudZWxUK-14jYRoqpwwZrQ at mail dot gmail dot com>
On 06/04/16 10:44, Ilya Enkovich wrote:
> 2016-04-06 1:50 GMT+03:00 David Guillen Fandos <david@davidgf.net>:
>>
>> Thanks again Ilya,
>>
>> That seems to help to solve the problem. Now I'm facing another issue.
>> It seems the tree-vec-generic pass is promoting my vector operations to
>> BLKmode and therefore the VECTOR_MODE_P macro evaluates to false,
>> falling back to scalar mode.
>> I thought I got it working for a moment when I forgot to fix the
>> HARD_MODE_REGNO_OK macro that evaluated to false for vector registers.
>> In that case I mange to dodge this issue but I see another issue
>> regarding register allocation (obviously! :P)
>>
>> So the bottom line would be, how do I make sure that my "compute_type"
>> is V4SF instead of BLKmode? Where does this promotion happen?
>
> TYPE_MODE macro for vectors is actually a call to vector_type_mode. You
> should probably look at it first. You may also check what mode_for_vector
> returns for float vector in your case.
>
> Ilya
>
>>
>> Thanks a lot!
>> David
Thanks a lot Ilya!
I managed to get it working. There were some bugs regarding register
allocation that ended up promoting the class to be BLKmode instead of
V4SFmode. I had to debug it a bit, which is tricky, but in the end I
found my way through it.
Just to finish this. Do you think from your experience that is difficult
to implement vector instructions that have variable sizes? This
particular VFU has 4, 3, 2 and 1 element operations with arbitrary
swizzling. This is, we can load a V3SF and perform a dot product
operation with another V3SF to get a V1SF for instance. Of course the
elements might overlap, so if a vreg is A B C D we can have a 4 element
vector ABCD or a pair of 3 element vregs ABC and BCD, the same logic
applies to have 3 registers of V2SF type and so forth. It is very
flexible. It also allows column and row arranging, so we can load 4
vectors in a 4x4 matrix and multiply them with another matrix
transposing them on the fly.
I guess this is too difficult to expose to gcc, which is more used to
intel SIMD stuff. In the past I wrote most of the kernels in assembly
and wrap them around C functions, but if you use classes and inline
functions having gcc on your side helps a lot (register allocation and
therefore less load/stores to memory).
Thanks a lot for your help!
David