This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Vector registers on MIPS arch


2016-04-10 3:34 GMT+03:00 David Guillen Fandos <david@davidgf.net>:
> On 07/04/16 09:09, Ilya Enkovich wrote:
>> 2016-04-07 0:49 GMT+03:00 David Guillen Fandos <david@davidgf.net>:
>>>
>>> Thanks a lot Ilya!
>>>
>>> I managed to get it working. There were some bugs regarding register
>>> allocation that ended up promoting the class to be BLKmode instead of
>>> V4SFmode. I had to debug it a bit, which is tricky, but in the end I
>>> found my way through it.
>>>
>>> Just to finish this. Do you think from your experience that is difficult
>>> to implement vector instructions that have variable sizes?
>>
>> Having implemented instruction in some mode you shouldn't have much trouble
>> to extend it into other mode using mode iterators.  There are a lot of
>> examples in GCC.
>>
>>> This
>>> particular VFU has 4, 3, 2 and 1 element operations with arbitrary
>>> swizzling. This is, we can load a V3SF and perform a dot product
>>> operation with another V3SF to get a V1SF for instance. Of course the
>>> elements might overlap, so if a vreg is A B C D we can have a 4 element
>>> vector ABCD or a pair of 3 element vregs ABC and BCD, the same logic
>>> applies to have 3 registers of V2SF type and so forth. It is very
>>> flexible. It also allows column and row arranging, so we can load 4
>>> vectors in a 4x4 matrix and multiply them with another matrix
>>> transposing them on the fly.
>>
>> Unfortunately GCC doesn't expect vector to have not a power of two
>> number of elements.  Thus you can't write
>>
>> float var __attribute__ ((vector_size (12)));
>>
>> and expect it to get V3SF mode.
>>
>>
>> Target instruction set doesn't affect a way vector code is represented
>> in GIMPLE.  It means complex instructions like matrix multiplication
>> don't have expressions with corresponding semantics and can't be
>> just generated out of a single GIMPLE statement.
>>
>> You still may get advantage of your ISA when expand vector code.
>> E.g. vec_extract_[lo|hi] may be expanded into simple SUBREG in your case.
>> Advanced vector instructions may be generated by RTL optimizers.  E.g.
>> combine may merge few vector instructions into a single one.
>>
>>>
>>> I guess this is too difficult to expose to gcc, which is more used to
>>> intel SIMD stuff. In the past I wrote most of the kernels in assembly
>>> and wrap them around C functions, but if you use classes and inline
>>> functions having gcc on your side helps a lot (register allocation and
>>> therefore less load/stores to memory).
>>
>> There are instructions which are never generated by compiler and exist
>> mostly to be used manually.  AES instruction set is a good example of such
>> instructions.  Intrinsics (builtin functions) is a better alternative to
>> assembler code to manually write vector code with such instructions.
>> Using intrinsics you get register allocation and RTL optimizations working.
>>
>> Ilya
>>
>>>
>>> Thanks a lot for your help!
>>>
>>> David
>>>
>>>
>
> Cool I wasnt aware of some things you mentinon.
> To be a bit more especific:
>
>  - How would you define a template that takes 2 V4SF, calculates the dot
> product and outputs a SF that is a subreg of a V4SF? This is, the
> operation could be any of the four:
>
>  r.x = a.x*b.x + a.y*b.y + a.z*b.z + a.w*b.w;
>
> or
>
>  r.y = a.x*b.x + a.y*b.y + a.z*b.z + a.w*b.w;
>
> and so forth.
> The idea would be to tell gcc that a V4SF has 4 SF that he can address
> as subregs and define operations like the dot product one.

You can use vec_select to get vector elements and compute sum.  Then you
can use vec_concat or vec_merge to build up resulting vector.  I would not
expect GCC to autogenerate this instruction though.

> It's a pain not to have V3SF though...

AVX-512 instructions use masks to perform operation on vector parts.
vec_merge is used to describe that in patterns.  Probably it will be
easier to consider V3SF instruction as V4SF instruction with mask
applied?


Ilya

>
> Thanks a lot again!
> David


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]