This is the mail archive of the
mailing list for the GCC project.
Re: Vector registers on MIPS arch
- From: David Guillen Fandos <david at davidgf dot net>
- To: Ilya Enkovich <enkovich dot gnu at gmail dot com>
- Cc: Gcc Mailing List <gcc at gcc dot gnu dot org>
- Date: Sun, 10 Apr 2016 01:34:12 +0100
- Subject: Re: Vector registers on MIPS arch
- Authentication-results: sourceware.org; auth=none
- References: <56FF1331 dot 9080103 at davidgf dot net> <CAMbmDYYGvvWWABBwXJ+yQdxvvrZbt4Gd6zsTaGRa1nGd8ndZPg at mail dot gmail dot com> <5702F1CA dot 5040605 at davidgf dot net> <CAMbmDYb7tMPbmf1OdxF+RmTGxWc1377wE0rqyeu1cDtdfmKEGg at mail dot gmail dot com> <57044146 dot 5030206 at davidgf dot net> <CAMbmDYYtsxHBikUCybpXQKCRMDj2BudZWxUK-14jYRoqpwwZrQ at mail dot gmail dot com> <57058455 dot 7030303 at davidgf dot net> <CAMbmDYYxPBUb7ZDuiNhFwzHPRqEH=OtOq72om3=cva8jv4P81g at mail dot gmail dot com>
On 07/04/16 09:09, Ilya Enkovich wrote:
> 2016-04-07 0:49 GMT+03:00 David Guillen Fandos <email@example.com>:
>> Thanks a lot Ilya!
>> I managed to get it working. There were some bugs regarding register
>> allocation that ended up promoting the class to be BLKmode instead of
>> V4SFmode. I had to debug it a bit, which is tricky, but in the end I
>> found my way through it.
>> Just to finish this. Do you think from your experience that is difficult
>> to implement vector instructions that have variable sizes?
> Having implemented instruction in some mode you shouldn't have much trouble
> to extend it into other mode using mode iterators. There are a lot of
> examples in GCC.
>> particular VFU has 4, 3, 2 and 1 element operations with arbitrary
>> swizzling. This is, we can load a V3SF and perform a dot product
>> operation with another V3SF to get a V1SF for instance. Of course the
>> elements might overlap, so if a vreg is A B C D we can have a 4 element
>> vector ABCD or a pair of 3 element vregs ABC and BCD, the same logic
>> applies to have 3 registers of V2SF type and so forth. It is very
>> flexible. It also allows column and row arranging, so we can load 4
>> vectors in a 4x4 matrix and multiply them with another matrix
>> transposing them on the fly.
> Unfortunately GCC doesn't expect vector to have not a power of two
> number of elements. Thus you can't write
> float var __attribute__ ((vector_size (12)));
> and expect it to get V3SF mode.
> Target instruction set doesn't affect a way vector code is represented
> in GIMPLE. It means complex instructions like matrix multiplication
> don't have expressions with corresponding semantics and can't be
> just generated out of a single GIMPLE statement.
> You still may get advantage of your ISA when expand vector code.
> E.g. vec_extract_[lo|hi] may be expanded into simple SUBREG in your case.
> Advanced vector instructions may be generated by RTL optimizers. E.g.
> combine may merge few vector instructions into a single one.
>> I guess this is too difficult to expose to gcc, which is more used to
>> intel SIMD stuff. In the past I wrote most of the kernels in assembly
>> and wrap them around C functions, but if you use classes and inline
>> functions having gcc on your side helps a lot (register allocation and
>> therefore less load/stores to memory).
> There are instructions which are never generated by compiler and exist
> mostly to be used manually. AES instruction set is a good example of such
> instructions. Intrinsics (builtin functions) is a better alternative to
> assembler code to manually write vector code with such instructions.
> Using intrinsics you get register allocation and RTL optimizations working.
>> Thanks a lot for your help!
Cool I wasnt aware of some things you mentinon.
To be a bit more especific:
- How would you define a template that takes 2 V4SF, calculates the dot
product and outputs a SF that is a subreg of a V4SF? This is, the
operation could be any of the four:
r.x = a.x*b.x + a.y*b.y + a.z*b.z + a.w*b.w;
r.y = a.x*b.x + a.y*b.y + a.z*b.z + a.w*b.w;
and so forth.
The idea would be to tell gcc that a V4SF has 4 SF that he can address
as subregs and define operations like the dot product one.
It's a pain not to have V3SF though...
Thanks a lot again!