This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Vector registers on MIPS arch

2016-04-07 0:49 GMT+03:00 David Guillen Fandos <>:
> Thanks a lot Ilya!
> I managed to get it working. There were some bugs regarding register
> allocation that ended up promoting the class to be BLKmode instead of
> V4SFmode. I had to debug it a bit, which is tricky, but in the end I
> found my way through it.
> Just to finish this. Do you think from your experience that is difficult
> to implement vector instructions that have variable sizes?

Having implemented instruction in some mode you shouldn't have much trouble
to extend it into other mode using mode iterators.  There are a lot of
examples in GCC.

> This
> particular VFU has 4, 3, 2 and 1 element operations with arbitrary
> swizzling. This is, we can load a V3SF and perform a dot product
> operation with another V3SF to get a V1SF for instance. Of course the
> elements might overlap, so if a vreg is A B C D we can have a 4 element
> vector ABCD or a pair of 3 element vregs ABC and BCD, the same logic
> applies to have 3 registers of V2SF type and so forth. It is very
> flexible. It also allows column and row arranging, so we can load 4
> vectors in a 4x4 matrix and multiply them with another matrix
> transposing them on the fly.

Unfortunately GCC doesn't expect vector to have not a power of two
number of elements.  Thus you can't write

float var __attribute__ ((vector_size (12)));

and expect it to get V3SF mode.

Target instruction set doesn't affect a way vector code is represented
in GIMPLE.  It means complex instructions like matrix multiplication
don't have expressions with corresponding semantics and can't be
just generated out of a single GIMPLE statement.

You still may get advantage of your ISA when expand vector code.
E.g. vec_extract_[lo|hi] may be expanded into simple SUBREG in your case.
Advanced vector instructions may be generated by RTL optimizers.  E.g.
combine may merge few vector instructions into a single one.

> I guess this is too difficult to expose to gcc, which is more used to
> intel SIMD stuff. In the past I wrote most of the kernels in assembly
> and wrap them around C functions, but if you use classes and inline
> functions having gcc on your side helps a lot (register allocation and
> therefore less load/stores to memory).

There are instructions which are never generated by compiler and exist
mostly to be used manually.  AES instruction set is a good example of such
instructions.  Intrinsics (builtin functions) is a better alternative to
assembler code to manually write vector code with such instructions.
Using intrinsics you get register allocation and RTL optimizations working.


> Thanks a lot for your help!
> David

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]