Scatter/Gather vector operations
Segher Boessenkool
segher@kernel.crashing.org
Sun Apr 8 13:10:00 GMT 2007
> typedef union
> {
> float f[4] ;
> v4sf v ;
> } vector[100] ;
>
> for( i=100; --i>=0;) {
> row1[i] = vector[i].f[0];
> row2[i] = vector[i].f[1];
> row3[i] = vector[i].f[2];
> row4[i] = vector[i].f[3];
> }
Unroll by four: load four vectors, swap data around in
registers, store four vectors.
> Is this the only portable way to do to a pack/unpack without asm()?
If you want fully portable at the C level without using any
conditionals, this is pretty much it. If you just don't want
to use asm(), there are intrinsics you can use.
> How do I set it up differently to trigger a pack/unpack optimization?
Perhaps the auto-vectorisers aren't smart enough (yet) to
do this for you. If your goal is great performance, you
really have to write a special version for every processor;
although auto-vectorisation certainly can speed up things
quite a bit, hand-written vector code can be *much* faster.
A big part of the problem is that many vector insn sets are
very limited, or just "different".
Segher
More information about the Gcc-help
mailing list