Scatter/Gather vector operations

Segher Boessenkool segher@kernel.crashing.org
Sun Apr 8 13:10:00 GMT 2007


> typedef union
> {
> float    f[4] ;
> v4sf    v ;
> } vector[100] ;
>
> for( i=100; --i>=0;) {
> row1[i] = vector[i].f[0];
> row2[i] = vector[i].f[1];
> row3[i] = vector[i].f[2];
> row4[i] = vector[i].f[3];
>            }

Unroll by four: load four vectors, swap data around in
registers, store four vectors.

> Is this the only portable way to do to a pack/unpack without asm()?

If you want fully portable at the C level without using any
conditionals, this is pretty much it.  If you just don't want
to use asm(), there are intrinsics you can use.

> How do I set it up differently to trigger a pack/unpack optimization?

Perhaps the auto-vectorisers aren't smart enough (yet) to
do this for you.  If your goal is great performance, you
really have to write a special version for every processor;
although auto-vectorisation certainly can speed up things
quite a bit, hand-written vector code can be *much* faster.
A big part of the problem is that many vector insn sets are
very limited, or just "different".


Segher



More information about the Gcc-help mailing list