Scatter/Gather vector operations
Dzonatas
dzonatas@dzonux.net
Sun Apr 8 08:04:00 GMT 2007
Tim Prince wrote:
> Dzonatas wrote:
>> Is this the only portable way to do to a pack/unpack without asm()?
>> How do I set it up differently to trigger a pack/unpack optimization?
>>
>> Thank you.
>>
> If you're talking about optimization for a specific CPU, but you don't
> want to reveal which CPU that is, why even post this?
No. I'm just trying to get an idea of what direction the future of such
code may take, as I also wonder what is the best format for now.
> This code looks OK to me. There isn't any special hardware support
> for this on commonly available CPUs, like Opteron or Xeon. Scalar
> moves should work as well as anything, and you are within the limits
> for efficient Write Combine buffering. If you have problems, you
> won't get any help if you can't describe them more specifically.
>
The problem is bandwidth. Vector processes help greatly with that alone
despite the matrix math.
Currently, there are immediate targets for SSE2 and Altivec enabled
architectures. I could probably write assembly code to overcome it with
instructions to unpack a vector and scatter the data that is specific
for SSE2/Altivec, but I don't want to aim that short. I would like to
avoid the assembly code if possible.
For example, is there a formal way to use a vector register as a pointer
to main memory to fetch that data into another vector register. I know
this is beyond the basic vector operations implemented now, but like:
for(i=0;i<100;i++) {
*vector_reg1 = vector_reg2; // scatter each data element from reg2
into memory pointed to by each associated element in reg1
vector_reg1++;
}
Thanks for the response.
--
More information about the Gcc-help
mailing list