This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: upcoming SSE/SSE2 support in 3.1
- From: Jim Wray <wray at rivit dot cs dot byu dot edu>
- To: Geert Bosch <bosch at gnat dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 9 May 2002 20:27:51 -0600 (MDT)
- Subject: Re: upcoming SSE/SSE2 support in 3.1
On Thu, 9 May 2002, Geert Bosch wrote:
>
> On Thursday, May 9, 2002, at 05:07 , Jim Wray wrote:
> > In other words, a typical operation would be to
> > iteratively go through data that is consecutive in memory with two
> > sources
> > and a destination. Is this likely to get SSE(2) code generated
> > automatically, or should I spend time looking at writing custom
> > assembly.
>
> In order to make it possible to have any vectorization, you should start
> to make sure all your data is properly aligned and that the compiler
> knows
> about this. Also you should take care to write the code such that it is
> clear (for the compiler) that there can be no possible aliasing issues.
> This is not trivial, but will likely result in speedups already without
> using specific SSE/SSE2 instructions.
>
> The last step of actually using the vector instructions is relatively
> easy, and can be done using Asm inserts for now, or automatically by the
> compiler later. The step of laying out your data and designing your
> functions to
> meet the aliasing and alignment requirements as described above is
> something
> that compilers in general will not be able to do for you however.
>
> -Geert
>
So, as a follow up, code such as this:
... (T is an atomic type)
T src1[BIGARRAYSIZE];
T src2[BIGARRAYSIZE];
T dest[BIGARRAYSIZE];
for( int i = 0; i < BIGARRAYSIZE; ++i )
dest[i] = src1[i] + src2[i]
would fit your conditions, but /for now/, there is no autovectorizing of
this code for SSE(2) instructions; so I would have to code up my own based
on types. For example, if T were char, I could eat up the loop 16
operations at a time, and set my loop increment to 16 and plug in the
SSE instructions. Have I understood correctly?
Additionally, with my limited understanding it seems that this particular
type of loop operation on mass data /should/ be easy to auto-vectorize, is
auto-vectorization at some level planned for a particular release, just
sort of "on-the-docket," or not even really considred right now.
Thanks a bunch. :-)
--
Jim Wray