This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: upcoming SSE/SSE2 support in 3.1

From: Jim Wray <wray at rivit dot cs dot byu dot edu>
To: Geert Bosch <bosch at gnat dot com>
Cc: gcc at gcc dot gnu dot org
Date: Thu, 9 May 2002 20:27:51 -0600 (MDT)
Subject: Re: upcoming SSE/SSE2 support in 3.1

On Thu, 9 May 2002, Geert Bosch wrote:

> 
> On Thursday, May 9, 2002, at 05:07 , Jim Wray wrote:
> > In other words, a typical operation would be to
> > iteratively go through data that is consecutive in memory with two 
> > sources
> > and a destination.  Is this likely to get SSE(2) code generated
> > automatically, or should I spend time looking at writing custom 
> > assembly.
> 
> In order to make it possible to have any vectorization, you should start
> to make sure all your data is properly aligned and that the compiler 
> knows
> about this. Also you should take care to write the code such that it is
> clear (for the compiler) that there can be no possible aliasing issues.
> This is not trivial, but will likely result in speedups already without
> using specific SSE/SSE2 instructions.
> 
> The last step of actually using the vector instructions is relatively 
> easy, and can be done using Asm inserts for now, or automatically by the 
> compiler later. The step of laying out your data and designing your 
> functions to
> meet the aliasing and alignment requirements as described above is 
> something
> that compilers in general will not be able to do for you however.
> 
>    -Geert
> 

So, as a follow up, code such as this:

... (T is an atomic type)

T src1[BIGARRAYSIZE];
T src2[BIGARRAYSIZE];
T dest[BIGARRAYSIZE];

for( int i = 0; i < BIGARRAYSIZE; ++i )
    dest[i] = src1[i] + src2[i]

would fit your conditions, but /for now/, there is no autovectorizing of 
this code for SSE(2) instructions; so I would have to code up my own based 
on types.  For example, if T were char, I could eat up the loop 16 
operations at a time, and set my loop increment to 16 and plug in the 
SSE instructions.  Have I understood correctly?

Additionally, with my limited understanding it seems that this particular 
type of loop operation on mass data /should/ be easy to auto-vectorize, is 
auto-vectorization at some level planned for a particular release, just 
sort of "on-the-docket," or not even really considred right now.

Thanks a bunch.  :-)

-- 
Jim Wray

References:
- Re: upcoming SSE/SSE2 support in 3.1
  - From: Geert Bosch

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]