This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: what optimization can be expected?

From: Tim Prince <TimothyPrince at sbcglobal dot net>
To: Burlen Loring <burlen dot loring at gmail dot com>
Cc: tprince at computer dot org, GCC-help <gcc-help at gcc dot gnu dot org>
Date: Fri, 24 Apr 2009 07:43:53 -0700
Subject: Re: what optimization can be expected?
References: <49F080C8.5020200@gmail.com> <49F0918A.4080300@sbcglobal.net> <49F1AF64.3070300@gmail.com>
Reply-to: tprince at computer dot org

Burlen Loring wrote:
> Tim Prince wrote:
>> burlen wrote:
>>
>>  
>>> Can loops with a non-unit stride be automagically optimized by compiler
>>> with SSE?
>>>
>>> template <int nComp>
>>> void norm(double *result, double *data, size_t n)
>>> {
>>>  double *pDat=data;
>>>  double *pRes=result;
>>>
>>>  for (size_t i=0; i<n; ++i)
>>>  {
>>>    *pRes=*pDat**pDat;
>>>    for (int j=1; j<nComp; ++j)
>>>    {
>>>      *pRes+=pDat[j]*pDat[j];
>>>    }
>>>    *pRes=sqrt(*pRes);
>>>
>>>    pRes+=1;
>>>    pDat+=nComp;
>>>  }
>>> }
>>>     
>>
>> Your inner loop appears to have unit stride, and might be optimized
>> easily
>> if you didn't write it with potential aliases.  If you meant
>> inner_product(), why not use that?
>>   
> Inner loop does have unit stride but its usually small between 1 and 12
> and the outer loop is usually large in the 10-100s of thousands. That
> example is simply one simple situation that I encounter. I want to
> understand how the compiler applies SSE optimization. What can be
> automatically SSE optimized by g++? Is this documented somewhere?
> 
> I want to write in such a way to take advantage of g++ capability. It's
> important for me to let g++ do optimization because the code needs to be
> cross platform.
> 

SSE vectorization can work only on a stride 1 inner loop (other than
limited cases with sse4).  If that loop had a known constant trip count,
you might instruct the compiler to unroll it entirely, but that is not
possible according to your follow-up.
You still ought to get a significant optimization by writing the apparent
aliasing out of the loop, if in fact there is no such aliasing, as the
inner_product() would do.  If it did vectorize, which would be enabled
only with g++ -O3 -ffast-math, and that slowed it down on account of the
short loop length, you could remove the vectorization by removing
-ffast-math or other ways.
To optimize the outer loop, you would need to declare operands as double *
__restrict__ , if that is valid, and if you are willing to deal with the
different names given to the restrict extension by each C++ compiler.
That may not make much difference, as long as you optimize the inner loop.
You haven't told us whether the operands are in fact aliased, in which
case the compiler would break your code by attempting to optimize it.

References:
- what optimization can be expected?
  - From: burlen
- Re: what optimization can be expected?
  - From: Tim Prince
- Re: what optimization can be expected?
  - From: Burlen Loring

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]