This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 0/2] Loop distribution for memset zero


Hi,

On Sat, 31 Jul 2010, Joseph S. Myers wrote:

> On Sat, 31 Jul 2010, Michael Matz wrote:
> 
> > Some math libraries (the one from AMD at least for instance) provide not 
> > only vectorized intrinsics for a fixed vector size (e.g. 4 float 
> > elements), but also for a generic arbitrarily sized array.  
> > For instance:
> > 
> >   void vrsa_expf(int n, float *src, float *dest);
> > 
> > is equivalent to:
> > 
> >   for (i = 0; i < n; i++)
> >     dest[i] = expf (src[i]);
> 
> Exactly equivalent to that C code even with overlaps, or is it really (int 
> n, const float *restrict src, float *restrict dest) with no overlap 
> permitted?

The current vrsa_expf happens to be implemented with a forward walk 
through the two arrays without checking for overlap.  I think it's safe to 
assume that nobody thought about this aspect and hence the specification 
should include that no partial overlap is permitted (it would work with 
exact overlap).


Ciao,
Michael.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]