This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 0/2] Loop distribution for memset zero
- From: Michael Matz <matz at suse dot de>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Sebastian Pop <sebpop at gmail dot com>, Richard Guenther <richard dot guenther at gmail dot com>, gcc-patches at gcc dot gnu dot org
- Date: Sun, 1 Aug 2010 18:19:33 +0200 (CEST)
- Subject: Re: [PATCH 0/2] Loop distribution for memset zero
- References: <1280522440-27919-1-git-send-email-sebpop@gmail.com> <AANLkTi=mz6HzE5HRu98U_FAwSVp7_zxuKunSjC+PUvUu@mail.gmail.com> <AANLkTinRf_40ebd6fY_pT9L+L7EZ3c9s2FBjF-uF_-1K@mail.gmail.com> <Pine.LNX.4.64.1007311835370.5995@wotan.suse.de> <Pine.LNX.4.64.1007311925520.15802@digraph.polyomino.org.uk>
Hi,
On Sat, 31 Jul 2010, Joseph S. Myers wrote:
> On Sat, 31 Jul 2010, Michael Matz wrote:
>
> > Some math libraries (the one from AMD at least for instance) provide not
> > only vectorized intrinsics for a fixed vector size (e.g. 4 float
> > elements), but also for a generic arbitrarily sized array.
> > For instance:
> >
> > void vrsa_expf(int n, float *src, float *dest);
> >
> > is equivalent to:
> >
> > for (i = 0; i < n; i++)
> > dest[i] = expf (src[i]);
>
> Exactly equivalent to that C code even with overlaps, or is it really (int
> n, const float *restrict src, float *restrict dest) with no overlap
> permitted?
The current vrsa_expf happens to be implemented with a forward walk
through the two arrays without checking for overlap. I think it's safe to
assume that nobody thought about this aspect and hence the specification
should include that no partial overlap is permitted (it would work with
exact overlap).
Ciao,
Michael.