This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)
On Fri, 28 Oct 2011, Richard Guenther wrote:
> On Fri, 28 Oct 2011, Jakub Jelinek wrote:
>
> > On Fri, Oct 28, 2011 at 12:59:48PM +0200, Richard Guenther wrote:
> > > It is also because of re-use of memory via memcpy (yes, some dubious
> > > TBAA case from C, but essentially we don't want to break that). Thus
> > > we can't use TBAA on anonymous memory.
> >
> > No, IMHO we always use a ref_all mem access in that case.
> > If you meant something like:
> >
> > void
> > foo (int *intptr, float *floatptr)
> > {
> > int i;
> > for (i = 0; i < 256; ++i)
> > {
> > int tem;
> > __builtin_memcpy (&tem, &intptr[i], sizeof (tem));
> > floatptr[i] = (float) tem;
> > }
> > }
> >
> > which is valid C even if intptr == floatptr, we have:
> >
> > <bb 2>:
> >
> > <bb 3>:
> > # i_21 = PHI <i_14(4), 0(2)>
> > # ivtmp.12_27 = PHI <ivtmp.12_26(4), 256(2)>
> > D.2709_3 = (long unsigned int) i_21;
> > D.2710_4 = D.2709_3 * 4;
> > D.2711_6 = intptr_5(D) + D.2710_4;
> > D.2712_7 = MEM[(char * {ref-all})D.2711_6];
> > D.2713_11 = floatptr_10(D) + D.2710_4;
> > D.2715_13 = (float) D.2712_7;
> > *D.2713_11 = D.2715_13;
> > i_14 = i_21 + 1;
> > ivtmp.12_26 = ivtmp.12_27 - 1;
> > if (ivtmp.12_26 != 0)
> > goto <bb 4>;
> > else
> > goto <bb 5>;
> >
> > <bb 4>:
> > goto <bb 3>;
> >
> > which is just fine even with TBAA.
> > And similarly for
> > void
> > bar (int *intptr, float *floatptr)
> > {
> > int i;
> > for (i = 0; i < 256; ++i)
> > {
> > float tem;
> > tem = (float) intptr[i];
> > __builtin_memcpy (&floatptr[i], &tem, sizeof (tem));
> > }
> > }
> >
> > where the ref-all isn't used for load, but for store.
>
> Well, yeah. I said it's probably difficult to generate a
> C testcase. It's still valid middle-end IL (and well-defined) to have
> intptr == floatptr and MEM[(int *)..] and MEM[(float *)...].
Btw, only the exact overlap case is critical, for non-exact overlap
like
for (i)
{
float[i] = int[i-1] + int[i];
}
you can reason that there cannot be aliasing as if you execute this
loop more than once(!) then you'd have
float[i] = int[i-1] + int[i];
float[i+1] = int[i] + int[i+1];
...
where the 2nd load from int[i] would load from float-initialized
memory which is undefined. Thus you can assume that float != int.
But that requires more thorough analysis that we don't do at the
moment and knowledge that the loop will iterate at least N
times (when called from the vectorizer, the vectorization factor,
which is at least 2).
Richard.