This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/50789] Gather vectorization

From: "jakub at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Mon, 24 Oct 2011 08:39:16 +0000
Subject: [Bug tree-optimization/50789] Gather vectorization
Auto-submitted: auto-generated
References: <bug-50789-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-24 08:39:16 UTC ---
Not exactly, -fif-convert-loop-stores is apparently a language changing option,
only a subset of valid C/C++ programs is valid with it.  With V*GATHER* insns,
and, as I found out during the weekend, with VMASKMOVP[SD] and VPMASKMOV[DQ]
instructions too we can handle both conditional loads and conditional stores.
So testcases like:

float a[N], b[N], c[N], d[N], e[N], g[N];

void
f6 (void)
{
  int i;
  for (i = 0; i < N; i++)
    e[i] = a[i] < b[i] ? c[i] : d[i];
}

void
f7 (float *p, float *q)
{
  int i;
  for (i = 0; i < N; i++)
    e[i] = a[i] < b[i] ? p[i] : q[i];
}

void
f8 (void)
{
  int i;
  for (i = 0; i < N; i++)
    {
      float f = c[i] + d[i];
      if (a[i] < b[i])
        e[i] = f;
    }
}

void
f9 (void)
{
  int i;
  for (i = 0; i < N; i++)
    {
      float f = c[i] * d[i];
      if (a[i] < b[i])
        e[i] = f;
      else
        g[i] = f;
    }
}

should be vectorizable (and even with -mavx).  Haven't checked if any other CPU
(PPC, ARM, ...) doesn't have anything similar.
In fact, f6 ought to be vectorizable always, we could easily find out that for
any i that can appear in the loop (0 through 999) c[i] (nor d[i]) will not trap
or fault.  The question is if the same is true for
extern float c[N];
instead (e.g. if the actual definition would be then float c[N / 2];, I'd hope
that it is invalid C though), but for f7 you already can't know if p resp. q
are valid pointers at all, are correctly aligned, and whether e.g. p[i] or q[i]
don't point beyond end of an mmapped region.  So f7/f8/f9 are only vectorizable
using these v*maskmov* instructions (or f7 using v*gather*, but that would be
unnecessary additional overhead).  I've verified that SNB CPUs don't require
any alignment and don't fault on completely invalid addresses with zero mask.

The question is how to represent this in the IL, and IMHO it should be
something that is either present solely during the vectorization (i.e. pattern
recognizer like thing), or that we convert the IL into right before the
vectorizer (e.g. during ifcvt), but convert it back to the original multiple
BBs IL either at the end of the vectorizer or in a pass right after the
vectorizer.  For the conditional loads we could perhaps represent them by
COND_EXPRs with some flag on the gimple which would allow memory instead of
SSA_NAMEs in one or both of the then/else operands or a new tree code, for
conditional stores we'd need a new tree code.

References:
- [Bug tree-optimization/50789] New: Gather vectorization
  - From: jakub at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]