This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Masked load/store vectorization (take 5)

On Wed, Nov 13, 2013 at 01:21:03PM +0100, Jakub Jelinek wrote:
> Sergey has kindly tested the patch on SPEC2k6, but on 4 tests it revealed
> an ICE.  Here is an incremental fix for that, for now it just punts on
> those.  In theory the invariant conditional loads could be handled e.g.
> using gather, or perhaps V{,P}MOVMSKP{S,D,B} on the mask followed by
> conditional scalar load of the value if any bits in the mask are set,
> then broadcasting it.

And another issue, ICE in predcom.  The problem is that the patch creates
DR_READ data ref for the MASK_LOAD internal call and !DR_READ data ref
for the MASK_STORE, which is handled properly in the vectorizer, but
apparently other data ref consumers aren't prepared for it.

So, in the spirit of the recommended change to do some dataref processing
privately in tree-vect-data-refs.c, the first patch (bootstrapped/regtested
on x86_64-linux and i686-linux) handles MASK_LOAD and MASK_STORE solely in
the vectorizer and leaves those as failing the data ref analysis otherwise.

Or, alternatively, we would need to handle those or punt in the other
consumers.  Untested second patch does the punting in predcom and phiopt,
but there are others still unverified.  Apparently e.g. predcom isn't
prepared to handle data refs on any calls at all, which can happen already
before my patches, and apparently other passes aren't either.
E.g. I've tried to create a testcase for pcom (which asserts that DR_STMT
is either gimple assign or PHI):

struct S { int i; };
__attribute__((const, noinline, noclone))
struct S foo (int x)
  struct S s;
  s.i = x;
  return s;

int a[2048], b[2048], c[2048], d[2048];
struct S e[2048];

__attribute__((noinline, noclone)) void
bar (void)
  int i;
  for (i = 0; i < 1024; i++)
      e[i] = foo (i);
      a[i+2] = a[i] + a[i+1];
      b[10] = b[10] + i;
      c[i] = c[2047 - i];
      d[i] = d[i + 1];

main ()
  int i;
  bar ();
  for (i = 0; i < 1024; i++)
    if (e[i].i != i)
      __builtin_abort ();
  return 0;

but to my surprise pcom didn't fail on it (because the data reference
doesn't have a gimple type and thus isn't suitable_reference_p),
but e.g. ldist miscompiled it (completely ignored it as if there wasn't
e[i] = foo (i); and let it out of the loop, so now it fails at runtime).

Richard, any preferences on whether you'd prefer MASK_LOAD/MASK_STORE
to be handled as data references only for the vectorization, or also
other passes?  Even in the former case, we need to investigate the
non-vectorizer data ref users how they handle calls and fix the ldist


Attachment: V023
Description: Text document

Attachment: V022
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]