This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug testsuite/63175] [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175

--- Comment #24 from Martin Sebor <msebor at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> Why is the loop bound to i != 16 / sizeof *s?

The upper bound is intended to make the copied sequence fit into one vector
register, irrespective of the size of the array element.

The vector load and store instructions tolerate unaligned accesses and there
are permute instructions that combine the contents of two vector registers into
a single one to compensate for unaligned reads or writes.  I'm not sure it
makes sense to expect unaligned copies involving a single vector register's
worth of data to be vectorized (as done in my proposed tests for char and
short), but I would expect larger unaligned copies (i.e., multiples of 16
bytes) to benefit from it.  In my experiments I've seen no evidence of GCC
attempting to vectorize such copies but I need to do some more research to
understand why.

(In reply to comment #23)

The test uses -maltivec and that's what I've been using as well.  But I see in
the Power ISA book that lxvw4x and stxvw4x are classified as VSX instructions,
so perhaps they shouldn't be emitted without -mvsx.  Although 5.0 doesn't emit
them even with -vsx.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]