This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

How are we supposed to play along the autovectorizer in c++? (alignment issues)

From: tbp <tbptbp at gmail dot com>
To: GCC <gcc at gcc dot gnu dot org>
Date: Tue, 29 Jul 2008 13:17:26 +0200
Subject: How are we supposed to play along the autovectorizer in c++? (alignment issues)

Hello.
the autovectorizer is enabled by default in g++ 4.3 and does a fine
job most of the time. Except it gets mightily pissed off if you dare
to tweak the alignment and after much experimentation i haven't yet
devised how to plug all the holes.
This silly example shows where things start to get ugly
#  cat autovec.cc
enum { N = 4, align_to = 16/sizeof(char) };
typedef float scalar_type;
struct foo_t {
	scalar_type m[N];
	foo_t operator +(const foo_t &rhs) const { foo_t v(*this); for
(unsigned i=0; i<N; ++i) v.m[i] += rhs.m[i]; return v; }
};
struct  bar_t {
	scalar_type  __attribute__((aligned(sizeof(char)*align_to))) m[N];
	bar_t operator +(const bar_t &rhs) const { bar_t v(*this); for
(unsigned i=0; i<N; ++i) v.m[i] += rhs.m[i]; return v; }
};

template<typename T> __attribute__((noinline)) void foobar(T &dst,
const T *src) {
	T v = {{ 0 }};
	for (unsigned i=0; i<64; ++i) v = v + src[i];
	dst = v;
}

int main(int argc, char *argv[]) {
	foo_t *p((foo_t*) argv);
	bar_t *q((bar_t*) argv);
	foobar(*p, p + 1);
	foobar(*q, q + 1);
	return 0;
}
# g++ -O3 -march=native autovec.cc # g++ 4.3.1, x86_64

There's not much to say about foobar<foo_t> and the addition in
foobar<bar_t> gets somewhat vectorized but
  400620:       89 54 24 f4             mov    %edx,-0xc(%rsp)
  400624:       89 4c 24 f0             mov    %ecx,-0x10(%rsp)
  400628:       44 89 44 24 ec          mov    %r8d,-0x14(%rsp)
  40062d:       44 89 4c 24 e8          mov    %r9d,-0x18(%rsp)
  400632:       0f 28 c1                movaps %xmm1,%xmm0
  400635:       0f 12 04 06             movlps (%rsi,%rax,1),%xmm0
  400639:       0f 16 44 06 08          movhps 0x8(%rsi,%rax,1),%xmm0
  40063e:       48 83 c0 10             add    $0x10,%rax
  400642:       41 0f 58 02             addps  (%r10),%xmm0
  400646:       48 3d 00 04 00 00       cmp    $0x400,%rax
  40064c:       41 0f 29 02             movaps %xmm0,(%r10)
  400650:       8b 54 24 f4             mov    -0xc(%rsp),%edx
  400654:       8b 4c 24 f0             mov    -0x10(%rsp),%ecx
  400658:       44 8b 44 24 ec          mov    -0x14(%rsp),%r8d
  40065d:       44 8b 4c 24 e8          mov    -0x18(%rsp),%r9d
  400662:       75 bc                   jne    400620 <void
foobar<bar_t>(bar_t&, bar_t const*)+0x20>

as you can see there's a lot of undue load/store. And that's for a POD
(or something really looking like one).
So, you start fixing that with some looping copy ctor/operator (surely
losing the POD property in the process) and so on. Doing that i can
fix most reload issues, but stores are much more elusive (note that it
depends on the underlying type & its natural alignment).
Ideally i'd like PODs to remain PODs, and synthetized ctor/operators
to be efficient (ie not falling back to using gpr based memcpy when
everything is in an XMM register already), or at least a consistent
way how such ctor/operators can be written (and dead store removed).

Briefly: how am i supposed to decorate my structures with larger
aligment and not royally piss off the autovectorizer (and g++ in
general)?

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]