[Bug target/68928] AVX loops on unaligned arrays could generate more efficient startup/cleanup code when peeling
peter at cordes dot ca
gcc-bugzilla@gcc.gnu.org
Wed Dec 16 21:46:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68928
--- Comment #2 from Peter Cordes <peter at cordes dot ca> ---
Richard wrote:
> [...] avoid peeling for alignment on x86_64 and just use unaligned ops
Yeah, that's what clang does, and may be optimal. Certainly it's easy, and
gives optimal performance when buffers *are* in fact aligned, even when the
programmer has neglected to inform the compiler of any guarantee.
However, with vector sizes getting closer to the cache-line size, unaligned
accesses will cross cache lines more of the time. (e.g. an AVX loop over an
unaligned buffer will have a cacheline split on every other iteration). Iff we
can *cheaply* avoid this, it may be worth it.
IIRC, all modern x86 / x86-64 CPUs have no penalty for unaligned loads, as long
as they don't actually cross a cache-line boundary. (True for Intel since
Nehalem). Store-forwarding doesn't work well if the stores don't line up with
the loads, though.
More information about the Gcc-bugs
mailing list