This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: [ARM] Disable peeling


On 11/12/12 09:56, Richard Biener wrote:
On Tue, Dec 11, 2012 at 10:48 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
On 11/12/12 09:45, Richard Biener wrote:

On Mon, Dec 10, 2012 at 10:07 PM, Andi Kleen <andi@firstfloor.org> wrote:

Jan Hubicka <hubicka@ucw.cz> writes:


Note that I think Core has similar characteristics - at least for string
operations
it fares well with unalignes accesses.


Nehalem and later has very fast unaligned vector loads. There's still
some
penalty when they cross cache lines however.

iirc the rule of thumb is to do unaligned for 128 bit vectors,
but avoid it for 256bit vectors because the cache line cross
penalty is larger on Sandy Bridge and more likely with the larger
vectors.


Yes, I think the rule was that using the unaligned instruction variants
carries
no penalty when the actual access is aligned but that aligned accesses are
still faster than unaligned accesses.  Thus peeling for alignment _is_ a
win.
I also seem to remember that the story for unaligned stores vs. unaligned
loads
is usually different.


Yes, it's generally the case that unaligned loads are slightly more
expensive than unaligned stores, since the stores can often merge in a store
buffer with little or no penalty.

It was the other way around on AMD CPUs AFAIK - unaligned stores forced flushes of the store buffers. Which is why the vectorizer first and foremost tries to align stores.


In which case, which to align should be a question that the ME asks the BE.


R.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]