This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: [ARM] Disable peeling


On 11/12/12 09:45, Richard Biener wrote:
On Mon, Dec 10, 2012 at 10:07 PM, Andi Kleen <andi@firstfloor.org> wrote:
Jan Hubicka <hubicka@ucw.cz> writes:

Note that I think Core has similar characteristics - at least for string operations
it fares well with unalignes accesses.

Nehalem and later has very fast unaligned vector loads. There's still some penalty when they cross cache lines however.

iirc the rule of thumb is to do unaligned for 128 bit vectors,
but avoid it for 256bit vectors because the cache line cross
penalty is larger on Sandy Bridge and more likely with the larger
vectors.

Yes, I think the rule was that using the unaligned instruction variants carries no penalty when the actual access is aligned but that aligned accesses are still faster than unaligned accesses. Thus peeling for alignment _is_ a win. I also seem to remember that the story for unaligned stores vs. unaligned loads is usually different.

Yes, it's generally the case that unaligned loads are slightly more expensive than unaligned stores, since the stores can often merge in a store buffer with little or no penalty.


R.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]