RFC: [ARM] Disable peeling

Richard Earnshaw rearnsha@arm.com
Tue Dec 11 10:14:00 GMT 2012


On 11/12/12 09:56, Richard Biener wrote:
> On Tue, Dec 11, 2012 at 10:48 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>> On 11/12/12 09:45, Richard Biener wrote:
>>>
>>> On Mon, Dec 10, 2012 at 10:07 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>>>
>>>> Jan Hubicka <hubicka@ucw.cz> writes:
>>>>
>>>>> Note that I think Core has similar characteristics - at least for string
>>>>> operations
>>>>> it fares well with unalignes accesses.
>>>>
>>>>
>>>> Nehalem and later has very fast unaligned vector loads. There's still
>>>> some
>>>> penalty when they cross cache lines however.
>>>>
>>>> iirc the rule of thumb is to do unaligned for 128 bit vectors,
>>>> but avoid it for 256bit vectors because the cache line cross
>>>> penalty is larger on Sandy Bridge and more likely with the larger
>>>> vectors.
>>>
>>>
>>> Yes, I think the rule was that using the unaligned instruction variants
>>> carries
>>> no penalty when the actual access is aligned but that aligned accesses are
>>> still faster than unaligned accesses.  Thus peeling for alignment _is_ a
>>> win.
>>> I also seem to remember that the story for unaligned stores vs. unaligned
>>> loads
>>> is usually different.
>>
>>
>> Yes, it's generally the case that unaligned loads are slightly more
>> expensive than unaligned stores, since the stores can often merge in a store
>> buffer with little or no penalty.
>
> It was the other way around on AMD CPUs AFAIK - unaligned stores forced
> flushes of the store buffers.  Which is why the vectorizer first and
> foremost tries
> to align stores.
>

In which case, which to align should be a question that the ME asks the BE.

R.




More information about the Gcc mailing list