[PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

Fang, Changpeng Changpeng.Fang@amd.com
Tue Jun 14 23:22:00 GMT 2011


A similar argument is for software prefetching, which we observed a ~2% benefit on greyhound (not that much
for Bulldozer). We would also prefer turning on software prefetching at -O3 for -mtune=generic.

--Changprng






________________________________________
From: H.J. Lu [hjl.tools@gmail.com]
Sent: Tuesday, June 14, 2011 8:05 AM
To: Jakub Jelinek; sergos.gnu@gmail.com
Cc: Richard Guenther; Fang, Changpeng; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

On Tue, Jun 14, 2011 at 3:16 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 14, 2011 at 12:13:47PM +0200, Richard Guenther wrote:
>> On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>> > The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which introduces splitting avx256 unaligned loads.
>> > However, we found that it causes significant regressions for cpu2006 ( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
>> >
>> > In this work, we introduce a tune option that sets splitting unaligned loads default only for such CPUs that such splitting
>> > is beneficial.
>> >
>> > The patch passed bootstrapping and regression tests on x86_64-unknown-linux-gnu system.
>> >
>> > Is it OK to commit?
>>
>> It probably should go to the 4.6 branch as well.  Note that I find the
>> X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
>> why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?
>
> I also wonder what we should do for -mtune=generic.  Should we split or not?
> How big improvement is it on Intel chips, how big degradation does it
> cause on AMD chips (I assume no other chip maker currently supports AVX)?
>

Simply turning off 32byte aligned load split, which introduces
performance regressions on
Intel Sandy Bridge processors, isn't an appropriate solution.

I am proposing a different approach so that we can improve
-mtune=generic performance
on current Intel and AMD processors.

The current default GCC tuning, -mtune=generic, was implemented in
2005 for Intel
Pentium 4, Core 2 and AMD K8 processors.  Many optimization choices
are no longer
applicable to the current Intel nor AMD processors.

We should choose a set of optimization choices for -mtune=generic,
including 32byte
unaligned load split, for the current Intel and AMD processors,  which
should improve
performance with no performance regressions.


--
H.J.




More information about the Gcc-patches mailing list