[PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic
H.J. Lu
hjl.tools@gmail.com
Wed Jun 15 00:21:00 GMT 2011
On Tue, Jun 14, 2011 at 4:01 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> A similar argument is for software prefetching, which we observed a ~2% benefit on greyhound (not that much
> for Bulldozer). We would also prefer turning on software prefetching at -O3 for -mtune=generic.
Sure, we can put everything on the table and take a look.
> Simply turning off 32byte aligned load split, which introduces
> performance regressions on
> Intel Sandy Bridge processors, isn't an appropriate solution.
>
> I am proposing a different approach so that we can improve
> -mtune=generic performance
> on current Intel and AMD processors.
>
> The current default GCC tuning, -mtune=generic, was implemented in
> 2005 for Intel
> Pentium 4, Core 2 and AMD K8 processors. Many optimization choices
> are no longer
> applicable to the current Intel nor AMD processors.
>
> We should choose a set of optimization choices for -mtune=generic,
> including 32byte
> unaligned load split, for the current Intel and AMD processors, which
> should improve
> performance with no performance regressions.
>
>
--
H.J.
More information about the Gcc-patches
mailing list