[PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

H.J. Lu hjl.tools@gmail.com
Wed Jun 15 00:21:00 GMT 2011

On Tue, Jun 14, 2011 at 4:01 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> A similar argument is for software prefetching, which we observed a ~2% benefit on greyhound (not that much
> for Bulldozer). We would also prefer turning on software prefetching at -O3 for -mtune=generic.

Sure, we can put everything on the table and take a look.

> Simply turning off 32byte aligned load split, which introduces
> performance regressions on
> Intel Sandy Bridge processors, isn't an appropriate solution.
> I am proposing a different approach so that we can improve
> -mtune=generic performance
> on current Intel and AMD processors.
> The current default GCC tuning, -mtune=generic, was implemented in
> 2005 for Intel
> Pentium 4, Core 2 and AMD K8 processors.  Many optimization choices
> are no longer
> applicable to the current Intel nor AMD processors.
> We should choose a set of optimization choices for -mtune=generic,
> including 32byte
> unaligned load split, for the current Intel and AMD processors,  which
> should improve
> performance with no performance regressions.


More information about the Gcc-patches mailing list