[PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

Fang, Changpeng Changpeng.Fang@amd.com
Wed Jun 15 22:07:00 GMT 2011

>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>change and did suggest a different approach for -mtune=generic.

Something must have been broken for the unaligned load splitting in generic mode.

While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for -mtune=bdver1, splitting
unaligned loads in generic mode is KILLING us:

For 459.GemsFDTD (ref) on Bulldozer,
 -Ofast -mavx -mno-avx256-split-unaligned-load:   480s
-Ofast -mavx                                                       :    2527s

So, splitting unaligned loads results in the program to run 5~6 times slower!

For 434.zeusmp train run
 -Ofast -mavx -mno-avx256-split-unaligned-load:   32.5s
-Ofast -mavx                                                       :    106s

Other tests are on-going!


More information about the Gcc-patches mailing list