[PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

Richard Guenther richard.guenther@gmail.com
Thu Jun 16 07:51:00 GMT 2011


On Wed, Jun 15, 2011 at 11:06 PM, Fang, Changpeng
<Changpeng.Fang@amd.com> wrote:
>>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
> Something must have been broken for the unaligned load splitting in generic mode.
>
> While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for -mtune=bdver1, splitting
> unaligned loads in generic mode is KILLING us:
>
> For 459.GemsFDTD (ref) on Bulldozer,
>  -Ofast -mavx -mno-avx256-split-unaligned-load:   480s
> -Ofast -mavx                                                       :    2527s
>
> So, splitting unaligned loads results in the program to run 5~6 times slower!
>
> For 434.zeusmp train run
>  -Ofast -mavx -mno-avx256-split-unaligned-load:   32.5s
> -Ofast -mavx                                                       :    106s
>
> Other tests are on-going!

I suspect that the split loads get further split into mov[lh]ps pieces?
We do that for SSE moves with generic tuning at least IIRC.

Richard.

>
> Changpeng.
>
>
>



More information about the Gcc-patches mailing list