This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic


On Wed, Jun 15, 2011 at 11:06 PM, Fang, Changpeng
<Changpeng.Fang@amd.com> wrote:
>>I have no problems on -mtune=Bulldozer. ?But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
> Something must have been broken for the unaligned load splitting in generic mode.
>
> While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for -mtune=bdver1, splitting
> unaligned loads in generic mode is KILLING us:
>
> For 459.GemsFDTD (ref) on Bulldozer,
> ?-Ofast -mavx -mno-avx256-split-unaligned-load: ? 480s
> -Ofast -mavx ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? : ? ?2527s
>
> So, splitting unaligned loads results in the program to run 5~6 times slower!
>
> For 434.zeusmp train run
> ?-Ofast -mavx -mno-avx256-split-unaligned-load: ? 32.5s
> -Ofast -mavx ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? : ? ?106s
>
> Other tests are on-going!

I suspect that the split loads get further split into mov[lh]ps pieces?
We do that for SSE moves with generic tuning at least IIRC.

Richard.

>
> Changpeng.
>
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]