[PATCH, i386 tuning] Generate 128-bit AVX by default for bdver1

Richard Guenther rguenther@suse.de
Fri Feb 11 09:47:00 GMT 2011


On Thu, 10 Feb 2011, Fang, Changpeng wrote:

> Hi, 
> 
>  Attached is the patch to force gcc to generate 128-bit avx instructions for bdver1. We found that for
> the current Bulldozer processors, AVX128 performs better than AVX256. For example, AVX128 is 3%
> faster than AVX256 on CFP2006, and 2~3% faster than AVX256 on polyhedron.
> 
> As a result, we prefer gcc 4.6 to generate 128-bit avx instructions only (for bdver1).
> 
> The patch passed bootstrapping on x86_64-unknown-linux-gnu with "-O3 -g -march=bdver1" and
> the necessary correctness and performance.
> 
> Is it OK to commit to trunk?

I think there was no attempt to tune anything for AVX256, in particular
the vectorizer cost model may be completely off.  HJ and Andi also
hinted at some alignment problems (at least SB seems to have a large
penalty when loads cross a cacheline boundary).  So - did you do any
investigation on why 256bit vectors are slower for you?  Are these
cases that the cost model could easily catch?

Thanks,
Richard.



More information about the Gcc-patches mailing list