This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Fri, Feb 11, 2011 at 1:46 AM, Richard Guenther <rguenther@suse.de> wrote: > On Thu, 10 Feb 2011, Fang, Changpeng wrote: > >> Hi, >> >> ?Attached is the patch to force gcc to generate 128-bit avx instructions for bdver1. We found that for >> the current Bulldozer processors, AVX128 performs better than AVX256. For example, AVX128 is 3% >> faster than AVX256 on CFP2006, and 2~3% faster than AVX256 on polyhedron. >> >> As a result, we prefer gcc 4.6 to generate 128-bit avx instructions only (for bdver1). >> >> The patch passed bootstrapping on x86_64-unknown-linux-gnu with "-O3 -g -march=bdver1" and >> the necessary correctness and performance. >> >> Is it OK to commit to trunk? > > I think there was no attempt to tune anything for AVX256, in particular > the vectorizer cost model may be completely off. ?HJ and Andi also > hinted at some alignment problems (at least SB seems to have a large > penalty when loads cross a cacheline boundary). ?So - did you do any > investigation on why 256bit vectors are slower for you? ?Are these > cases that the cost model could easily catch? > Here is a patch to split 32byte unalignd load/store. I don't have performance number on this. -- H.J. ---- gcc/ 2011-02-11 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386.c (flag_opts): Add -mavx256-split-unaligned-load and -mavx256-split-unaligned-store. (ix86_option_override_internal): Split 32-byte AVX unaligned load/store by default. (ix86_avx256_split_vector_move_misalign): New. (ix86_expand_vector_move_misalign): Use it. * config/i386/i386.opt: Add -mavx256-split-unaligned-load and -mavx256-split-unaligned-store. * config/i386/sse.md (*avx_mov<mode>_internal): Verify unaligned 256bit load/store. Generate unaligned store on misaligned memory operand. (*avx_movu<ssemodesuffix><avxmodesuffix>): Verify unaligned 256bit load/store. (*avx_movdqu<avxmodesuffix>): Likewise. * doc/invoke.texi: Document -mavx256-split-unaligned-load and -mavx256-split-unaligned-store. gcc/testsuite/ 2011-02-11 H.J. Lu <hongjiu.lu@intel.com> * gcc.target/i386/avx256-unaligned-load-1.c: New. * gcc.target/i386/avx256-unaligned-load-2.c: Likewise. * gcc.target/i386/avx256-unaligned-load-3.c: Likewise. * gcc.target/i386/avx256-unaligned-load-4.c: Likewise. * gcc.target/i386/avx256-unaligned-load-5.c: Likewise. * gcc.target/i386/avx256-unaligned-load-6.c: Likewise. * gcc.target/i386/avx256-unaligned-load-7.c: Likewise. * gcc.target/i386/avx256-unaligned-store-1.c: Likewise. * gcc.target/i386/avx256-unaligned-store-2.c: Likewise. * gcc.target/i386/avx256-unaligned-store-3.c: Likewise. * gcc.target/i386/avx256-unaligned-store-4.c: Likewise. * gcc.target/i386/avx256-unaligned-store-5.c: Likewise. * gcc.target/i386/avx256-unaligned-store-6.c: Likewise. * gcc.target/i386/avx256-unaligned-store-7.c: Likewise.
Attachment:
gcc-avx256-unaligned-1.patch
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |