This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386 tuning] Generate 128-bit AVX by default for bdver1


On Fri, Feb 11, 2011 at 1:46 AM, Richard Guenther <rguenther@suse.de> wrote:
> On Thu, 10 Feb 2011, Fang, Changpeng wrote:
>
>> Hi,
>>
>> ?Attached is the patch to force gcc to generate 128-bit avx instructions for bdver1. We found that for
>> the current Bulldozer processors, AVX128 performs better than AVX256. For example, AVX128 is 3%
>> faster than AVX256 on CFP2006, and 2~3% faster than AVX256 on polyhedron.
>>
>> As a result, we prefer gcc 4.6 to generate 128-bit avx instructions only (for bdver1).
>>
>> The patch passed bootstrapping on x86_64-unknown-linux-gnu with "-O3 -g -march=bdver1" and
>> the necessary correctness and performance.
>>
>> Is it OK to commit to trunk?
>
> I think there was no attempt to tune anything for AVX256, in particular
> the vectorizer cost model may be completely off. ?HJ and Andi also
> hinted at some alignment problems (at least SB seems to have a large
> penalty when loads cross a cacheline boundary). ?So - did you do any
> investigation on why 256bit vectors are slower for you? ?Are these
> cases that the cost model could easily catch?
>

Here is a patch to split 32byte unalignd load/store.  I don't have performance
number on this.

-- 
H.J.
----
gcc/

2011-02-11  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (flag_opts): Add -mavx256-split-unaligned-load
	and -mavx256-split-unaligned-store.
	(ix86_option_override_internal): Split 32-byte AVX unaligned
	load/store by default.
	(ix86_avx256_split_vector_move_misalign): New.
	(ix86_expand_vector_move_misalign): Use it.

	* config/i386/i386.opt: Add -mavx256-split-unaligned-load and
	-mavx256-split-unaligned-store.

	* config/i386/sse.md (*avx_mov<mode>_internal): Verify unaligned
	256bit load/store.  Generate unaligned store on misaligned memory
	operand.
	(*avx_movu<ssemodesuffix><avxmodesuffix>): Verify unaligned
	256bit load/store.
	(*avx_movdqu<avxmodesuffix>): Likewise.

	* doc/invoke.texi: Document -mavx256-split-unaligned-load and
	-mavx256-split-unaligned-store.

gcc/testsuite/

2011-02-11  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/avx256-unaligned-load-1.c: New.
	* gcc.target/i386/avx256-unaligned-load-2.c: Likewise.
	* gcc.target/i386/avx256-unaligned-load-3.c: Likewise.
	* gcc.target/i386/avx256-unaligned-load-4.c: Likewise.
	* gcc.target/i386/avx256-unaligned-load-5.c: Likewise.
	* gcc.target/i386/avx256-unaligned-load-6.c: Likewise.
	* gcc.target/i386/avx256-unaligned-load-7.c: Likewise.
	* gcc.target/i386/avx256-unaligned-store-1.c: Likewise.
	* gcc.target/i386/avx256-unaligned-store-2.c: Likewise.
	* gcc.target/i386/avx256-unaligned-store-3.c: Likewise.
	* gcc.target/i386/avx256-unaligned-store-4.c: Likewise.
	* gcc.target/i386/avx256-unaligned-store-5.c: Likewise.
	* gcc.target/i386/avx256-unaligned-store-6.c: Likewise.
	* gcc.target/i386/avx256-unaligned-store-7.c: Likewise.

Attachment: gcc-avx256-unaligned-1.patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]