This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads


On Wed, Oct 30, 2013 at 11:00:13AM +0100, Jakub Jelinek wrote:
> But the above is 16 byte unaligned load.  Furthermore, GCC supports
> -mavx256-split-unaligned-load and can emit 32 byte loads either as an
> unaligned 32 byte load, or merge of 16 byte unaligned loads.  The patch
> affects only the cases where we were already emitting 16 byte or 32 byte
> unaligned loads rather than split loads.

With my patch, the differences (in all cases only on f1) for
-O2 -mavx -ftree-vectorize with the patch is (16 byte unaligned load, not split):
-	vmovdqu	(%rsi,%rax), %xmm0
-	vpmulld	%xmm1, %xmm0, %xmm0
+	vpmulld	(%rsi,%rax), %xmm1, %xmm0
 	vmovups	%xmm0, (%rdi,%rax)
with -O2 -mavx2 -ftree-vectorize (again, load wasn't split):
-	vmovdqu	(%rsi,%rax), %ymm0
-	vpmulld	%ymm1, %ymm0, %ymm0
+	vpmulld	(%rsi,%rax), %ymm1, %ymm0
 	vmovups	%ymm0, (%rdi,%rax)
and with -O2 -mavx2 -mavx256-split-unaligned-load:
 	vmovdqu	(%rsi,%rax), %xmm0
 	vinserti128	$0x1, 16(%rsi,%rax), %ymm0, %ymm0
-	vpmulld	%ymm1, %ymm0, %ymm0
+	vpmulld	%ymm0, %ymm1, %ymm0
 	vmovups	%ymm0, (%rdi,%rax)
(the last change is just giving RTL optimizers more freedom by not
doing the SUBREG on the lhs).

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]