This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Emit vperm2[if]128 $0x12/$0x20 as vinsert[if]128 $0/$1


Hi!

I think it is at least more readable and perhaps for some CPUs could
be faster (for SandyBridge it is the same speed) if we emit a more
specialized insn over a more generic one.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

In the attachment is my first attempt to do this, in the expander,
unfortunately that turned out to be pessimizing - seems like IRA or
reload has issues with the subregs and on
#include <immintrin.h>
#include <stdio.h>

__m256i a, b, c, d, e, f;

__attribute__((noinline, noclone)) void
f1 (void)
{
  a = _mm256_permute2f128_si256 (e, f, 0x12);
  b = _mm256_permute2f128_si256 (e, f, 0x20);
}
both vinsert* insns were using a memory operand instead of
loading it into a register first (as done in vanilla gcc as well
as with the patch right below).

2011-11-07  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/sse.md (*avx_vperm2f128<mode>3_nozero): Emit mask
	0x12 and 0x20 as vinsert[fi]128 instead of vperm2[fi]128.

--- gcc/config/i386/sse.md.jj	2011-11-07 12:40:55.000000000 +0100
+++ gcc/config/i386/sse.md	2011-11-07 17:50:37.000000000 +0100
@@ -12073,6 +12073,10 @@ (define_insn "*avx_vperm2f128<mode>_noze
    && avx_vperm2f128_parallel (operands[3], <MODE>mode)"
 {
   int mask = avx_vperm2f128_parallel (operands[3], <MODE>mode) - 1;
+  if (mask == 0x12)
+    return "vinsert<i128>\t{$0, %x2, %1, %0|%0, %1, %x2, 0}";
+  if (mask == 0x20)
+    return "vinsert<i128>\t{$1, %x2, %1, %0|%0, %1, %x2, 1}";
   operands[3] = GEN_INT (mask);
   return "vperm2<i128>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }

	Jakub

Attachment: X321
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]