This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Fix various avx512 extraction issues (PR target/80206)
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Thu, 30 Mar 2017 00:36:20 +0200
- Subject: [PATCH] Fix various avx512 extraction issues (PR target/80206)
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jakub at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 69EF380F8E
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 69EF380F8E
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
Hi!
As the testcase shows, we ICE with -mavx512f -ffloat-store, because
at -O0 during expansion the destination is MEM, and the corresponding dup
operand is some pseudo. There are *_mask patterns that have just
register_operand / =v for the desination and vector_move_operand / 0C
for the corresponding dup operand (but this doesn't apply when the
destination is MEM), and then *_maskm patterns, that have
memory_operand / =m and corresponding dup operand memory_operand / 0,
but also requires rtx_equal_p between them in the condition, so that
doesn't match either.
The expanders have weirdo:
if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
operands[0] = force_reg (<ssequartermode>mode, operands[0]);
which can't really ever work, because the expander's caller expects
the output to be stored in the original operands[0], but that is not
where it stores it. Furthermore, force_reg makes no sense for the
output operand.
The following patch should fix that, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?
There are still some remaining issues that can perhaps be resolved
incrementally, e.g. some insns use:
(define_insn "vec_extract_hi_<mode><mask_name>"
[(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>,vm")
If <mask_applied>, <store_mask_predicate> is register_operand, so
having vm constraint for it is strange. Not really sure how well
it can work with vector_move_operand and 0C constraint, what will
LRA do with it if the input isn't in memory but dest is, or if both
are memory, but not the same one.
2017-03-28 Jakub Jelinek <jakub@redhat.com>
PR target/80206
* config/i386/sse.md
(<extract_type>_vextract<shuffletype><extract_suf>_mask): Force
dest into register whenever it is a MEM not rtx_equal_p to the
corresponding dup operand, and when forcing into reg move the
reg into the memory afterwards.
(<extract_type_2>_vextract<shuffletype><extract_suf_2>_mask):
Likewise. Use <ssehalfvecmode> instead of <ssequartermode>
for the force_reg mode.
(avx512vl_vextractf128<mode>): Force dest into register either
always when a MEM, or when it is a MEM not rtx_equal_p to the
corresponding dup operand, or even not when it is a CONST_VECTOR
depending on the mode and lo vs. hi.
(avx512dq_vextract<shuffletype>64x2_1_maskm): Remove extraneous
parens.
(avx512f_vextract<shuffletype>32x4_1_maskm): Likewise.
(<mask_codefor>avx512dq_vextract<shuffletype>64x2_1<mask_name>):
Likewise. Require that operands[2] is even.
(<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>):
Remove extraneous parens. Require that operands[2] is a multiple
of 4.
(vec_extract_lo_<mode><mask_name>): Don't bother testing if
operands[0] is a MEM if <mask_applied>, the predicates/constraints
disallow memory then.
* gcc.target/i386/pr80206.c: New test.
--- gcc/config/i386/sse.md.jj 2017-03-07 09:10:56.946428168 +0100
+++ gcc/config/i386/sse.md 2017-03-29 19:22:37.394215557 +0200
@@ -7135,19 +7135,22 @@ (define_expand "<extract_type>_vextract<
{
int mask;
mask = INTVAL (operands[2]);
+ rtx dest = operands[0];
- if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
- operands[0] = force_reg (<ssequartermode>mode, operands[0]);
+ if (MEM_P (operands[0]) && !rtx_equal_p (operands[0], operands[3]))
+ dest = force_reg (<ssequartermode>mode, dest);
if (<MODE>mode == V16SImode || <MODE>mode == V16SFmode)
- emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+ emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (dest,
operands[1], GEN_INT (mask * 4), GEN_INT (mask * 4 + 1),
GEN_INT (mask * 4 + 2), GEN_INT (mask * 4 + 3), operands[3],
operands[4]));
else
- emit_insn (gen_avx512dq_vextract<shuffletype>64x2_1_mask (operands[0],
+ emit_insn (gen_avx512dq_vextract<shuffletype>64x2_1_mask (dest,
operands[1], GEN_INT (mask * 2), GEN_INT (mask * 2 + 1), operands[3],
operands[4]));
+ if (dest != operands[0])
+ emit_move_insn (operands[0], dest);
DONE;
})
@@ -7161,8 +7164,8 @@ (define_insn "avx512dq_vextract<shufflet
(match_operand:<ssequartermode> 4 "memory_operand" "0")
(match_operand:QI 5 "register_operand" "Yk")))]
"TARGET_AVX512DQ
- && (INTVAL (operands[2]) % 2 == 0)
- && (INTVAL (operands[2]) == INTVAL (operands[3]) - 1)
+ && INTVAL (operands[2]) % 2 == 0
+ && INTVAL (operands[2]) == INTVAL (operands[3]) - 1
&& rtx_equal_p (operands[4], operands[0])"
{
operands[2] = GEN_INT ((INTVAL (operands[2])) >> 1);
@@ -7187,13 +7190,13 @@ (define_insn "avx512f_vextract<shufflety
(match_operand:<ssequartermode> 6 "memory_operand" "0")
(match_operand:QI 7 "register_operand" "Yk")))]
"TARGET_AVX512F
- && ((INTVAL (operands[2]) % 4 == 0)
- && INTVAL (operands[2]) == (INTVAL (operands[3]) - 1)
- && INTVAL (operands[3]) == (INTVAL (operands[4]) - 1)
- && INTVAL (operands[4]) == (INTVAL (operands[5]) - 1))
+ && INTVAL (operands[2]) % 4 == 0
+ && INTVAL (operands[2]) == INTVAL (operands[3]) - 1
+ && INTVAL (operands[3]) == INTVAL (operands[4]) - 1
+ && INTVAL (operands[4]) == INTVAL (operands[5]) - 1
&& rtx_equal_p (operands[6], operands[0])"
{
- operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2);
+ operands[2] = GEN_INT (INTVAL (operands[2]) >> 2);
return "vextract<shuffletype>32x4\t{%2, %1, %0%{%7%}|%0%{%7%}, %1, %2}";
}
[(set_attr "type" "sselog")
@@ -7209,9 +7212,11 @@ (define_insn "<mask_codefor>avx512dq_vex
(match_operand:V8FI 1 "register_operand" "v")
(parallel [(match_operand 2 "const_0_to_7_operand")
(match_operand 3 "const_0_to_7_operand")])))]
- "TARGET_AVX512DQ && (INTVAL (operands[2]) == INTVAL (operands[3]) - 1)"
+ "TARGET_AVX512DQ
+ && INTVAL (operands[2]) % 2 == 0
+ && INTVAL (operands[2]) == INTVAL (operands[3]) - 1"
{
- operands[2] = GEN_INT ((INTVAL (operands[2])) >> 1);
+ operands[2] = GEN_INT (INTVAL (operands[2]) >> 1);
return "vextract<shuffletype>64x2\t{%2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2}";
}
[(set_attr "type" "sselog1")
@@ -7229,11 +7234,12 @@ (define_insn "<mask_codefor>avx512f_vext
(match_operand 4 "const_0_to_15_operand")
(match_operand 5 "const_0_to_15_operand")])))]
"TARGET_AVX512F
- && (INTVAL (operands[2]) == (INTVAL (operands[3]) - 1)
- && INTVAL (operands[3]) == (INTVAL (operands[4]) - 1)
- && INTVAL (operands[4]) == (INTVAL (operands[5]) - 1))"
+ && INTVAL (operands[2]) % 4 == 0
+ && INTVAL (operands[2]) == INTVAL (operands[3]) - 1
+ && INTVAL (operands[3]) == INTVAL (operands[4]) - 1
+ && INTVAL (operands[4]) == INTVAL (operands[5]) - 1"
{
- operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2);
+ operands[2] = GEN_INT (INTVAL (operands[2]) >> 2);
return "vextract<shuffletype>32x4\t{%2, %1, %0<mask_operand6>|%0<mask_operand6>, %1, %2}";
}
[(set_attr "type" "sselog1")
@@ -7260,9 +7266,10 @@ (define_expand "<extract_type_2>_vextrac
"TARGET_AVX512F"
{
rtx (*insn)(rtx, rtx, rtx, rtx);
+ rtx dest = operands[0];
- if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
- operands[0] = force_reg (<ssequartermode>mode, operands[0]);
+ if (MEM_P (dest) && !rtx_equal_p (dest, operands[3]))
+ dest = force_reg (<ssehalfvecmode>mode, dest);
switch (INTVAL (operands[2]))
{
@@ -7276,7 +7283,9 @@ (define_expand "<extract_type_2>_vextrac
gcc_unreachable ();
}
- emit_insn (insn (operands[0], operands[1], operands[3], operands[4]));
+ emit_insn (insn (dest, operands[1], operands[3], operands[4]));
+ if (dest != operands[0])
+ emit_move_insn (operands[0], dest);
DONE;
})
@@ -7317,7 +7326,8 @@ (define_insn "vec_extract_lo_<mode><mask
(match_operand:V8FI 1 "nonimmediate_operand" "v,m")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)])))]
- "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+ "TARGET_AVX512F
+ && (<mask_applied> || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
{
if (<mask_applied> || !TARGET_AVX512VL)
return "vextract<shuffletype>64x4\t{$0x0, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x0}";
@@ -7411,10 +7421,19 @@ (define_expand "avx512vl_vextractf128<mo
"TARGET_AVX512DQ && TARGET_AVX512VL"
{
rtx (*insn)(rtx, rtx, rtx, rtx);
+ rtx dest = operands[0];
- if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
- operands[0] = force_reg (<ssehalfvecmode>mode, operands[0]);
-
+ if (MEM_P (dest)
+ && (GET_MODE_SIZE (GET_MODE_INNER (<MODE>mode)) == 4
+ /* For V8S[IF]mode there are maskm insns with =m and 0
+ constraints. */
+ ? !rtx_equal_p (dest, operands[3])
+ /* For V4D[IF]mode, hi insns don't allow memory, and
+ lo insns have =m and 0C constraints. */
+ : (operands[2] != const0_rtx
+ || (!rtx_equal_p (dest, operands[3])
+ && GET_CODE (operands[3]) != CONST_VECTOR))))
+ dest = force_reg (<ssehalfvecmode>mode, dest);
switch (INTVAL (operands[2]))
{
case 0:
@@ -7427,7 +7446,9 @@ (define_expand "avx512vl_vextractf128<mo
gcc_unreachable ();
}
- emit_insn (insn (operands[0], operands[1], operands[3], operands[4]));
+ emit_insn (insn (dest, operands[1], operands[3], operands[4]));
+ if (dest != operands[0])
+ emit_move_insn (operands[0], dest);
DONE;
})
--- gcc/testsuite/gcc.target/i386/pr80206.c.jj 2017-03-29 19:25:00.167347884 +0200
+++ gcc/testsuite/gcc.target/i386/pr80206.c 2017-03-29 19:24:35.000000000 +0200
@@ -0,0 +1,14 @@
+/* PR target/80206 */
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -ffloat-store" } */
+
+#include <immintrin.h>
+
+__m512d a;
+__m256d b;
+
+void
+foo (__m256d *p)
+{
+ *p = _mm512_mask_extractf64x4_pd (b, 1, a, 1);
+}
Jakub