Bug 117072 - [15 Regression] FAIL: gcc.target/i386/cond_op_fma_{float,double,_Float16}-1.c since r15-3509-gd34cda72098867
Summary: [15 Regression] FAIL: gcc.target/i386/cond_op_fma_{float,double,_Float16}-1.c...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 15.0
: P1 normal
Target Milestone: 15.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, testsuite-fail
: 117073 (view as bug list)
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-10-10 23:46 UTC by H.J. Lu
Modified: 2024-10-17 02:02 UTC (History)
4 users (show)

See Also:
Host:
Target: x86-64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-10-11 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2024-10-10 23:46:12 UTC
On GCC 15 branch, I got

FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfmadd132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfnmadd132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfmsub132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfnmsub132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
Comment 1 Andrew Pinski 2024-10-11 00:04:28 UTC
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662523.html

d34cda720988674bcf8a24267c9e1ec61335d6de is the first bad commit
commit d34cda720988674bcf8a24267c9e1ec61335d6de
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Sep 29 12:54:17 2023 +0200

    Handle non-grouped stores as single-lane SLP
Comment 2 Andrew Pinski 2024-10-11 00:04:55 UTC
*** Bug 117073 has been marked as a duplicate of this bug. ***
Comment 3 Andrew Pinski 2024-10-11 00:10:49 UTC
See https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662257.html which mentions this failure explicitly.
Comment 4 Andrew Pinski 2024-10-11 00:10:55 UTC
.
Comment 5 Richard Biener 2024-10-11 11:03:58 UTC
Compared to gcc14 I have for example for cond_op_fma__Float16-1.c

foo1_fnms:
.LFB7:
        .cfi_startproc
        xorl    %eax, %eax
        .p2align 4,,10
        .p2align 3
.L24:
        vmovdqa b(%rax), %ymm1
        vmovdqa d(%rax), %ymm0
        addq    $32, %rax
        vcmpph  $1, c-32(%rax), %ymm1, %k1
        vmovdqa e-32(%rax), %ymm1
        vfnmsub213ph    a-32(%rax), %ymm0, %ymm1
        vmovdqu16       %ymm1, %ymm0{%k1}
        vmovdqa %ymm0, a-32(%rax)
        cmpq    $1600, %rax
        jne     .L24
        vzeroupper
        ret

instead of the expected

foo1_fnms:
.LFB7:
        .cfi_startproc
        xorl    %eax, %eax
        .p2align 4,,10
        .p2align 3
.L24:
        vmovdqa b(%rax), %ymm1
        vmovdqa a(%rax), %ymm2
        addq    $32, %rax
        vmovdqa d-32(%rax), %ymm0
        vcmpph  $1, c-32(%rax), %ymm1, %k1
        vfnmsub132ph    e-32(%rax), %ymm2, %ymm0{%k1}
        vmovdqa %ymm0, a-32(%rax)
        cmpq    $1600, %rax
        jne     .L24
        vzeroupper
        ret

.combine shows in gcc14:

Trying 15 -> 16:
   15: r113:V16HF={-r102:V16HF*[r98:DI+`e']+-[r98:DI+`a']}
   16: r99:V16HF=vec_merge(r113:V16HF,r102:V16HF,r110:HI)
      REG_DEAD r113:V16HF
      REG_DEAD r110:HI
      REG_DEAD r102:V16HF
Successfully matched this instruction:
(set (reg:V16HF 99 [ _37 ])
    (vec_merge:V16HF (fma:V16HF (neg:V16HF (reg:V16HF 102 [ vect_pretmp_14.315 ]))
            (mem:V16HF (plus:DI (reg:DI 98 [ ivtmp.333 ])
                    (symbol_ref:DI ("e") [flags 0x2]  <var_decl 0x7ffff6810ea0 e>)) [1 MEM <vector(16) _Float16> [(_Float16 *)&e + ivtmp.333_9 * 1]+0 S32 A256])
            (neg:V16HF (mem:V16HF (plus:DI (reg:DI 98 [ ivtmp.333 ])
                        (symbol_ref:DI ("a") [flags 0x2]  <var_decl 0x7ffff6810c60 a>)) [1 MEM <vector(16) _Float16> [(_Float16 *)&a + ivtmp.333_9 * 1]+0 S32 A256])))
        (reg:V16HF 102 [ vect_pretmp_14.315 ])
        (reg:HI 110 [ mask__11.325_55 ])))

but

Trying 15 -> 16:
   15: r113:V16HF={-[r98:DI+`e']*r104:V16HF+-[r98:DI+`a']}
   16: r99:V16HF=vec_merge(r113:V16HF,r104:V16HF,r110:HI)
      REG_DEAD r113:V16HF
      REG_DEAD r110:HI
      REG_DEAD r104:V16HF
Failed to match this instruction:
(set (reg:V16HF 99 [ _37 ])
    (vec_merge:V16HF (fma:V16HF (neg:V16HF (mem:V16HF (plus:DI (reg:DI 98 [ ivtmp.329 ])
                        (symbol_ref:DI ("e") [flags 0x2]  <var_decl 0x7ffff6810ea0 e>)) [1 MEM <vector(16) _Float16> [(_Float16 *)&e + ivtmp.329_9 * 1]+0 S32 A256]))
            (reg:V16HF 104 [ vect_pretmp_14.315 ])
            (neg:V16HF (mem:V16HF (plus:DI (reg:DI 98 [ ivtmp.329 ])
                        (symbol_ref:DI ("a") [flags 0x2]  <var_decl 0x7ffff6810c60 a>)) [1 MEM <vector(16) _Float16> [(_Float16 *)&a + ivtmp.329_9 * 1]+0 S32 A256])))
        (reg:V16HF 104 [ vect_pretmp_14.315 ])
        (reg:HI 110 [ mask__11.309_43 ])))

see how the commutative multiply part of insn 15 differs and causes the
matching to fail:

good:     15: r113:V16HF={-r102:V16HF*[r98:DI+`e']+-[r98:DI+`a']}
bad:      15: r113:V16HF={-[r98:DI+`e']*r104:V16HF+-[r98:DI+`a']}

this ordering is already present on GIMPLE:

  vect_pretmp_14.315_45 = MEM <vector(16) _Float16> [(_Float16 *)&d + ivtmp.333_9 * 1];
  vect__5.322_52 = MEM <vector(16) _Float16> [(_Float16 *)&e + ivtmp.333_9 * 1];
  _37 = .COND_FNMS (mask__11.325_55, vect_pretmp_14.315_45, vect__5.322_52, vect__3.318_48, vect_pretmp_14.315_45);

vs.

  vect_pretmp_14.315_49 = MEM <vector(16) _Float16> [(_Float16 *)&d + ivtmp.329_9 * 1];
  vect__5.312_46 = MEM <vector(16) _Float16> [(_Float16 *)&e + ivtmp.329_9 * 1];
  _37 = .COND_FNMS (mask__11.309_43, vect__5.312_46, vect_pretmp_14.315_49, vect__3.319_53, vect_pretmp_14.315_49);

both are canonicalized correctly (after SSA name version).

This is a spurious difference, if we rely on these combines for the now
missed micro-optimization we need to beef up the patterns to allow both
orders.  (avx512vl_fnmsub_v16hf_mask)

A target issue IMO?

Alternatively make sure RTL canonicalizes (fma (neg non-reg) (reg) ...)
to (fma (neg reg) (non-reg) ...) or stop matching that as pattern and
thus force RTL expansion + combine to arrive at the correct variant?
Comment 6 Richard Biener 2024-10-11 11:26:04 UTC
Btw, simplify-rtx does

      /* Canonicalize the two multiplication operands.  */
      /* a * -b + c  =>  -b * a + c.  */
      if (swap_commutative_operands_p (op0, op1))
        std::swap (op0, op1), any_change = true;

but it doesn't try to swap_commutative_operands_p on the negate argument
and the non-negated operand, aka -a * b -> -b * a.

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index e8e60404ef6..0c86c204529 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -6835,6 +6835,16 @@ simplify_context::simplify_ternary_operation (rtx_code code, machine_mode mode,
       if (swap_commutative_operands_p (op0, op1))
        std::swap (op0, op1), any_change = true;
 
+      /* Canonicalize -a * b + c to -b * a + c if a is not a register
+        but b is.  */
+      if (GET_CODE (op0) == NEG && REG_P (op1) && !REG_P (XEXP (op0, 0)))
+       {
+         op0 = XEXP (op0, 0);
+         op1 = simplify_gen_unary (NEG, mode, op1, mode);
+         std::swap (op0, op1);
+         any_change = true;
+       }
+
       if (any_change)
        return gen_rtx_FMA (mode, op0, op1, op2);
       return NULL_RTX;

fixes part of the observed regressions,

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index e8e60404ef6..13cb2cc0f5c 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -6832,9 +6832,20 @@ simplify_context::simplify_ternary_operation (rtx_code code, machine_mode mode,
 
       /* Canonicalize the two multiplication operands.  */
       /* a * -b + c  =>  -b * a + c.  */
-      if (swap_commutative_operands_p (op0, op1))
+      if (swap_commutative_operands_p (op0, op1)
+         || (REG_P (op1) && GET_CODE (op0) != NEG && !REG_P (op0)))
        std::swap (op0, op1), any_change = true;
 

fixes the rest.  I'm going to propose this.
Comment 7 Richard Biener 2024-10-11 11:39:36 UTC
OTOH I'll note that no other simplify_* treats canonicalization as simplification and the existing swap_commutative_operands_p transform for FMA
is highly uncommon.

So why do we recognize (fma (neg (mem...)) ...) and not only (neg (register_operand))?
Comment 8 Hongtao Liu 2024-10-11 12:32:56 UTC
(In reply to Richard Biener from comment #7)
> OTOH I'll note that no other simplify_* treats canonicalization as
> simplification and the existing swap_commutative_operands_p transform for FMA
> is highly uncommon.
> 
> So why do we recognize (fma (neg (mem...)) ...) and not only (neg
> (register_operand))?

I think we can relex register_operand to nonimmediate_operand and rely on RA to reload it into a reg just like we did in <sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>. So a backend fix shou be better?
Comment 9 rguenther@suse.de 2024-10-11 12:37:39 UTC
On Fri, 11 Oct 2024, liuhongt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117072
> 
> --- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #7)
> > OTOH I'll note that no other simplify_* treats canonicalization as
> > simplification and the existing swap_commutative_operands_p transform for FMA
> > is highly uncommon.
> > 
> > So why do we recognize (fma (neg (mem...)) ...) and not only (neg
> > (register_operand))?
> 
> I think we can relex register_operand to nonimmediate_operand and rely on RA to
> reload it into a reg just like we did in
> <sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>. So a backend fix
> shou be better?

I think currently the backend isn't consistent with itself and sure,
a backend fix would be better (if it doesn't mean bloating the .md
with many more patterns).
Comment 10 Hongtao Liu 2024-10-11 12:38:56 UTC
(In reply to rguenther@suse.de from comment #9)
> On Fri, 11 Oct 2024, liuhongt at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117072
> > 
> > --- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #7)
> > > OTOH I'll note that no other simplify_* treats canonicalization as
> > > simplification and the existing swap_commutative_operands_p transform for FMA
> > > is highly uncommon.
> > > 
> > > So why do we recognize (fma (neg (mem...)) ...) and not only (neg
> > > (register_operand))?
> > 
> > I think we can relex register_operand to nonimmediate_operand and rely on RA to
> > reload it into a reg just like we did in
> > <sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>. So a backend fix
> > shou be better?
> 
> I think currently the backend isn't consistent with itself and sure,
> a backend fix would be better (if it doesn't mean bloating the .md
> with many more patterns).

No, just adjust the existed pattern should be ok.
Comment 11 Hongtao Liu 2024-10-13 10:08:50 UTC
(In reply to Hongtao Liu from comment #10)
> (In reply to rguenther@suse.de from comment #9)
> > On Fri, 11 Oct 2024, liuhongt at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117072
> > > 
> > > --- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > > (In reply to Richard Biener from comment #7)
> > > > OTOH I'll note that no other simplify_* treats canonicalization as
> > > > simplification and the existing swap_commutative_operands_p transform for FMA
> > > > is highly uncommon.
> > > > 
> > > > So why do we recognize (fma (neg (mem...)) ...) and not only (neg
> > > > (register_operand))?
> > > 
> > > I think we can relex register_operand to nonimmediate_operand and rely on RA to
> > > reload it into a reg just like we did in
> > > <sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>. So a backend fix
> > > shou be better?
> > 
> > I think currently the backend isn't consistent with itself and sure,
> > a backend fix would be better (if it doesn't mean bloating the .md
> > with many more patterns).
> 
> No, just adjust the existed pattern should be ok.
Relax the predicate doesn't help since the mask pattern checks extra (match_dup 1)
 and need to swap operands. we once tried to replace it with (match_operand:VFH_AVX512VL 5 "nonimmediate_operand" "0,0")), but trigger an ICE in reload(reload can handle at most one operand with "0" constraint).


6213(define_insn "<avx512>_fnmsub_<mode>_mask<round_name>"
 6214  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
 6215        (vec_merge:VFH_AVX512VL
 6216          (fma:VFH_AVX512VL
 6217            (neg:VFH_AVX512VL
 6218              (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0"))
 6219            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
 6220            (neg:VFH_AVX512VL
 6221              (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")))
 6222          (match_dup 1)
 6223          (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
 6224  "TARGET_AVX512F && <round_mode_condition>"


So the backend fix should at least add 8 patterns to handle that, in that case, maybe the middle-end canonicalization would be better.
Comment 12 Hongtao Liu 2024-10-13 10:14:44 UTC
> 
> So the backend fix should at least add 8 patterns to handle that, in that
> case, maybe the middle-end canonicalization would be better.

And I will still submit a patch to make the FMA predicates more consistent.
Comment 13 GCC Commits 2024-10-17 01:59:02 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:330782a1b6cfe881ad884617ffab441aeb1c2b5c

commit r15-4398-g330782a1b6cfe881ad884617ffab441aeb1c2b5c
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Oct 14 17:16:13 2024 +0800

    Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask).
    
    For x86 masked fma, there're 2 rtl representations
    1) (vec_merge (fma op2 op1 op3) op1 mask)
    2) (vec_merge (fma op1 op2 op3) op1 mask).
    
     5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
     5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
     5896        (vec_merge:VFH_AVX512VL
     5897          (fma:VFH_AVX512VL
     5898            (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
     5899            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
     5900            (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
     5901          (match_dup 1)
     5902          (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
     5903  "TARGET_AVX512F && <round_mode_condition>"
     5904  "@
     5905   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
     5906   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
     5907  [(set_attr "type" "ssemuladd")
     5908   (set_attr "prefix" "evex")
     5909   (set_attr "mode" "<MODE>")])
    
    Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
    we once tried to replace it with (match_operand:M 5
    "nonimmediate_operand" "0")) to enable more flexibility for pattern
    match and recog, but it triggered an ICE in reload(reload can handle
    at most one perand with "0" constraint).
    
    So we need either add 2 patterns in the backend or just do the
    canonicalization in the middle-end.
    
    gcc/ChangeLog:
    
            PR middle-end/117072
            * combine.cc (maybe_swap_commutative_operands):
            Canonicalize (vec_merge (fma op2 op1 op3) op1 mask)
            to (vec_merge (fma op1 op2 op3) op1 mask).
Comment 14 GCC Commits 2024-10-17 01:59:07 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:edf4db8355dead3413bad64f6a89bae82dabd0ad

commit r15-4399-gedf4db8355dead3413bad64f6a89bae82dabd0ad
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Oct 14 13:09:59 2024 +0800

    Canonicalize (vec_merge (fma: op2 op1 op3) (match_dup 1)) mask) to (vec_merge (fma: op1 op2 op3) (match_dup 1)) mask)
    
    For masked FMA, there're 2 forms of RTL representation
    1) (vec_merge (fma: op2 op1 op3) op1) mask)
    2) (vec_merge (fma: op1 op2 op3) op1) mask)
    It's because op1 op2 are communatative in RTL(the second op1 is
    written as (match_dup 1))
    
    we once tried to replace (match_dup 1)
    with (match_operand:VFH_AVX512VL 5 "nonimmediate_operand" "0,0")), but
    trigger an ICE in reload(reload can handle at most one operand with
    "0" constraint).
    
    So the patch do the canonicalizaton for the backend part.
    
    gcc/ChangeLog:
    
            PR target/117072
            * config/i386/sse.md (<avx512>_fmadd_<mode>_mask<round_name>):
            Relax predicates of fma operands from register_operand to
            nonimmediate_operand.
            (<avx512>_fmadd_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fmsub_<mode>_mask<round_name>): Ditto.
            (<avx512>_fmsub_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fnmadd_<mode>_mask<round_name>): Ditto.
            (<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fnmsub_<mode>_mask<round_name>): Ditto.
            (<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto.
            (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto.
            (avx512f_vmfmadd_<mode>_mask<round_name>): Ditto.
            (avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto.
            (avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto.
            (*avx512f_vmfmsub_<mode>_mask<round_name>): Ditto.
            (avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto.
            (*avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto.
            (avx512f_vmfnmadd_<mode>_mask<round_name>): Ditto.
            (avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto.
            (avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto.
            (*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
            (*avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto.
            (*avx512f_vmfnmsub_<mode>_maskz_1<round_name>): Ditto.
            (avx10_2_fmaddnepbf16_<mode>_mask3): Ditto.
            (avx10_2_fnmaddnepbf16_<mode>_mask3): Ditto.
            (avx10_2_fmsubnepbf16_<mode>_mask3): Ditto.
            (avx10_2_fnmsubnepbf16_<mode>_mask3): Ditto.
            (fmai_vmfmadd_<mode><round_name>): Swap operands[1] and operands[2].
            (fmai_vmfmsub_<mode><round_name>): Ditto.
            (fmai_vmfnmadd_<mode><round_name>): Ditto.
            (fmai_vmfnmsub_<mode><round_name>): Ditto.
            (*fmai_fmadd_<mode>): Swap operands[1] and operands[2] adjust
            operands[1] predicates from register_operand to
            nonimmediate_operand.
            (*fmai_fmsub_<mode>): Ditto.
            (*fmai_fnmadd_<mode><round_name>): Ditto.
            (*fmai_fnmsub_<mode><round_name>): Ditto.
Comment 15 Hongtao Liu 2024-10-17 02:02:12 UTC
Tests that now work, but didn't before (24 tests):

gcc: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times vfmadd132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times vfmsub132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times vfnmadd132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times vfnmsub132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfmadd132pd[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfmsub132pd[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfnmadd132pd[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfnmsub132pd[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfmadd132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfmsub132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfnmadd132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfnmsub132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times vfmadd132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times vfmsub132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times vfnmadd132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times vfnmsub132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfmadd132pd[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfmsub132pd[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfnmadd132pd[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfnmsub132pd[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfmadd132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfmsub132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfnmadd132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
unix/-m32: gcc: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfnmsub132ps[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1