Bug 107647 - [12 Regression] GCC 12.2.0 may produce FMAs even with -ffp-contract=off
Summary: [12 Regression] GCC 12.2.0 may produce FMAs even with -ffp-contract=off
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.2.0
: P2 normal
Target Milestone: 12.3
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on: 107766
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2022-11-11 16:59 UTC by bartoldeman
Modified: 2022-12-12 11:21 UTC (History)
5 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Build: x86_64-pc-linux-gnu
Known to work: 11.3.0, 12.2.1, 13.0
Known to fail: 12.2.0
Last reconfirmed: 2022-11-11 00:00:00


Attachments
patch I am testing (1.26 KB, patch)
2022-11-17 09:10 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description bartoldeman 2022-11-11 16:59:53 UTC
I stumped upon an example where GCC generates FMA instruction even when FMAs are disabled using -ffp-contract=off (extracted from https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_64/cscal.c)

$ cat cscal.c
void cscal(int n, float da_r, float *x)
{
  for (int i = 0; i < n; i += 4)
    {
      float temp0  =  da_r * x[i]   - x[i+1];
      float temp1  =  da_r * x[i+2] - x[i+3];
      x[i+1]       =  da_r * x[i+1] + x[i];
      x[i+3]       =  da_r * x[i+3] + x[i+2];
      x[i]         =  temp0;
      x[i+2]       =  temp1;
    }
}
$ gcc -S -march=haswell -O2 -ffp-contract=off cscal.c
$ grep fma cscal.s
        vfmaddsub231ps  %xmm0, %xmm2, %xmm1

I would expect there to be no FMA instructions in there.
Comment 1 bartoldeman 2022-11-11 17:05:44 UTC
According to godbolt it's still producing FMAs on trunk:
https://godbolt.org/z/aWh6d1E4E
Comment 2 Andrew Pinski 2022-11-11 17:12:10 UTC
Confirmed.

It is SLP that is doing it.
At -O3 even GCC 11 is working ok.
Comment 3 Alexander Monakov 2022-11-11 17:22:52 UTC
Nice catch, thanks for the report. This is due to g:7d810646d421

The documentation should clarify that patterns correspond to basic fma instructions (without intermediate rounding), and SLP pattern matching should check flag_fp_contract_mode != FP_CONTRACT_OFF.
Comment 4 Andrew Pinski 2022-11-11 17:27:54 UTC
(In reply to Alexander Monakov from comment #3)
> Nice catch, thanks for the report. This is due to g:7d810646d421
> 
> The documentation should clarify that patterns correspond to basic fma
> instructions (without intermediate rounding), and SLP pattern matching
> should check flag_fp_contract_mode != FP_CONTRACT_OFF.

I don't think they should. Because __builtin_fma (and fma and fmaf) should be able to be used from an user program and still get FMA instructions. And even the SLP instruction patterns can be generating using those.
Comment 5 Andrew Pinski 2022-11-11 17:33:31 UTC
(In reply to Andrew Pinski from comment #4)
> (In reply to Alexander Monakov from comment #3)
> > Nice catch, thanks for the report. This is due to g:7d810646d421
> > 
> > The documentation should clarify that patterns correspond to basic fma
> > instructions (without intermediate rounding), and SLP pattern matching
> > should check flag_fp_contract_mode != FP_CONTRACT_OFF.
> 
> I don't think they should. Because __builtin_fma (and fma and fmaf) should
> be able to be used from an user program and still get FMA instructions. And
> even the SLP instruction patterns can be generating using those.

That is:
void f(float *a, float *b, float *c)
{
  float t0 = __builtin_fmaf(a[0], b[0], c[0]);
  float t1 = __builtin_fmaf(a[1], b[1], c[1]);
  a[0] = t0;
  a[1] = t1;
}
Should produce:
        vmovq   xmm0, QWORD PTR [rdi]
        vmovq   xmm2, QWORD PTR [rsi]
        vmovq   xmm1, QWORD PTR [rdx]
        vfmadd132ps     xmm0, xmm1, xmm2
        vmovlps QWORD PTR [rdi], xmm0
        ret
Even with  -ffp-contract=off -march=haswell -O3
Comment 6 Alexander Monakov 2022-11-11 17:41:49 UTC
Sure, but I was talking specifically about the pattern matching introduced by that commit.
Comment 7 Andrew Pinski 2022-11-11 17:54:27 UTC
(In reply to Alexander Monakov from comment #6)
> Sure, but I was talking specifically about the pattern matching introduced
> by that commit.

The general rule for pattern matching is if you don't have a FMA (or FMA-like) don't try to generate a FMA when -ffp-contract=off is supplied. The target backend should NOT know about flag_fp_contract_mode for the FMA like patterns. It is up to the middle-end optimizers to produce the correct thing.
Comment 8 Richard Biener 2022-11-14 11:43:14 UTC
I think the SLP pattern detection doesn't even work with a FMADD, FMSUB combo. We don't generate FMSUB early, not sure if we could somehow trick us into
folding one with __builtin_fma, maybe __builtin_fma (x, y, -z) would do, I'd have to check.

So yes, the SLP pattern detection introduces a FMA where none was before
(and then vectorizes it).

I'll fix it.
Comment 9 Richard Biener 2022-11-14 11:49:25 UTC
Ah, we only fold those to internal functions _after_ vectorization.  SLP will
see

double x[2];

void foo (double a, double b, double * __restrict c)
{
  x[0] = __builtin_fma (a, b, c[0]);
  x[1] = __builtin_fma (a, b, -c[1]);
}

as two calls to FMA and thus fail to optimially vectorize it.
Comment 10 Richard Biener 2022-11-17 08:38:42 UTC
Tamar, are the IFN_COMPLEX_FMA and IFN_COMPLEX_FMA_CONJ FP contracting operations as well?
Comment 11 Tamar Christina 2022-11-17 08:59:46 UTC
(In reply to Richard Biener from comment #10)
> Tamar, are the IFN_COMPLEX_FMA and IFN_COMPLEX_FMA_CONJ FP contracting
> operations as well?

Yes, they have no intermediate rounding.
Comment 12 Tamar Christina 2022-11-17 09:07:49 UTC
Note that the same IFN is used for integer MLA as well. We didn't split them apart.
Comment 13 Richard Biener 2022-11-17 09:10:29 UTC
Created attachment 53917 [details]
patch I am testing

OK, I'm testing the following then - can you see if that works for the complex fmas and if the aarch64 testsuite needs any adjustments?  (-ffp-contract=fast is default)
Comment 14 Tamar Christina 2022-11-17 09:15:16 UTC
(In reply to Richard Biener from comment #13)
> Created attachment 53917 [details]
> patch I am testing
> 
> OK, I'm testing the following then - can you see if that works for the
> complex fmas and if the aarch64 testsuite needs any adjustments? 
> (-ffp-contract=fast is default)

Will do, this needs a check on the type no? SVE2 adds complex integer MLAs which should be fine (we didn't split the IFNs).
Comment 15 Alexander Monakov 2022-11-17 09:21:25 UTC
I'm confused about the first hunk in the attached patch:

--- a/gcc/tree-vect-slp-patterns.cc
+++ b/gcc/tree-vect-slp-patterns.cc
@@ -1035,8 +1035,10 @@ complex_mul_pattern::matches (complex_operation_t op,
   auto_vec<slp_tree> left_op, right_op;
   slp_tree add0 = NULL;
 
-  /* Check if we may be a multiply add.  */
+  /* Check if we may be a multiply add.  It's only valid to form FMAs
+     with -ffp-contract=fast.  */
   if (!mul0
+      && flag_fp_contract_mode != FP_CONTRACT_FAST
       && vect_match_expression_p (l0node[0], PLUS_EXPR))
     {
       auto vals = SLP_TREE_CHILDREN (l0node[0]);


Shouldn't it be ' == FP_CONTRACT_FAST' rather than '!='? It seems we are checking that a match is found and contracting across statement boundaries is allowed.
Comment 16 Richard Biener 2022-11-17 13:16:35 UTC
(In reply to Alexander Monakov from comment #15)
> I'm confused about the first hunk in the attached patch:
> 
> --- a/gcc/tree-vect-slp-patterns.cc
> +++ b/gcc/tree-vect-slp-patterns.cc
> @@ -1035,8 +1035,10 @@ complex_mul_pattern::matches (complex_operation_t op,
>    auto_vec<slp_tree> left_op, right_op;
>    slp_tree add0 = NULL;
>  
> -  /* Check if we may be a multiply add.  */
> +  /* Check if we may be a multiply add.  It's only valid to form FMAs
> +     with -ffp-contract=fast.  */
>    if (!mul0
> +      && flag_fp_contract_mode != FP_CONTRACT_FAST
>        && vect_match_expression_p (l0node[0], PLUS_EXPR))
>      {
>        auto vals = SLP_TREE_CHILDREN (l0node[0]);
> 
> 
> Shouldn't it be ' == FP_CONTRACT_FAST' rather than '!='? It seems we are
> checking that a match is found and contracting across statement boundaries
> is allowed.

whoops yes, I'll fix and add a check for the type.
Comment 17 GCC Commits 2022-11-18 07:37:47 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:c5df8392c5848c0462558f41cdf6eab31db301cf

commit r13-4137-gc5df8392c5848c0462558f41cdf6eab31db301cf
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Nov 17 09:43:31 2022 +0100

    tree-optimization/107647 - avoid FMA from SLP with -ffp-contract=off
    
    Only with -ffp-contract=fast we can synthesize FMA operations like
    vfmaddsub231ps, so properly guard the transform in SLP pattern
    detection.
    
            PR tree-optimization/107647
            * tree-vect-slp-patterns.cc (addsub_pattern::recognize): Only
            allow FMA generation with -ffp-contract=fast for FP types.
            (complex_mul_pattern::matches): Likewise.
    
            * gcc.target/i386/pr107647.c: New testcase.
Comment 18 Richard Biener 2022-11-18 07:48:56 UTC
Fixed on trunk sofar (whoops, pushed before giving you a chance to test on aarch64 - hopefully it goes OK).
Comment 19 Tamar Christina 2022-11-21 09:45:33 UTC
FWIW, the testsuite on AArch64 was clean after the patch.
Comment 20 GCC Commits 2022-12-12 11:20:43 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a9fafa2f533e25c57528c0294e19a154197848dd

commit r12-8974-ga9fafa2f533e25c57528c0294e19a154197848dd
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Nov 17 09:43:31 2022 +0100

    tree-optimization/107647 - avoid FMA from SLP with -ffp-contract=off
    
    Only with -ffp-contract=fast we can synthesize FMA operations like
    vfmaddsub231ps, so properly guard the transform in SLP pattern
    detection.
    
            PR tree-optimization/107647
            * tree-vect-slp-patterns.cc (addsub_pattern::recognize): Only
            allow FMA generation with -ffp-contract=fast for FP types.
            (complex_mul_pattern::matches): Likewise.
    
            * gcc.target/i386/pr107647.c: New testcase.
    
    (cherry picked from commit c5df8392c5848c0462558f41cdf6eab31db301cf)
Comment 21 Richard Biener 2022-12-12 11:21:27 UTC
Fixed.