Bug 111306 - [12,13] macro-fusion makes error on conjugate complex multiplication fp16
Summary: [12,13] macro-fusion makes error on conjugate complex multiplication fp16
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 13.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 111333 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-09-06 12:18 UTC by joony.wie
Modified: 2023-09-12 11:09 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-09-08 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description joony.wie 2023-09-06 12:18:54 UTC
It seems that the operands src1 and src2 of "_mm512_fcmul_pch" are swapped for macro-fusion with optimize option.

If the operands are swapped, the imag value of result will have incorrect sign bit.

So, the operands should not be swapped in these conjugate complex multiplication intrinsics.

Let me show the example and the output.

output: 
3.000000 -4.000000 // w/o optimize.
3.000000 4.000000 // w/ optimize.

https://godbolt.org/z/df9Gz18hc // but may not executable
```
#include <immintrin.h>
#include <cstdio>

__attribute__((optimize("O0")))
auto func0(_Float16 *a, _Float16 *b, int n, _Float16 *c) {
  __m512h rA = _mm512_loadu_ph(a);
  for (int i = 0; i < n; i += 32) {
    __m512h rB = _mm512_loadu_ph(b + i);
    _mm512_storeu_ph(c + i, _mm512_fcmul_pch(rB, rA));
  }
}

__attribute__((optimize("O")))
auto func1(_Float16 *a, _Float16 *b, int n, _Float16 *c) {
  __m512h rA = _mm512_loadu_ph(a);
  for (int i = 0; i < n; i += 32) {
    __m512h rB = _mm512_loadu_ph(b + i);
    _mm512_storeu_ph(c + i, _mm512_fcmul_pch(rB, rA));
  }
}

int main() {
  int n = 32;

  _Float16 a[n], b[n], c[n];
  for (int i = 1; i <= n; i++) {
    a[i - 1] = i & 1 ? -i : i;
    b[i - 1] = i;
  }

  func0(a, b, n, c);
    for (int i = 0; i < n / 32 * 2; i++) {
      printf("%f ", (float)c[i]);
    }
    printf("\n");

  func1(a, b, n, c);
    for (int i = 0; i < n / 32 * 2; i++) {
      printf("%f ", (float)c[i]);
    }
    printf("\n");

  return 0;
}
```
Comment 1 Andrew Pinski 2023-09-08 02:18:13 UTC
*** Bug 111333 has been marked as a duplicate of this bug. ***
Comment 2 Andrew Pinski 2023-09-08 02:19:13 UTC
.
Comment 3 Hongtao.liu 2023-09-08 03:24:33 UTC
A patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629650.html
Comment 4 Hongtao.liu 2023-09-08 06:02:52 UTC
A related PR111335 for fmaddcph , similar but not the same, PR111335 is due to precision difference for complex _Float16 fma, fmaddcph a, b, c is not equal to fmaddcph b, a, c
Comment 5 GCC Commits 2023-09-11 01:17:07 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:f197392a16ffb1327f1d12ff8ff05f9295e015cb

commit r14-3831-gf197392a16ffb1327f1d12ff8ff05f9295e015cb
Author: liuhongt <hongtao.liu@intel.com>
Date:   Fri Sep 8 09:22:43 2023 +0800

    Remove constraint modifier % for fcmaddcph/fmaddcph/fcmulcph since there're not commutative.
    
    gcc/ChangeLog:
    
            PR target/111306
            PR target/111335
            * config/i386/sse.md (int_comm): New int_attr.
            (fma_<complexopname>_<mode><sdc_maskz_name><round_name>):
            Remove % for Complex conjugate operations since they're not
            commutative.
            (fma_<complexpairopname>_<mode>_pair): Ditto.
            (<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto.
            (cmul<conj_op><mode>3): Ditto.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr111306.c: New test.
Comment 6 GCC Commits 2023-09-11 01:20:10 UTC
The releases/gcc-13 branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:162731529e4dd10970880c369471229735dc3e9b

commit r13-7789-g162731529e4dd10970880c369471229735dc3e9b
Author: liuhongt <hongtao.liu@intel.com>
Date:   Fri Sep 8 09:22:43 2023 +0800

    Remove constraint modifier % for fcmaddcph/fmaddcph/fcmulcph since there're not commutative.
    
    gcc/ChangeLog:
    
            PR target/111306
            PR target/111335
            * config/i386/sse.md (int_comm): New int_attr.
            (fma_<complexopname>_<mode><sdc_maskz_name><round_name>):
            Remove % for Complex conjugate operations since they're not
            commutative.
            (fma_<complexpairopname>_<mode>_pair): Ditto.
            (<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto.
            (cmul<conj_op><mode>3): Ditto.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr111306.c: New test.
    
    (cherry picked from commit f197392a16ffb1327f1d12ff8ff05f9295e015cb)
Comment 7 GCC Commits 2023-09-11 01:22:12 UTC
The releases/gcc-12 branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:82c1ff396e49b706d5baa11f4c884810f6350e95

commit r12-9852-g82c1ff396e49b706d5baa11f4c884810f6350e95
Author: liuhongt <hongtao.liu@intel.com>
Date:   Fri Sep 8 09:22:43 2023 +0800

    Remove constraint modifier % for fcmaddcph/fmaddcph/fcmulcph since there're not commutative.
    
    gcc/ChangeLog:
    
            PR target/111306
            PR target/111335
            * config/i386/sse.md (int_comm): New int_attr.
            (fma_<complexopname>_<mode><sdc_maskz_name><round_name>):
            Remove % for Complex conjugate operations since they're not
            commutative.
            (fma_<complexpairopname>_<mode>_pair): Ditto.
            (<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto.
            (cmul<conj_op><mode>3): Ditto.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr111306.c: New test.
    
    (cherry picked from commit f197392a16ffb1327f1d12ff8ff05f9295e015cb)
Comment 8 Hongtao.liu 2023-09-11 01:23:26 UTC
Fixed in GCC14.1 GCC13.3 GCC12.4
Comment 9 Richard Biener 2023-09-12 11:09:37 UTC
Fixed.