It seems that the operands src1 and src2 of "_mm512_fcmul_pch" are swapped for macro-fusion with optimize option. If the operands are swapped, the imag value of result will have incorrect sign bit. So, the operands should not be swapped in these conjugate complex multiplication intrinsics. Let me show the example and the output. output: 3.000000 -4.000000 // w/o optimize. 3.000000 4.000000 // w/ optimize. https://godbolt.org/z/df9Gz18hc // but may not executable ``` #include <immintrin.h> #include <cstdio> __attribute__((optimize("O0"))) auto func0(_Float16 *a, _Float16 *b, int n, _Float16 *c) { __m512h rA = _mm512_loadu_ph(a); for (int i = 0; i < n; i += 32) { __m512h rB = _mm512_loadu_ph(b + i); _mm512_storeu_ph(c + i, _mm512_fcmul_pch(rB, rA)); } } __attribute__((optimize("O"))) auto func1(_Float16 *a, _Float16 *b, int n, _Float16 *c) { __m512h rA = _mm512_loadu_ph(a); for (int i = 0; i < n; i += 32) { __m512h rB = _mm512_loadu_ph(b + i); _mm512_storeu_ph(c + i, _mm512_fcmul_pch(rB, rA)); } } int main() { int n = 32; _Float16 a[n], b[n], c[n]; for (int i = 1; i <= n; i++) { a[i - 1] = i & 1 ? -i : i; b[i - 1] = i; } func0(a, b, n, c); for (int i = 0; i < n / 32 * 2; i++) { printf("%f ", (float)c[i]); } printf("\n"); func1(a, b, n, c); for (int i = 0; i < n / 32 * 2; i++) { printf("%f ", (float)c[i]); } printf("\n"); return 0; } ```
*** Bug 111333 has been marked as a duplicate of this bug. ***
.
A patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629650.html
A related PR111335 for fmaddcph , similar but not the same, PR111335 is due to precision difference for complex _Float16 fma, fmaddcph a, b, c is not equal to fmaddcph b, a, c
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>: https://gcc.gnu.org/g:f197392a16ffb1327f1d12ff8ff05f9295e015cb commit r14-3831-gf197392a16ffb1327f1d12ff8ff05f9295e015cb Author: liuhongt <hongtao.liu@intel.com> Date: Fri Sep 8 09:22:43 2023 +0800 Remove constraint modifier % for fcmaddcph/fmaddcph/fcmulcph since there're not commutative. gcc/ChangeLog: PR target/111306 PR target/111335 * config/i386/sse.md (int_comm): New int_attr. (fma_<complexopname>_<mode><sdc_maskz_name><round_name>): Remove % for Complex conjugate operations since they're not commutative. (fma_<complexpairopname>_<mode>_pair): Ditto. (<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto. (cmul<conj_op><mode>3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111306.c: New test.
The releases/gcc-13 branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>: https://gcc.gnu.org/g:162731529e4dd10970880c369471229735dc3e9b commit r13-7789-g162731529e4dd10970880c369471229735dc3e9b Author: liuhongt <hongtao.liu@intel.com> Date: Fri Sep 8 09:22:43 2023 +0800 Remove constraint modifier % for fcmaddcph/fmaddcph/fcmulcph since there're not commutative. gcc/ChangeLog: PR target/111306 PR target/111335 * config/i386/sse.md (int_comm): New int_attr. (fma_<complexopname>_<mode><sdc_maskz_name><round_name>): Remove % for Complex conjugate operations since they're not commutative. (fma_<complexpairopname>_<mode>_pair): Ditto. (<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto. (cmul<conj_op><mode>3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111306.c: New test. (cherry picked from commit f197392a16ffb1327f1d12ff8ff05f9295e015cb)
The releases/gcc-12 branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>: https://gcc.gnu.org/g:82c1ff396e49b706d5baa11f4c884810f6350e95 commit r12-9852-g82c1ff396e49b706d5baa11f4c884810f6350e95 Author: liuhongt <hongtao.liu@intel.com> Date: Fri Sep 8 09:22:43 2023 +0800 Remove constraint modifier % for fcmaddcph/fmaddcph/fcmulcph since there're not commutative. gcc/ChangeLog: PR target/111306 PR target/111335 * config/i386/sse.md (int_comm): New int_attr. (fma_<complexopname>_<mode><sdc_maskz_name><round_name>): Remove % for Complex conjugate operations since they're not commutative. (fma_<complexpairopname>_<mode>_pair): Ditto. (<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto. (cmul<conj_op><mode>3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111306.c: New test. (cherry picked from commit f197392a16ffb1327f1d12ff8ff05f9295e015cb)
Fixed in GCC14.1 GCC13.3 GCC12.4
Fixed.