The traditional Altivec single precision vector floating point instructions should not be used if the VSX instructions are available. This is due to the fact that the Altivec instruction does not use the current rounding mode, while the VSX instruction does. I did a quick glance through vsx.md and altivec.md, and I believe the only places we use the Altivec instruction by default is for 4 operand fused multiply-add and 4 operand fused negate multiply-subtract, when the destination operand does not overlap with the input operands.
Note, I meant V4SFmode (i.e. vector float), not V4DFmode.
Do you have a testcase?
This is showing up in some of the binaries generated by Eigen (with GCC13).
It shows up as a rounding difference on BE machines.
Created attachment 54814 [details] Test case This is test case that shows the generation of fmaddfp and fnmsubfp.
We should not use any VMX insn unless explicitly asked for it, since those do not work as expected if VSCR[NJ]=1, which unfortunately is the default on Linux (but not on powerpc64le-linux; that is a separate (kernel) bug). Rounding mode does not matter too much, if we have some subset of fast-math anyway; the only rounding mode in VMX is round-to-nearest-ties-to-even, which is the default for most everything else). But NJ=1 makes arithmetic behave completely unexpectedly, and it isn't actually faster than NJ=0 on modern hardware anyway. We cannot change the default for setting NJ because some code might rely on it, unfortunately. Luckily disabling generating all VMX insns automatically (i.e. without it being explicitly asked for) isn't all that expensive, just ends up as a few more move instructions here and there. This isn't a regression, but we should have this in GCC 13.
The master branch has been updated by Michael Meissner <meissner@gcc.gnu.org>: https://gcc.gnu.org/g:725bcdeec60771cb9ee387978716028b64ea1b7f commit r13-7132-g725bcdeec60771cb9ee387978716028b64ea1b7f Author: Michael Meissner <meissner@linux.ibm.com> Date: Sun Apr 9 23:32:27 2023 -0400 Do not generate vmaddfp and vnmsubfp This is version 3 of the patch. This is essentially version 1 with the removal of changes to altivec.md, and cleanup of the comments. Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used, and those changes are deleted in this patch. The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors than the VSX xvmaddsp and xvnmsubsp instructions due to VSCR[NJ] and other corner cases. In particular, generating these instructions seems to break Eigen on big endian systems. 2023-04-09 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/70243 * config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp. (vsx_nfmsv4sf4): Do not generate vnmsubfp. gcc/testsuite/ PR target/70243 * gcc.target/powerpc/pr70243.c: New test.
The releases/gcc-12 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>: https://gcc.gnu.org/g:3bb91d31a272d7fd9f02301df101e3041d5aeb5d commit r12-9635-g3bb91d31a272d7fd9f02301df101e3041d5aeb5d Author: Michael Meissner <meissner@linux.ibm.com> Date: Mon May 22 11:08:13 2023 -0400 Do not generate vmaddfp and vnmsubfp This is version 3 of the patch. This is essentially version 1 with the removal of changes to altivec.md, and cleanup of the comments. Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used, and those changes are deleted in this patch. The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating these instructions seems to break Eigen on big endian systems. I have done bootstrap builds on power9 little endian (with both IEEE long double and IBM long double). I have also done the builds and test on a power8 big endian system (testing both 32-bit and 64-bit code generation). Chip has verified that it fixes the problem that Eigen encountered. Can I check this into the master GCC branch? After a burn-in period, can I check this patch into the active GCC branches? Thanks in advance. 2023-05-22 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/70243 * config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp. (vsx_nfmsv4sf4): Do not generate vnmsubfp. Back port from master 04/10/2023 change. gcc/testsuite/ PR target/70243 * gcc.target/powerpc/pr70243.c: New test. Back port from master 04/10/2023 change.
The releases/gcc-11 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>: https://gcc.gnu.org/g:d7d25bcfbd5ee5ef17fabeb67ad5e093cd975a36 commit r11-10807-gd7d25bcfbd5ee5ef17fabeb67ad5e093cd975a36 Author: Michael Meissner <meissner@linux.ibm.com> Date: Mon May 22 11:17:01 2023 -0400 Do not generate vmaddfp and vnmsubfp This is version 3 of the patch. This is essentially version 1 with the removal of changes to altivec.md, and cleanup of the comments. Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used, and those changes are deleted in this patch. The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating these instructions seems to break Eigen on big endian systems. I have done bootstrap builds on power9 little endian (with both IEEE long double and IBM long double). I have also done the builds and test on a power8 big endian system (testing both 32-bit and 64-bit code generation). Chip has verified that it fixes the problem that Eigen encountered. Can I check this into the master GCC branch? After a burn-in period, can I check this patch into the active GCC branches? Thanks in advance. 2023-04-07 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/70243 * config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp. Back port from master 04/10/2023. (vsx_nfmsv4sf4): Do not generate vnmsubfp. gcc/testsuite/ PR target/70243 * gcc.target/powerpc/pr70243.c: New test. Back port from master 04/10/2023.
The releases/gcc-10 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>: https://gcc.gnu.org/g:c970030226341f0c7fa9f319b37786ca81703c6d commit r10-11421-gc970030226341f0c7fa9f319b37786ca81703c6d Author: Michael Meissner <meissner@linux.ibm.com> Date: Mon May 22 11:26:08 2023 -0400 Do not generate vmaddfp and vnmsubfp This is version 3 of the patch. This is essentially version 1 with the removal of changes to altivec.md, and cleanup of the comments. Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used, and those changes are deleted in this patch. The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating these instructions seems to break Eigen on big endian systems. I have done bootstrap builds on power9 little endian (with both IEEE long double and IBM long double). I have also done the builds and test on a power8 big endian system (testing both 32-bit and 64-bit code generation). Chip has verified that it fixes the problem that Eigen encountered. Can I check this into the master GCC branch? After a burn-in period, can I check this patch into the active GCC branches? Thanks in advance. 2023-05-22 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/70243 * config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp. Back port from master 04/10/2023. (vsx_nfmsv4sf4): Do not generate vnmsubfp. gcc/testsuite/ PR target/70243 * gcc.target/powerpc/pr70243.c: New test. Back port from master 04/10/2023.
Mike, can we marked this as FIXED now? ...or are there other changes needed?
Mike said offline we can mark this as FIXED.