70243 – PowerPC V4SFmode should not use Altivec instructions on VSX systems

Bug 70243 - PowerPC V4SFmode should not use Altivec instructions on VSX systems

Summary: PowerPC V4SFmode should not use Altivec instructions on VSX systems

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	6.0

Importance:	P1 normal
Target Milestone:	---
Assignee:	Michael Meissner

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-03-15 17:33 UTC by Michael Meissner
Modified:	2023-06-02 16:16 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2016-03-15 00:00:00

Attachments
Test case (196 bytes, text/plain) 2023-04-05 23:06 UTC, Michael Meissner	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Meissner 2016-03-15 17:33:38 UTC

The traditional Altivec single precision vector floating point instructions should not be used if the VSX instructions are available.  This is due to the fact that the Altivec instruction does not use the current rounding mode, while the VSX instruction does.

I did a quick glance through vsx.md and altivec.md, and I believe the only places we use the Altivec instruction by default is for 4 operand fused multiply-add and 4 operand fused negate multiply-subtract, when the destination operand does not overlap with the input operands.

Comment 1 Michael Meissner 2016-03-15 17:40:20 UTC

Note, I meant V4SFmode (i.e. vector float), not V4DFmode.

Comment 2 Segher Boessenkool 2022-01-13 16:02:27 UTC

Do you have a testcase?

Comment 3 Chip Kerchner 2023-04-05 22:27:28 UTC

This is showing up in some of the binaries generated by Eigen (with GCC13).

Comment 4 Chip Kerchner 2023-04-05 22:52:01 UTC

It shows up as a rounding difference on BE machines.

Comment 5 Michael Meissner 2023-04-05 23:06:01 UTC

Created attachment 54814 [details]
Test case

This is test case that shows the generation of fmaddfp and fnmsubfp.

Comment 6 Segher Boessenkool 2023-04-06 18:23:11 UTC

We should not use any VMX insn unless explicitly asked for it, since those
do not work as expected if VSCR[NJ]=1, which unfortunately is the default on
Linux (but not on powerpc64le-linux; that is a separate (kernel) bug).

Rounding mode does not matter too much, if we have some subset of fast-math
anyway; the only rounding mode in VMX is round-to-nearest-ties-to-even, which
is the default for most everything else).

But NJ=1 makes arithmetic behave completely unexpectedly, and it isn't
actually faster than NJ=0 on modern hardware anyway.  We cannot change the
default for setting NJ because some code might rely on it, unfortunately.
Luckily disabling generating all VMX insns automatically (i.e. without it
being explicitly asked for) isn't all that expensive, just ends up as a few
more move instructions here and there.

This isn't a regression, but we should have this in GCC 13.

Comment 7 GCC Commits 2023-04-10 03:34:30 UTC

The master branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:725bcdeec60771cb9ee387978716028b64ea1b7f

commit r13-7132-g725bcdeec60771cb9ee387978716028b64ea1b7f
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Sun Apr 9 23:32:27 2023 -0400

    Do not generate vmaddfp and vnmsubfp
    
    This is version 3 of the patch.  This is essentially version 1 with the removal
    of changes to altivec.md, and cleanup of the comments.
    
    Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used,
    and those changes are deleted in this patch.
    
    The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions due to VSCR[NJ] and other
    corner cases.  In particular, generating these instructions seems to break
    Eigen on big endian systems.
    
    2023-04-09   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp.
            (vsx_nfmsv4sf4): Do not generate vnmsubfp.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.

Comment 8 GCC Commits 2023-05-22 15:13:57 UTC

The releases/gcc-12 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:3bb91d31a272d7fd9f02301df101e3041d5aeb5d

commit r12-9635-g3bb91d31a272d7fd9f02301df101e3041d5aeb5d
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon May 22 11:08:13 2023 -0400

    Do not generate vmaddfp and vnmsubfp
    
    This is version 3 of the patch.  This is essentially version 1 with the removal
    of changes to altivec.md, and cleanup of the comments.
    
    Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used,
    and those changes are deleted in this patch.
    
    The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen on big endian systems.
    
    I have done bootstrap builds on power9 little endian (with both IEEE long
    double and IBM long double).  I have also done the builds and test on a power8
    big endian system (testing both 32-bit and 64-bit code generation).  Chip has
    verified that it fixes the problem that Eigen encountered.  Can I check this
    into the master GCC branch?  After a burn-in period, can I check this patch
    into the active GCC branches?
    
    Thanks in advance.
    
    2023-05-22   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp.
            (vsx_nfmsv4sf4): Do not generate vnmsubfp.  Back port from master
            04/10/2023 change.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.  Back port from master
            04/10/2023 change.

Comment 9 GCC Commits 2023-05-22 15:18:19 UTC

The releases/gcc-11 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:d7d25bcfbd5ee5ef17fabeb67ad5e093cd975a36

commit r11-10807-gd7d25bcfbd5ee5ef17fabeb67ad5e093cd975a36
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon May 22 11:17:01 2023 -0400

    Do not generate vmaddfp and vnmsubfp
    
    This is version 3 of the patch.  This is essentially version 1 with the removal
    of changes to altivec.md, and cleanup of the comments.
    
    Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used,
    and those changes are deleted in this patch.
    
    The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen on big endian systems.
    
    I have done bootstrap builds on power9 little endian (with both IEEE long
    double and IBM long double).  I have also done the builds and test on a power8
    big endian system (testing both 32-bit and 64-bit code generation).  Chip has
    verified that it fixes the problem that Eigen encountered.  Can I check this
    into the master GCC branch?  After a burn-in period, can I check this patch
    into the active GCC branches?
    
    Thanks in advance.
    
    2023-04-07   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp.  Back
            port from master 04/10/2023.
            (vsx_nfmsv4sf4): Do not generate vnmsubfp.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.  Back port from master
            04/10/2023.

Comment 10 GCC Commits 2023-05-22 15:28:40 UTC

The releases/gcc-10 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:c970030226341f0c7fa9f319b37786ca81703c6d

commit r10-11421-gc970030226341f0c7fa9f319b37786ca81703c6d
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon May 22 11:26:08 2023 -0400

    Do not generate vmaddfp and vnmsubfp
    
    This is version 3 of the patch.  This is essentially version 1 with the removal
    of changes to altivec.md, and cleanup of the comments.
    
    Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used,
    and those changes are deleted in this patch.
    
    The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen on big endian systems.
    
    I have done bootstrap builds on power9 little endian (with both IEEE long
    double and IBM long double).  I have also done the builds and test on a power8
    big endian system (testing both 32-bit and 64-bit code generation).  Chip has
    verified that it fixes the problem that Eigen encountered.  Can I check this
    into the master GCC branch?  After a burn-in period, can I check this patch
    into the active GCC branches?
    
    Thanks in advance.
    
    2023-05-22   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp.  Back
            port from master 04/10/2023.
            (vsx_nfmsv4sf4): Do not generate vnmsubfp.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.  Back port from master
            04/10/2023.

Comment 11 Peter Bergner 2023-05-23 13:19:08 UTC

Mike, can we marked this as FIXED now?  ...or are there other changes needed?

Comment 12 Peter Bergner 2023-06-02 16:16:42 UTC

Mike said offline we can mark this as FIXED.