Created attachment 38420 [details] small test program to verify vec_cts/vec_ctf working on doubles I've noticed some tests in Eigen failing for VSX 64-bit doubles code, when the algorithm looked perfectly normal. Further investigation -and comparison with direct output using same inputs from an SSE test program, showed that the conversion to integer yielded 0 results in the case of VSX (vec_cts basically returned 0 vectors). I've written a small test case that verifies this on both big/little endian VSX-capable systems (compiled with -m64 -mvsx). When using the intrinsic the result is wrong, when using the inline asm version, it works as expected. I could not test it in a more recent gcc so it may well be fixed, however it would be great if this would be backported to gcc 5. Some asm output follows from test program (attached): vec_cts: 1000066c: 60 67 00 f0 xvcvdpsxds vs0,vs12 10000670: 50 02 00 f0 xxswapd vs0,vs0 asm: 10000674: 51 02 00 f0 xxswapd vs32,vs0 10000678: 60 07 00 f0 xvcvdpsxds vs0,vs0 1000067c: 56 02 00 f0 xxswapd vs0,vs32 vec_ctf: 100006f8: 50 02 00 f0 xxswapd vs0,vs0 100006fc: e0 07 00 f0 xvcvsxddp vs0,vs0 10000700: 50 02 00 f0 xxswapd vs0,vs0 asm: 100006f8: 51 02 00 f0 xxswapd vs32,vs0 100006fc: e0 07 00 f0 xvcvsxddp vs0,vs0 10000700: 56 02 00 f0 xxswapd vs0,vs32
Confirmed.
The xxswapd's are a bit of a red herring. These are part of the little-endian normalization code that are required with the funky lxvd2x and stxvd2x instructions. The problem appears to be the register assignment on the instructions generated for vec_cts and vec_ctf. The use of vs12 on vec_cts is an obvious problem, since vs12 doesn't contain any value assigned in the function. The code for vec_ctf looks fine. So we need to figure out what happened with the register number on xvcvdpsxds. The problem still exists on trunk.
Note also that your asm constraints are wrong. You need VSX registers, not Altivec registers, so you should be using the "wa" constraint instead of the "v" constraint. This is why you get some apparently wrong register numbers with your asm results.
OK, there is an obvious bug in the define_expand for vsx_xvcvdpsxds_scale. If the scale factor is 0, wrong code is always generated. I'll get a patch going.
Ack, thanks for the heads up on VSX registers, it does print more reasonable results now.
Author: wschmidt Date: Tue May 10 14:27:12 2016 New Revision: 236082 URL: https://gcc.gnu.org/viewcvs?rev=236082&root=gcc&view=rev Log: [gcc] 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/70963 * config/rs6000/vsx.md (vsx_xvcvdpsxds_scale): Generate correct code for a zero scale factor. (vsx_xvcvdpuxds_scale): Likewise. [gcc/testsuite] 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/70963 * gcc.target/powerpc/pr70963.c: New. Added: trunk/gcc/testsuite/gcc.target/powerpc/pr70963.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/vsx.md trunk/gcc/testsuite/ChangeLog
Author: wschmidt Date: Tue May 10 16:07:04 2016 New Revision: 236089 URL: https://gcc.gnu.org/viewcvs?rev=236089&root=gcc&view=rev Log: [gcc] 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/70963 * config/rs6000/vsx.md (vsx_xvcvdpsxds_scale): Generate correct code for a zero scale factor. (vsx_xvcvdpuxds_scale): Likewise. [gcc/testsuite] 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/70963 * gcc.target/powerpc/pr70963.c: New. Added: branches/gcc-5-branch/gcc/testsuite/gcc.target/powerpc/pr70963.c Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/rs6000/vsx.md branches/gcc-5-branch/gcc/testsuite/ChangeLog
Author: wschmidt Date: Tue May 10 16:09:28 2016 New Revision: 236091 URL: https://gcc.gnu.org/viewcvs?rev=236091&root=gcc&view=rev Log: [gcc] 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/70963 * config/rs6000/vsx.md (vsx_xvcvdpsxds_scale): Generate correct code for a zero scale factor. (vsx_xvcvdpuxds_scale): Likewise. [gcc/testsuite] 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/70963 * gcc.target/powerpc/pr70963.c: New. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.target/powerpc/pr70963.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/config/rs6000/vsx.md branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
Author: wschmidt Date: Tue May 10 17:24:32 2016 New Revision: 236097 URL: https://gcc.gnu.org/viewcvs?rev=236097&root=gcc&view=rev Log: [gcc] 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/70963 * config/rs6000/vsx.md (vsx_xvcvdpsxds_scale): Generate correct code for a zero scale factor. (vsx_xvcvdpuxds_scale): Likewise. [gcc/testsuite] 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2016-05-10 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/70963 * gcc.target/powerpc/pr70963.c: New. Added: branches/gcc-6-branch/gcc/testsuite/gcc.target/powerpc/pr70963.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/rs6000/vsx.md branches/gcc-6-branch/gcc/testsuite/ChangeLog
Fixed everywhere.