The __builtin_mma_disassemble_acc built-in doesn't correctly account for little endian byte ordering of the pointer type passed to it is not a __vector_quad pointer as the following test case shows: bergner@pike:~/$ cat disassemble.c void buggy (void *dst) { __vector_quad acc; __builtin_mma_xxsetaccz (&acc); __builtin_mma_disassemble_acc (dst, &acc); } void foo (__vector_quad *dst) { __vector_quad acc; __builtin_mma_xxsetaccz (&acc); __builtin_mma_disassemble_acc (dst, &acc); } bergner@pike:~/$ gcc -S -O2 -mcpu=power10 disassemble.c bergner@pike:~/$ cat disassemble.s buggy: xxsetaccz 0 xxmfacc 0 stxv 0,0(3) stxv 1,16(3) stxv 2,32(3) stxv 3,48(3) blr foo: xxsetaccz 0 xxmfacc 0 stxvp 2,0(3) stxvp 0,32(3) blr
Mine. This is broken in the FSF GCC 10 branch as well.
The master branch has been updated by Peter Bergner <bergner@gcc.gnu.org>: https://gcc.gnu.org/g:ae575662833d70cb7d74b9538096c7becc79af14 commit r11-2278-gae575662833d70cb7d74b9538096c7becc79af14 Author: Peter Bergner <bergner@linux.ibm.com> Date: Wed Jul 22 11:44:35 2020 -0500 rs6000: __builtin_mma_disassemble_acc() doesn't store elements correctly in LE mode PR96236 shows a problem where we don't correctly store our 512-bit accumulators correctly in little-endian mode. The patch below detects when we're doing a little-endian memory access and stores to the correct memory locations. 2020-07-22 Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/96236 * config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Handle little-endian memory ordering. gcc/testsuite/ PR target/96236 * gcc.target/powerpc/mma-double-test.c: Update storing results for correct little-endian ordering. * gcc.target/powerpc/mma-single-test.c: Likewise.
Fixed on trunk. I will backport to the GCC 10 release branch once it reopens. I would have set the target milestone to 10.3, but that version isn't an option right now.
The releases/gcc-10 branch has been updated by Peter Bergner <bergner@gcc.gnu.org>: https://gcc.gnu.org/g:5497677b497b95a261089d19f5295cc80f99a2b6 commit r10-8522-g5497677b497b95a261089d19f5295cc80f99a2b6 Author: Peter Bergner <bergner@linux.ibm.com> Date: Wed Jul 22 11:44:35 2020 -0500 rs6000: __builtin_mma_disassemble_acc() doesn't store elements correctly in LE mode PR96236 shows a problem where we don't correctly store our 512-bit accumulators correctly in little-endian mode. The patch below detects when we're doing a little-endian memory access and stores to the correct memory locations. 2020-07-22 Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/96236 * config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Handle little-endian memory ordering. gcc/testsuite/ PR target/96236 * gcc.target/powerpc/mma-double-test.c: Update storing results for correct little-endian ordering. * gcc.target/powerpc/mma-single-test.c: Likewise. (cherry picked from commit ae575662833d70cb7d74b9538096c7becc79af14)
Fixed everywhere.