Created attachment 51037 [details] reduced testcase Output: $ powerpc64le-unknown-linux-gnu-gcc -O testcase.c -static $ qemu-ppc64le -- ./a.out qemu: uncaught target signal 6 (Aborted) - core dumped Aborted $ powerpc64le-unknown-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-powerpc64le/bin/powerpc64le-unknown-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r12-1607-20210617224441-g2786064d91f-checking-yes-rtl-df-extra-powerpc64le/bin/../libexec/gcc/powerpc64le-unknown-linux-gnu/12.0.0/lto-wrapper Target: powerpc64le-unknown-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/powerpc64le-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=powerpc64le-unknown-linux-gnu --with-ld=/usr/bin/powerpc64le-unknown-linux-gnu-ld --with-as=/usr/bin/powerpc64le-unknown-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r12-1607-20210617224441-g2786064d91f-checking-yes-rtl-df-extra-powerpc64le Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.0.0 20210618 (experimental) (GCC)
Started with r11-5839-g3a6e3ad38a17a03ee0139b49a0946e7b9ded1eb1
Optimized difference: --- pr101129.c.244t.optimized.r11-5838 2021-06-18 22:40:20.915971652 +0200 +++ pr101129.c.244t.optimized.r11-5839 2021-06-18 22:40:34.739789553 +0200 @@ -8,6 +8,7 @@ void foo0 (u32 u32_0, U * ret) __int128 _1; vector(16) unsigned char _2; unsigned char _3; + vector(16) <signed-boolean:8> _4; vector(16) signed char _5; __int128 _6; __int128 _7; @@ -31,7 +32,8 @@ void foo0 (u32 u32_0, U * ret) _14 = u32_0_13(D) & 4; _3 = (unsigned char) _14; _2 = {_3, _3, _3, _3, _3, _3, _3, _3, _3, _3, _3, _3, _3, _3, _3, _3}; - _5 = .VCONDU (_2, { 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, 108); + _4 = _2 < { 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; + _5 = VIEW_CONVERT_EXPR<vector(16) signed char>(_4); _6 = VIEW_CONVERT_EXPR<__int128>(_5); u128_2_15 = _1 * _6; _7 = u128_2_15 >> 64; Assembly difference: --- pr101129.s.r11-5838 2021-06-18 22:40:20.919971599 +0200 +++ pr101129.s.r11-5839 2021-06-18 22:40:34.743789500 +0200 @@ -14,19 +14,16 @@ foo0: .localentry foo0,.-foo0 rlwinm 9,3,0,29,29 mtvsrd 32,9 - vspltb 0,0,7 + vspltb 1,0,7 addis 9,2,.LC0@toc@ha addi 9,9,.LC0@toc@l - lxvd2x 33,0,9 - vcmpgtub 0,1,0 - addi 9,1,-16 - xxpermdi 32,32,32,2 - stxvd2x 32,0,9 - ld 9,-8(1) - ld 10,-16(1) + lxvd2x 32,0,9 + vcmpgtub 0,0,1 + mfvsrd 11,32 + xxpermdi 0,32,32,3 + mfvsrd 10,0 + add 10,11,10 mulld 10,10,3 - mulhdu 3,3,9 - add 10,10,3 addis 9,2,.LANCHOR0@toc@ha lbz 9,.LANCHOR0@toc@l(9) add 10,10,9
The GIMPLE doesn't look wrong but eventually + _4 = _2 < { 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; + _5 = VIEW_CONVERT_EXPR<vector(16) signed char>(_4); goes wrong somewhere in RTL expansion?
I think this is a bug in the swaps rs6000 specific pass. It in particular changes (insn 20 19 21 2 (set (reg:DI 155) (mult:DI (reg:DI 224) - (subreg:DI (reg:V16QI 148) 0))) "pr101129.c":14:8 154 {muldi3} + (subreg:DI (reg:V16QI 148) 8))) "pr101129.c":14:8 154 {muldi3} (nil)) (insn 21 20 22 2 (set (reg:DI 156) (subreg:DI (mult:TI (zero_extend:TI (reg:DI 224)) - (zero_extend:TI (subreg:DI (reg:V16QI 148) 0))) 8)) "pr101129.c":14:8 166 {umuldi3_highpart_le} + (zero_extend:TI (subreg:DI (reg:V16QI 148) 8))) 0)) "pr101129.c":14:8 166 {umuldi3_highpart_le} (nil)) The SUBREG_BYTE changes in subregs of pseudo 148 look reasonable as the pass removed a xxswapd_v16qi that was swapping the value set to that pseudo. But the 8 to 0 change of SUBREG_BYTE with SUBREG_REG of MULT looks wrong, that wouldn't even match unless simplified and turns a highpart multiply into lowpart multiply. So either the swaps pass needs to be fixed not to do that, or the highpart multiply instructions should be changed to e.g. use lshiftrt by 64 instead of highpart subreg of the result.
Uh, yeah, that is completely unexpected behavior for swaps. I'll try to look at this soon.
Testing this patch. diff --git a/gcc/config/rs6000/rs6000-p8swap.c b/gcc/config/rs6000/rs6000-p8swap.c index 21cbcb2e28a..00693e6dc60 100644 --- a/gcc/config/rs6000/rs6000-p8swap.c +++ b/gcc/config/rs6000/rs6000-p8swap.c @@ -1523,6 +1523,20 @@ replace_swap_with_copy (swap_web_entry *insn_entry, unsigned i) insn->set_deleted (); } +/* INSN is known to contain a SUBREG, which we can normally handle, + but if the SUBREG itself contains a MULT then we need to leave it alone + to avoid turning a mult_hipart into a mult_lopart, for example. */ +static bool +has_part_mult (rtx_insn *insn) +{ + rtx body = PATTERN (insn); + rtx src = SET_SRC (body); + if (GET_CODE (src) != SUBREG) + return false; + rtx inner = XEXP (src, 0); + return (GET_CODE (inner) == MULT); +} + /* Make NEW_MEM_EXP's attributes and flags resemble those of ORIGINAL_MEM_EXP. */ static void @@ -2501,6 +2515,9 @@ rs6000_analyze_swaps (function *fun) insn_entry[uid].is_swappable = 0; else if (special != SH_NONE) insn_entry[uid].special_handling = special; + else if (insn_entry[uid].contains_subreg + && has_part_mult (insn)) + insn_entry[uid].is_swappable = 0; else if (insn_entry[uid].contains_subreg) insn_entry[uid].special_handling = SH_SUBREG; }
That patch is pre-approved (if it works ;-) ) Bill. Thanks!
Small change required to actually check that it's a SET insn. (oops) Otherwise looks like it passed regstrap. Testing the revised patch now. Thanks for the pre-review!
The master branch has been updated by William Schmidt <wschmidt@gcc.gnu.org>: https://gcc.gnu.org/g:ad5f8ac1d2f2dc92d43663243b52f9e9eb3cf7c0 commit r12-2325-gad5f8ac1d2f2dc92d43663243b52f9e9eb3cf7c0 Author: Bill Schmidt <wschmidt@linux.ibm.com> Date: Thu Jul 15 10:16:17 2021 -0500 rs6000: Don't let swaps pass break multiply low-part (PR101129) 2021-07-15 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ PR target/101129 * config/rs6000/rs6000-p8swap.c (has_part_mult): New. (rs6000_analyze_swaps): Insns containing a subreg of a mult are not swappable. gcc/testsuite/ PR target/101129 * gcc.target/powerpc/pr101129.c: New.
Fixed on trunk, but will require backports.
The releases/gcc-11 branch has been updated by William Schmidt <wschmidt@gcc.gnu.org>: https://gcc.gnu.org/g:ac0efe3c6fc6231b20ffd684956a4a5c3c54a96b commit r11-8780-gac0efe3c6fc6231b20ffd684956a4a5c3c54a96b Author: Bill Schmidt <wschmidt@linux.ibm.com> Date: Mon Jul 19 12:49:17 2021 -0500 rs6000: Don't let swaps pass break multiply low-part (PR101129) Backport from mainline. 2021-07-15 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ PR target/101129 * config/rs6000/rs6000-p8swap.c (has_part_mult): New. (rs6000_analyze_swaps): Insns containing a subreg of a mult are not swappable. gcc/testsuite/ PR target/101129 * gcc.target/powerpc/pr101129.c: New.
The releases/gcc-10 branch has been updated by William Schmidt <wschmidt@gcc.gnu.org>: https://gcc.gnu.org/g:ecad28561c47f5b2652a03e9b67cf37c13d88f53 commit r10-9990-gecad28561c47f5b2652a03e9b67cf37c13d88f53 Author: Bill Schmidt <wschmidt@linux.ibm.com> Date: Mon Jul 19 14:38:23 2021 -0500 rs6000: Don't let swaps pass break multiply low-part (PR101129) Backport from mainline. 2021-07-15 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ PR target/101129 * config/rs6000/rs6000-p8swap.c (has_part_mult): New. (rs6000_analyze_swaps): Insns containing a subreg of a mult are not swappable. gcc/testsuite/ PR target/101129 * gcc.target/powerpc/pr101129.c: New.
The releases/gcc-9 branch has been updated by William Schmidt <wschmidt@gcc.gnu.org>: https://gcc.gnu.org/g:3e4b43ec9a937866ead88e996bd200b116a1766c commit r9-9632-g3e4b43ec9a937866ead88e996bd200b116a1766c Author: Bill Schmidt <wschmidt@linux.ibm.com> Date: Mon Jul 19 16:14:36 2021 -0500 rs6000: Don't let swaps pass break multiply low-part (PR101129) Backport from mainline. 2021-07-15 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ PR target/101129 * config/rs6000/rs6000-p8swap.c (has_part_mult): New. (rs6000_analyze_swaps): Insns containing a subreg of a mult are not swappable. gcc/testsuite/ PR target/101129 * gcc.target/powerpc/pr101129.c: New.
Now fixed everywhere.