On x86-64, I got FAIL: gcc.target/i386/pr105493.c scan-tree-dump-times slp1 " MEM <vector\\(4\\) unsigned int> \\[[^]]*\\] = " 4
I see it too.
https://inbox.sourceware.org/gcc-patches/202408081706.478H6mY51198181@shliclel4214.sh.intel.com/
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659911.html
The reason that we don't have "MEM <vector(4) unsigned int>" in the dump anymore is that we now have "MEM <vector(8) unsigned char>". Further, the size of the function in the test case shrinks from 225 instructions down to 109 (almost all vector instructions). I tried to measure a performance difference on my 5950X (-march=native) when calling the test function four times in a loop with 1024l * 1024 * 1024 * 1024 iterations. However, I did not see enough evidence to claim that the new code is better (memory bandwidth is probably the limit): * old: 4m34.405s, 4m47.825s, 4m38.187s * new: 4m34.722s, 4m34.936s, 4m34.922s I propose to fix the failing test case by fixing the test condition. A patch for that is on the list: https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673551.html FWIW, here is a small code change that will bring back the old behavior for analysis: --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2595,7 +2595,7 @@ out: auto_vec<unsigned> two_op_perm_indices[2]; vec<stmt_vec_info> two_op_scalar_stmts[2] = {vNULL, vNULL}; - if (two_operators && oprnds_info.length () == 2 && group_size > 2) + if (false && two_operators && oprnds_info.length () == 2 && group_size > 2) { unsigned idx = 0; hash_map<gimple *, unsigned> seen;
Fixed in https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=120a37008222bf6fe17658af3d1ba1b384642905