Bug 117079 - [15 Regression] FAIL: gcc.target/i386/pr105493.c since r15-2820-gab18785840d7b8
Summary: [15 Regression] FAIL: gcc.target/i386/pr105493.c since r15-2820-gab18785840d7b8
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P1 normal
Target Milestone: 15.0
Assignee: Christoph Müllner
URL:
Keywords: testsuite-fail
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-10-11 00:18 UTC by H.J. Lu
Modified: 2025-01-15 11:28 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-10-11 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2024-10-11 00:18:54 UTC
On x86-64, I got

FAIL: gcc.target/i386/pr105493.c scan-tree-dump-times slp1 "  MEM <vector\\(4\\) unsigned int> \\[[^]]*\\] = " 4
Comment 1 Sam James 2024-10-11 00:35:49 UTC
I see it too.
Comment 4 Christoph Müllner 2025-01-14 12:36:24 UTC
The reason that we don't have "MEM <vector(4) unsigned int>" in the dump anymore is that we now have "MEM <vector(8) unsigned char>".

Further, the size of the function in the test case shrinks from 225 instructions down to 109 (almost all vector instructions).

I tried to measure a performance difference on my 5950X (-march=native) when calling the test function four times in a loop with 1024l * 1024 * 1024 * 1024 iterations.
However, I did not see enough evidence to claim that the new code is better (memory bandwidth is probably the limit):

* old: 4m34.405s, 4m47.825s, 4m38.187s
* new: 4m34.722s, 4m34.936s, 4m34.922s

I propose to fix the failing test case by fixing the test condition.
A patch for that is on the list:
  https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673551.html

FWIW, here is a small code change that will bring back the old behavior for analysis:

--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2595,7 +2595,7 @@ out:
   auto_vec<unsigned> two_op_perm_indices[2];
   vec<stmt_vec_info> two_op_scalar_stmts[2] = {vNULL, vNULL};
 
-  if (two_operators && oprnds_info.length () == 2 && group_size > 2)
+  if (false && two_operators && oprnds_info.length () == 2 && group_size > 2)
     {
       unsigned idx = 0;
       hash_map<gimple *, unsigned> seen;