Created attachment 47085 [details] 32-bit sparc-sun-solaris2.11 bb-slp-22.c.168t.slp2 Between 20191016 (r277071) and 20191021 (r277261), gcc.dg/vect/bb-slp-22.c began to FAIL on 32 and 64-bit SPARC: +FAIL: gcc.dg/vect/bb-slp-22.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "basic block vectorized" 1 +FAIL: gcc.dg/vect/bb-slp-22.c scan-tree-dump-times slp2 "basic block vectorized" 1 The message now appears twice, so this is clearly progress, but the guards need to be adapted. Dump attached.
Ah, thanks - that might be an undesided side-effect of r277241 We now vectorize out[0] = a0 * x; out[1] = a1 * y; out[2] = a2 * x; out[3] = a3 * y; as _5 = a0_40 * x_44(D); _6 = a1_41 * y_45(D); _7 = a2_42 * x_44(D); _8 = a3_43 * y_45(D); _66 = {_7, _8}; vect_cst__67 = _66; _68 = {_5, _6}; vect_cst__69 = _68; MEM <vector(2) unsigned int> [(unsigned int *)&out] = vect_cst__69; _71 = &out[0] + 8; MEM <vector(2) unsigned int> [(unsigned int *)_71] = vect_cst__67; since we're no longer fencing the build-from-scalar code via && !SLP_TREE_CHILDREN (child).is_empty () (previously we had no SLP children nodes for the SLP node representing the multiplication). So the test is no longer testing vectorization of multiplications. It also shows that BB vectorizing this function at strict basic-block boundaries is suboptimal. I'll see what to best do here, clearly a sort-term fix would be to change the code to make vectorization of the multiplication more obviously profitable.
The master branch has been updated by Richard Guenther <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:9847df2c9573f1e4b948b5a7272c6aadf8e01c22 commit r10-6449-g9847df2c9573f1e4b948b5a7272c6aadf8e01c22 Author: Richard Biener <rguenther@suse.de> Date: Wed Feb 5 14:10:50 2020 +0100 testsuite/92177 fix for SLP build changes We're now consistently building SLP operations with only scalar defs from scalars which makes the testcase no longer testing multiplication vectorization. The following smuggles in a constant making the vector variant profitable for SLP build. 2020-02-05 Richard Biener <rguenther@suse.de> PR testsuite/92177 * gcc.dg/vect/bb-slp-22.c: Adjust.
We should test multiplication again, so hopefully fixed (unless Richards recent improvement makes such adventure moot again by pruning the SLP tree based on unsupported ops...)
Created attachment 48429 [details] 32-bit sparc-sun-solaris2.11 bb-slp-22.c.171.slp2 I just noticed that the test still FAILs, both 32 and 64-bit, and apparently never stopped.
Not yet fixed.
Doesn't look like a release blocker though, it seems this isn't a wrong-code issue. sparc*-*-solaris* is not a vect_int_mult target, as it can't vectorize integer multiplication, but on the testcase in question it doesn't vectorize that: [bb-slp-22.c:29:24] _5 = x_48(D) + 1; [bb-slp-22.c:29:19] _6 = _5 * a0_44; - [bb-slp-22.c:29:14] [bb-slp-22.c:29:10] out[0] = _6; [bb-slp-22.c:30:24] _7 = y_49(D) + 1; [bb-slp-22.c:30:19] _8 = _7 * a1_45; - [bb-slp-22.c:30:14] [bb-slp-22.c:30:10] out[1] = _8; [bb-slp-22.c:31:19] _9 = _5 * a2_46; - [bb-slp-22.c:31:14] [bb-slp-22.c:31:10] out[2] = _9; [bb-slp-22.c:32:19] _10 = _7 * a3_47; - [bb-slp-22.c:32:14] [bb-slp-22.c:32:10] out[3] = _10; + _70 = {_9, _10}; + [bb-slp-22.c:29:14] vect_cst__71 = _70; + _72 = {_6, _8}; + [bb-slp-22.c:29:14] vect_cst__73 = _72; + [bb-slp-22.c:29:14] MEM <vector(2) unsigned int> [(unsigned int *)&out] = vect_cst__73; + [bb-slp-22.c:29:14] _75 = &[bb-slp-22.c:29:10] out[0] + 8; + [bb-slp-22.c:29:14] MEM <vector(2) unsigned int> [(unsigned int *)_75] = vect_cst__71; It just vectorizes the stores. So looks like a testsuite issue to me. Either it needs to look for a different message in the dump file etc.
The SLP reorgs made making this testcase target agnostic really really hard. It was supposed to test multiplication vectorization but now tests nothing. I don't see how to reliably test this unless we scan for details in the vectorizer dumps that are prone to change. Currently we have for example on x86_64: /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: SLPing BB part /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: ------>vectorizing SLP node starting from: _5 = x_48(D) + 1; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: operand x_48(D), type of def: external /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: operand 1, type of def: constant /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: transform binary/unary operation. /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: created new init_stmt: vect_cst__63 = _42; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: created new init_stmt: vect_cst__12 = { 1, 1, 1, 1 }; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: add new stmt: vect__5.19_13 = vect_cst__63 + vect_cst__12; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: ------>vectorizing SLP node starting from: _6 = _5 * a0_44; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: operand x_48(D) + 1, type of def: internal /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: vectype vector(4) unsigned int /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: operand _1 + 23, type of def: external /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: transform binary/unary operation. /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: created new init_stmt: vect_cst__71 = _70; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: add new stmt: vect__6.20_72 = vect__5.19_13 * vect_cst__71; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: ------>vectorizing SLP node starting from: out[0] = _6; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: operand _5 * a0_44, type of def: internal /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: vectype vector(4) unsigned int /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: operand _7 * a1_45, type of def: internal /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: operand _5 * a2_46, type of def: internal /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vect_is_simple_use: operand _7 * a3_47, type of def: internal /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: transform store. ncopies = 1 /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: create vector_type-pointer variable to type: vector(4) unsigned int vectorizing a pointer ref: out[0] /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: created &out[0] /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: add new stmt: MEM <vector(4) unsigned int> [(unsigned int *)&out] = vect__6.20_72; /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: note: vectorizing stmts using SLP. /home/rguenther/src/trunk/gcc/testsuite/gcc.dg/vect/bb-slp-22.c:35:14: optimized: basic block part vectorized using 16 byte vectors we could match 'vectorizing SLP node starting from: _6 = _5 * a0_44; which means we are actually vectorizing a multiplication. Like with the following. Rainer - can you test this? diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-22.c b/gcc/testsuite/gcc.dg/vect/bb-slp-22.c index 6dc2375f5d1..f25a225666e 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-22.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-22.c @@ -63,6 +63,6 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" { target { ! {vect_int_mult } } } } } */ -/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" { target vect_int_mult } } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */ +/* { dg-final { scan-tree-dump "vectorizing SLP node starting from: _\[0-9\]+ = _\[0-9\]+ \\\* a0" "slp2" { target vect_int_mult } } } */
> --- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- [...] > which means we are actually vectorizing a multiplication. Like with > the following. Rainer - can you test this? [...] Works for me: tested on sparc-sun-solaris2.11 (32 and 64-bit) and (to verify that I didn't mess up the mangled patch) on i386-pc-solaris2.11.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:3fbf43b9bc060e2904abe64e870868b9a4bfce13 commit r11-71-g3fbf43b9bc060e2904abe64e870868b9a4bfce13 Author: Richard Biener <rguenther@suse.de> Date: Tue May 5 15:38:24 2020 +0200 testsuite/92177 - adjust expected patterns for gcc.dg/vect/bb-slp-22.c We now always vectorize two BBs, adjust the selector to also scan for integer multiplication vectorization explicitely. 2020-05-05 Richard Biener <rguenther@suse.de> PR testsuite/92177 * gcc.dg/vect/bb-slp-22.c: Adjust.
Fixed on trunk (hopefully).
GCC 10.1 has been released.
.
The releases/gcc-10 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:ff9d4e09566e24d4dff10adeaef109823266c7bd commit r10-8140-gff9d4e09566e24d4dff10adeaef109823266c7bd Author: Richard Biener <rguenther@suse.de> Date: Tue May 5 15:38:24 2020 +0200 testsuite/92177 - adjust expected patterns for gcc.dg/vect/bb-slp-22.c We now always vectorize two BBs, adjust the selector to also scan for integer multiplication vectorization explicitely. 2020-05-05 Richard Biener <rguenther@suse.de> PR testsuite/92177 * gcc.dg/vect/bb-slp-22.c: Adjust.