Summary: | outer loop auto-vectorization fails for exponentiation code | ||
---|---|---|---|
Product: | gcc | Reporter: | Trass3r <trass3r> |
Component: | tree-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | rguenth |
Priority: | P3 | Keywords: | missed-optimization |
Version: | 9.0 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2018-10-16 00:00:00 | |
Bug Depends on: | |||
Bug Blocks: | 53947 |
Description
Trass3r
2018-10-16 10:27:48 UTC
Interestingly it happily unrolls the loop even with -fno-unroll-loops. The issue is the unsupported reduction. We can't vectorize a x = x*x; reduction. And I don't see how we could. We could eventually vectorize the outer loop but outer loop vectorization is "confused" by the if-conversion we need to do to the inner loop. Fixing that (y *= n%2 ? x : 1) yields outer loop vectorization failure like t.ii:20:20: note: vect_is_simple_use: operand y_36 = PHI <1(3), prephitmp_27(10)>, type of def: unknown t.ii:20:20: missed: Unsupported pattern. t.ii:17:6: missed: not vectorized: unsupported use in stmt. t.ii:20:20: missed: unexpected pattern. t.ii:20:20: missed: couldn't vectorize loop that is because we "simplified" the multiplication by 1 and thus the reduction op becomes y = n%2 ? new_y : y; and appearantly we do not like this (not sure why the reduction structure is relevant for outer loop vectorization). We do not actually detect this as reduction, but we could simply identify inner loop reductions by looking for the loop-closed PHIs. So - were you expecting outer loop vectorization to happen? Yes see the godbolt link. clang compiles it down to a few vpmulld's. Fixed on trunk. Author: rguenth Date: Fri Nov 9 10:53:31 2018 New Revision: 265959 URL: https://gcc.gnu.org/viewcvs?rev=265959&root=gcc&view=rev Log: 2018-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/87621 * tree-vect-loop.c (vectorizable_reduction): Handle reduction op with only phi inputs. * tree-ssa-loop-ch.c: Include tree-ssa-sccvn.h. (ch_base::copy_headers): Run CSE on copied loop headers. (pass_ch_vect::process_loop_p): Simplify. * g++.dg/vect/pr87621.cc: New testcase. Added: trunk/gcc/testsuite/g++.dg/vect/pr87621.cc Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-ch.c trunk/gcc/tree-vect-loop.c |