Bug 87621 - outer loop auto-vectorization fails for exponentiation code
Summary: outer loop auto-vectorization fails for exponentiation code
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2018-10-16 10:27 UTC by Trass3r
Modified: 2018-11-09 10:54 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-10-16 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Trass3r 2018-10-16 10:27:48 UTC
https://godbolt.org/z/bgieBT

template <typename T>
T pow(T x, unsigned int n)
{
	if (!n)
		return 1;

	T y = 1;
	while (n > 1)
	{
		if (n%2)
			y *= x;
		x = x*x; // unsupported use in stmt
		n /= 2;
	}
	return x*y;
}

void testVec(int* x)
{
	// loop nest containing two or more consecutive inner loops cannot be vectorized
	for (int i = 0; i < 8; ++i)
		x[i] = pow(x[i], 10);
}
Comment 1 Trass3r 2018-10-16 10:30:54 UTC
Interestingly it happily unrolls the loop even with -fno-unroll-loops.
Comment 2 Richard Biener 2018-10-16 11:56:23 UTC
The issue is the unsupported reduction.  We can't vectorize a

 x = x*x;

reduction.  And I don't see how we could.

We could eventually vectorize the outer loop but outer loop vectorization
is "confused" by the if-conversion we need to do to the inner loop.

Fixing that (y *= n%2 ? x : 1) yields outer loop vectorization failure like

t.ii:20:20: note:   vect_is_simple_use: operand y_36 = PHI <1(3), prephitmp_27(10)>, type of def: unknown
t.ii:20:20: missed:   Unsupported pattern.
t.ii:17:6: missed:   not vectorized: unsupported use in stmt.
t.ii:20:20: missed:  unexpected pattern.
t.ii:20:20: missed: couldn't vectorize loop

that is because we "simplified" the multiplication by 1 and thus the
reduction op becomes

 y = n%2 ? new_y : y;

and appearantly we do not like this (not sure why the reduction structure
is relevant for outer loop vectorization).  We do not actually detect this
as reduction, but we could simply identify inner loop reductions by
looking for the loop-closed PHIs.


So - were you expecting outer loop vectorization to happen?
Comment 3 Trass3r 2018-10-16 14:33:54 UTC
Yes see the godbolt link.
clang compiles it down to a few vpmulld's.
Comment 4 Richard Biener 2018-11-09 10:53:48 UTC
Fixed on trunk.
Comment 5 Richard Biener 2018-11-09 10:54:08 UTC
Author: rguenth
Date: Fri Nov  9 10:53:31 2018
New Revision: 265959

URL: https://gcc.gnu.org/viewcvs?rev=265959&root=gcc&view=rev
Log:
2018-11-09  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/87621
	* tree-vect-loop.c (vectorizable_reduction): Handle reduction
	op with only phi inputs.
	* tree-ssa-loop-ch.c: Include tree-ssa-sccvn.h.
	(ch_base::copy_headers): Run CSE on copied loop headers.
	(pass_ch_vect::process_loop_p): Simplify.

	* g++.dg/vect/pr87621.cc: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/vect/pr87621.cc
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-ch.c
    trunk/gcc/tree-vect-loop.c