This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems
- From: David Malcolm <dmalcolm at redhat dot com>
- To: Richard Sandiford <richard dot sandiford at arm dot com>
- Cc: Richard Biener <rguenther at suse dot de>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 11 Jul 2018 14:28:39 -0400
- Subject: Re: [PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems
- References: <alpine.LSU.2.20.1806251101311.5043@zhemvz.fhfr.qr> <1531185096-17113-1-git-send-email-dmalcolm@redhat.com> <87k1q17omd.fsf@arm.com>
On Wed, 2018-07-11 at 16:56 +0100, Richard Sandiford wrote:
> David Malcolm <dmalcolm@redhat.com> writes:
> > On Mon, 2018-06-25 at 11:10 +0200, Richard Biener wrote:
> > > On Fri, 22 Jun 2018, David Malcolm wrote:
> > >
> > > > NightStrike and I were chatting on IRC last week about
> > > > issues with trying to vectorize the following code:
> > > >
> > > > #include <vector>
> > > > std::size_t f(std::vector<std::vector<float>> const & v) {
> > > > std::size_t ret = 0;
> > > > for (auto const & w: v)
> > > > ret += w.size();
> > > > return ret;
> > > > }
> > > >
> > > > icc could vectorize it, but gcc couldn't, but neither of us
> > > > could
> > > > immediately figure out what the problem was.
> > > >
> > > > Using -fopt-info leads to a wall of text.
> > > >
> > > > I tried using my patch here:
> > > >
> > > > "[PATCH] v3 of optinfo, remarks and optimization records"
> > > > https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
> > > >
> > > > It improved things somewhat, by showing:
> > > > (a) the nesting structure via indentation, and
> > > > (b) the GCC line at which each message is emitted (by using the
> > > > "remark" output)
> > > >
> > > > but it's still a wall of text:
> > > >
> > > > https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.rema
> > > > rks.
> > > > html
> > > > https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..
> > > > %7C.
> > > > .%7Csrc%7Ctest.cc.html#line-4
> > > >
> > > > It doesn't yet provide a simple high-level message to a
> > > > tech-savvy user on what they need to do to get GCC to
> > > > vectorize their loop.
> > >
> > > Yeah, in particular the vectorizer is way too noisy in its low-
> > > level
> > > functions. IIRC -fopt-info-vec-missed is "somewhat" better:
> > >
> > > t.C:4:26: note: step unknown.
> > > t.C:4:26: note: vector alignment may not be reachable
> > > t.C:4:26: note: not ssa-name.
> > > t.C:4:26: note: use not simple.
> > > t.C:4:26: note: not ssa-name.
> > > t.C:4:26: note: use not simple.
> > > t.C:4:26: note: no array mode for V2DI[3]
> > > t.C:4:26: note: Data access with gaps requires scalar epilogue
> > > loop
> > > t.C:4:26: note: can't use a fully-masked loop because the target
> > > doesn't
> > > have the appropriate masked load or store.
> > > t.C:4:26: note: not ssa-name.
> > > t.C:4:26: note: use not simple.
> > > t.C:4:26: note: not ssa-name.
> > > t.C:4:26: note: use not simple.
> > > t.C:4:26: note: no array mode for V2DI[3]
> > > t.C:4:26: note: Data access with gaps requires scalar epilogue
> > > loop
> > > t.C:4:26: note: op not supported by target.
> > > t.C:4:26: note: not vectorized: relevant stmt not supported: _15
> > > =
> > > _14
> > > /[ex] 4;
> > > t.C:4:26: note: bad operation or unsupported loop bound.
> > > t.C:4:26: note: not vectorized: no grouped stores in basic block.
> > > t.C:4:26: note: not vectorized: no grouped stores in basic block.
> > > t.C:6:12: note: not vectorized: not enough data-refs in basic
> > > block.
> > >
> > >
> > > > The pertinent dump messages are:
> > > >
> > > > test.cc:4:23: remark: === try_vectorize_loop_1 ===
> > > > [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
> > > > cc1plus: remark:
> > > > Analyzing loop at test.cc:4
> > > > [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
> > > > test.cc:4:23: remark: === analyze_loop_nest ===
> > > > [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
> > > > [...snip...]
> > > > test.cc:4:23: remark: === vect_analyze_loop_operations ===
> > > > [../../src/gcc/tree-vect-
> > > > loop.c:1520:vect_analyze_loop_operations]
> > > > [...snip...]
> > > > test.cc:4:23: remark: ==> examining statement: ‘_15 = _14
> > > > /[ex]
> > > > 4;’ [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
> > > > test.cc:4:23: remark: vect_is_simple_use: operand ‘_14’
> > > > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> > > > test.cc:4:23: remark: def_stmt: ‘_14 = _8 - _7;’
> > > > [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
> > > > test.cc:4:23: remark: type of def: internal
> > > > [../../src/gcc/tree-
> > > > vect-stmts.c:10112:vect_is_simple_use]
> > > > test.cc:4:23: remark: vect_is_simple_use: operand ‘4’
> > > > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> > > > test.cc:4:23: remark: op not supported by target.
> > > > [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
> > > > test.cc:4:23: remark: not vectorized: relevant stmt not
> > > > supported: ‘_15 = _14 /[ex] 4;’ [../../src/gcc/tree-vect-
> > > > stmts.c:9565:vect_analyze_stmt]
> > > > test.cc:4:23: remark: bad operation or unsupported loop
> > > > bound.
> > > > [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
> > > > cc1plus: remark: vectorized 0 loops in function.
> > > > [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
> > > >
> > > > In particular, that complaint from
> > > > [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
> > > > is coming from:
> > > >
> > > > if (!ok)
> > > > {
> > > > if (dump_enabled_p ())
> > > > {
> > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > > > vect_location,
> > > > "not vectorized: relevant stmt not
> > > > ");
> > > > dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
> > > > dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > > > stmt, 0);
> > > > }
> > > >
> > > > return false;
> > > > }
> > > >
> > > > This got me thinking: the user presumably wants to know several
> > > > things:
> > > >
> > > > * the location of the loop that can't be vectorized
> > > > (vect_location
> > > > captures this)
> > > > * location of the problematic statement
> > > > * why it's problematic
> > > > * the problematic statement itself.
> > > >
> > > > The following is an experiment at capturing that information,
> > > > by
> > > > recording an "opt_problem" instance describing what the
> > > > optimization
> > > > problem is, created deep in the callstack when it occurs, for
> > > > use
> > > > later on back at the top of the vectorization callstack.
> > >
> > > Nice idea. Of course the issue is then for which issues to
> > > exactly create those. Like all of the MSG_MISSED_OPTIMIZATION
> > > dumpings?
> > >
> > > I guess the vectorizer needs some axing of useless messages
> > > and/or we need a MSG_DEBUG to have an additional level below
> > > MSG_NOTE.
> > >
> > > > This extra work is only done if dump_enabled_p.
> > > >
> > > > It feels vaguely analogous to an exception object (in terms of
> > > > packaging up a problem that occurs deep in the stack for
> > > > reporting
> > > > back at a higher level).
> > > >
> > > > With this patch, we emit:
> > > >
> > > > ../../src/test.cc: In function ‘std::size_t f(const
> > > > std::vector<std::vector<float> >&)’:
> > > > ../../src/test.cc:4:23: remark: couldn't vectorize loop
> > > > for (auto const & w: v)
> > > > ^
> > > > In file included from ../x86_64-pc-linux-gnu/libstdc++-
> > > > v3/include/vector:64,
> > > > from ../../src/test.cc:1:
> > > > ../x86_64-pc-linux-gnu/libstdc++-
> > > > v3/include/bits/stl_vector.h:806:50:
> > > > note: relevant stmt not supported: ‘_15 = _14 /[ex] 4;’
> > > > { return size_type(this->_M_impl._M_finish - this-
> > > > > _M_impl._M_start); }
> > > >
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
> > > > ~~~~
> > > > ~~~~~~~
> > > >
> > > > which reports both the location of the loop and the statement
> > > > that's
> > > > problematic (if I'm reading it right, the pointer arithmetic
> > > > leads
> > > > to a
> > > > division by 4, and presumably we're not able to handle that).
> > >
> > > Quite likely because we only handle TRUNC_DIV_EXPR and not
> > > EXACT_DIV_EXPR
> > > which we can handle the same semantically (EXACT_DIV_EXPR just
> > > gives
> > > us stronger guarantees).
> >
> > [...snip...]
> >
> > I've been experimenting with the idea; here's an updated version of
> > the patch (on top of the optinfo patch kit, as it happens).
> >
> > I tried adding more uses of opt_problem, but ran into the issue
> > that it's
> > hard to find "by hand" the places deep in the callstack where
> > things fail;
> > I spent a chunk of time stepping through failures, trying to figure
> > out
> > the best place to capture the opt_problem.
> >
> > It seemed like something that we could track in C++'s type system.
> >
> > The following kit solves this by introducing a new class
> > opt_result, which
> > looks a lot like a bool: it has the same representation in memory.
> >
> > For instance, at the deepest point of the callstack where the
> > failure
> > happens, rather than:
> >
> > if (!check_something ())
> > {
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "foo is unsupported.\n");
> > return false;
> > }
> > [...lots more checks...]
> >
> > // All checks passed:
> > return true;
> >
> > we can capture the cause of the failure via:
> >
> > if (!check_something ())
> > {
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "foo is unsupported.\n");
> > return opt_result::failure ("foo is unsupported",
> > stmt);
> > }
> > [...lots more checks...]
> >
> > // All checks passed:
> > return opt_result::success ();
> >
> > which effectively returns true or false, whilst recording any
> > problem.
> >
> > opt_result::success and opt_result::failure return opt_result
> > values
> > which "looks like" true/false respectively, via operator bool().
> > If dump_enabled_p, then opt_result::failure also creates an
> > opt_problem *,
> > capturing the pertinent data (here, "foo is unsupported" and
> > "stmt").
> > If dumps are disabled, then opt_problem instances aren't
> > created, and it's equivalent to just returning a bool (false for
> > failure).
>
> Since the creation of the opt_problem depends on dump_enabled_p,
> would
> it make sense for the dump_printf_loc to happen automatically on
> opt_result::failure, rather than have both? E.g.:
>
> if (!check_something ())
> // Also calls dump_printf_loc if appropriate.
> return opt_result::failure ("foo is unsupported", stmt);
>
> I think the aim should be for opt_problem to capture any information
> that's useful at the user level, so doing the dump_printf_loc
> automatically would help to keep the MISSED_OPTIMIZATIONS messages
> clean of information that's too low-level.
Interesting idea, thanks.
The existing dump_* calls are "time-ordered": they can be thought of as
log of the events that happened as the optimizer ran. The time-ordered
log looks like:
[...lots of messages...]
note: foo is unsupported
[...probably more messages...]
note: loop could not be vectorized
This patch kit inverts the message ordering somewhat: it's storing away
details for later use, so that we can print:
remark: loop could not be vectorized
note: foo is unsupported: DETAILS OF STMT
where the note "happened" before the remark, but providing a good
message to the user suggests doing it in that
"what happened" then
"why (with details)"
order.
One of the "unknowns" in this patch kit is the relationship between the
"high-level messages" that I'm proposing to eventually emit at the top
level, and the existing dump_* messages.
I'm wondering if the existing dump_* messages should continue to be
time-ordered, as a log for debugging purposes, and to have these high-
level messages as a more user-facing thing.
Then the opt_result::failure would internally do a dump_printf_loc for
logging purposes, but an end-user might be looking at the remarks/notes
thing as a separate stream of messages.
(I'm thinking aloud here)
> > The opt_problem can be propagated via opt_result values back up
> > the call stack to where it makes most sense to the user.
> > For instance, rather than:
> >
> > bool ok = try_something_that_might_fail ();
> > if (!ok)
> > {
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "some message.\n");
> > return false;
> > }
> >
> > we can replace the bool with an opt_result, so if dump_enabled_p,
> > we
> > assume that if try_something_that_might_fail, an opt_problem * will
> > be
> > created, and we can propagate it up the call chain:
> >
> > opt_result ok = try_something_that_might_fail ();
> > if (!ok)
> > {
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "some message.\n");
> > return ok; // propagating the opt_result
> > }
> >
> > There's currently a way to implicitly construct an opt_result from
> > a
> > bool, but this is scaffolding: commenting it out allows the
> > compiler to
> > tell us where we need to capture failure information.
>
> I guess this is bike-shedding, but personally I'd prefer an explicit
> test for success rather than operator bool, so that:
>
> opt_result foo = ...;
> bool bar = foo;
>
> is ill-formed. The danger otherwise might be that we drop a useful
> opt_problem and replace it with something more generic. E.g. the
> opt_result form of:
>
> if (!ok)
> {
> if (dump_enabled_p ())
> {
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "not vectorized: relevant stmt not ");
> dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
> dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt,
> 0);
> }
>
> return false;
> }
>
> in vect_analyze_stmt could silently drop the information provided by
> the subroutine if we forgot to change "ok" from "bool" to
> "opt_result".
I think that's handled already, kind-of. If the function's return type
is opt_result, then an attempt to "return false;" will only work if an
opt_result can be constructed from bool.
Currently, opt_result does have this ctor from bool:
/* Deprecated ctor. During transition, allow construction from bool.
We want to eliminate this, as it doesn't capture the reason for
failures. */
opt_result (bool result,
const dump_impl_location_t &impl_location
= dump_impl_location_t ())
: opt_wrapper <bool> (result,
opt_problem::make ("UNKNOWN", NULL, impl_location))
{
}
and I've been working towards eliminating it. If I comment it out, the
compiler immediately provides an error on all of the places where we
would silent discard the opt_result.
So I think I just need to finish eliminating that transitional
constructor; hopefully that would alleviate your concern here.
> Same for opt_wrapper<T>, but defining operator* and operator-> would
> be useful.
>
> LGTM otherwise (but obviously it's Richard B's call).
Thanks for taking a look.
Dave
> Thanks,
> Richard
>
> > As well as opt_result for "bool", there's a template for wrapping
> > pointers
> > where non-NULL is "success" and NULL "failure", so that:
> >
> > loop_vinfo = vect_analyze_loop_form (loop, shared);
> > if (!loop_vinfo)
> > {
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "bad loop form.\n");
> > return NULL;
> > }
> >
> > can simply become:
> >
> > loop_vinfo = vect_analyze_loop_form (loop, shared);
> > if (!loop_vinfo)
> > {
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "bad loop form.\n");
> > return loop_vinfo;
> > }
> >
> > where the "loop_vinfo" is now an opt_wrapper<loop_vec_info> and can
> > capture exactly what went wrong inside vect_analyze_loop_form.
> >
> > In all cases, the classes act as if the opt_problem were one of its
> > fields, but the opt_problem is actually stored in a global, so that
> > when
> > compiled, an opt_wrapper<T> is effectively just a T, so that we're
> > still just passing e.g. a bool around; the opt_wrapper<T> classes
> > simply provide type-checking and an API to ensure that we provide
> > error-messages deep in the callstack at the places where problems
> > occur, and that we propagate them. This also avoids having
> > to manage the ownership of the opt_problem instances.
> >
> > Using opt_result and opt_wrapper<T> documents the intent of the
> > code
> > for the places where we represent success values, and allows the
> > C++ type
> > system to track where the deepest points in the callstack are where
> > we
> > need to emit the failure messages from.
> >
> > Some examples, showing how it immediately leads to more meaningful
> > diagnostics from the vectorizer:
> >
> > gcc.dg/vect/vect-widen-mult-1.c: In function ‘f’:
> > gcc.dg/vect/vect-widen-mult-1.c:16:3: remark: loop vectorized using
> > 16 byte vectors
> > for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > ^~~
> > gcc.dg/vect/vect-widen-mult-1.c: In function ‘main’:
> > gcc.dg/vect/vect-widen-mult-1.c:42:10: remark: couldn't vectorize
> > loop
> > if (a[i] != (SIGNEDNESS_1 short) ((BASE + i * 5)
> > ~^~~
> > gcc.dg/vect/vect-widen-mult-1.c:42:10: note: control flow in loop
> > [../../src/gcc/tree-vect-loop.c:1200:vect_analyze_loop_form_1]
> > gcc.dg/vect/vect-widen-mult-1.c:34:3: remark: couldn't vectorize
> > loop
> > for (int i = 0; i < N; ++i)
> > ^~~
> > gcc.dg/vect/vect-widen-mult-1.c:38:7: note: statement clobbers
> > memory: ‘__asm__ __volatile__("" : : : "memory");’
> > [../../src/gcc/tree-data-ref.c:5086:find_data_references_in_stmt]
> > asm volatile ("" ::: "memory");
> > ^~~
> >
> > by showing the location of the loop that can't be vectorized,
> > the problematic statement within the loop, and describing what the
> > problem is.
> >
> > Note how it also captures the location of where in GCC the problem
> > was
> > encountered. Ultimately this would also show hotness information
> > for the
> > loop in the remark.
> >
> > I'm not sure exactly what an opt_problem should capture. There are
> > a few
> > places in the kit where it looks the "failure" calls might benefit
> > from
> > being a formatted print API, with format codes for various middle-
> > end
> > entities (e.g. gimple, tree, --param values, etc). It's also not
> > clear
> > to me how this would interact with the optinfo work (e.g. for
> > capturing
> > things in optimization records), or what the API should look like
> > (though
> > I like the one in the patch).
> >
> > (Not yet bootstrapped or regrtested, and has plenty of FIXMEs: I'm
> > posting
> > this for comment/discussion).
> >
> > Thoughts?
> > Does the basic idea look useful and sane? (I think so)
> >
> > Thanks
> > Dave
> >
> > [...snip...]
> >
> > David Malcolm (5):
> > Add opt-problem.h
> > Use opt-problem.h in try_vectorize_loop_1
> > Add some test coverage
> > Use opt_result throughout vectorizer
> > Add opt-problem.cc
> >
> > gcc/Makefile.in | 1 +
> > gcc/opt-problem.cc | 96 ++++++++
> > gcc/opt-problem.h | 326
> > +++++++++++++++++++++++++
> > gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c | 11 +-
> > gcc/testsuite/gcc.dg/vect/vect-remarks-1.c | 18 ++
> > gcc/tree-data-ref.c | 33 +--
> > gcc/tree-data-ref.h | 10 +-
> > gcc/tree-vect-data-refs.c | 189 ++++++++-----
> > -
> > gcc/tree-vect-loop.c | 212 +++++++++--
> > -----
> > gcc/tree-vect-slp.c | 4 +-
> > gcc/tree-vect-stmts.c | 130 ++++++----
> > gcc/tree-vectorizer.c | 34 ++-
> > gcc/tree-vectorizer.h | 41 ++--
> > 13 files changed, 842 insertions(+), 263 deletions(-)
> > create mode 100644 gcc/opt-problem.cc
> > create mode 100644 gcc/opt-problem.h
> > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-remarks-1.c