This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: David Malcolm <dmalcolm at redhat dot com>
- Cc: Richard Biener <rguenther at suse dot de>, <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 11 Jul 2018 16:56:10 +0100
- Subject: Re: [PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems
- References: <alpine.LSU.2.20.1806251101311.5043@zhemvz.fhfr.qr> <1531185096-17113-1-git-send-email-dmalcolm@redhat.com>
David Malcolm <dmalcolm@redhat.com> writes:
> On Mon, 2018-06-25 at 11:10 +0200, Richard Biener wrote:
>> On Fri, 22 Jun 2018, David Malcolm wrote:
>>
>> > NightStrike and I were chatting on IRC last week about
>> > issues with trying to vectorize the following code:
>> >
>> > #include <vector>
>> > std::size_t f(std::vector<std::vector<float>> const & v) {
>> > std::size_t ret = 0;
>> > for (auto const & w: v)
>> > ret += w.size();
>> > return ret;
>> > }
>> >
>> > icc could vectorize it, but gcc couldn't, but neither of us could
>> > immediately figure out what the problem was.
>> >
>> > Using -fopt-info leads to a wall of text.
>> >
>> > I tried using my patch here:
>> >
>> > "[PATCH] v3 of optinfo, remarks and optimization records"
>> > https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
>> >
>> > It improved things somewhat, by showing:
>> > (a) the nesting structure via indentation, and
>> > (b) the GCC line at which each message is emitted (by using the
>> > "remark" output)
>> >
>> > but it's still a wall of text:
>> >
>> > https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.
>> > html
>> > https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C.
>> > .%7Csrc%7Ctest.cc.html#line-4
>> >
>> > It doesn't yet provide a simple high-level message to a
>> > tech-savvy user on what they need to do to get GCC to
>> > vectorize their loop.
>>
>> Yeah, in particular the vectorizer is way too noisy in its low-level
>> functions. IIRC -fopt-info-vec-missed is "somewhat" better:
>>
>> t.C:4:26: note: step unknown.
>> t.C:4:26: note: vector alignment may not be reachable
>> t.C:4:26: note: not ssa-name.
>> t.C:4:26: note: use not simple.
>> t.C:4:26: note: not ssa-name.
>> t.C:4:26: note: use not simple.
>> t.C:4:26: note: no array mode for V2DI[3]
>> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
>> t.C:4:26: note: can't use a fully-masked loop because the target
>> doesn't
>> have the appropriate masked load or store.
>> t.C:4:26: note: not ssa-name.
>> t.C:4:26: note: use not simple.
>> t.C:4:26: note: not ssa-name.
>> t.C:4:26: note: use not simple.
>> t.C:4:26: note: no array mode for V2DI[3]
>> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
>> t.C:4:26: note: op not supported by target.
>> t.C:4:26: note: not vectorized: relevant stmt not supported: _15 =
>> _14
>> /[ex] 4;
>> t.C:4:26: note: bad operation or unsupported loop bound.
>> t.C:4:26: note: not vectorized: no grouped stores in basic block.
>> t.C:4:26: note: not vectorized: no grouped stores in basic block.
>> t.C:6:12: note: not vectorized: not enough data-refs in basic block.
>>
>>
>> > The pertinent dump messages are:
>> >
>> > test.cc:4:23: remark: === try_vectorize_loop_1 ===
>> > [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
>> > cc1plus: remark:
>> > Analyzing loop at test.cc:4
>> > [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
>> > test.cc:4:23: remark: === analyze_loop_nest ===
>> > [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
>> > [...snip...]
>> > test.cc:4:23: remark: === vect_analyze_loop_operations ===
>> > [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations]
>> > [...snip...]
>> > test.cc:4:23: remark: ==> examining statement: ‘_15 = _14 /[ex]
>> > 4;’ [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
>> > test.cc:4:23: remark: vect_is_simple_use: operand ‘_14’
>> > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
>> > test.cc:4:23: remark: def_stmt: ‘_14 = _8 - _7;’
>> > [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
>> > test.cc:4:23: remark: type of def: internal [../../src/gcc/tree-
>> > vect-stmts.c:10112:vect_is_simple_use]
>> > test.cc:4:23: remark: vect_is_simple_use: operand ‘4’
>> > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
>> > test.cc:4:23: remark: op not supported by target.
>> > [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
>> > test.cc:4:23: remark: not vectorized: relevant stmt not
>> > supported: ‘_15 = _14 /[ex] 4;’ [../../src/gcc/tree-vect-
>> > stmts.c:9565:vect_analyze_stmt]
>> > test.cc:4:23: remark: bad operation or unsupported loop bound.
>> > [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
>> > cc1plus: remark: vectorized 0 loops in function.
>> > [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
>> >
>> > In particular, that complaint from
>> > [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
>> > is coming from:
>> >
>> > if (!ok)
>> > {
>> > if (dump_enabled_p ())
>> > {
>> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> > "not vectorized: relevant stmt not ");
>> > dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
>> > dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
>> > stmt, 0);
>> > }
>> >
>> > return false;
>> > }
>> >
>> > This got me thinking: the user presumably wants to know several
>> > things:
>> >
>> > * the location of the loop that can't be vectorized (vect_location
>> > captures this)
>> > * location of the problematic statement
>> > * why it's problematic
>> > * the problematic statement itself.
>> >
>> > The following is an experiment at capturing that information, by
>> > recording an "opt_problem" instance describing what the
>> > optimization
>> > problem is, created deep in the callstack when it occurs, for use
>> > later on back at the top of the vectorization callstack.
>>
>> Nice idea. Of course the issue is then for which issues to
>> exactly create those. Like all of the MSG_MISSED_OPTIMIZATION
>> dumpings?
>>
>> I guess the vectorizer needs some axing of useless messages
>> and/or we need a MSG_DEBUG to have an additional level below
>> MSG_NOTE.
>>
>> > This extra work is only done if dump_enabled_p.
>> >
>> > It feels vaguely analogous to an exception object (in terms of
>> > packaging up a problem that occurs deep in the stack for reporting
>> > back at a higher level).
>> >
>> > With this patch, we emit:
>> >
>> > ../../src/test.cc: In function ‘std::size_t f(const
>> > std::vector<std::vector<float> >&)’:
>> > ../../src/test.cc:4:23: remark: couldn't vectorize loop
>> > for (auto const & w: v)
>> > ^
>> > In file included from ../x86_64-pc-linux-gnu/libstdc++-
>> > v3/include/vector:64,
>> > from ../../src/test.cc:1:
>> > ../x86_64-pc-linux-gnu/libstdc++-
>> > v3/include/bits/stl_vector.h:806:50:
>> > note: relevant stmt not supported: ‘_15 = _14 /[ex] 4;’
>> > { return size_type(this->_M_impl._M_finish - this-
>> > >_M_impl._M_start); }
>> > ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
>> > ~~~~~~~
>> >
>> > which reports both the location of the loop and the statement
>> > that's
>> > problematic (if I'm reading it right, the pointer arithmetic leads
>> > to a
>> > division by 4, and presumably we're not able to handle that).
>>
>> Quite likely because we only handle TRUNC_DIV_EXPR and not
>> EXACT_DIV_EXPR
>> which we can handle the same semantically (EXACT_DIV_EXPR just gives
>> us stronger guarantees).
>
> [...snip...]
>
> I've been experimenting with the idea; here's an updated version of
> the patch (on top of the optinfo patch kit, as it happens).
>
> I tried adding more uses of opt_problem, but ran into the issue that it's
> hard to find "by hand" the places deep in the callstack where things fail;
> I spent a chunk of time stepping through failures, trying to figure out
> the best place to capture the opt_problem.
>
> It seemed like something that we could track in C++'s type system.
>
> The following kit solves this by introducing a new class opt_result, which
> looks a lot like a bool: it has the same representation in memory.
>
> For instance, at the deepest point of the callstack where the failure
> happens, rather than:
>
> if (!check_something ())
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "foo is unsupported.\n");
> return false;
> }
> [...lots more checks...]
>
> // All checks passed:
> return true;
>
> we can capture the cause of the failure via:
>
> if (!check_something ())
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "foo is unsupported.\n");
> return opt_result::failure ("foo is unsupported",
> stmt);
> }
> [...lots more checks...]
>
> // All checks passed:
> return opt_result::success ();
>
> which effectively returns true or false, whilst recording any problem.
>
> opt_result::success and opt_result::failure return opt_result values
> which "looks like" true/false respectively, via operator bool().
> If dump_enabled_p, then opt_result::failure also creates an opt_problem *,
> capturing the pertinent data (here, "foo is unsupported" and "stmt").
> If dumps are disabled, then opt_problem instances aren't
> created, and it's equivalent to just returning a bool (false for failure).
Since the creation of the opt_problem depends on dump_enabled_p, would
it make sense for the dump_printf_loc to happen automatically on
opt_result::failure, rather than have both? E.g.:
if (!check_something ())
// Also calls dump_printf_loc if appropriate.
return opt_result::failure ("foo is unsupported", stmt);
I think the aim should be for opt_problem to capture any information
that's useful at the user level, so doing the dump_printf_loc
automatically would help to keep the MISSED_OPTIMIZATIONS messages
clean of information that's too low-level.
> The opt_problem can be propagated via opt_result values back up
> the call stack to where it makes most sense to the user.
> For instance, rather than:
>
> bool ok = try_something_that_might_fail ();
> if (!ok)
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "some message.\n");
> return false;
> }
>
> we can replace the bool with an opt_result, so if dump_enabled_p, we
> assume that if try_something_that_might_fail, an opt_problem * will be
> created, and we can propagate it up the call chain:
>
> opt_result ok = try_something_that_might_fail ();
> if (!ok)
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "some message.\n");
> return ok; // propagating the opt_result
> }
>
> There's currently a way to implicitly construct an opt_result from a
> bool, but this is scaffolding: commenting it out allows the compiler to
> tell us where we need to capture failure information.
I guess this is bike-shedding, but personally I'd prefer an explicit
test for success rather than operator bool, so that:
opt_result foo = ...;
bool bar = foo;
is ill-formed. The danger otherwise might be that we drop a useful
opt_problem and replace it with something more generic. E.g. the
opt_result form of:
if (!ok)
{
if (dump_enabled_p ())
{
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: relevant stmt not ");
dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
}
return false;
}
in vect_analyze_stmt could silently drop the information provided by
the subroutine if we forgot to change "ok" from "bool" to "opt_result".
Same for opt_wrapper<T>, but defining operator* and operator-> would
be useful.
LGTM otherwise (but obviously it's Richard B's call).
Thanks,
Richard
> As well as opt_result for "bool", there's a template for wrapping pointers
> where non-NULL is "success" and NULL "failure", so that:
>
> loop_vinfo = vect_analyze_loop_form (loop, shared);
> if (!loop_vinfo)
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "bad loop form.\n");
> return NULL;
> }
>
> can simply become:
>
> loop_vinfo = vect_analyze_loop_form (loop, shared);
> if (!loop_vinfo)
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "bad loop form.\n");
> return loop_vinfo;
> }
>
> where the "loop_vinfo" is now an opt_wrapper<loop_vec_info> and can
> capture exactly what went wrong inside vect_analyze_loop_form.
>
> In all cases, the classes act as if the opt_problem were one of its
> fields, but the opt_problem is actually stored in a global, so that when
> compiled, an opt_wrapper<T> is effectively just a T, so that we're
> still just passing e.g. a bool around; the opt_wrapper<T> classes
> simply provide type-checking and an API to ensure that we provide
> error-messages deep in the callstack at the places where problems
> occur, and that we propagate them. This also avoids having
> to manage the ownership of the opt_problem instances.
>
> Using opt_result and opt_wrapper<T> documents the intent of the code
> for the places where we represent success values, and allows the C++ type
> system to track where the deepest points in the callstack are where we
> need to emit the failure messages from.
>
> Some examples, showing how it immediately leads to more meaningful
> diagnostics from the vectorizer:
>
> gcc.dg/vect/vect-widen-mult-1.c: In function ‘f’:
> gcc.dg/vect/vect-widen-mult-1.c:16:3: remark: loop vectorized using 16 byte vectors
> for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> ^~~
> gcc.dg/vect/vect-widen-mult-1.c: In function ‘main’:
> gcc.dg/vect/vect-widen-mult-1.c:42:10: remark: couldn't vectorize loop
> if (a[i] != (SIGNEDNESS_1 short) ((BASE + i * 5)
> ~^~~
> gcc.dg/vect/vect-widen-mult-1.c:42:10: note: control flow in loop [../../src/gcc/tree-vect-loop.c:1200:vect_analyze_loop_form_1]
> gcc.dg/vect/vect-widen-mult-1.c:34:3: remark: couldn't vectorize loop
> for (int i = 0; i < N; ++i)
> ^~~
> gcc.dg/vect/vect-widen-mult-1.c:38:7: note: statement clobbers memory: ‘__asm__ __volatile__("" : : : "memory");’ [../../src/gcc/tree-data-ref.c:5086:find_data_references_in_stmt]
> asm volatile ("" ::: "memory");
> ^~~
>
> by showing the location of the loop that can't be vectorized,
> the problematic statement within the loop, and describing what the
> problem is.
>
> Note how it also captures the location of where in GCC the problem was
> encountered. Ultimately this would also show hotness information for the
> loop in the remark.
>
> I'm not sure exactly what an opt_problem should capture. There are a few
> places in the kit where it looks the "failure" calls might benefit from
> being a formatted print API, with format codes for various middle-end
> entities (e.g. gimple, tree, --param values, etc). It's also not clear
> to me how this would interact with the optinfo work (e.g. for capturing
> things in optimization records), or what the API should look like (though
> I like the one in the patch).
>
> (Not yet bootstrapped or regrtested, and has plenty of FIXMEs: I'm posting
> this for comment/discussion).
>
> Thoughts?
> Does the basic idea look useful and sane? (I think so)
>
> Thanks
> Dave
>
> [...snip...]
>
> David Malcolm (5):
> Add opt-problem.h
> Use opt-problem.h in try_vectorize_loop_1
> Add some test coverage
> Use opt_result throughout vectorizer
> Add opt-problem.cc
>
> gcc/Makefile.in | 1 +
> gcc/opt-problem.cc | 96 ++++++++
> gcc/opt-problem.h | 326 +++++++++++++++++++++++++
> gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c | 11 +-
> gcc/testsuite/gcc.dg/vect/vect-remarks-1.c | 18 ++
> gcc/tree-data-ref.c | 33 +--
> gcc/tree-data-ref.h | 10 +-
> gcc/tree-vect-data-refs.c | 189 ++++++++------
> gcc/tree-vect-loop.c | 212 +++++++++-------
> gcc/tree-vect-slp.c | 4 +-
> gcc/tree-vect-stmts.c | 130 ++++++----
> gcc/tree-vectorizer.c | 34 ++-
> gcc/tree-vectorizer.h | 41 ++--
> 13 files changed, 842 insertions(+), 263 deletions(-)
> create mode 100644 gcc/opt-problem.cc
> create mode 100644 gcc/opt-problem.h
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-remarks-1.c