[PATCH 09/10] Experiment with using optinfo in gimple-loop-interchange.cc

Richard Biener richard.guenther@gmail.com
Fri Jun 1 15:32:00 GMT 2018

On June 1, 2018 3:40:15 PM GMT+02:00, David Malcolm <dmalcolm@redhat.com> wrote:
>On Fri, 2018-06-01 at 11:50 +0200, Richard Biener wrote:
>> On Tue, May 29, 2018 at 10:33 PM David Malcolm <dmalcolm@redhat.com>
>> wrote:
>> > 
>> > This was an experiment to try to capture information on a
>> > loop optimization.
>> > 
>> > gcc/ChangeLog:
>> >         * gimple-loop-interchange.cc (should_interchange_loops):
>> > Add
>> >         optinfo note when interchange gives better data locality
>> > behavior.
>> >         (tree_loop_interchange::interchange): Add OPTINFO_SCOPE.
>> >         Add optinfo for successful and unsuccessful interchanges.
>> >         (prepare_perfect_loop_nest): Add OPTINFO_SCOPE.  Add
>> >         optinfo note.
>> >         (pass_linterchange::execute): Add OPTINFO_SCOPE.
>> > ---
>> >  gcc/gimple-loop-interchange.cc | 36
>> > +++++++++++++++++++++++++++++++++++-
>> >  1 file changed, 35 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-
>> > interchange.cc
>> > index eb35263..cd32288 100644
>> > --- a/gcc/gimple-loop-interchange.cc
>> > +++ b/gcc/gimple-loop-interchange.cc
>> > @@ -1556,7 +1556,19 @@ should_interchange_loops (unsigned i_idx,
>> > unsigned o_idx,
>> >    ratio = innermost_loops_p ? INNER_STRIDE_RATIO :
>> >    /* Do interchange if it gives better data locality behavior.  */
>> >    if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, ratio)))
>> > -    return true;
>> > +    {
>> > +      if (optinfo_enabled_p ())
>> > +       OPTINFO_NOTE ((gimple *)NULL) // FIXME
>> > +         << "interchange gives better data locality behavior: "
>> > +         << "iloop_strides: "
>> > +         << decu (iloop_strides)
>> > +         << " > (oloop_strides: "
>> > +         << decu (oloop_strides)
>> > +         << " * ratio: "
>> > +         << decu (ratio)
>> > +         << ")";
>> Just randomly inside the thread.
>> :/
>> Please do _not_ add more stream-like APIs.  How do you expect
>> translators to deal with those?
>> Yes, I'm aware of the graphite-* ones and I dislike those very much.
>> What's wrong with the existing dump API?
>The existing API suffers from a "wall of text" problem:
>* although it's possible to filter based on various criteria (the
>optgroup tags, specific passes, success vs failure), it's not possible
>to filter base on code hotness: the -fopt-info API works purely in
>terms of location_t.  So all of the messages about the hottest
>functions in the workload are intermingled with all of the other
>messages about all of the other functions.


>* some of the text notes refer to function entry, but all of these are
>emitted "at the same level": there's no way to see the nesting of these
>function-entry logs, and where other notes are in relation to them. 
>For example, in:
>  test.c:8:3: note: === analyzing loop ===
>  test.c:8:3: note: === analyze_loop_nest ===
>  test.c:8:3: note: === vect_analyze_loop_form ===
>  test.c:8:3: note: === get_loop_niters ===
>test.c:8:3: note: symbolic number of iterations is (unsigned int)
>test.c:8:3: note: not vectorized: loop contains function calls or data
>references that cannot be analyzed
>  test.c:8:3: note: vectorized 0 loops in function
>there's no way to tell that the "vect_analyze_loop_form" is in fact
>inside the call to "analyze_loop_nest", and where the "symbolic number
>of iterations" messages is coming from in relation to them.  This may
>not seem significant here, but I'm quoting a small example;
>vectorization typically leads to dozens of messages, with a deep
>nesting structure (where that structure isn't visible in the -fopt-info

True. The same issue exists for diagnostics BTW. Indeed, being able to collapse 'sections' in dump files, opt-info or diagnostics sounds useful. 

Note that for dump files and opt-info the level argument was sort of designed to do that. 

>The existing API is throwing data away:
>* as noted above, by working purely with a location_t, the execution
>count isn't associated with the messages.  The output format purely
>gives file/line/column information, but doesn't cover the inlining
>chain.   For C++ templates it doesn't specify which instance of a
>template is being optimized.

It might be useful to enhance the interface by allowing different kind of 'locations'. 

>* there's no metadata about where the messages are coming from.  It's
>easy to get at the current pass internally, but the messages don't
>capture that.  Figuring out where a message came from requires grepping
>the GCC source code.  The prototype I posted captures the __FILE__ and
>__LINE__ within the gcc source for every message emitted, and which
>pass instance emitted it.

The opt info group was supposed to captures this to the level interesting for a user. 

>* The current output format is of the form:
>     "FILE:LINE:COLUMN: free-form text\n"
>This is only machine-readable up to a point: if a program is parsing
>it, all it has is the free-form text.  The prototype I posted captures
>what kinds of things are in the text (statements, trees, symtab_nodes),
>and captures location information for them, so that visualizations of
>the dumps can provide useful links.
>There's no API-level grouping of messages, beyond looking for newline
>I'm probably re-hashing a lot of the material in the cover letter here:
>"[PATCH 00/10] RFC: Prototype of compiler-assisted performance
>  https://gcc.gnu.org/ml/gcc-patches/2018-05/msg01675.html
>I'd like to provide a machine-readable output format that covers the
>above - in particular the profile data (whilst retaining -fopt-info for
>compatibility).  Separation of the data from its presentation.
>Clearly you don't like the stream-like API from the prototype :)

Yes :) I wasn't so much complaining about the content but the presentation /API. 

>So I'm wondering what the API ought to look like, one that would allow
>for the kind of "richer" machine-readable output.
>Consider this "-fopt-info" code (from "vect_create_data_ref_ptr"; this
>is example 2 from the cover-letter):
>  if (dump_enabled_p ())
>     {
>       tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
>       dump_printf_loc (MSG_NOTE, vect_location,
>                        "create %s-pointer variable to type: ",
>                        get_tree_code_name (TREE_CODE (aggr_type)));
>       dump_generic_expr (MSG_NOTE, TDF_SLIM, aggr_type);
>       if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
>         dump_printf (MSG_NOTE, "  vectorizing an array ref: ");
>       else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
>         dump_printf (MSG_NOTE, "  vectorizing a vector ref: ");
>       else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
>    dump_printf (MSG_NOTE, "  vectorizing a record based array ref: ");
>       else
>         dump_printf (MSG_NOTE, "  vectorizing a pointer ref: ");
>       dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_BASE_OBJECT (dr));
>       dump_printf (MSG_NOTE, "\n");
>     }
>where the information is built up piecewise, with conditional logic.
>(This existing code isn't particularly amenable to translation either).
>One option would be for "vect_location" to become something other than
>a location_t, from which a execution count can be gained (e.g. a gimple
>*, from which the location_t and the BB can be accessed).

Yes. Like a container that can be initiated from other kind of contexts. 

>"dump_printf_loc" would signify starting an optimization record, and
>the final "\n" would signify the end of an optimization record; the
>various dump_generic_expr and dump_printf would add structured
>information and text entries to theoptimization record.

A push/pop style API would maybe work as well. (pushing a level with some meta data) 

>This has the advantage of retaining the look of the existing API (so
>the existing callsites don't need changing), but it changes their
>meaning, so that as well as writing to -fopt-info's destinations, it's
>also in a sense parsing the dump_* calls and building optimization
>records from them.
>AIUI, dump_printf_loc can't be a macro, as it's variadic, so this
>approach doesn't allow for capturing the location within GCC for where
>the message is emitted.

True, though we have __builtin_FILE and friends that can be used as default args. 

Note a push/pop level API can also buffer intermediate printf style output and annotate/indent it (supporting a dump file emacs/vim mode that can do collapse and expand) 

>Another approach: there could be something like:
>  if (optinfo_enabled_p ())
>    {
>       tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
>       OPTINFO_VECT_NOTE note;
>       note.printf ("create %s-pointer variable to type: ",
>                    get_tree_code_name (TREE_CODE (aggr_type)));
>       note.add_slim (aggr_type);
>       if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
>         note.printf ("  vectorizing an array ref: ");
>       else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
>         note.printf (MSG_NOTE, "  vectorizing a vector ref: ");
>       else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
>    note.printf (MSG_NOTE, "  vectorizing a record based array ref: ");
>       else
>         note.printf (MSG_NOTE, "  vectorizing a pointer ref: ");
>       note.add_slim (DR_BASE_OBJECT (dr));
>    }
>which uses a macro to get the gcc source location, and avoids the
>special meaning for "\n".  This avoids the "<<" - but is kind of just
>different syntax for the same - it's hard with this example to avoid

Maybe the following raii style that encapsulates the enabling /disabling checks? 

 If (optinfo o = push (msg_optimization,...))
     O.print (...) ;
     Destructor of o 'pops' 

>Yet another approach: reworking things to support i18n via a pure
>printf-style interface, using, say "%S" to mean "TDF_SLIM", it could be
>  if (optinfo_enabled_p ())
>    {
>       tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
>       if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
>         OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S"
>                            "  vectorizing an array ref: %S",
>                            get_tree_code_name (TREE_CODE (aggr_type))
>	                    aggr_type,
>                            DR_BASE_OBJECT (dr));
>      else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
>         OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S"
>                            "  vectorizing a vector ref: %S",
>                            get_tree_code_name (TREE_CODE (aggr_type))
>	                    aggr_type,
>                            DR_BASE_OBJECT (dr));
>      else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
>         OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S"
>                          "  vectorizing a record based array ref: %S",
>                            get_tree_code_name (TREE_CODE (aggr_type))
>	                    aggr_type,
>                            DR_BASE_OBJECT (dr));
>      else
>         OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S"
>                            "  vectorizing a pointer ref: %S",
>                            get_tree_code_name (TREE_CODE (aggr_type))
>	                    aggr_type,
>                            DR_BASE_OBJECT (dr));
>    }
>or somesuch.  The %S would allow for the data to be captured when
>emitting optimization records for the messages (not just plain text,
>e.g. locations of the expressions).

Certainly when exposing things in opt-info we have to be more disciplined. The vectorizer is a bad example here. 
The original goal of having one set of outputs for both dump files and opt-info is good but I guess the result is less than optimal. Maybe additional detail levels would help here (MSG_Dumpfile_only?) 

>So those are some ideas.  I'm sure there are other ways to fix the
>issues with -fopt-info; I'm brainstorming here.

Likewise. As said I applaud the attempt improve the situation but I detest a stream API ;) 



More information about the Gcc-patches mailing list