This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Documentation for loop infrastructure


Zdenek Dvorak wrote on 07/24/06 14:57:

> this patch adds documentation for loop analysis and representation of
> loops, as well as other loop-related analyses, and for functions to
> manipulate loops.  As I am not a native English speaker, there are
> probably many language errors in the document, so I would appreciate
> if someone could read it and fix them.
>
Awesome!  Thanks for adding this documentation.  I'm not terribly
familiar with texi, so I skipped over the markup stuff.  Hopefully
Gerald or Joseph will chime in.

> + This chapter describes the representation of loops in GCC, and functions
> + that can be used to analyze them and construct and update this
representation.
>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
to build, modify and analyze this representation.

> + Most of the interfaces and data structures are declared in
@file{cfgloop.h}.
> + At the moment, loop structures are analyzed and this information is
updated
> + only by the optimization passes that deal with loops, but some
efforts are
> + being made to make it available throughout most of the optimization
passes.
> +
> + In general form, the natural loop has one entry block (header) and
possibly
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    In general, a natural loop

> + several back edges (latches) leading to the header from the inside
of the loop.
> + Loops with several latches may appear if several loops share a
single header,
> + or if there is a branching in the middle of the loop.  The
representation of
> + loops in GCC however allows only loops with a single latch.  During
the loop

^^^^^^^^
									loop

> + analysis, headers of such loops are split and forwarder blocks are
created in
> + order to disambiguate their structures.  A heuristic based on profile
> + information is used to determine whether the latches correspond to
sub-loops or
> + to control flow in a single loop.  This in particular means that the
analysis
                                       ^^^^^^^^^^^^^^^^^^^^^^^^
				       This means

> + sometimes changes the CFG, and if you run it in the middle of an
optimization
> + pass, you must be able to deal with the new blocks.
> +
> + Natural loops form a tree defined by the subset relation, i.e., sons
of a loop
> + are its immediate sub-loops.
>
Maybe something along the lines of 'Natural loops are organized in a
containment hierarchy such that all the loops immediately contained
inside loop L are its children in the hierarchy.'

>  This tree is represented by the @code{struct
> + loops} structure. The root of this tree is a fake loop that contains
all blocks
>
You should probably say that each loop structure contains all the
blocks that are contained in the loop.  But, that much can be deduced
from context, so I guess it's fine.

> + in the function.  Each of the loops is represented in a @code{struct
loop}
> + structure.  Each loop is assigned an index (@code{num} field of the
> + @code{struct loop} structure), and the pointer to the loop is stored
in the
> + corresponding field of the @code{parray} field of the loops
structure.  Index
> + of a sub-loop is always greater than the index of its super-loop.
The indices do
> + not have to be continuous, there may be empty (@code{NULL}) entries
in the
> + @code{parray} created by deleting loops.  The index of a loop never
changes.
> + The first unused index is stored in the @code{num} field of the loops
> + structure.
> +
> + Each basic block contains the reference to the innermost loop it
belongs to
> + (@code{loop_father}).  For this reason, it is only possible to have one
> + @code{struct loops} structure initialized at the same time for each
CFG.  It is
> + recommended to use the global variable @code{current_loops} to
contain the
> + @code{struct loops} structure, especially if the loop structures are
updated
> + throughout several passes.  Many of the loop manipulation functions
assume that
> + dominance information is up-to-date.
> +
> + The loops are analyzed through @code{loop_optimizer_init} function.  The
> + argument of this function is an or of flags that specify what other
properties
>                             ^^^^^^^^^^^^^^^^^
                              ???  Are you trying to say that the
			      flags should be OR'd together?  How
			      about something like 'is a set of flags
			      represented in an integer bitmask'.

> + of the loop structures should be calculated/enforced and preserved
later:
> +
> + @itemize
> + @item @code{LOOPS_HAVE_PREHEADERS}: Each loop has only one entry edge.
> + Additionally, the source block of this entry edge has only one
successor.
> + This creates a natural place where the code can be moved out of the
loop,
> + and ensures that the entry edge of the loop leads from its immediate
super-loop.
>
It's not clear from the text whether specifying these flags will
actually modify the CFG to comply with the restriction.  You did
mention before that the CFG will be changed by loop discovery, but
maybe it should be explicitly mentioned again.

> + @item @code{LOOPS_HAVE_SIMPLE_LATCHES}: The latch block of a loop
has only one
> + successor.  This ensures that the latch of the loop does not belong to
> + any of its sub-loops, and makes manipulation with the loops
significantly
> + easier.  Most of the loop manipulation functions assume that the loops
> + are in this shape.  Note that with this flag, the ``normal'' loop
without
> + any control flow inside and with one exit consists of two basic blocks.
> + @item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
edges
> + in the strongly connected components that are not natural loops
(have more
> + than one entry block) are marked with @code{BB_IRREDUCIBLE_LOOP} and
> + @code{EDGE_IRREDUCIBLE_LOOP} flags.  The flag is not set for blocks
and edges that
> + belong to natural loops that are in such an irreducible region (but
it is
> + set for the entry and exit edges of such a loop, if they lead
to/from this
> + region).
> + @item @code{LOOPS_HAVE_MARKED_SINGLE_EXITS}: If a loop has exactly
one exit
> + edge, this edge is stored in @code{single_exit} field of the loop
structure.
> + @code{NULL} is stored there otherwise.
> + @end itemize
> +
> + These properties may also be computed/enforced later, using functions
> + @code{create_preheaders}, @code{force_single_succ_latches},
> + @code{mark_irreducible_loops} and @code{mark_single_exit_loops}.
> +
> + The memory occupied by the loops structures should be freed with
> + @code{loop_optimizer_finalize} function.
> +
> + The CFG manipulation functions in general do not update loop structures.
> + Specialized versions that additionally do so are provided for the most
> + common tasks.  On trees, @code{cleanup_tree_cfg_loop} function can be
> + used to cleanup CFG while updating the loops structures if
@code{current_loops}
> + is set.
> +
> + @node Loop querying
> + @section Loop querying
> + @cindex Loop querying
> +
> + The functions to query the information about loops are declared in
> + @file{cfgloop.h}.  Some of the information can be taken directly from
> + the structures.  @code{loop_father} field of each basic block contains
> + the innermost loop to that the block belongs.  The most useful fields of
> + loop structure (that are kept up-to-date at all times) are:
> +
> + @itemize
> + @item @code{header}, @code{latch}: Header and latch basic blocks of
the loop.
> + @item @code{num_nodes}: Number of basic blocks in the loop (including
> + the basic blocks of the sub-loops).
> + @item @code{depth}: The depth of the loop in the loops tree, i.e.,
the number
> + of super-loops of the loop.
> + @item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
> + sub-loop, and the sibling of the loop in the loops tree.
> + @item @code{single_exit}: The exit edge of the loop, if the loop has
exactly
> + one exit and the loops were analyzed with
LOOPS_HAVE_MARKED_SINGLE_EXITS.
> + @end itemize
> +
> + There are other fields in the loop structures, many of them used
only by some
> + of the passes, or not updated during CFG changes; in general, they
should not
> + be accessed directly.
> +
> + The most important functions to query loop structures are:
> +
> + @itemize
> + @item @code{flow_loops_dump}: Dumps the information about loops to file.
> + @item @code{verify_loop_structure}: Checks consistency of the loop
structures.
> + @item @code{loop_latch_edge}: Returns the latch edge of a loop.
> + @item @code{loop_preheader_edge}: If loops have preheaders, returns
> + the preheader edge of a loop.
> + @item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
other loop.
> + @item @code{flow_bb_inside_loop_p}: Tests whether a basic block
belongs to
> +   a loop (including its sub-loops).
> + @item @code{find_common_loop}: Finds the common super-loop of two loops.
> + @item @code{superloop_at_depth}: Returns the super-loop of a loop
with the given
> +   depth.
> + @item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates
number of
> +   insns in the loop, on trees and on RTL.
> + @item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
loop.
> + @item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
with
> +   @code{EDGE_LOOP_EXIT} flag.
> + @item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
> +   @code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in
the loop in
> +   dfs order, ordered by dominance relation, and bfs order, respectively.
> + @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
> + @item @code{just_once_each_iteration_p}: Returns true if the basic
block is
> + executed exactly once during each iteration of a loop (that is, it
does not
> + belong to a sub-loop, and it dominates the latch of the loop).
> + @end itemize
> +
> + @node Loop manipulation
> + @section Loop manipulation
> + @cindex Loop manipulation
> +
> + The loops tree can be manipulated using the following functions:
> +
> + @itemize
> + @item @code{flow_loop_tree_node_add}: Adds a node to the tree.
> + @item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
> + @item @code{add_bb_to_loop}: Adds a basic block to a loop.
> + @item @code{remove_bb_from_loops}: Removes a basic block from loops.
> + @end itemize
> +
> + The specialized versions of several low-level CFG functions that
also update
> + loop structures are provided:
> +
> + @itemize
> + @item @code{loop_split_edge_with}: Splits an edge, and places a
specified RTL
> + code on it.  On trees, the function can still be used, but the code
must be
> + NULL.
> + @item @code{bsi_insert_on_edge_immediate_loop}: Inserts code on
edge, splitting
> + it if necessary.  Only works on trees.
> + @item @code{remove_path}: Removes an edge and all blocks it dominates.
> + @item @code{loop_commit_inserts}: Commits insertions scheduled on
edges, and
> + sets loops for the new blocks.  This function can only be used on trees.
> + @item @code{split_loop_exit_edge}: Splits exit edge of the loop,
ensuring
> + that PHI node arguments remain in the loop (this ensures that
loop-closed
> + SSA form is preserved).  Only useful on trees.
> + @end itemize
> +
> + Finally, there are some higher-level loop transformations
implemented.  While
> + some of them are written so that they should work on non-innermost
loops,
> + they are mostly untested in that case, and at the moment, they are only
> + reliable for the innermost loops:
> +
> + @itemize
> + @item @code{create_iv}: Creates a new induction variable.  Only
works on trees.
> + @code{standard_iv_increment_position} can be used to find a suitable
place for
> + the iv increment.
> + @item @code{duplicate_loop_to_header_edge},
> + @code{tree_duplicate_loop_to_header_edge}: These functions (on RTL
and on
> + trees) duplicate the body of the loop prescribed number of times on
one of
> + the edges entering loop header, thus performing either loop unrolling or
> + loop peeling.  @code{can_duplicate_loop_p} (@code{can_unroll_loop_p}
> + on trees) must be true for the duplicated loop.
> + @item @code{loop_version}, @code{tree_ssa_loop_version}: These
function create
> + a copy of a loop, and a branch before them that selects one of them
depending
> + on the prescribed condition.  This is useful for optimizations that
need to
> + verify some assumptions in runtime (one of the copies of the loop is
usually
> + left unchanged, while the other one is transformed in some way).
> + @item @code{tree_unroll_loop}: Unrolls the loop, including peeling
the extra
> + iterations to make the number of iterations divisible by unroll factor,
> + updating the exit condition, and removing the exits that now cannot
be taken.
> + Works only on trees.
> + @end itemize
> +
> + @node LCSSA
> + @section Loop-closed SSA form
> + @cindex LCSSA
> + @cindex Loop-closed SSA form
> +
> + Throughout the loop optimizations on tree level, one extra condition is
> + enforced on the SSA form:  No SSA name is used outside of the loop
in that
> + it is defined.  The SSA form satisfying this condition is called
> + ``loop-closed SSA form'' -- LCSSA.  To enforce LCSSA, PHI nodes must be
> + created at the exits of the loops for the SSA names that are used
outside
> + of them.  Only the real operands (not virtual SSA names) are held in
LCSSA,
> + in order to save memory.
> +
> + There are various benefits of LCSSA:
> +
> + @itemize
> + @item Various optimizations (value range analysis, final value
replacement)
> + are interested in the values that are defined in the loop and used
outside
> + of it, i.e., exactly those for that we create new PHI nodes.
> + @item In induction variable analysis, it is not necessary to specify the
> + loop with respect to that the behavior of the SSA name is inspected.
> + @item It makes updating of SSA form during loop transformations simpler.
> + Without LCSSA, operations like loop unrolling may force creation of phi
> + nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
> + updated locally.  However, since we only keep real operands in LCSSA, we
> + cannot use this advantage (we could have local updating of real
operands,
> + but it is not much more efficient than to use generic SSA form updating
> + for it as well; the amount of changes to SSA is the same).
> + @end itemize
> +
> + However, it also means LCSSA must be updated.  This is usually
straightforward,
> + unless you create a new value in loop and use it outside, or unless you
> + manipulate loop exit edges (functions are provided to make these
manipulations
> + simple).  @code{rewrite_into_loop_closed_ssa} is used to rewrite SSA
form to
> + LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant
of LCSSA
> + is preserved.
> +
> + @node Scalar evolutions
> + @section Scalar evolutions
> + @cindex Scalar evolutions
> + @cindex IV analysis on trees
> +
> + Scalar evolutions (SCEV) are used to represent results of induction
variable
> + analysis on trees.  They enable us to represent variables with
complicated
> + behavior in a simple and consistent way (we only use it to express
values of
> + polynomial induction variables, but it is possible to extend it).  The
> + interfaces to SCEV analysis are declared in
@file{tree-scalar-evolution.h}.
> + To use scalar evolutions analysis, @code{scev_initialize} must be used.
> + To stop using SCEV, @code{scev_finalize} should be used.  SCEV
analysis caches
> + results in order to save time and memory.  This cache however is
made invalid
> + by most of the loop transformations, including removal of code.  If such
> + a transformation is performed, @code{scev_reset} must be called to clean
> + the caches.
> +
> + Given a SSA name, its behavior in loops can be analyzed using
> + @code{analyze_scalar_evolution} function.  The returned SCEV however
> + does not have to be fully analyzed and it may contain references to
other
> + SSA names defined in the loop.  To resolve these (potentially recursive)
> + references, @code{instantiate_parameters} or @code{resolve_mixers}
functions
> + must be used.  @code{instantiate_parameters} is useful when you use the
> + results of SCEV only for some analysis, and when you work with whole
nest
> + of loops at once.  It will try replacing all SSA names by their SCEV in
> + all loops, including the super-loops of the current loop, thus providing
> + a complete information about the behavior of the variable in the
loop nest.
> + @code{resolve_mixers} is useful if you work with only one loop at a
time,
> + and if you possibly need to create code based on the value of the
induction
> + variable.  It will only resolve the SSA names defined in the current
loop,
> + leaving the SSA names defined outside unchanged, even if their
evolution in
> + the outer loops is known.
> +
> + The SCEV is a normal tree expression, except for the fact that it
may contain
> + several special tree nodes.  One of them is @code{SCEV_NOT_KNOWN}, used
> + for SSA names whose value cannot be expressed.  The other one is
> + @code{POLYNOMIAL_CHREC}.  Polynomial chrec has three arguments --
base, step
> + and loop (both base and step may contain further polynomial chrecs).
 Type of
> + the expression and of base and step must be the same.  A variable
has evolution
> + @code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
loop)
> + equivalent to @code{x_1} in the following example
> +
> + @smallexample
> + while (...)
> +   @{
> +     x_1 = phi (base, x_2);
> +     x_2 = x_1 + step;
> +   @}
> + @end smallexample
> +
> + Note that this includes the language restrictions on the operations.
 For
> + example, if we compile C code and @code{x} has signed type, then the
overflow
> + in addition would cause undefined behavior, and we may assume that
this does
> + not happen.  Hence, the value with this SCEV cannot overflow (which
restricts
> + the number of iterations of such a loop).
> +
> + In many cases, one wants to restrict the attention just to affine
induction
> + variables.  In this case, the extra expressive power of SCEV is not
useful,
> + and may complicate the optimizations.  In this case,
@code{simple_iv} function
> + may be used to analyze a value -- the result is a loop-invariant
base and step.
> +
> + @node loop-iv
> + @section IV analysis on RTL
> + @cindex IV analysis on RTL
> +
> + The induction variable on RTL is simple and only allows analysis of
affine
> + induction variables, and only in one loop at once.  The interface is
declared
> + in @file{cfgloop.h}.  Before start of the analysis of the induction
variables
> + in a loop, @code{iv_analysis_loop_init} function must be called for
this loop.
> + After the analysis (possibly calling @code{iv_analysis_loop_init}
for several
> + loops) is finished, @code{iv_analysis_done} should be called.  The
following
> + functions can be used to access the results of the analysis:
> +
> + @itemize
> + @item @code{iv_analyze}: Analyzes a single register used in the
given insn.
> + If no use of the register in this insn is found, following insns are
scanned,
> + so that this function can be called on the insn returned by
get_condition.
> + @item @code{iv_analyze_result}: Analyzes result of the assignment in
> + the given insn.
> + @item @code{iv_analyze_expr}: Analyzes a more complicated
expression.  All its
> + operands are analyzed by @code{iv_analyze}, and hence they must be
used in
> + the specified insn or one of the following insns.
> + @end itemize
> +
> + The description of the induction variable is provided in
@code{struct rtx_iv}.
> + In order to handle subregs, the representation is a bit complicated;
the value
> + of the induction variable in the i-th iteration is
> +
> + @smallexample
> + delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i *
step)),
> + @end smallexample
> +
> + with the following exception:  if @code{first_special} is true, then
the value
> + in the first iteration (when @code{i} is zero) is @code{delta + mult
* base}.
> + However, if @code{extend} is equal to @code{UNKNOWN}, then
@code{first_special}
> + must be false, @code{delta} 0, @code{mult} 1 and the value in the i-th
> + iteration is
> +
> + @smallexample
> + subreg_@{mode@} (base + i * step)
> + @end smallexample
> +
> + The function @code{get_iv_value} can be used to perform these
calculations.
> +
> + @node Number of iterations
> + @section Number of iterations analysis
> + @cindex Number of iterations analysis
> +
> + Both on trees and on RTL, there are functions available to determine
> + the number of iterations of a loop, with a similar interface.  In many
> + cases, it is not possible to determine number of iterations
unconditionally --
> + the determined number is correct only if some assumptions are satisfied.
> + The analysis tries to verify these conditions using the information
contained
> + in the program; if it fails, the conditions are returned together
with the
> + result.  The following information and conditions are provided by
the analysis:
> +
> + @itemize
> + @item @code{assumptions}: If this condition is false, the rest of
> + the information is invalid.
> + @item @code{noloop_assumptions} on RTL, @code{may_be_zero} on trees:
If this
> + condition is true, the loop exits in the first iteration.
> + @item @code{infinite}: If this condition is true, the loop is infinite.
> + This condition is only available on RTL.  On trees, this condition
is OR-ed
> + with the @code{assumptions}.
> + @item @code{niter_expr} on RTL, @code{niter} on trees: The
expression that
> + gives number of iterations.  The number of iterations is defined as
the number
> + of executions of the loop latch.
> + @end itemize
> +
> + Both on trees and on RTL, it necessary for the induction variable
analysis
> + framework to be initialized (SCEV on trees, loop-iv on RTL).  On
trees, the
> + results are stored to @code{struct tree_niter_desc} structure.
Number of
> + iterations before the loop is exited through a given exit can be
determined
> + using @code{number_of_iterations_exit} function.  On RTL, the
results are
> + returned in @code{struct niter_desc} structure.  The corresponding
function is
> + named @code{check_simple_exit}.  There are also functions that pass
through all
> + the exits of a loop and try to find one with easy to determine number of
> + iterations -- @code{find_loop_niter} on trees and
@code{find_simple_exit} on
> + RTL.  Finally, there are functions that provide the same
information, but
> + additionally cache it, so that repeated calls to number of
iterations are not
> + so costly -- @code{number_of_iterations_in_loop} on trees and
> + @code{get_simple_loop_desc} on RTL.
> +
> + Note that some of these functions may behave slightly differently
than others
> + -- some of them return only the expression for the number of
iterations, and
> + fail if there are some assumptions.
@code{number_of_iterations_in_loop} works
> + only for single-exit loops, and it returns the value for number of
iterations
> + higher by one with respect to all other functions (i.e., it returns
number of
> + executions of the exit statement, not of the loop latch).
> +
> + @node Lambda
> + @section Linear loop transformations framework
> + @cindex Linear loop transformations framework
> +
> + TODO
> +
> + @node Dependency analysis
> + @section Data Dependency Analysis
> + @cindex Data Dependency Analysis
> +
> + TODO
> Index: doc/gccint.texi
> ===================================================================
> *** doc/gccint.texi	(revision 115709)
> --- doc/gccint.texi	(working copy)
> *************** Additional tutorial information is linke
> *** 111,116 ****
> --- 111,117 ----
>   * RTL::             The intermediate representation that most passes
work on.
>   * Control Flow::    Maintaining and manipulating the control flow graph.
>   * Tree SSA::        Analysis and optimization of the tree
representation.
> + * Loop Representation:: Analysis and representation of loops
>   * Machine Desc::    How to write machine description instruction
patterns.
>   * Target Macros::   How to write the machine description C macros
and functions.
>   * Host Config::     Writing the @file{xm-@var{machine}.h} file.
> *************** Additional tutorial information is linke
> *** 141,146 ****
> --- 142,148 ----
>   @include passes.texi
>   @include c-tree.texi
>   @include tree-ssa.texi
> + @include loop.texi
>   @include rtl.texi
>   @include cfg.texi
>   @include md.texi
> Index: Makefile.in
> ===================================================================
> *** Makefile.in	(revision 115709)
> --- Makefile.in	(working copy)
> *************** TEXI_GCCINT_FILES = gccint.texi gcc-comm
> *** 3379,3385 ****
>   	 rtl.texi md.texi tm.texi hostconfig.texi fragments.texi	\
>   	 configfiles.texi collect2.texi headerdirs.texi funding.texi	\
>   	 gnu.texi gpl.texi fdl.texi contrib.texi languages.texi		\
> ! 	 sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi
>
>   TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi
>
> --- 3379,3386 ----
>   	 rtl.texi md.texi tm.texi hostconfig.texi fragments.texi	\
>   	 configfiles.texi collect2.texi headerdirs.texi funding.texi	\
>   	 gnu.texi gpl.texi fdl.texi contrib.texi languages.texi		\
> ! 	 sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi	\
> ! 	 loop.texi
>
>   TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]