This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Looping pattern docs
- To: gcc-patches at gcc dot gnu dot org
- Subject: Looping pattern docs
- From: Michael Hayes <m dot hayes at elec dot canterbury dot ac dot nz>
- Date: Sun, 17 Dec 2000 15:13:42 +1300 (NZDT)
Yonks ago I submitted the following documentation on looping patterns.
Is it OK to commit?
Michael.
Index: md.texi
===================================================================
RCS file: /cvs/gcc/egcs/gcc/md.texi,v
retrieving revision 1.52
diff -c -3 -p -r1.52 md.texi
*** md.texi 2000/12/04 18:42:59 1.52
--- md.texi 2000/12/17 00:10:10
*************** See the next chapter for information on
*** 32,37 ****
--- 32,38 ----
* Pattern Ordering:: When the order of patterns makes a difference.
* Dependent Patterns:: Having one pattern may make you need another.
* Jump Patterns:: Special considerations for patterns for jump insns.
+ * Looping Patterns:: How to define patterns for special looping insns.
* Insn Canonicalizations::Canonicalization of Instructions
* Expander Definitions::Generating a sequence of several RTL insns
for a standard operation.
*************** table it uses. Its assembler code norma
*** 2597,2602 ****
--- 2598,2644 ----
second operand, but you should incorporate it in the RTL pattern so
that the jump optimizer will not delete the table as unreachable code.
+
+ @cindex @code{decrement_and_branch_until_zero} instruction pattern
+ @item @samp{decrement_and_branch_until_zero}
+ Conditional branch instruction that decrements a register and
+ jumps if the register is non-zero. Operand 0 is the register to
+ decrement and test; operand 1 is the label to jump to if the
+ register is non-zero. @xref{Looping Patterns}
+
+ This optional instruction pattern is only used by the combiner,
+ typically for loops reversed by the loop optimizer when strength
+ reduction is enabled.
+
+ @cindex @code{doloop_end} instruction pattern
+ @item @samp{doloop_end}
+ Conditional branch instruction that decrements a register and jumps if
+ the register is non-zero. This instruction takes five operands: Operand
+ 0 is the register to decrement and test; operand 1 is the number of loop
+ iterations as a @code{const_int} or @code{const0_rtx} if this cannot be
+ determined until run-time; operand 2 is the actual or estimated maximum
+ number of iterations as a @code{const_int}; operand 3 is the number of
+ enclosed loops as a @code{const_int} (an innermost loop has a value of
+ 1); operand 4 is the label to jump to if the register is non-zero.
+ @xref{Looping Patterns}
+
+ This optional instruction pattern should be defined for machines with
+ low-overhead looping instructions as the loop optimizer will try to
+ modify suitable loops to utilize it. If nested low-overhead looping is
+ not supported, use a @code{define_expand} (@pxref{Expander Definitions})
+ and make the pattern fail if operand 3 is not @code{const1_rtx}.
+ Similarly, if the actual or estimated maximum number of iterations is
+ too large for this instruction, make it fail.
+
+ @cindex @code{doloop_begin} instruction pattern
+ @item @samp{doloop_begin}
+ Companion instruction to @code{doloop_end} required for machines that
+ need to perform some initialisation, such as loading special registers
+ used by a low-overhead looping instruction. If initialisation insns do
+ not always need to be emitted, use a @code{define_expand}
+ (@pxref{Expander Definitions}) and make it fail.
+
+
@cindex @code{canonicalize_funcptr_for_compare} instruction pattern
@item @samp{canonicalize_funcptr_for_compare}
Canonicalize the function pointer in operand 1 and store the result
*************** discussed above, we have the pattern
*** 3075,3080 ****
--- 3117,3227 ----
The @code{SELECT_CC_MODE} macro on the Sparc returns @code{CC_NOOVmode}
for comparisons whose argument is a @code{plus}.
+
+ @node Looping Patterns
+ @section Defining Looping Instruction Patterns
+ @cindex looping instruction patterns
+ @cindex defining looping instruction patterns
+
+ Some machines have special jump instructions that can be utilised to
+ make loops more efficient. A common example is the 68000 @samp{dbra}
+ instruction which performs a decrement of a register and a branch if the
+ result was greater than zero. Other machines, in particular digital
+ signal processors (DSPs), have special block repeat instructions to
+ provide low-overhead loop support. For example, the TI TMS320C3x/C4x
+ DSPs have a block repeat instruction that loads special registers to
+ mark the top and end of a loop and to count the number of loop
+ iterations. This avoids the need for fetching and executing a
+ @samp{dbra}-like instruction and avoids pipeline stalls asociated with
+ the jump.
+
+ GNU CC has three special named patterns to support low overhead looping,
+ @samp{decrement_and_branch_until_zero}, @samp{doloop_begin}, and
+ @samp{doloop_end}. The first pattern,
+ @samp{decrement_and_branch_until_zero}, is not emitted during RTL
+ generation but may be emitted during the instruction combination phase.
+ This requires the assistance of the loop optimizer, using information
+ collected during strength reduction, to reverse a loop to count down to
+ zero. Some targets also require the loop optimizer to add a
+ @code{REG_NONNEG} note to indicate that the iteration count is always
+ positive. This is needed if the target performs a signed loop
+ termination test. For example, the 68000 uses a pattern similar to the
+ following for its @code{dbra} instruction:
+
+ @smallexample
+ @group
+ (define_insn "decrement_and_branch_until_zero"
+ [(set (pc)
+ (if_then_else
+ (ge (plus:SI (match_operand:SI 0 "general_operand" "+d*am")
+ (const_int -1))
+ (const_int 0))
+ (label_ref (match_operand 1 "" ""))
+ (pc)))
+ (set (match_dup 0)
+ (plus:SI (match_dup 0)
+ (const_int -1)))]
+ "find_reg_note (insn, REG_NONNEG, 0)"
+ "...")
+ @end group
+ @end smallexample
+
+ Note that since the insn is both a jump insn and has an output, it must
+ deal with its own reloads, hence the `m' constraints. Also note that
+ since this insn is generated by the instruction combination phase
+ combining two sequential insns together into an implicit parallel insn,
+ the iteration counter needs to be biased by the same amount as the
+ decrement operation, in this case -1. Note that the following similar
+ pattern will not be matched by the combiner.
+
+ @smallexample
+ @group
+ (define_insn "decrement_and_branch_until_zero"
+ [(set (pc)
+ (if_then_else
+ (ge (match_operand:SI 0 "general_operand" "+d*am")
+ (const_int 1))
+ (label_ref (match_operand 1 "" ""))
+ (pc)))
+ (set (match_dup 0)
+ (plus:SI (match_dup 0)
+ (const_int -1)))]
+ "find_reg_note (insn, REG_NONNEG, 0)"
+ "...")
+ @end group
+ @end smallexample
+
+ The other two special looping patterns, @samp{doloop_begin} and
+ @samp{doloop_end}, are emitted by the loop optimiser for certain
+ well-behaved loops with a finite number of loop iterations using
+ information collected during strength reduction.
+
+ The @samp{doloop_end} pattern describes the actual looping instruction
+ (or the implicit looping operation) and the @samp{doloop_begin} pattern
+ is an optional companion pattern that can be used for initialisation
+ needed for some low-overhead looping instructions.
+
+ Note that some machines require the actual looping instruction to be
+ emitted at the top of the loop (e.g., the TMS320C3x/C4x DSPs). Emitting
+ the true RTL for a looping instruction at the top of the loop can cause
+ problems with flow analysis. So instead, a dummy @code{doloop} insn is
+ emitted at the end of the loop. The machine dependent reorg pass checks
+ for the presence of this @code{doloop} insn and then searches back to
+ the top of the loop, where it inserts the true looping insn (provided
+ there are no instructions in the loop which would cause problems). Any
+ additional labels can be emitted at this point. In addition, if the
+ desired special iteration counter register was not allocated, this
+ machine dependent reorg pass could emit a traditional compare and jump
+ instruction pair.
+
+ The essential difference between the
+ @samp{decrement_and_branch_until_zero} and the @samp{doloop_end}
+ patterns is that the loop optimizer allocates an additional pseudo
+ register for the latter as an iteration counter. This pseudo register
+ cannot be used within the loop (i.e., general induction variables cannot
+ be derived from it), however, in many cases the loop induction variable
+ may become redundant and removed by the flow pass.
+
@node Insn Canonicalizations
@section Canonicalization of Instructions