This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Re: 2nd try for patch for automaton based pipeline hazard recognizer (part #1)

To: jsm28 at cam dot ac dot uk
Subject: Re: 2nd try for patch for automaton based pipeline hazard recognizer (part #1)
From: Vladimir Makarov <vmakarov at toke dot toronto dot redhat dot com>
Date: Thu, 14 Jun 2001 13:35:01 -0400
CC: gcc-patches at gcc dot gnu dot org

> On Wed, 13 Jun 2001, Vladimir Makarov wrote:
> 
> >   I considerably modified the sources according to the most of Bernd
> > Schmidt's comments.  This is the second try to get approval of the
> > patch.  The patch has been successfully tested on i386 Linux and
> > Solaris machines.
> 
> I'll comment on the documentation only.
> 

Joseph, thank you for the valuable comments.

> You need to add something to the documentation on passes and source files 
> of the compiler in gcc.texi (once Daniel Berlin's patch to update that 
> section is in).
>

  I've add some comments.  In whole, it is too brief description.  So
practically nothing should be added.
  
> You should add more index entries.  For example, everywhere you define a 
> term, or document a construct, there should be an appropriate index entry.  
> There are some, but could be more.
> 
> > ! To achieve better productivity the most of modern processors
> > ! (super-pipelined, superscalar RISC, and VLIW processors) have many
> > ! @dfn{functional units} on which several instructions can be executed
> > ! simultaneously.  An instruction execution can be started only if its
> > ! issue conditions are satisfied.  If not, instruction is interlocked
> > ! until its conditions are satisfied.  Such an @dfn{interlock (pipeline)
> > ! delay} causes interruption of the fetching of successor instructions
> 
> E.g., this paragraph should have index entries for both "functional units" 
> and "interlock (pipeline) delay".  Other entries, such as for VLIW, might 
> be a good idea as well.
> 

I added a lot of them.

> > ! (or demands @var{nop} instructions, e.g. for some MIPS processors).
> 
> Is "nop" here really a metasyntactic variable?
> 
> > ! data are not evaluated but will be ready till given time after the
> 
> "will be ready till given time" is bad English, or doesn't mean what you 
> want it to mean - it implies that the data stop being available at the 
> given time, not start being available.
> 
> > ! instruction execution start).  Taking into account of the data
> > ! dependence delays is simple.  Data dependence (true, output, and
> 
> "Taking the data dependence delays into account ..."
> 
> > ! anti-dependence) delay between two instructions is given by constant.
> 
> "by a constant"?
> 
> > ! In the most cases this approach is adequate.  The second kind of
> > ! interlock delays is reservation delay.  Two such way dependent
> 
> "such way dependent" is bad English.
> 
> > ! which are reserved for some time.  Taking into account of this kind of
> > ! delay is complex especially for modern RISC processors.
> 
> "Taking this kind of delay into account ..."
> 
> > ! The task of exploiting more processor parallelism is solved by
> > ! instruction scheduler.  For better solution of this problem, the
> 
> "by the instruction scheduler" or "by an instruction scheduler".
> 
> > ! instruction scheduler has to have adequate description of processor
> > ! parallelism (or @dfn{pipeline description}).  Currently GCC has two
> > ! ways to describe processor parallelism.  The first one is old and
> > ! originated from instruction scheduler written by Michael Tiemann and
> > ! described in the first subsequent section.  The second one is new and
> 
> "next section".
> 
> Do not say "new" here, or people will be reading this section in 10 years 
> and still seeing it say it is "new".  It would be better to give dates, 
> references to the literature on which this work is based, and describe the 
> advantages of the new model.  And you should update your entry in 
> contrib.texi to describe this work.
> 
> > ! Gcc instruction scheduler uses @dfn{pipeline hazard recognizer} to
> 
> "The GCC instruction scheduler uses a".
> 
> I've omitted further comments on missing "a" and "the" and grammar below.
>

  I changed all mentioned above.  Acctually you are polite.  I know my
english is terrible.  This is not my native language.  Especially, I
have problems with the articles.  I don't feel them.  But I've tried
to change/add/remove the articles as I feel them.  I am not sure about
the result.

> > + (define_cpu_unit @var{unit-names} [@var{automaton-name}])
> > + @end smallexample
> > + 
> > + @var{names} is a string giving the names of the functional units
> 
> You use @var{unit-names} then @var{names}.  You need to be consistent 
> here.  The same applies below; further such comments omitted.
> 
> > + separated by commas.  Don't use name @dfn{nothing}, it is reserved for
> 
> You aren't defining the word "nothing" so @dfn is inappropriate.  Use 
> @samp or @code.
> 
> > + @var{regexp} is string describing reservation of cpu functional units
> > + by the instruction.  The reservations are described by a regular
> > + expression according the following syntax:
> 
> It would be better to have a Texinfo mechanism designed for this sort of 
> syntax production, but unless someone wants to add one to Texinfo the 
> approach here seems reasonable.
> 
> > + @samp{","} is used for describing start of the next cycle in
> > + reservation.
> 
> Note that @samp adds (in the DVI output) quotes around the contained text.  
> Unless the quotes already in "," are a literal part of the syntax - and 
> from the examples they don't seem to be, you should be using just @samp{,} 
> and similarly for the items below.
> 
> > + first regular expression *or* the reservation described by the second
> > + regular expression *or* etc.
> 
> Don't use ASCII markup when Texinfo provides proper facilities.  Use @emph
> or @strong.  I think you need to re-read the "Marking Text" section in the
> Texinfo manual, and try to use its facilities better.  Also, run "make
> dvi" and look at the formatted output for your text.  TeX is designed to
> produce very high quality typeset output, but you need to put a fair
> amount of effort into the text you produce to make the typeset results
> look good.
> 
> > + @samp{reservation_name} -- see description of construction
> > + @samp{define_reservation}.
> 
> Use TeX dashes - that is, --- (not surrounded by spaces) for an em dash.
> 
> > + unit names and can not be reserved name @dfn{nothing}.
> 
> Again, this isn't a definition of the word "nothing".
> 
> > + treatment of operator `|' in the regular expressions.  The usual
> 
> @samp{|}.
> 
> > + @findex MD_AUTOMATON_SCHED_INIT
> > + @item MD_AUTOMATON_SCHED_INIT (@var{file}, @var{verbose})
> > + Like @samp{MD_SCHED_INIT} but used only for automaton based
> > + pipeline description.
> 
> In this sort of case, you should describe the macros together, using 
> @itemx.
> 
> > + be executed in pipelines @samp{A} or @samp{B}, some insns only in
> > + pipeline @samp{B} or @samp{C}, and one insn in pipeline @samp{B}.  The
> > + processor may issue the 1st insn into @samp{A}, the 2nd one into
> > + @samp{B}.  In this case, the 3rd insn will wait for freeing @samp{B}
> > + until the next cycle.  If the scheduler issues the 3rd insn the first,
> > + the processor could issue all 3 insns per cycle.
> 
> In this case, I think @var would be better than @samp.
> 

I took all your comments into accout.  Here the result is

Vladimir

Here the changelog entry decumenation part is

	* doc/md.texi: Description of automaton based model.
	
	* doc/tm.texi (USE_AUTOMATON_PIPELINE_INTERFACE,
	MD_AUTOMATON_SCHED_INIT, MD_AUTOMATON_SCHED_REORDER,
	DFA_SCHEDULER_PRE_CYCLE_INSN, DFA_SCHEDULER_POST_CYCLE_INSN,
	FIRST_CYCLE_MULTIPASS_SCHEDULING,
	FIRST_CYCLE_MULTIPASS_SCHEDULING_LOOKAHEAD,
	INIT_SCHEDULER_BUBBLES, SCHEDULER_BUBBLE): The new macro
	descriptions.
	(ISSUE_RATE, MD_SCHED_INIT, MD_SCHED_REORDER, MD_SCHED_REORDER2,
	MD_SCHED_VARIABLE_ISSUE): Add comment.
	
	* doc/contrib.texi: Add dfa based scheduler contribution.

	* doc/gcc.texi: Add more information about genattrtab.
	

Index: contrib.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/contrib.texi,v
retrieving revision 1.5
diff -u -p -r1.5 contrib.texi
--- contrib.texi	2001/06/13 15:15:24	1.5
+++ contrib.texi	2001/06/13 23:31:21
@@ -313,9 +313,10 @@ Andrew MacLeod for his ongoing work in b
 various code generation improvements, work on the global optimizer, etc.
 
 @item
-Vladimir Makarov for hacking some ugly i960 problems, PowerPC
-hacking improvements to compile-time performance and overall knowledge
-and direction in the area of instruction scheduling.
+Vladimir Makarov for hacking some ugly i960 problems, PowerPC hacking
+improvements to compile-time performance, overall knowledge and
+direction in the area of instruction scheduling, and design and
+implementation of automaton based instruction scheduler.
 
 @item
 Bob Manson for his behind the scenes work on dejagnu.
Index: gcc.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/gcc.texi,v
retrieving revision 1.8
diff -u -p -r1.8 gcc.texi
--- gcc.texi	2001/06/12 22:40:00	1.8
+++ gcc.texi	2001/06/13 23:31:22
@@ -3611,8 +3611,10 @@ Several passes use instruction attribute
 attributes defined for a particular machine is in file
 @file{insn-attr.h}, which is generated from the machine description by
 the program @file{genattr}.  The file @file{insn-attrtab.c} contains
-subroutines to obtain the attribute values for insns.  It is generated
-from the machine description by the program @file{genattrtab}.@refill
+subroutines to obtain the attribute values for insns and information
+about processor pipeline characteristics for the instruction scheduler.
+It is generated from the machine description by the program
+@file{genattrtab}.@refill
 @end itemize
 @end ifset
 
bash$ cvs diff -c -p md.texi tm.texi contrib.texi gcc.texi
Enter passphrase for RSA key 'vmakarov@toke.to.cygnus.com': xrm55p

Warning: Remote host denied X11 forwarding, perhaps xauth program could not be run on the server side.
Index: md.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/md.texi,v
retrieving revision 1.3
diff -c -p -r1.3 md.texi
*** md.texi	2001/06/11 20:52:30	1.3
--- md.texi	2001/06/14 17:20:20
*************** in the compiler.@refill
*** 3607,3616 ****
  There are two cases where you should specify how to split a pattern into
  multiple insns.  On machines that have instructions requiring delay
  slots (@pxref{Delay Slots}) or that have instructions whose output is
! not available for multiple cycles (@pxref{Function Units}), the compiler
! phases that optimize these cases need to be able to move insns into
! one-instruction delay slots.  However, some insns may generate more than one
! machine instruction.  These insns cannot be placed into a delay slot.
  
  Often you can rewrite the single insn as a list of individual insns,
  each corresponding to one machine instruction.  The disadvantage of
--- 3607,3617 ----
  There are two cases where you should specify how to split a pattern into
  multiple insns.  On machines that have instructions requiring delay
  slots (@pxref{Delay Slots}) or that have instructions whose output is
! not available for multiple cycles (@pxref{Processor pipeline description}),
! the compiler phases that optimize these cases need to be able to move
! insns into one-instruction delay slots.  However, some insns may
! generate more than one machine instruction.  These insns cannot be
! placed into a delay slot.
  
  Often you can rewrite the single insn as a list of individual insns,
  each corresponding to one machine instruction.  The disadvantage of
*************** to track the condition codes.
*** 4140,4146 ****
  * Insn Lengths::        Computing the length of insns.
  * Constant Attributes:: Defining attributes that are constant.
  * Delay Slots::         Defining delay slots required for a machine.
! * Function Units::      Specifying information for insn scheduling.
  @end menu
  
  @node Defining Attributes
--- 4141,4147 ----
  * Insn Lengths::        Computing the length of insns.
  * Constant Attributes:: Defining attributes that are constant.
  * Delay Slots::         Defining delay slots required for a machine.
! * Processor pipeline description::      Specifying information for insn scheduling.
  @end menu
  
  @node Defining Attributes
*************** overrides on specific instructions (@pxr
*** 4230,4238 ****
  @cindex @code{const_string} and attributes
  @item (const_string @var{value})
  The string @var{value} specifies a constant attribute value.
! If @var{value} is specified as @samp{"*"}, it means that the default value of
  the attribute is to be used for the insn containing this expression.
! @samp{"*"} obviously cannot be used in the @var{default} expression
  of a @code{define_attr}.@refill
  
  If the attribute whose value is being specified is numeric, @var{value}
--- 4231,4239 ----
  @cindex @code{const_string} and attributes
  @item (const_string @var{value})
  The string @var{value} specifies a constant attribute value.
! If @var{value} is specified as @samp{*}, it means that the default value of
  the attribute is to be used for the insn containing this expression.
! @samp{*} obviously cannot be used in the @var{default} expression
  of a @code{define_attr}.@refill
  
  If the attribute whose value is being specified is numeric, @var{value}
*************** branch is true, we might represent this 
*** 4770,4783 ****
  @end smallexample
  @c the above is *still* too long.  --mew 4feb93
  
! @node Function Units
! @subsection Specifying Function Units
  @cindex function units, for scheduling
  
! On most RISC machines, there are instructions whose results are not
! available for a specific number of cycles.  Common cases are instructions
! that load data from memory.  On many machines, a pipeline stall will result
! if the data is referenced too soon after the load instruction.
  
  In addition, many newer microprocessors have multiple function units, usually
  one for integer and one for floating point, and often will incur pipeline
--- 4771,4871 ----
  @end smallexample
  @c the above is *still* too long.  --mew 4feb93
  
! @node Processor pipeline description
! @subsection Specifying processor pipeline description
! @cindex processor pipeline description
! @cindex processor functional units
! @cindex instruction latency time
! @cindex interlock delays
! @cindex data dependence delays
! @cindex reservation delays
! @cindex pipeline hazard recognizer
! @cindex automaton based pipeline description
! @cindex regular expressions
! @cindex deterministic finite state automaton
! @cindex automaton based scheduler
! @cindex RISC
! @cindex VLIW
! 
! To achieve better productivity the most of modern processors
! (super-pipelined, superscalar @acronym{RISC}, and @acronym{VLIW}
! processors) have many @dfn{functional units} on which several
! instructions can be executed simultaneously.  An instruction execution
! can be started only if its issue conditions are satisfied.  If not,
! the instruction is interlocked until its conditions are satisfied.
! Such @dfn{interlock (pipeline) delay} causes interruption of the
! fetching of successor instructions (or demands nop instructions,
! e.g. for some MIPS processors).
! 
! There are two major kinds of interlock delays in modern processors.
! The first one is a data dependence delay determining @dfn{instruction
! latency time}.  The instruction execution is not started until all
! source data have been evaluated by the previous instructions (there
! are more complex cases when the instruction execution starts even when
! the data are not availaible but will be ready in given time after the
! instruction execution start).  Taking the data dependence delays into
! account is simple.  The data dependence (true, output, and
! anti-dependence) delay between two instructions is given by a
! constant.  In the most of cases this approach is adequate.  The second
! kind of interlock delays is a reservation delay.  The reservation delay
! means that two instructions under execution will be in need of shared
! processors resources, i.e. buses, internal registers, and/or
! functional units, which are reserved for some time.  Taking this kind
! of delay into account is complex especially for modern @acronym{RISC}
! processors.
! 
! The task of exploiting more processor parallelism is solved by an
! instruction scheduler.  For better solution of this problem, the
! instruction scheduler has to have an adequate description of the
! processor parallelism (or @dfn{pipeline description}).  Currently GCC
! has two ways to describe processor parallelism.  The first one is old
! and originated from instruction scheduler written by Michael Tiemann
! and described in the first subsequent section.  The second one was
! created later.  It is based on description of functional unit
! reservations by processor instructions with the aid of @dfn{regular
! expressions}.  This is so called @dfn{automaton based description}.
! 
! Gcc instruction scheduler uses a @dfn{pipeline hazard recognizer} to
! figure out the possibility of the instruction issue by the processor
! on given simulated processor cycle.  The pipeline hazard recognizer is
! a code generated from the processor pipeline description.  The
! pipeline hazard recognizer generated from the automaton based
! description is more sophisticated and based on deterministic finite
! state automaton (@acronym{DFA}) and therefore faster than one
! generated from the old description.  Also its speed is not depended on
! processor complexity.  The instruction issue is possible if there is
! a transition from one automaton state to another one.
! 
! You can use any model to describe processor pipeline characteristics
! or even a mix of them.  You could use the old description for some
! processor submodels and the @acronym{DFA}-based one for the rest
! processor submodels.
! 
! In general, the usage of the automaton based description is more
! preferable.  Its model is more rich.  It permits to describe more
! accurately pipeline characteristics of processors which results in
! improving code quality (although sometimes only on several percent
! fractions).  It will be also used as an infrastructure to implement
! sophisticated and practical insn scheduling which will try many
! instruction sequences to choose the best one.
! 
! 
! @menu
! * Old pipeline description:: Specifying information for insn scheduling.
! * Automaton pipeline description:: Describing insn pipeline characteristics.
! * Comparison of the two descriptions:: Drawbacks of the old pipeline description
! @end menu
! 
! @node Old pipeline description
! @subsubsection Specifying Function Units
! @cindex old pipeline description
  @cindex function units, for scheduling
  
! On most @acronym{RISC} machines, there are instructions whose results
! are not available for a specific number of cycles.  Common cases are
! instructions that load data from memory.  On many machines, a pipeline
! stall will result if the data is referenced too soon after the load
! instruction.
  
  In addition, many newer microprocessors have multiple function units, usually
  one for integer and one for floating point, and often will incur pipeline
*************** due to function unit conflicts.
*** 4791,4803 ****
  
  For the purposes of the specifications in this section, a machine is
  divided into @dfn{function units}, each of which execute a specific
! class of instructions in first-in-first-out order.  Function units that
! accept one instruction each cycle and allow a result to be used in the
! succeeding instruction (usually via forwarding) need not be specified.
! Classic RISC microprocessors will normally have a single function unit,
! which we can call @samp{memory}.  The newer ``superscalar'' processors
! will often have function units for floating point operations, usually at
! least a floating point adder and multiplier.
  
  @findex define_function_unit
  Each usage of a function units by a class of insns is specified with a
--- 4879,4892 ----
  
  For the purposes of the specifications in this section, a machine is
  divided into @dfn{function units}, each of which execute a specific
! class of instructions in first-in-first-out order.  Function units
! that accept one instruction each cycle and allow a result to be used
! in the succeeding instruction (usually via forwarding) need not be
! specified.  Classic @acronym{RISC} microprocessors will normally have
! a single function unit, which we can call @samp{memory}.  The newer
! ``superscalar'' processors will often have function units for floating
! point operations, usually at least a floating point adder and
! multiplier.
  
  @findex define_function_unit
  Each usage of a function units by a class of insns is specified with a
*************** Typical uses of this vector are where a 
*** 4860,4869 ****
  pipeline either single- or double-precision operations, but not both, or
  where a memory unit can pipeline loads, but not stores, etc.
  
! As an example, consider a classic RISC machine where the result of a
! load instruction is not available for two cycles (a single ``delay''
! instruction is required) and where only one load instruction can be executed
! simultaneously.  This would be specified as:
  
  @smallexample
  (define_function_unit "memory" 1 1 (eq_attr "type" "load") 2 0)
--- 4949,4958 ----
  pipeline either single- or double-precision operations, but not both, or
  where a memory unit can pipeline loads, but not stores, etc.
  
! As an example, consider a classic @acronym{RISC} machine where the
! result of a load instruction is not available for two cycles (a single
! ``delay'' instruction is required) and where only one load instruction
! can be executed simultaneously.  This would be specified as:
  
  @smallexample
  (define_function_unit "memory" 1 1 (eq_attr "type" "load") 2 0)
*************** units.  These insns will cause a potenti
*** 4888,4893 ****
--- 4977,5350 ----
  used during their execution and there is no way of representing that
  conflict.  We welcome any examples of how function unit conflicts work
  in such processors and suggestions for their representation.
+ 
+ @node Automaton pipeline description
+ @subsubsection Describing instruction pipeline characteristics
+ @cindex automaton based pipeline description
+ 
+ This section describes constructions of the automaton based processor
+ pipeline description.  The order of all mentioned below constructions
+ in the machine description file is not important.
+ 
+ @findex define_automaton
+ @cindex pipeline hazard recognizer
+ The following optional construction describes names of automata
+ generated and used for the pipeline hazards recognition.  Sometimes
+ the generated finite state automaton used by the pipeline hazard
+ recognizer is large.  If we use more one automaton and bind functional
+ units to the automata, the summary size of the automata usually is
+ less than the size of the single automaton.  If there is no one such
+ construction, only one finite state automaton is generated.
+ 
+ @smallexample
+ (define_automaton @var{automata-names})
+ @end smallexample
+ 
+ @var{automata-names} is a string giving names of the automata.  The
+ names are separated by commas.  All the automata should have unique names.
+ The automaton name is used in construction @code{define_cpu_unit} and
+ @code{define_query_cpu_unit}.
+ 
+ @findex define_cpu_unit
+ @cindex processor functional units
+ Each processor functional unit used in description of instruction
+ reservations should be described by the following construction.
+ 
+ @smallexample
+ (define_cpu_unit @var{unit-names} [@var{automaton-name}])
+ @end smallexample
+ 
+ @var{unit-names} is a string giving the names of the functional units
+ separated by commas.  Don't use name @samp{nothing}, it is reserved
+ for other goals.
+ 
+ @var{automaton-name} is a string giving the name of automaton with
+ which the unit is bound.  The automaton should be described in
+ construction @code{define_automaton}.  You should give
+ @dfn{automaton-name}, if there is a defined automaton.
+ 
+ @findex define_query_cpu_unit
+ @cindex querying function unit reservations
+ The following construction describes CPU functional units analogously
+ to @code{define_cpu_unit}.  If we use automata without their
+ minimization, the reservation of such units can be queried for an
+ automaton state.  The instruction scheduler never queries reservation
+ of functional units for given automaton state.  So as a rule, you
+ don't need this construction.  This construction could be used for
+ future code generation goals (e.g. to generate @acronym{VLIW} insn
+ templates).
+ 
+ @smallexample
+ (define_query_cpu_unit @var{unit-names} [@var{automaton-name}])
+ @end smallexample
+ 
+ @var{unit-names} is a string giving names of the functional units
+ separated by commas.
+ 
+ @var{automaton-name} is a string giving name of the automaton with
+ which the unit is bound.
+ 
+ @findex define_insn_reservation
+ @cindex instruction latency time
+ @cindex regular expressions
+ @cindex data bypass
+ The following construction is major one to describe pipeline
+ characteristics of an instruction.
+ 
+ @smallexample
+ (define_insn_reservation @var{insn-name} @var{default_latency}
+                          @var{condition} @var{regexp})
+ @end smallexample
+ 
+ @var{default_latency} is a number giving latency time of the
+ instruction.
+ 
+ @var{insn-names} is a string giving internal name of the insn.  The
+ internal names are used in constructions @code{define_bypass} and in
+ the automaton description file generated for debugging.  The internal
+ name has nothing common with the names in @code{define_insn}.  It is a
+ good practice to use insn classes described in the processor manual.
+ 
+ @var{condition} defines what RTL insns are described by this
+ construction.
+ 
+ @var{regexp} is a string describing reservation of the cpu functional
+ units by the instruction.  The reservations are described by a regular
+ expression according to the following syntax:
+ 
+ @smallexample
+        regexp = regexp "," oneof
+               | oneof
+ 
+        oneof = oneof "|" allof
+              | allof
+ 
+        allof = allof "+" repeat
+              | repeat
+  
+        repeat = element "*" number
+               | element
+ 
+        element = cpu_function_unit_name
+                | reservation_name
+                | result_name
+                | "nothing"
+                | "(" regexp ")"
+ @end smallexample
+ 
+ @itemize @bullet
+ @item
+ @samp{,} is used for describing the start of the next cycle in
+ the reservation.
+ 
+ @item
+ @samp{|} is used for describing a reservation described by the first
+ regular expression @strong{or} a reservation described by the second
+ regular expression @strong{or} etc.
+ 
+ @item
+ @samp{+} is used for describing a reservation described by the first
+ regular expression @strong{and} a reservation described by the
+ second regular expression @strong{and} etc.
+ 
+ @item
+ @samp{*} is used for convenience and simply means a sequence in which
+ the regular expression are repeated @var{number} times with cycle
+ advancing (see @samp{,}).
+ 
+ @item
+ @samp{cpu_function_unit_name} denotes reservation of the named
+ functional unit.
+ 
+ @item
+ @samp{reservation_name} --- see description of construction
+ @samp{define_reservation}.
+ 
+ @item
+ @samp{nothing} denotes no unit reservations.
+ @end itemize
+ 
+ @findex define_reservation
+ Sometimes unit reservations for different insns contain common parts.
+ In such case, you can simplify the pipeline description by describing
+ the common part by the following construction
+ 
+ @smallexample
+ (define_reservation @var{reservation-name} @var{regexp})
+ @end smallexample
+ 
+ @var{reservation-name} is a string giving name of @var{regexp}.
+ Functional unit names and reservation names are in the same name
+ space.  So the reservation names should be different from the
+ functional unit names and can not be reserved name @samp{nothing}.
+ 
+ @findex define_bypass
+ @cindex instruction latency time
+ @cindex data bypass
+ The following construction is used to describe exceptions in the
+ latency time for given instruction pair.  This is so called bypasses.
+ 
+ @smallexample
+ (define_bypass @var{number} @var{out_insn_names} @var{in_insn_names}
+                [@var{guard}])
+ @end smallexample
+ 
+ @var{number} defines when the result generated by the instructions
+ given in string @var{out_insn_names} will be ready for the
+ instructions given in string @var{in_insn_names}.  The instructions in
+ the string are separated by commas.
+ 
+ @var{guard} is an optional string giving name of a C function which
+ defines an additional guard for the bypass.  The function will get the
+ two insns as parameters.  If the function returns zero the bypass will
+ be ignored for this case.  The additional guard is necessary to
+ recognize complicated bypasses, e.g. when consumer is only an address
+ of insn @samp{store} (not a stored value).
+ 
+ @findex exclusion_set
+ @findex presence_set
+ @findex absence_set
+ @cindex VLIW
+ @cindex RISC
+ Usually the following three constructions are used to describe
+ @acronym{VLIW} processors (more correctly to describe a placement of
+ small insns into @acronym{VLIW} insn slots).  Although they can be
+ used for @acronym{RISC} processors too.
+ 
+ @smallexample
+ (exclusion_set @var{unit-names} @var{unit-names})
+ (presence_set @var{unit-names} @var{unit-names})
+ (absence_set @var{unit-names} @var{unit-names})
+ @end smallexample
+ 
+ @var{unit-names} is a string giving names of functional units
+ separated by commas.
+ 
+ The first construction (@samp{exclusion_set}) means that each
+ functional unit in the first string can not be reserved simultaneously
+ with a unit whose name is in the second string and vice versa.  For
+ example, the construction is useful for describing processors
+ (e.g. some SPARC processors) with a fully pipelined floating point
+ functional unit which can execute simultaneously only single floating
+ point insns or only double floating point insns.
+ 
+ The second construction (@samp{presence_set}) means that each
+ functional unit in the first string can not be reserved unless at
+ least one of units whose names are in the second string is reserved.
+ This is an asymmetric relation.  For example, it is useful for
+ description that @acronym{VLIW} @samp{slot1} is reserved after
+ @samp{slot0} reservation.
+ 
+ The third construction (@samp{absence_set}) means that each functional
+ unit in the first string can be reserved only if each unit whose name
+ is in the second string is not reserved.  This is an asymmetric
+ relation (actually @samp{exclusion_set} is analogous to this one but
+ it is symmetric).  For example, it is useful for description that
+ @acronym{VLIW} @samp{slot0} can not be reserved after @samp{slot1} or
+ @samp{slot2} reservation.
+ 
+ @findex automata_option
+ @cindex deterministic finite state automaton
+ @cindex nondeterministic finite state automaton
+ @cindex finite state automaton minimization
+ You can control the generator of the pipeline hazard recognizer with
+ the following construction.
+ 
+ @smallexample
+ (automata_option @var{options})
+ @end smallexample
+ 
+ @var{options} is a string giving options which affect the generated
+ code.  Currently there are the following options:
+ 
+ @itemize @bullet
+ @item
+ @dfn{no-minimization} makes no minimization of the automaton.  This is
+ only worth to do when we are going to query CPU functional unit
+ reservations in an automaton state.
+ 
+ @item
+ @dfn{w} means a generation of the file describing the result
+ automaton.  The file can be used to verify the description.
+ 
+ @item
+ @dfn{ndfa} makes nondeterministic finite state automata.  This affects
+ the treatment of operator @samp{|} in the regular expressions.  The
+ usual treatment of the operator is to try the first alternative and,
+ if the reservation is not possible, the second alternative.  The
+ nondeterministic treatment means trying all alternatives, some of them
+ may be rejected by reservations in the subsequent insns.  You can not
+ query functional unit reservations in nondeterministic automaton
+ states.
+ @end itemize
+ 
+ As an example, consider a superscalar @acronym{RISC} machine which can
+ issue three insns (two integer insns and one floating point insn) on
+ the cycle but can finish only two insns.  To describe this, we define
+ the following functional units.
+ 
+ @smallexample
+ (define_cpu_unit "i0_pipeline, i1_pipeline, f_pipeline")
+ (define_cpu_unit "port_0, port1")
+ @end smallexample
+ 
+ All simple integer insns can be executed in any integer pipeline and
+ their result is ready in two cycles.  The simple integer insns are
+ issued into the first pipeline unless it is reserved, otherwise they
+ are issued into the second pipeline.  Integer division and
+ multiplication insns can be executed only in the second integer
+ pipeline and their results are ready correspondingly in 8 and 4
+ cycles.  The integer division is not pipelined, i.e. the subsequent
+ integer division insn can not be issued until the current division
+ insn finished.  Floating point insns are fully pipelined and their
+ results are ready in 3 cycles.  There is also additional one cycle
+ delay in the usage by integer insns of result produced by floating
+ point insns.  To describe all of this we could specify
+ 
+ @smallexample
+ (define_cpu_unit "div")
+ 
+ (define_insn_reservation "simple" 2 (eq_attr "cpu" "int")
+                          "(i0_pipeline | i1_pipeline), (port_0 | port1)")
+ 
+ (define_insn_reservation "mult" 4 (eq_attr "cpu" "mult")
+                          "i1_pipeline, nothing*3, (port_0 | port1)")
+ 
+ (define_insn_reservation "div" 8 (eq_attr "cpu" "div")
+                          "i1_pipeline, div*7, (port_0 | port1)")
+ 
+ (define_insn_reservation "float" 3 (eq_attr "cpu" "float")
+                          "f_pipeline, nothing, (port_0 | port1))
+ 
+ (define_bypass 4 "float" "simple,mut,div")
+ @end smallexample
+ 
+ To simplify the description we could describe the following reservation
+ 
+ @smallexample
+ (define_reservation "finish" "port0|port1")
+ @end smallexample
+ 
+ and use it in all @code{define_insn_reservation} as in the following
+ construction
+ 
+ @smallexample
+ (define_insn_reservation "simple" 2 (eq_attr "cpu" "int")
+                          "(i0_pipeline | i1_pipeline), finish")
+ @end smallexample
+ 
+ 
+ @node Comparison of the two descriptions
+ @subsubsection Drawbacks of the old pipeline description
+ @cindex old pipeline description
+ @cindex automaton based pipeline description
+ @cindex processor functional units
+ @cindex interlock delays
+ @cindex instruction latency time
+ @cindex pipeline hazard recognizer
+ @cindex data bypass
+ 
+ The old instruction level parallelism description and the pipeline
+ hazards recognizer based on it have the following drawbacks in
+ comparison with the @acronym{DFA}-based ones:
+   
+ @itemize @bullet
+ @item
+ Each functional unit is believed to be reserved at the instruction
+ execution start.  This is a very inaccurate model for modern
+ processors.
+ 
+ @item
+ An inadequate description of instruction latency times.  The latency
+ time is bound with a functional unit reserved by an instruction not
+ with the instruction itself.  In other words, the description is
+ oriented to describe at most one unit reservation by each instruction.
+ It also does not permit to describe special bypasses between
+ instruction pairs.
+ 
+ @item
+ The implementation of the pipeline hazard recognizer interface has
+ constraints on number of functional units.  This is a number of bits
+ in integer on the host machine.
+ 
+ @item
+ The interface to the pipeline hazard recognizer is more complex than
+ one to the automaton based pipeline recognizer.
+ 
+ @item
+ An unnatural description when you write an unit and a condition which
+ selects instructions using the unit.  Writing all unit reservations
+ for an instruction (an instruction class) is more natural.
+ 
+ @item
+ The recognition of the interlock delays has slow implementation.  GCC
+ scheduler supports structures which describe the unit reservations.
+ The more processor has functional units, the slower pipeline hazard
+ recognizer.  Such implementation would become slower when we enable to
+ reserve functional units not only at the instruction execution start.
+ The automaton based pipeline hazard recognizer speed is not depended
+ on processor complexity.
+ @end itemize
  @end ifset
  
  @node Conditional Execution
Index: tm.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v
retrieving revision 1.2
diff -c -p -r1.2 tm.texi
*** tm.texi	2001/06/07 16:41:27	1.2
--- tm.texi	2001/06/14 17:20:21
*************** symbols must be explicitly imported from
*** 8343,8363 ****
  A C statement that adds to @var{CLOBBERS} @code{STRING_CST} trees for
  any hard regs the port wishes to automatically clobber for all asms.
  
  @findex ISSUE_RATE
  @item ISSUE_RATE
  A C expression that returns how many instructions can be issued at the
  same time if the machine is a superscalar machine.
  
  @findex MD_SCHED_INIT
  @item MD_SCHED_INIT (@var{file}, @var{verbose}, @var{max_ready})
! A C statement which is executed by the scheduler at the
! beginning of each block of instructions that are to be scheduled.
! @var{file} is either a null pointer, or a stdio stream to write any
! debug output to.  @var{verbose} is the verbose level provided by
  @samp{-fsched-verbose-}@var{n}.  @var{max_ready} is the maximum number
  of insns in the current scheduling region that can be live at the same
  time.  This can be used to allocate scratch space if it is needed.
  
  @findex MD_SCHED_FINISH
  @item MD_SCHED_FINISH (@var{file}, @var{verbose})
  A C statement which is executed by the scheduler at the end of each block
--- 8343,8383 ----
  A C statement that adds to @var{CLOBBERS} @code{STRING_CST} trees for
  any hard regs the port wishes to automatically clobber for all asms.
  
+ @findex USE_AUTOMATON_PIPELINE_INTERFACE
+ @item USE_AUTOMATON_PIPELINE_INTERFACE
+ @cindex automaton based pipeline description
+ @cindex old pipeline description
+ A C expression that is used only when the machine description file
+ contains the old pipeline description and the automaton based one
+ (@pxref{Processor pipeline description,,Specifying processor pipeline
+ description}).  If the expression returns nonzero, the automaton based
+ pipeline description is used for insn scheduling, otherwise the old
+ pipeline description is used.  The default value is one.  In other
+ words, by default the automaton based pipeline description will be
+ always used.
+ 
  @findex ISSUE_RATE
  @item ISSUE_RATE
  A C expression that returns how many instructions can be issued at the
  same time if the machine is a superscalar machine.
  
+ This is used only for old pipeline description.
+ 
  @findex MD_SCHED_INIT
+ @findex MD_AUTOMATON_SCHED_INIT
  @item MD_SCHED_INIT (@var{file}, @var{verbose}, @var{max_ready})
! @itemx MD_AUTOMATON_SCHED_INIT (@var{file}, @var{verbose})
! C statements which are executed by the scheduler at the beginning of
! each block of instructions that are to be scheduled.  @var{file} is
! either a null pointer, or a stdio stream to write any debug output to.
! @var{verbose} is the verbose level provided by
  @samp{-fsched-verbose-}@var{n}.  @var{max_ready} is the maximum number
  of insns in the current scheduling region that can be live at the same
  time.  This can be used to allocate scratch space if it is needed.
  
+ The first macro is used only for the old pipeline description.  The
+ second one is used only for the automaton based pipeline description.
+ 
  @findex MD_SCHED_FINISH
  @item MD_SCHED_FINISH (@var{file}, @var{verbose})
  A C statement which is executed by the scheduler at the end of each block
*************** debug output to.  @var{verbose} is the v
*** 8368,8387 ****
  @samp{-fsched-verbose-}@var{n}.
  
  @findex MD_SCHED_REORDER
  @item MD_SCHED_REORDER (@var{file}, @var{verbose}, @var{ready}, @var{n_ready}, @var{clock}, @var{can_issue_more})
! A C statement which is executed by the scheduler after it
! has scheduled the ready list to allow the machine description to reorder
  it (for example to combine two small instructions together on
! @samp{VLIW} machines).  @var{file} is either a null pointer, or a stdio
! stream to write any debug output to.  @var{verbose} is the verbose level
! provided by @samp{-fsched-verbose-}@var{n}.  @var{ready} is a pointer to
! the ready list of instructions that are ready to be scheduled.
! @var{n_ready} is the number of elements in the ready list.  The
! scheduler reads the ready list in reverse order, starting with
  @var{ready}[@var{n_ready}-1] and going to @var{ready}[0].  @var{clock}
  is the timer tick of the scheduler.  @var{can_issue_more} is an output
! parameter that is set to the number of insns that can issue this clock;
! normally this is just @code{issue_rate}.  See also @samp{MD_SCHED_REORDER2}.
  
  @findex MD_SCHED_REORDER2
  @item MD_SCHED_REORDER2 (@var{file}, @var{verbose}, @var{ready}, @var{n_ready}, @var{clock}, @var{can_issue_more})
--- 8388,8415 ----
  @samp{-fsched-verbose-}@var{n}.
  
  @findex MD_SCHED_REORDER
+ @findex MD_AUTOMATON_SCHED_REORDER
+ @cindex RISC
+ @cindex VLIW
  @item MD_SCHED_REORDER (@var{file}, @var{verbose}, @var{ready}, @var{n_ready}, @var{clock}, @var{can_issue_more})
! @itemx MD_AUTOMATON_SCHED_REORDER (@var{file}, @var{verbose}, @var{ready}, @var{n_ready}, @var{clock})
! C statements which are executed by the scheduler after it has
! scheduled the ready list to allow the machine description to reorder
  it (for example to combine two small instructions together on
! @acronym{VLIW} machines).  @var{file} is either a null pointer, or a
! stdio stream to write any debug output to.  @var{verbose} is the
! verbose level provided by @samp{-fsched-verbose-}@var{n}.  @var{ready}
! is a pointer to the ready list of instructions that are ready to be
! scheduled.  @var{n_ready} is the number of elements in the ready list.
! The scheduler reads the ready list in reverse order, starting with
  @var{ready}[@var{n_ready}-1] and going to @var{ready}[0].  @var{clock}
  is the timer tick of the scheduler.  @var{can_issue_more} is an output
! parameter that is set to the number of insns that can issue this
! clock; normally this is just @code{issue_rate}.  See also
! @samp{MD_SCHED_REORDER2}.
! 
! The first macro is used only for the old pipeline description.  The
! second one is used only for the automaton based pipeline description.
  
  @findex MD_SCHED_REORDER2
  @item MD_SCHED_REORDER2 (@var{file}, @var{verbose}, @var{ready}, @var{n_ready}, @var{clock}, @var{can_issue_more})
*************** Defining this macro can be useful if the
*** 8394,8399 ****
--- 8422,8429 ----
  scheduling one insn causes other insns to become ready in the same cycle,
  these other insns can then be taken into account properly.
  
+ This macro is used only for the old pipeline description.
+ 
  @findex MD_SCHED_VARIABLE_ISSUE
  @item MD_SCHED_VARIABLE_ISSUE (@var{file}, @var{verbose}, @var{insn}, @var{more})
  A C statement which is executed by the scheduler after it
*************** is the verbose level provided by @samp{-
*** 8404,8409 ****
--- 8434,8525 ----
  number of instructions that can be issued in the current cycle.  The
  @samp{MD_SCHED_VARIABLE_ISSUE} macro is responsible for updating the
  value of @var{more} (typically by @var{more}--).
+ 
+ This macro is used only for the old pipeline description.
+ 
+ @findex DFA_SCHEDULER_PRE_CYCLE_INSN
+ @findex DFA_SCHEDULER_POST_CYCLE_INSN
+ @item DFA_SCHEDULER_PRE_CYCLE_INSN
+ @itemx DFA_SCHEDULER_POST_CYCLE_INSN
+ C statements which return an RTL insn.  The automaton state used in
+ the pipeline hazard recognizer is changed as if the insn were
+ scheduled when the new simulated processor cycle correspondingly
+ starts and finishes.  Usage of the macros may simplify the automaton
+ pipeline description for some @acronym{VLIW} processors.  If the
+ macros are defined, they are used only for the automaton based
+ pipeline description.
+ 
+ @findex FIRST_CYCLE_MULTIPASS_SCHEDULING
+ @item FIRST_CYCLE_MULTIPASS_SCHEDULING
+ @cindex multi-pass scheduling
+ This macro controls better choosing an insn from the ready insn queue
+ for the @acronym{DFA}-based insn scheduler.  Usually the scheduler
+ chooses the first insn from the queue.  If
+ @samp{FIRST_CYCLE_MULTIPASS_SCHEDULING} is not zero, an additional
+ scheduler code tries all permutations of
+ @samp{FIRST_CYCLE_MULTIPASS_SCHEDULING_LOOKAHEAD} subsequent ready
+ insns to choose an insn whose issue will result in maximal number of
+ issued insns on the same cycle.  For the @acronym{VLIW} processor, the
+ code could actually solve the problem of packing simple insns into the
+ @acronym{VLIW} insn.  Of course, if the rules of @acronym{VLIW}
+ packing are described in the automaton.
+ 
+ This code also could be used for superscalar @acronym{RISC}
+ processors.  Let us consider a superscalar @acronym{RISC} processor
+ with 3 pipelines.  Some insns can be executed in pipelines @var{A} or
+ @var{B}, some insns can be executed only in pipelines @var{B} or
+ @var{C}, and one insn can be executed in pipeline @var{B}.  The
+ processor may issue the 1st insn into @var{A} and the 2nd one into
+ @var{B}.  In this case, the 3rd insn will wait for freeing @var{B}
+ until the next cycle.  If the scheduler issues the 3rd insn the first,
+ the processor could issue all 3 insns per cycle.
+ 
+ Actually this code demonstrates advantages of the automaton based
+ pipeline hazard recognizer.  We try quickly and easy many insn
+ schedules to choose the best one.
+ 
+ The default value of the macro is zero.
+ 
+ @findex FIRST_CYCLE_MULTIPASS_SCHEDULING_LOOKAHEAD
+ @item FIRST_CYCLE_MULTIPASS_SCHEDULING_LOOKAHEAD
+ See description of @samp{FIRST_CYCLE_MULTIPASS_SCHEDULING}.  The
+ default value of the macro is zero.  Actually this means no multi-pass
+ scheduling.
+ 
+ @findex INIT_SCHEDULER_BUBBLES
+ @item INIT_SCHEDULER_BUBBLES ()
+ The @acronym{DFA}-based scheduler could take the insertion of nop
+ operations for better insn scheduling into account.  It can be done
+ only if multi-pass insn scheduling works (see macro
+ @samp{FIRST_CYCLE_MULTIPASS_SCHEDULING}).
+ 
+ Let us consider a @acronym{VLIW} processor insn with 3 slots.  Each
+ insn can be placed only in one of the three slots.  We have 3 ready
+ insns @var{A}, @var{B}, and @var{C}.  @var{A} and @var{C} can be
+ placed only in the 1st slot, @var{B} can be placed only in the 3rd
+ slot.  We described the automaton which does not permit empty slot
+ gaps between insns (usually such description is simpler).  Without
+ this code the scheduler would place each insn in 3 separate
+ @acronym{VLIW} insns.  If the scheduler places a nop insn into the 2nd
+ slot, it could place the 3 insns into 2 @acronym{VLIW} insns.  What is
+ the nop insn is defined by macro @samp{SCHEDULER_BUBBLE}.  If macro
+ @samp{INIT_SCHEDULER_BUBBLES} is defined it can be used to initialize
+ or create the nop insns.
+ 
+ You should remember that the scheduler does not insert the nop insns.
+ It is not wise because of the following optimizations.  The scheduler
+ only considers such possibility to improve the result schedule.  The
+ nop insns should be inserted lately, e.g. on the final phase.
+ 
+ @findex SCHEDULER_BUBBLE
+ @item SCHEDULER_BUBBLE (@var{index})
+ If this macro and macro @samp{FIRST_CYCLE_MULTIPASS_SCHEDULING} are
+ defined, the @acronym{DFA}-based scheduler could take the insertion of
+ nop operations for better insn scheduling into account (see also
+ description of macro @samp{INIT_SCHEDULER_BUBBLES}).  This macro
+ returns a nop insn with given @var{index}.  The indexes start with
+ zero.  The macro should return NULL if there are no more nop insns
+ with indexes greater than given index.
  
  @findex MAX_INTEGER_COMPUTATION_MODE
  @item MAX_INTEGER_COMPUTATION_MODE
Index: contrib.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/contrib.texi,v
retrieving revision 1.5
diff -c -p -r1.5 contrib.texi
*** contrib.texi	2001/06/13 15:15:24	1.5
--- contrib.texi	2001/06/14 17:20:21
*************** Andrew MacLeod for his ongoing work in b
*** 313,321 ****
  various code generation improvements, work on the global optimizer, etc.
  
  @item
! Vladimir Makarov for hacking some ugly i960 problems, PowerPC
! hacking improvements to compile-time performance and overall knowledge
! and direction in the area of instruction scheduling.
  
  @item
  Bob Manson for his behind the scenes work on dejagnu.
--- 313,322 ----
  various code generation improvements, work on the global optimizer, etc.
  
  @item
! Vladimir Makarov for hacking some ugly i960 problems, PowerPC hacking
! improvements to compile-time performance, overall knowledge and
! direction in the area of instruction scheduling, and design and
! implementation of the automaton based instruction scheduler.
  
  @item
  Bob Manson for his behind the scenes work on dejagnu.
Index: gcc.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/gcc.texi,v
retrieving revision 1.8
diff -c -p -r1.8 gcc.texi
*** gcc.texi	2001/06/12 22:40:00	1.8
--- gcc.texi	2001/06/14 17:20:22
*************** Several passes use instruction attribute
*** 3611,3618 ****
  attributes defined for a particular machine is in file
  @file{insn-attr.h}, which is generated from the machine description by
  the program @file{genattr}.  The file @file{insn-attrtab.c} contains
! subroutines to obtain the attribute values for insns.  It is generated
! from the machine description by the program @file{genattrtab}.@refill
  @end itemize
  @end ifset
  
--- 3611,3620 ----
  attributes defined for a particular machine is in file
  @file{insn-attr.h}, which is generated from the machine description by
  the program @file{genattr}.  The file @file{insn-attrtab.c} contains
! subroutines to obtain the attribute values for insns and information
! about processor pipeline characteristics for the instruction scheduler.
! It is generated from the machine description by the program
! @file{genattrtab}.@refill
  @end itemize
  @end ifset
Follow-Ups:
- Re: 2nd try for patch for automaton based pipeline hazard recognizer (part #1)
  - From: Neil Booth
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]