This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RTL definition

From: Abhijat Vichare <amvichare at iitb dot ac dot in>
To: Fran Baena <franbaena at gmail dot com>
Cc: Ian Lance Taylor <iant at google dot com>, Jim Wilson <wilson at tuliptree dot org>, gcc at gcc dot gnu dot org
Date: Wed, 12 Mar 2008 13:37:14 +0530
Subject: Re: RTL definition
References: <m3y78p8iwu.fsf@google.com>
Reply-to: amvichare at iitb dot ac dot in

Hello all,

Maybe I can add a few comments here. This is the way I see the RTL
within GCC. Details are at:

http://www.cfdvs.iitb.ac.in/~amv/gcc-int-docs/

and in particular, I'd like to point to

http://www.cfdvs.iitb.ac.in/~amv/gcc-int-docs/html/gcc-conceptual-structure.html.

(BTW, community feedback requested for the above documents.)

On Tue, 2008-03-11 at 06:55 -0700, Ian Lance Taylor wrote: 
> "Fran Baena" <franbaena@gmail.com> writes:
> 
> >>  By the way, RTL is not really machine-independent.  The data
> >>  structures are machine independent.  But the contents are not.  You
> >>  can not, even in principle, take the RTL generated for one processor
> >>  and compile it on another processor.
> >
> > I thought that RTL represented something close to the target machine,
> > but not machine-dependent. I firstly thought that the output of the
> > middle-end was an RTL machine-independent representation, to which is
> > applied a few low-optimization machine-independent passes, and after
> > that is translated to a RTL machine-dependent to be applied other
> > optimization passes.
> 
> RTL is created using named patterns in the MD file, so even the
> creation process is machine-dependent.  This is most obvious in the
> use of unspec, but it is true in general.

Adding to the above, I think it's best to see the RTL constructs as a
specification language. The semantics of the target instructions are
expressed in this language. RTL expresses machine dependent issues in a
machine independent manner. This process of capturing machine issues
occurs when the machine descriptions are written. Machine descriptions
mainly contain the semantics of each target instruction (the
"define_insn" construct) and information (pattern names) to be used to
associate them to Gimple objects. During a compilation of a program
these target specific descriptions are used directly by the Gimple->RTL
translation phase, and hence the output is a RTL representation that has
captured target semantics. The Gimple->RTL translation does not emit a
machine independent RTL representation. Quite the opposite.

> > I read the rtl.def and rtl.h files, are very interesting, and i better
> > understand the whole process. But reading the output files by debuggin
> > options (-fdump-rtl-all) i have seen instructions like this:
> >
> > (insn 8 6 10 1 (set (mem/c/i:SI (plus:DI (reg/f:DI 54 virtual-stack-vars)
> >                 (const_int -8 [0xfffffffffffffff8])) [0 a+0 S4 A64])
> >         (const_int 1 [0x1])) -1 (nil)
> >     (nil))
> >
> > Among the multiple questions that appears i have a particular one,
> > what does "8 6 10 1" represents? Is it the "print format" defined in
> > rtl.def?
> 
> 8 is in the INSN uid.  6 is the previous INSN uid.  10 is the next
> insn UID.  1 is the number of the basic block holding the insn.  In
> general RTL is printed according to the format in rtl.def.  There are
> a couple of exceptions; one of those exceptions is that field 4 of an
> insn, INSN_LOCATOR, is only printed if it is present.  See line 391 of
> print-rtl.c.

It helps to see the above RTL as the written representation of internal
linear-linked-list-of-RTL-objects. The "(set (mem...))" is an _instance_
of the specification in the <target>.md file. It describes the semantics
of an instruction available on the target. The entire linked list may be
thought of as a machine independent representation of machine specific
instructions, i.e. the ASM syntax has been stripped off. The RTL dump
given above describes this chaining structure: the position of the
current instruction, the (RTL representation) of the previous and next
instructions etc. (as described in the reply above).

General remarks:

I prefer to view the objects in rtl.def as three disjoint subsets: one
set of objects appear exclusively in machine descriptions, another set
of objects appear exclusively in dumps during a compilation run, and the
third set of objects appear in both. The first set of objects are
constructs like define_insn, match_operand etc. The second set of
objects are constructs like insn, jump_insn etc. and the third set of
objects are constructs like set, plus etc. Objects from the first and
the third set are used to write machine descriptions. Objects from the
second and the third set are used to dump the RTL representation of a
compilation run.

It, thus, seems best to think of the RTL (as the gcc community usually
refers to) as two different languages. One language, that we call the
MD-RTL is used to capture target instruction semantics while developing
a machine description <target>.md file, and is made up of constructs
from the first and the third sets from rtl.def. The other, that we call
the IR-RTL is used to _express_ a given compilation in a target specific
manner without the target (asm) syntax, and is made up of constructs
from the second and the third sets from rtl.def. The written form of
both these is lisp like. The MD-RTL has only a written representation,
while the IR-RTL is dumped in written form when so requested during a
particular compilation run. IR-RTL normally has an internal form, as a
linear linked list of objects of type "struct *rtx" (in rtl.h).

Because we can look at MD-RTL and IR-RTL as specification languages, it
is possible to think of their grammar. About two years ago I wrote one
for the machine description system of GCC 3.3.3 (not upgraded to 4.x,
though :( ). The bison code was mostly generated from the rtl.def (and
the rules therein), and was useful to check syntactic correctness of
machine descriptions. The grammar would also useful for writing a
machine descriptions mode for emacs (partially done; please see
http://www.cse.iitb.ac.in/~uday/gcc-workshop/downloads/install-md-mode.sh ).

HTH,

- amv

--
 +---------------------------------------------------------------------+
 | Abhijat M. Vichare       | Email: amvichare@iitb.ac.in              |
 |--------------------------|                                          |
 | CFDVS, IIT Powai,        | WWW:   http://cfdvs.iitb.ac.in/~amv      |
 | Mumbai 400076, INDIA.    |------------------------------------------|
 |                          | The truest perception of ignorance is at |
 |--------------------------| the summit of knowledge.                 |
 | Phone(Off):(22) 2576 8701|                      Keep climbing ...   |
 +---------------------------------------------------------------------+

References:
- Re: RTL definition
  - From: Ian Lance Taylor

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]