= GIMPLE FE: A Gimple Front End = The Gimple FE project, that started as part of Google Summer of code 2010, is aimed at converting the GIMPLE IR to a fixed format textual representation of GIMPLE tuples and to develop a front end which can recognize this textual GIMPLE and parse it. The essence of the project is to enable a user to write the Gimple IR as text (in the form of code) and be able to compile it. == Code Repository == The project is being implemented in the {{{gimple-front-end}}} SVN branch ([[http://gcc.gnu.org/viewcvs/branches/gimple-front-end/|browse]]). To get the latest version: {{{$ svn co svn://gcc.gnu.org/svn/gcc/branches/gimple-front-end}}} The branch is maintained by Diego Novillo < dnovillo@google.com >. The usual rules for contributing to branches apply to this branch: 1. Messages and patches to the lists should have their subject prefixed with {{{[gimplefe]}}}. 1. ChangeLog entries should be written to {{{ChangeLog.gimplefe}}}. == Motivation == GCC internally uses GIMPLE. The compile sequence converts text (source code) into GIMPLE and the Gimple is then lowered to RTL. Internally, GIMPLE is structured in the forms of tuples. A C-like representation of GIMPLE can be dumped using the compiler flag -fdump-tree-gimple and in the tuple form using the flag -fdump-tree-gimple-raw. During development or debugging, GIMPLE is often dumped this way. Most of the GCC development is done in the middle end or the back end of GCC. The compilation sequence is then reexecuted if any change to the source code is made. This repeats the execution of those steps in the compilation sequence - the language front ends - which are not directly related to the middle end. Most of the changes during development are also changes directly related to GIMPLE. Thus, an alternative to remove the redundancy of reexecution is to have the ability to start and stop compilation from an arbitrary point in the compilation process. Part of that is to start the compilation from the middle end itself. Thus, the Gimple FE is aimed to provide a front end that will compile GIMPLE. == Relevance == Such a feature will be a valuable addition to the compiler for the following reasons: * It allows to treat GIMPLE as a type of source code and to pass it to the next phase in compilation (optimizers). A user can write GIMPLE directly into a file. Thus, it can be helpful for testing and maintainability. Especially in relation to the LTO FE, it can ease the process of Unit testing. Since the bytestream will be converted into textual GIMPLE the developers will have a much simpler interface to test with textual GIMPLE rather than deal with the cumbersome bytestream. * By the added interface, the development process of GCC internals can be faster. * The development of the GIMPLE FE will strengthen the cause of achieving modularization in GCC. * In the long run, it might be suitable for the developers to apply their optimizations a lot better. It might be possible that because of a simple way of compiling GIMPLE directly, the developers come up with newer optimizations altogether in the future. == Gimple Grammar == The syntax of GIMPLE tuples is kept similar to the format given in gcc/gimple.def. The tuple format is extended to include types and declarations. The structure and the syntax of types and declarations can be represented as follows: === A) Global Declaration Section === ==== A.1) Type Section ==== ===== a ) Explicit Types: ===== . This section consists of the descriptions of the types that are defined in the source files. Since they are clearly defined in some of the source files they are categorized as ''Explicit types ''. For Ex: '''struct ''', '''union '''and '''enum '''can be laid out in the form of tuples here. The format for these tuples are: * Representation of a '''struct ''' . {{{ > }}} * Representation of a '''union ''' . {{{ > }}} Here each field is a Field_Decl or a Var_Decl and can itself be laid out as explained later. * Representation of an '''enum''' . {{{ > }}} Here each field is actually expanded as, field = < name,value >. ===== b ) Implicit types ===== . Although, these do not find a direct entry in the type section they are a part of Decl (declaration) tuples and are nested inside them. * Representation of the basic types. . {{{ > }}} <
> {{{ > }}} <
> {{{ > }}} <
> {{{ }}} <
> Note: For the basic types the size is important because it serves to distinguish types like {{{ int }}} and {{{ char }}} etc. <
> . {{{ char }}} is equivalent to {{{ > }}} while {{{ int }}} is equivalent to {{{ > }}} * Representation of aggregate types. . These are types which can be given an entry in the type section but it is better to have them nested in the Decl tuples. {{{ > }}} <
> . The min_val and max_val can give the minimum and maximum value of the index of the array. This is a good indication of the size of the array (Can be obtained from the TYPE_DOMAIN macro). Here, the type is the base type of the array. <
> For Ex. * There is an array of integers, say, {{{ int A[20]; }}} then we have in gimple, {{{ >>> }}} * There is an array of records, say, {{{ struct some_struct A[20] }}} then we have in gimple, {{{ >>> }}} Note: In this case, there is no need to expand the complete tuple of Record. Its name will be the right indicator. * Representation of pointer types. . These are again not separate independent tuples but are instead nested in the Decl tuples. . {{{ > }}} <
> Here, the type of the pointer is the type to which it points. <
> For Ex. * There is a character pointer, say, {{{ char *P; }}} then we have in gimple, {{{ >>> }}} * There is a pointer to a record, say, {{{ struct some_struct *P; }}} then we have in gimple, {{{ >>> }}} ===== c) Ignored Types ===== . Finally, there are a few types which have been ignored for now completely. These are, * Fixed_point_type * Reference_type * Complex_type * Vector_type * Function_type * Method_type * lang_type . Some of these will be easy to handle since they are analogous to the earlier types. ==== A.2) Decl Section ==== . This section deals with the declared variables and functions. * Representation of variable declarations {{{ >>> }}} <
> The categories of the data type and their corresponding syntax for gimple statements are as follows: * Integer or character . {{{ , readonly=>>> }}} For instance, {{{ int a; >>> unsigned long b; >>> const char c; >>> }}} * Single and double precision floating point . {{{ , readonly=>>>> }}} . For instance, {{{ float a; >>> double b; >>> }}} * Structure and Union . {{{ >>> >>> }}} For instance, {{{ struct some_struct S; >>> union some_union U; >>> }}} * Array . {{{ , size>>> }}} For instance, {{{ int arr[10]; >, 10>>> }}} * Pointer . {{{ >>> }}} For instance, {{{ float *p; >>>> }}} * Representation of function declarations. . {{{ > }}} Here, the parms are Parm_decls while result is a Result_decl having the format respectively as : <
> {{{ > }}} <
> {{{ > }}} <
> For ex: * Suppose the Function declaration reads, {{{ float division (int dividend, int divisor); }}} then we have in gimple, {{{ >>>,>>>,>>>>> }}} * Some implicit decls * Parm_decl (handled in Function_decl) * Result_decl (handled in Function_decl) * Some decls which have not been handled yet. * Const_decl * Label_decl * Type_decl === B) Function body === This section itself is divided as: * Local declaration section * Multiple basic blocks in a function body. The local declaration section will be same as the global declaration section. The Gimple statements in the multiple basic blocks can be laid out in the tuple format as per the tuple descriptions given in gimple.def . Note: This is just a preliminary grammar. The missing bits will be added as we go on. == Current Implementation State == The entire code can be found in gcc/gimple directory. The file parser.c in this directory consists of the entire functional logic of the parser. == TODO == 1. Currently the Gimple FE reads the tokens and interprets them. The types of tokens are obtained from the CPP_ttypes table. It should be possible now to implement a seperate gimple_ttypes table that the preprocessor can directly use for Gimple FE. This will make the current code in gcc/gimple/parser.c more compact and clean. 2. The Gimple FE relies on string comparisions a lot. Moreover, as we add new components to the front end the comparisions will increase. There needs to be something to do this in a clean way. 3. Extend the recognizer with error handling and error recovery. == Contact == Sandeep Soni ( sandeep@gcc.gnu.org ) Diego Novillo ( dnovillo@google.com )