[PATCH 0/4] RFC: RTL frontend

Mon May 16 22:42:00 GMT 2016

On 05/04/2016 02:49 PM, David Malcolm wrote:
>
> * The existing RTL code is structured around a single function being
>   optimized, so, as a simplification, the RTL frontend can only handle
>   one function per input file.  Also, the dump format currently uses
>   comments to separate functions::
>
>     ;; Function test_1 (test_1, funcdef_no=0, decl_uid=1758, cgraph_uid=0, symbol_order=0)
ISTM we can fix this by adding more true structure to the RTL dump. 
IMHO we have the freedom to extend the RTL dumper to make it easier to 
read the RTL dumps in for this kind of work.

>
>     ... various pass-specific things, sometimes expressed as comments,
>     sometimes not
Which seems like a bug to me.

>
>     ;;
>     ;; Full RTL generated for this function:
>     ;;
>     (note 1 0 6 NOTE_INSN_DELETED)
>     ;; etc, insns for function "test_1" go here
>     (insn 27 26 0 6 (use (reg/i:SI 0 ax)) ../../src/gcc/testsuite/rtl.dg/test.c:7 -1
>          (nil))
>
>     ;; Function test_2 (test_2, funcdef_no=1, decl_uid=1765, cgraph_uid=1, symbol_order=1)
>     ... various pass-specific things, sometimes expressed as comments,
>     sometimes not
>     ;;
>     ;; Full RTL generated for this function:
>     ;;
>     (note 1 0 5 NOTE_INSN_DELETED)
>     ;; etc, insns for function "test_2" go here
>     (insn 59 58 0 8 (use (reg/i:SF 21 xmm0)) ../../src/gcc/testsuite/rtl.dg/test.c:31 -1
>          (nil))
>
>   so that there's no clear separation of the instructions between the
>   two functions (and no metadata e.g. function names).
>
>   This could be fixed by adding a new clause to the dump e.g.::
Which would seem like a good idea to me.

>
> * Similarly, there are no types beyond the built-in ones; all expressions
>   are treated as being of type int.  I suspect that this approach
>   will be too simplistic when it comes to e.g. aliasing.
Well, we have pointers back to the tree IL for this kind of thing, but 
it's far from ideal because of the lack of separation that implies.

I wouldn't lose a ton of sleep if we punted this for a while, perhaps 
just dumping the alias set splay tree so we can at least carry that 
information around.

>
> * There's no support for running more than one pass; fixing this would
>   require being able to run passes from a certain point onwards.
I think that's OK at this stage.

>
> * Roundtripping of recognized instructions may be an issue (i.e. those
>   with INSN_CODE != -1), such as the "667 {jump}" in the following::
>
>     (jump_insn 50 49 51 10
>       (set (pc)
>            (label_ref:DI 59)) ../../src/test-switch.c:18 667 {jump}
>            (nil) -> 59)
>
>   since the integer ID can change when the .md files are changed
>   (and the associated pattern name is very much target-specific).
>   It may be best to reset them to -1 in the input files (and delete the
>   operation name), giving::
Just ignore the index and the pretty name.  When you're done reading the 
file, call recog on each insn to get that information filled in.

>
>     (jump_insn 50 49 51 10
>       (set (pc)
>            (label_ref:DI 59)) ../../src/test-switch.c:18 -1
>            (nil) -> 59)
>
> * Currently there's no explicit CFG edge information in the dumps.
>   The rtl1 frontend reconstructs the edges based on jump instructions.
>   As I understand the distinction between cfgrtl and cfglayout modes
>   https://gcc.gnu.org/wiki/cfglayout_mode , this is OK for "cfgrtl" mode,
>   but isn't going to work for "cfglayout" mode - in the latter,
>   unconditional jumps are represented purely by edges in the CFG, and this
>   information isn't currently present in the dumps  (perhaps we could add it
>   if it's an issue).
We could either add the CFG information or you could extract it from the 
guts of the RTL you read.  The former leads to the possibility of an 
inconsistent view of the CFG.  The latter is more up-front work and has 
to deal with the differences between cfgrtl and cfglayout modes.

>
> Open Questions
> **************
>
> * Register numbering: consider this fragment of RTL emitted during
>   expansion::
>
>     (reg/f:DI 82 virtual-stack-vars)
>
>   At the time of emission, register 82 is the VIRTUAL_STACK_VARS_REGNUM,
>   and this value is effectively hardcoded into the dump.  Presumably this
>   is baking in assumptions about the target into the test.  Also, how likely is
>   this value to change?  When we reload the dump, should we notice that this
>   is tagged with virtual-stack-vars and override the specific register
>   number to use the current value of VIRTUAL_STACK_VARS_REGNUM on the
>   target rtl1 was built for?
Those change semi-regularly.  Essentially anytime a new version of the 
ISA shows up with new register #s.

My instinct is to drop raw numbers and just output them symbolicly.  We 
can map them back into the hard register numbers easy enough.  We would 
want to use some magic to identify pseudo regs.  P1...PN in the dumps 
which we'd map to FIRST_PSEUDO_REGISTER+N when we read the file in.

Jeff