PORT MAINTAINER'S GUIDE TO THE DATAFLOW BRANCH
We are expecting to merge the dataflow branch into to the mainline soon.
The merge plan accepted by the GCC Steering Committee specified bootstrap and no testsuite regressions on five targets. By the time we expect to merge, we expect to have exceeded that number by quite a bit, but we will not have tested the branch on some platforms.
The document, is to be a guide to either anyone who wishes to make the platform work before the merge, or anyone who wishes to wait until after the merge.
The dataflow branch has reached a stable enough position to now encourage port maintainers to get involved with making their ports work with this branch. At the time of this writing, the dataflow branch works completely on:
- x86-32 and x86-64 (linux,darwin)
- ppc-32 (linux,darwin)
- ia-64 (linux)
- sparc (solaris)
- spu(linux)
- s390 and s390x (linux)
Work is in progress on the ppc-64, arm and the pa. The three people who are most knowledgable about the state of the port are:
Kenneth Zadeck ( zadeck@naturalbridge.com , irc:zadeck)
Danny Berlin ( dberlin@dberlin.org , irc:dberlin)
Seongbae Park ( seongbae.park@gmail.com , irc:spark).
SPEEDUP AREAS FOR DATAFLOW
bitmaps are slow. patch posted at http://gcc.gnu.org/ml/gcc-patches/2007-03/msg01758.html
- on df-branch, we can move cfglayout past combine. Paolo Bonzini has a patch for it but it has regressions. It is also waiting for the next mainline merge (currently it needs some patches that are in mainline only).
- building du/ud chains uses df_reorganize_refs_by_reg which is a cache killer. (note: fwprop accesses the table in insn order, which is not bad: the guilt is not in df_maybe_reorganize_{def,use}s per se).
regrename forces scanning the entire function again. patch posted at http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01349.html (Kenny said it does not work, but it may be worth trying it on a few architectures to confirm it still does not?)
- Maybe we can make ur and live optional problems. In many cases it might be that we only need lr, despite what some comments say.
ISSUES IN GETTING A PORT TO WORK WITH DATAFLOW
- auto inc and dec instructions: The dataflow branch contains a separate pass for finding auto inc like instructions. This pass is only enabled if the machine supports this kind of instructions so if your architecture does not have these instructions, you may skip this section. In the old system, the finding of auto inc like instructions was done in flow.c as a side effect of the doing the flow analysis. The mechanism used was quite ad hoc and missed many opportunities. The new mechanism is quite powerful and finds many cases which the older version missed. In the case of the post increment forms, the new code makes some minor improvements but does not find any new categories of matches, so there have been fewer merging issues. The story is quite different for the pre increment forms. Here there was not code to find pre modify forms at all and many cases were missing from the simpler pre increment forms. In the case of the ppc (aka rs6000), the back end was not ready for the full set of operations and a modest amount of machine description hacking was required to actually make this work. In most cases the code was there but since the flow.c solution never generated these forms, this code was never exercised and thus did not work. It would not be surprising to find that other machines that support pre increment are in the same position as the ppc. Additionally, the new pass only does the transformation if it is profitable to do so given the standard cost modeling mechanisms in the backends. In the old code, this transformation was done unconditionally. If the machine does not have it's costs properly set, there may be a performance regression because the transformations are not made. Currently there is still one problem with this code. Reload does not know how to properly handle a fully general pre-modify instruction. This will be fixed shortly.
- The dataflow processing is based on a separate and abstract representation of each insn. Rather that rescanning every insn in the function at the beginning of each pass, the dataflow representation is keep up to date as the insns are added, deleted or changed. There are four places where this effects the backends:
- Insn manipulation: Most of the core RTL passes use the validate_change/apply_change_group api when changing insns. The implementation of this api has been enhanced to call the proper df insn rescanning functions when insns are modified. However it is common for the backends to ignore this and just modify the insn directly. To keep the dataflow processing current, the back ends must be modified to either call df_insn_rescan directly or to use the validate_change/apply_change_group apis.
- As a special case of (a), the REGNO macro, will now only work on the RHS. If it is used on the LHS, it will generate a syntax error. We tried to find all occurrences of REGNO on the LHS, but it is possible that we missed some. Any places that do use the REGNO on the LHS should be changed to SET_REGNO. This calls the appropriate dataflow routine to update the effected part of the insn(s). So it is not necessary to call df_insn_rescan after a call to SET_REGNO unless other changes have been made to the insn.
- Insn sequences: There is an api in emit-rtl.c for adding to the stream of insns. These calls should be used rather than inserting the sequence directly. Those calls in emit-rtl.c have been enhanced to rescan insns as they are added to the stream. Likewise, delete_insn should be used to delete any insn.
- REG_EQUAL and REG_EQUIV notes must be added using set_unique_reg_note and deleted using remove_note. These calls have been modified to keep the dataflow info up to date. Steven Bossher just committed a patch to do this. However, it is possible that some calls were missed or done improperly since the patch was only visually checked. One place that is not automatically covered by Steven's patch for (d) is where the notes are moved from one insn to another. These require manual insertion of a call to df_notes_rescan. There are likely to be more of these cases out there because they are difficult to mechanically search for.
- Unconnected code. It seems to have been common practice to put in the parts of a computation that define some register in some pass and the parts that use the register in another pass. Such a practice depends on the dataflow analysis not running between the passes or being too dumb to catch the fact that dangling code was there. The dataflow branch version now runs more often and it is very precise. If the def side is inserted first, it is now quite likely that the new dead code eliminators (dce) will see that it is not connected to anything and remove it before the use side is inserted. If the use side is inserted first, the dataflow will notice that uninitialized variables are being referenced and do bad things (the kind of bad thing varies with the passes that are run, but they are generally bad. These problems are difficult to track down. The dce and dead store elimination (dse) passes do a good job of logging what they have deleted so a good place to look is in the logs for these passes to see code that you do not expect to be deleted. The unfortunate side is that dce is many times run as a side effect of calling df_analyze so there are a lot of places to check. The dataflow branch provides a lot more freedom to reorder the passes in the back end, but this freedom comes at a price that each pass must keep the code stream legal. Several of the ports have had to be modified to accomodate this.
- Access to dataflow information. On the trunk, flow information for a basic block is accessed in the global_live_at_{start,end}. This information is produced by a backwards propagation. Almost all of these accesses have been replaced with calls to DF_LIVE_{IN,OUT} which is different in two ways: (a) is is more accurate and (b) it is the AND of two dataflow problems. The (a) is a result of a large number of bug fixes, correction of places where we discovered that flow.c was just too conservative and finally, flow.c just not being called enough to keep the information accurate. The (b) is because DF_LIVE is set as the logical and of two dataflow problems. The first is the same backwards propagation from the uses; the second is a forwards propagation from the defs. Thus, a point is only live in the program if it can both reach a use and be reached from a definition. For the vast majority, replacing the global_live_at with DF_LIVE has been correct. There are a few places, like reg_stack.c that depend on only the backwards problem. The backwards problem is available with the DF_LR_{IN,OUT} macros. While it is unlikely, there may be places in the ports where our change to DF_LIVE must be changed to DF_LR.
- REGS_EVER_LIVE: This is certainly one of the most problematic datastructures touched by the dataflow branch if not the most problematic datastructure in the compiler. The problem is that the datastructure should really be named REGS_POSSIBLY_LIVE_OR_THAT_MAY_BECOME_LIVE_OR_WERE_LIVE_SOME_TIME_IN_THE_PAST_OR_THAT_WE_FANTISIZE_ABOUT_BEING_LIVE. On the surface, regs_ever_live was supposed to reflect the hard registers that are used in the current function. In practice, this data structure does not reflect that some optimization may have removed the instructions that at some previous time caused the bit to be set, and there are many places in the compiler that assume that the various entries can just be changed without reguard to what the code stream contains. Flow.c would periodicly set some of the fields in this structure based on finding insns that used the register. Ports would also reset regs_ever_live at various times. It has been impossible to actually replicate all of the changes to this datastructure on the dataflow branch. We are close and have no intention to try to get closer since every port expects something different. In the short term we have encapsulated the regs_ever_live array with df_regs_ever_live_p () and df_set_regs_ever_live() to make it easy to distinguish between setters and getters. The medium term plan is to replace regs_ever_live with df_hard_reg_used_count(reg) and df_hard_reg_used_p(reg). Both are based on an array that accurately reflects exactly how many uses and defs there are of a given hard register in the insn stream. Unless the rescanning has been deferred, this count is updated with ever change to any insn. There are no functions to directly set this array outside of the dataflow scanning. We have noticed that many ports depend on being able to manipulate regs_ever_live directly and we strongly suggest that any changes that are made be made to use df_hard_reg_used_count(reg) and df_hard_reg_used_p(reg). It is in general a much better practice to insert a USE or an UNSPEC into the insn stream that uses or defines the register than depend on regs_ever_live not being changed by some third party. As mentioned above, one of the problems addressed by the dataflow port is the difficulty of reordering the phases of the back end. Removing this kind of usage of regs_ever_live will go a long way to enhancing that process.