  1. Make the tree loop-versioning usable in RTL cfg-layout mode by making it a cfg hook.
  2. Move SMS to use cfg-layout mode.
  3. Use loop information to detect simple loops.
  4. Replace the loop versioning in SMS by using the RTL loop-versioning.
  5. Several other improvements
    1. Check if the SMSed loop kernel is more efficient in means of number of cycles if not undo the changes.
      • We do this by feeding the loop kernel into the DFA and counting the number of cycles before and after SMS - if we didn't improve (there is a chance because of the register copies we add) we prefer the original loop.
    2. Ignore register anti-dependencies - use register copies instead.
    3. Add backtracking to the scheduling algorithm;
      • When failing to find a cycle within a kernel of II cycles for a given node we used to restart the whole process with kernel of II + 1. Now we try to unscheduled some of the nodes that the one we failed on depends on and schedule the failing node first, then try the other nodes.


