This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)


On 05/13/2013 02:16 PM, Steve Ellcey wrote:
Here is the latest version of my SSA optimization pass to do the switch
statement optimization described in PR 54742 (core_state_transition from
coremark).

I have tested this optimization with a x86 bootstrap and GCC test run
with no errors and tested the MIPS cross compiler with no errors.
Because of that I decided to submit it as a statically linked
optimization pass instead of a dynamically loaded one, though I did keep
the ifdefs for using it as a dynamic pass in the code.  They could be
removed if this patch is approved as a statically linked pass.  Also,
while this patch shows the optimization only being turned on with the
-ftree-switch-shortcut flag, my bootstrap and testing had it turned on
for all -O2 optimizations in order to maximize the testing.
We may want to turn this on for -O3 and/or for
-fexpensive-optimizations.

I had to make one change to dominance.c in order to avoid some compiler
aborts where it was dereferencing a null pointer.  I believe this was
happening because I am calling gimple_duplicate_sese_region with regions
that are not really SESE.  Because I am doing this, I regenerate the cfg
and SSA information after each call, but I also had to change
iterate_fix_dominators to fix the abort.  Another way we might want to
fix this would be to pass a flag to gimple_duplicate_sese_region that
tells it whether or not we want it to recalculate the dominance
information at all.  If set to false, it would assume the caller will
take care of it.

Opinions?  OK to checkin?

Steve Ellcey
sellcey@imgtec.com


2013-05-13  Steve Ellcey  <sellcey@imgtec.com>

	PR tree-optimization/54742
	* Makefile.in (OBJS): Add tree-switch-shortcut.o.
	* common.opt (ftree-switch-shortcut): New.
	* dominance.c (iterate_fix_dominators): Add null check.
	* params.def (PARAM_MAX_SWITCH_INSNS): New.
	(PARAM_MAX_SWITCH_PATHS): New.
	* passes.c (init_optimization_passes): Add pass_switch_shortcut.
	* timevar.def (TV_SWITCH_SHORTCUT): New.
	* tree-pass.c (pass_switch_shortcut): New.
	* tree-switch-shortcut.c: New file.
I was looking at this last week (stuck for hours on tarmac at BWI).

I think we should fix this in the threader rather than doing a special purpose pass. This is primarily because we get it for free if we address one limitation in the threader (some of the other issues I was concerned about don't apply).

Specifically if we look at thread_around_empty_block we have:

  /* This block must have a single predecessor (E->dest).  */
  if (!single_pred_p (bb))
    return NULL;

  /* This block must have more than one successor.  */
  if (single_succ_p (bb))
    return NULL;

  /* This block can have no PHI nodes.  This is overly conservative.  */
  if (!gsi_end_p (gsi_start_phis (bb)))
    return NULL;


The test that the block have > 1 successor and no PHI nodes are the keys.

;;   basic block 17, loop depth 1
;;    pred:       9
;;                13
;;                16
;;                15
;;                4
;;                12
;;                7
;;                14
;;                10
;;                5
;;                6
;;                8
;;                11
# state_1 = PHI <0(9), 2(13), 3(16), 3(15), state_36(4), 1(12), 0(7), 2(14), 1(10), 1(5), 0(6), 2(8), 3(11)> # .MEM_4 = PHI <.MEM_29(9), .MEM_20(13), .MEM_24(16), .MEM_29(15), .MEM_29(4), .MEM_29(12), .MEM_12(7), .MEM_29(14), .MEM_16(10), .MEM_29(5), .MEM_29(6), .MEM_29(8), .MEM_29(11)>
<L32>:
  str_25 = str_37 + 1;
  # VUSE <.MEM_4>
  _8 = MEM[base: str_25, offset: 0B];
  if (_8 != 0)
    goto <bb 18>;
  else
    goto <bb 19>;
;;    succ:       18
;;                19

;;   basic block 18, loop depth 1
;;    pred:       17
  goto <bb 4>;
;;    succ:       4



bb will be block #18.  In theory we just do
if (single_succ_p (bb))
  bb = single_succ_edge (bb)->dest;

And allow PHI nodes in this specific instance.

That's enough to allow existing code to identify the potential jump threading candidates -- and more importantly, it's much more general.



The updater needs copy bb17 & bb4. bb17' will transfer to bb4' which will directly transfer to the destination of the switch.

The advantage of doing this in the threader is it'll help more than just the switch statement case. It's probably highly likely that we're missing cases to thread through empty blocks with PHIs.

I've started cobbling together some of the updating code. So far it doesn't look too terrible.


Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]