trivial RTL fast_dce speedup
Richard Guenther
rguenther@suse.de
Fri Jun 4 11:46:00 GMT 2010
On Fri, 4 Jun 2010, Jan Hubicka wrote:
> Hi,
> with my local patch to partially inline bitmap_bit_p and bitmap_set
> so one can see better where the time is going (they are good on collecting
> cache misses right now) I get the following profile on cc1 build with LTO
>
> 54122 3.7557 lto1 htab_find_slot_with_hash
> 31179 2.1636 lto1 bitmap_ior_into
> 20588 1.4287 lto1 fast_dce.463190.constprop.674
> 20432 1.4178 lto1 bitmap_set_bit_1
> 19827 1.3759 lto1 df_note_compute.40936
> 18915 1.3126 lto1 bitmap_ior_and_compl
> 18269 1.2677 lto1 df_worklist_dataflow
> 16676 1.1572 lto1 bitmap_copy
> 16560 1.1492 lto1 bitmap_and
> 14233 0.9877 lto1 bitmap_bit_p_1
> 13698 0.9506 lto1 bitmap_and_into
> 12530 0.8695 lto1 htab_find_with_hash
> 12373 0.8586 lto1 record_reg_classes.102576.constprop.811.3366
> 12307 0.8540 lto1 ggc_alloc_stat
> 11962 0.8301 lto1 bitmap_clear_bit
> 10863 0.7538 lto1 vt_find_locations.357140.3452
> 10762 0.7468 lto1 execute_function_todo.149789.4241
> 10495 0.7283 lto1 sorted_array_from_bitmap_set.306751.4292
> 10307 0.7152 lto1 walk_tree_1.constprop.778
> 9854 0.6838 lto1 lto_input_tree
>
> The following patch reduce fast_dce.463190.constprop.674 contribution
> by about 1/3rd
>
> 17105 1.2949 bitmap_ior_and_compl
> 16745 1.2676 df_worklist_dataflow
> 14965 1.1329 bitmap_copy
> 14910 1.1287 bitmap_and
> 14352 1.0865 fast_dce.463190.constprop.674
> 13234 1.0018 bitmap_bit_p_1
> 12599 0.9538 bitmap_and_into
> 12022 0.9101 htab_find_with_hash
>
> The change is simple, we do not scan trivially needed instructions if they are needed
> because of liveness and avoid multiple acceses to marked array.
> Currently WHOPR cc1 build is 1m46 of real time and 10m57 of cpu time for me.
>
> Without bitmap inlining patch (included for anyone who is interested) the time
> is dominated by bitmap_bit_p and bitmap_set. I think it is mostly cache
> mispredict since all the time goes in both profiles to first header check.
> Will try to cure some of those by aviding some indirection in df datastructures
> by embedding bitmap_header structures so we get similar layout to what former
> flow.c had.
>
> Also the large time spent in fast_dce seems bogus. There can be two reasons
> for this. First the algorihm is not really fast - it is a dataflow algorithm
> that in each iteration invoke dataflow, so its time is D(CFG)*O(dataflow solver
> of CFG). THere is DU/UD based DCE too that does not have this wrost case
> issue. Yet this DCE is executed only post reload and I would not expect too
> many loop carried dead regs to be there, so it should work fast. THe reason
> for overly large time spent there might be also invocation from crossjumping.
> Will experiment with this later today.
>
> Bootstrapped/regtested x86_64-linux, OK?
Ok.
Thanks,
Richard.
> Honza
>
> * dce.c (dce_process_block): Do not re-scan already marked instructions.
>
> Index: dce.c
> ===================================================================
> --- dce.c (revision 160215)
> +++ dce.c (working copy)
> @@ -904,27 +904,26 @@
> FOR_BB_INSNS_REVERSE (bb, insn)
> if (INSN_P (insn))
> {
> - bool needed = false;
> + bool needed = marked_insn_p (insn);
>
> /* The insn is needed if there is someone who uses the output. */
> - for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
> - if (bitmap_bit_p (local_live, DF_REF_REGNO (*def_rec))
> - || bitmap_bit_p (au, DF_REF_REGNO (*def_rec)))
> - {
> - needed = true;
> - break;
> - }
> + if (!needed)
> + for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
> + if (bitmap_bit_p (local_live, DF_REF_REGNO (*def_rec))
> + || bitmap_bit_p (au, DF_REF_REGNO (*def_rec)))
> + {
> + needed = true;
> + mark_insn (insn, true);
> + break;
> + }
>
> - if (needed)
> - mark_insn (insn, true);
> -
> /* No matter if the instruction is needed or not, we remove
> any regno in the defs from the live set. */
> df_simulate_defs (insn, local_live);
>
> /* On the other hand, we do not allow the dead uses to set
> anything in local_live. */
> - if (marked_insn_p (insn))
> + if (needed)
> df_simulate_uses (insn, local_live);
> }
>
>
>
--
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
More information about the Gcc-patches
mailing list