This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, contrary to my mail volume level I'm not dead. Yet. Over the last months I was working on the allocator some more, and this is the current version on my HD. I wanted to commit it already multiple times, but sometimes I first wanted to "just" fix "that" small annoyance, and one week later I realized the whole thing is inconsistent again. That last thing was, that DSL wasn't working the last weeks for me, and before it now breaks again, I submit it, how unfinished and ugly it may be. In fact it's not that ugly, only the work of the last week or so is. It currently contains some dead code, which I'm not yet sure if it really should be replaced by the replacement now done or not, so I commit that too for reference. Of course the positive (or negative) thing when working on something but not committing it is, that the next version tends to have changed really much, and ergo might be entertaining to someone else; like a new toy (in fact I wanted to committ it on christmas, but DSL wasn't working). So the patch is a heavy change, but contains some nice goodies: - incremental building of conflict graph: after emitting spill code the conflict graph is not completely rebuilt from scratch, but instead the portion not connected to spilled webs is left intact, as is most other information from the current round (like the ref chains from df.*, or coloring state of webs). This is the fragile part. - general improvements to graph building: when a regno isn't mentioned in a block, but a use of it flows into it, there is no use in looking at every insn in that block. The use simply conflicts with every def in the block. Those use-transparent blocks are used for fast skipping of large portions of a live range. This and the above incremental building make the build_web_parts_and_conflicts() thing take around 2-3% of compiler time (with a profiled build), less than df_analyse() or e.g. calculate_global_regs_live() alone. - rematerialization: It now detect rematerializable webs better. Currently only webs not depending on other pseudo registers are dealt with, but otherwise everything stable or constant or self-created stack-slots is considered rematerializable. To also deal with webs depending on pseudos different things are still needed. E.g. it doesn't every time make sense for (p1 <= p2 + 2) to have p1 rematerializable if p1 and p2 do _not_ conflict. Although we then are sure, that p2 doesn't change it's contents while p1 is live, we would extend p2's livetime if we add another use for it in p1's range. Given that p1 was spilled for a reason this often is no good idea. A better check is to test, that p1 and p2 _do_ conflict, but there is no def of p2 in p1's range. Then p2 is at least also partially live while p1 is, and added uses of p2 wouldn't extend p2's livetime as much (in fact we would test just before emitting the rematerialization, if p2 is currently live, and if not just use a stack-load). Later. But it already is a huge win. - recoloring of spills: This pass tries to make the overall spill-cost of the completely colored graph smaller, by trying to recolor spilled webs. For that either the neighbors need to be recolored or spilled. If by that method a color can be made free (and if for that spilling was necessary, the overall spill-cost is smaller than the cost of the web in question) we can color the spilled web. This often changes no webs at all, but sometimes the win is huge. - spill propagation: If two spilled webs are connected by copies over a colored web, and all three webs don't conflict, it might be better to just spill also the colored web, and then coalesce all spill-locations. This saves some needless loads/store together with spill_coalescing (which still is done). Currently I disabled it because the implmentation was thrown together in half an hour, and sucks, and is slow, while the effect is small (partly due to wrong cost measurements) If some webs were spilled by that method, other spilled webs might become colorable, which currently isn't checked. - interference region spilling: this is not the same as in Bergners thesis, but similar, and the idea comes from that. Spilled nodes also are assigned a color and while emitting loads a local live analysis is done to detect how and when that color gets used by a non-spill web. When that happens the interference region of the spilled web is entered, and a reload after it is necessary. As long as that color isn't in use no loads need to be emitted (it must become used at one point otherwise it would be a free color for that web, which means it wouldn't have to be spilled). In difference to Bergner I do not consider spilled webs as opening IR's for other spilled webs (i.e. they do not make the color allocated when they become live). Another difference is, that I consider each basic block entry an IR boundary, i.e. emit all still active reloads (read Bergners thesis to know what I mean). Otherwise much dead code is emitted which would need to be optimized away (and relocated to more infrequent locations) before the next round, as it would take up register resources needlessly. A further thing to look out for, is to make sure, the next round all webs get the same color as the current round (if possible), as for that the colors were choosen for spill webs, which influenced where the spill code is emitted. A new method I implemented last week, is to not stop at BB boundaries, but only at labels, which serves a similar purpose (creating webs which still are covering more than one BB), but without creating any dead code. Again a win. - optimistic+ coalescing: iterated coalescing is still supported, but inferiour (set flag_ra_optimistic_coalesce to 0). This aggressively combines all combinable webs (first those connected by moves, afterwards those only related by being used as inputs and outputs in the same insn (called extended coalescing, therefore the '+')). This generally makes the graph easier to color (most neighbors of coalesced nodes loose one neighbor), except for the coalesced node itself. Also this removes most copy insns (or better said the most costly set of copy insns). For the coalesced node we have: - coalesce breaking: This works for iterated coalescing (when not testing conservativeness before combining) and of course for optimistic/extended coalescing. If after coloring some spilled webs remain which are coalesce targets, all these coalesces are broken up again (level by level), and the webs involved are recolored. If still all these webs are spilled they will be coalesced again by spill-coalescing, so nothing is lost. But often at least some of the webs which were coalesced together can be colored, reducing the overall spillcost. As checked in it fully bootstraps all languages on x86, and check-gcc shows no regressions. If and how SPEC2000 is compiled I didn't check. It wouldn't be a wonder if I broke other platforms though (if they were working before). With that allocator I now frequently get smaller code than with -fno-new-ra (which uses local/global and reload), which is promising, as still all loads/stores are done explicitely to registers, instead of emitting the memrefs directly to the insns, and ergo take up extra registers, which are sometimes not strictly necessary, so there still is room for improvement. Btw. I often test things with -O2 -fpic to make only few registers available, but -O2 alone also looks good. The patch itself is gziped because of it's size. The ChangeLog below also is big, although synthezised by looking at the diff (yeah, sorry). Nevertheless happy christmas, as I wanted to say initially. -- 2002-01-04 Michael Matz <matzmich@cs.tu-berlin.de> * df.c (df_ref_record): Correctly calculate SUBREGs of hardregs. (df_bb_reg_def_chain_create, df_bb_reg_use_chain_create): Only add new refs. (df_bb_refs_update): Don't clear insns_modified here, ... (df_analyse): ... but here. * longlong.h (count_trailing_zeros): Fix macro multiline continuation. * sbitmap.c (dump_sbitmap_file): New. (debug_sbitmap): Use it. * sbitmap.h (dump_sbitmap_file): Add prototype. * toplev.c (rest_of_compilation): Do regclass() only when old-ra, don't dump info when new-ra. * varray.h (struct varray_data_tag): New member 'web'. (VARRAY_INIT_WEB, VARRAY_WEB, VARRAY_PUSH_WEB, VARRAY_TOP_WEB): New. * ra.c (enum node_type): New members 'FREE', 'SIMPLIFY_SPILL' and 'SIMPLIFY_FAT'. (struct tagged_conflict): Add 'size_word', delete 'size' and 'word'. (struct web_part): s/spanned_insns/spanned_deaths/. (struct web.orig_spill_temp, span_deaths, orig_spill_temp, orig_spill_cost, num_aliased, old_color, old_web, in_load, one_load, target_of_spilled_move, have_orig_conflicts, parent_web, orig_conflict_list, orig_usable_regs, bias_colors, prefer_colors, last_use_insn, pattern, temp_refs): New members. (struct web::span_insns, has_sub_conflicts): Remove members. (struct conflict): Delete. (ra_obstack): New static variable. (rtx_to_bits): Return an unsigned int. (set_undefined, free_all_lists, rematerializable, only_one_reaching_def): Delete functions. (unbrokengraph): Delete. (insns_with_death, death_insns_max_uid, ra_pass, live_at_end): New. (struct bb_begin_info): Rename to ra_bb_info, new members 'regnos_mentioned' and 'live_throughout'. (all_defs_for_web, all_uses_for_web): Delete. (web_lists[]): New. (WEBS): Macro for accessing it. (simplify_wl, freeze_wl, spill_wl, simplify_spilled_wl, coalesced_nodes, colored_nodes, spilled_nodes, select_stack, simplify_fat_wl): Delete, now in web_lists[]. All users changed. (last_def_id, last_use_id, last_num_webs, last_max_uid, last_check_uses, remember_conflicts): New. (hardregs_for_mode[], byte2bitcount[]): New. (ID2WEB): New. (DUMP_*): New. (flag_ra_dump_only_costs, flag_ra_biased, flag_ra_ir_spilling, flag_ra_optimistic_coalescing, flag_ra_break_aliases, flag_ra_merge_spill_costs, flag_ra_spill_every_use, flag_ra_dump_notes): New. (struct ra_insn_info): New. (insn_df_max_uid, insn_df, refs_for_insn_df): New. (hard_regs_count, rtx_to_undefined, create_insn_info, free_insn_info, undef_to_size_word, defuse_overlap_p_1, find_web_for_subweb_1, copy_web, compare_and_free_webs, init_webs_defs_uses, parts_to_webs_1, check_conflict_numbers, free_all_mem, push_list_end, put_web_at_end, remove_web_from_list, simplify_p, dump_igraph_machine, dump_cost, dump_graph_cost, setup_renumber, check_df): Various new functions. (debug_msg): Use a bitmap as debug level. All callers changed to use some level. (find_sub_conflicts, get_sub_conflicts): Use a combined size and word arguments. Change callers. (undef_table[]): According changes to above. (undef_to_bitmap): Use undef_to_size_word. (prune_hardregs_for_mode): Use hardregs_for_mode[]. (find_subweb): web->subreg_next is no circular list anymore. (find_subweb_2): Dito and use one argument for size/word. (find_web_for_subweb): Macro, plus use parent_web. (add_subweb): Set parent_web. (add_subweb_2): One size/word argument. (remember_move): Move sanity checking for copy insns to here ... (live_out_1): ... from here. Use insn_df[] instead DF_INSN_DEFS(). Update spanned_deaths according to insns_with_deaths. Don't check defs of fixed registers. (defuse_overlap_p): Now a macro calling ... (defuse_overlap_p_1): ... this, when necessary. Use rtx_to_undefined. (ra_print_rtx_1op, ra_print_rtx_2op, ra_print_rtx_3op, ra_print_rtx_object, ra_print_rtx, ra_print_rtx_top, ra_debug_rtx, ra_debug_insns, ra_debug_bbi, ra_print_rtl_with_bb): My version of pretty-printing RTL. Use them in the various dumps. (ra_alloc, ra_calloc): New. Use the ra_obstack for allocating memory. Use it everywhere, where small objects were allocated by xmalloc or xcalloc. At the same time never free that memory while working, but simply forget it. Instead it's freed once at the end of reg_alloc. -- Optimisations for graph building -- (update_regnos_mentioned, livethrough_conflicts_bb, init_bb_info, free_bb_info, live_in_edge): New. Handle use-transparent blocks. (live_in): Don't use prev_real_insn(), too many function calls. Call live_in_edge() for handling basic block boundaries, which has optimisations, when the entered block is transparent for the use in question. (build_web_parts_and_conflicts): Don't allocate copy_cache here, nor the per basic block info. Call update_regnos_mentioned() for conservative detection of transparency. Don't trace uses of fixed registers. Only trace new or marked uses. Call livethrough_conflicts_bb for dealing with transparent uses/blocks. -- Incremental building of interference graph -- (init_one_web_common, reinit_one_web, reset_lists, reset_conflicts, realloc_web_parts, detect_web_parts_to_rebuild): New. (init_one_web_common): Old init_one_web. Initialize also dlink. Put web directly in the correct list. (init_one_web): Simply memset() everything, then call init_one_web_common. (reinit_one_web): Clears only necessary members. Deal with new members. (init_web_parts): Sanity checks. Sometimes clear web_parts[].ref member (incremental building). (parts_to_webs): Break out in parts_to_webs_1 and init_webs_defs_uses. Generally deal with incremental building. Reuse free webs. Create artifical inverse webs. use2web[] and def2web[] now contain the subwebs instead of webs. All users of those arrays changed. (conflicts_between_webs): Use reset_conflicts() for later passes for incremental building. Cache already created conflicts. (alloc_mem): Use mostly realloc_web_parts. Only allocate move_handled and insn_df[] (per create_insn_info) each round. (free_mem): Similar. Don't free all memory here, but ... (free_all_mem): ... here. -- Interference region spilling -- (build_inverse_webs, spill_same_color_p, is_partly_live_1, update_spill_colors, spill_is_free, emit_loads, detect_bbs_for_rewrite, detect_deaths_in_bb, reloads_to_loads, rewrite_program2): New. Interference region spilling. (remember_slot, slots_overlap_p, delete_overlapping_slots, slot_member_p, insert_stores): New. Emit stores slightly more clever than after each def. -- Rematerialisation -- (memref_is_stack_slot, contains_pseudo, want_to_remat, detect_remat_webs): New. Detect rematerializable webs. Set web->pattern to the source rtx. Emit that instead of a reload in emit_loads() later. (delete_useless_defs): New. Delete insns which defined rematerializable webs which got spilled, and are now useless. (spill_coalescing): Don't coalesce rematerializable webs. Update savings due to insns deleting. -- Local spill cost improvements -- (spill_prop_savings, spill_prop_insert, spill_propagation, spill_coalprop): New. Spill propagation and integration with spill coalescing. (try_recolor_web, recolor_spills): New. Recoloring spilled webs if neighbors can be colored differently or spilled and that's cheaper. (actual_spill): Use spill_coalprop(), choose_spill_colors(), rewrite_program2(), insert_stores(), delete_useless_defs() and detect_web_parts_to_rebuild(). -- Optimistic coalescing / coalesce breaking -- (insert_coalesced_conflicts, comp_webs_maxcost, restore_conflicts_from_coalesce, unalias_web, copy_conflict_list, init_web_pairs, add_web_pair_cost, comp_web_pairs, sort_and_combine_web_pairs, aggressive_coalesce, aggressive_coalesce_2, extended_coalesce, extended_coalesce_2, check_uncoalesced_moves): New. Optimistically coalesce nonconflicting webs. (add_conflict_edge): Remember current conflict list when needed. Update num_conflicts only when other web has correct type. (coalesce breaking) (moves_to_webs): Only fill web's moves list, when !flag_ra_optimistic_coalescing. Otherwise that member isn't needed. This way no iterated coalescing is done at all when we want optimistic coalescing. (combine): Can combine webs with any type. Merge spill cost into coalesce target of requested. Merge _all_ conflicts into coalesce target (not only non-selected non-coalesced). -- Various stuff -- (remember_web_was_spilled): Use hard_regs_count. Maintain orig_usable_regs. Web doesn't yet have any neighbors. (detect_spill_temps): Better classification of when webs can't be spilled, or are spill-temps but _can_ be spilled. (determine_web_costs): New. Broken out of parts_to_webs. Correctly calculate spill costs in terms of basic-block frequency (although this seems wrong sometimes). (make_webs): Use detect_remat_webs() and determine_web_costs(). (put_web): Use web_lists[] instead of a directly named list header. (decrement_degree): Use remove_web_from_list. (combine): In case of partial conflicts try to find a corresponding part in the target. Relax spill_temp flag when combining constrained with less constrained webs. (default_spill_heuristic): Coalesced webs are cheaper. Prefer also webs spanning more deaths. (color_usable_p, get_biased_reg): New. Checks if a certain color is usable for a web, and get a available color for a web, prefering biased colors. (calculate_dont_begin): New. Detect colors which are forbidden as begin colors for a web according to already colored neighbors. Broken out from ... (colorize_one_web): ... here. Use calculate_dont_begin. Try to use the color this web got the last round first. Then try biased colors. Better and more methods to respill neighbors for spill_temps or non-spills. Maintain bias_colors for neighbors when requested. When optimistic coloring already break coalescings for spilled webs here and recolor. (rewrite_program): Maintain cost variables. Swap order of load and store emitting. Emit uses here only if flag_ra_spill_every_use. Emit stores only when a stack-slot was allocated. Now not called anymore. Stores are emitted by insert_stores. (one_pass): If requested call aggressive_coalesce() and extended_coalesce_2(). Use dump_igraph_machine(). Check the graph after coloring (check_colors()), break any remaining coalesces to spills if reqeusted and try to recolor spilled webs (recolor_spills). (dump_constraints): Temporarily change final colors into pseudo rtx's. Call constrain_operands() to check if the insns already strictly match, or not. (init_ra): Don't add eliminable registers into never_use_colors. Reload will fix this up, if it's finally not usable. Initialize byte2bitcount[], hardregs_for_mode[], insns_with_deaths and ra_obstack. (setup_renumber): Free ra_reg_renumber only if necessary. (reg_alloc): Initialize debug level to some usefull values. Call regclass() here (instead in toplev.c). Reset some global vars.
Attachment:
ra.diff.gz
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |