[RFC/RFA] Interface for profile info and size optimization
Richard Guenther
richard.guenther@gmail.com
Sun Jun 15 22:55:00 GMT 2008
On Fri, Jun 6, 2008 at 7:09 AM, Jan Hubicka <jh@suse.cz> wrote:
> Hi,
> this patch should make interface for profile easier. As discussed few
> months ago http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01371.html , the
> current predicates provided by profile infrastructure (maybe_hot,
> probably_cold, probably_never_executed) are somewhat hard to map to the
> question we are usually interested in, that is if optimize for size or
> speed.
>
> This patch adds optimize_for_size_p and optimize_for_speed_p predicates for
> BBs, edges and RTL expansion that implements the logic that everything that
> might be hot is optimized for speed unless -Os is specified or the function is
> explicitely marked as cold.
>
> There are contextes where we are really interested in the notion of
> coldness/hotness (ie when the tradeoffs are not involving code size, just
> optimizing one path on the cost of another), so I am keeping both interfaces.
>
> For RTL I've also added the maybe_hot_bb_p global flag. This was
> discussed in the past and Roger suggested that global current_bb pointer
> is better interface. I still think it is mistake, since current basic
> block is either wrong or inaccurate for several reasons:
>
> 1) during expansion current_bb points to the tree level basic block not
> to the basic block instruction will end up in after splitting. Cold
> sections of expanded tree constructs would be considered hot then.
> With separate flag, we can change it based on knowledge what we are
> expanding now (when expanding, for instance, switch or string operation)
> 2) We emit instructions to edges, then current basic block is not defined
> 3) Low level RTL bits are not aware of CFG and probably will stay in this way.
>
> So instead of current_bb, I've tried to hid the information in the API
> rtl_profile_for_bb, rtl_profile_for_edge and default_rtl_profile so if we
> decide we want to pass more infromation currently present in CFG than just the
> hotness bits, we can easilly do that in those functions.
>
> The basic idea is that profile aware passes will use proper set function
> and reset the profile when done. I would like to update all existing
> passes to be profile aware if we settle down on the API. It should not
> be that dificult.
>
> Bootstrapped/regtested i686-linux, OK?
This looks reasonable, but I'd prefer the simple accessors in predict.c to
move to predict.h.
Please wait for additional comments,
Thanks,
Richard.
> * predict.c (always_optimize_for_size_p): New function.
> (optimize_bb_for_size_p, optimize_bb_for_speed_p,
> optimize_edge_for_size_p, optimize_edge_for_speed_p,
> optimize_insn_for_size_p, optimize_insn_for_speed_p): New global
> functions.
> (rtl_profile_for_bb, rtl_profile_for_edge, rtl_default_profile): New.
> * function.c (prepare_function_start): Set default profile.
> * function.h (rtl_data): Add maybe_hot_insn_p.
> * cfgexpand.c (expand_gimple_basic_block): Set RTL profile.
> (construct_exit_block): Likewise.
> (tree_expand_cfg): Likewise.
> * basic-block.h
> (optimize_bb_for_size_p, optimize_bb_for_speed_p,
> optimize_edge_for_size_p, optimize_edge_for_speed_p,
> optimize_insn_for_size_p, optimize_insn_for_speed_p): Declare.
> (rtl_profile_for_bb, rtl_profile_for_edge, default_rtl_profile):
> Declare.
> Index: predict.c
> ===================================================================
> *** predict.c (revision 136426)
> --- predict.c (working copy)
> *************** probably_never_executed_bb_p (const_basi
> *** 178,183 ****
> --- 178,263 ----
> return false;
> }
>
> + /* Return true when current function should always be optimized for size. */
> +
> + static bool
> + always_optimize_for_size_p (void)
> + {
> + return (optimize_size
> + || cfun->function_frequency == FUNCTION_FREQUENCY_UNLIKELY_EXECUTED);
> + }
> +
> + /* Return TRUE when BB should be optimized for size. */
> +
> + bool
> + optimize_bb_for_size_p (basic_block bb)
> + {
> + return always_optimize_for_size_p () || !maybe_hot_bb_p (bb);
> + }
> +
> + /* Return TRUE when BB should be optimized for speed. */
> +
> + bool
> + optimize_bb_for_speed_p (basic_block bb)
> + {
> + return !optimize_bb_for_size_p (bb);
> + }
> +
> + /* Return TRUE when BB should be optimized for size. */
> +
> + bool
> + optimize_edge_for_size_p (edge e)
> + {
> + return always_optimize_for_size_p () || !maybe_hot_bb_p (e);
> + }
> +
> + /* Return TRUE when BB should be optimized for speed. */
> +
> + bool
> + optimize_edge_for_speed_p (edge e)
> + {
> + return !optimize_edge_for_size_p (e);
> + }
> +
> + /* Return TRUE when BB should be optimized for size. */
> +
> + bool
> + optimize_insn_for_size_p (void)
> + {
> + return always_optimize_for_size_p () || !crtl->maybe_hot_insn_p;
> + }
> +
> + /* Return TRUE when BB should be optimized for speed. */
> +
> + bool
> + optimize_insn_for_speed_p (void)
> + {
> + return !optimize_insn_for_size_p ();
> + }
> +
> + /* Set RTL expansion for BB profile. */
> +
> + void
> + rtl_profile_for_bb (basic_block bb)
> + {
> + crtl->maybe_hot_insn_p = maybe_hot_bb_p (bb);
> + }
> +
> + /* Set RTL expansion for edge profile. */
> +
> + void
> + rtl_profile_for_edge (edge e)
> + {
> + crtl->maybe_hot_insn_p = maybe_hot_edge_p (e);
> + }
> +
> + /* Set RTL expansion to default mode (i.e. when profile info is not known). */
> + void
> + default_rtl_profile (void)
> + {
> + crtl->maybe_hot_insn_p = true;
> + }
> +
> /* Return true if the one of outgoing edges is already predicted by
> PREDICTOR. */
>
> Index: function.c
> ===================================================================
> *** function.c (revision 136426)
> --- function.c (working copy)
> *************** prepare_function_start (void)
> *** 3908,3913 ****
> --- 3908,3914 ----
> init_emit ();
> init_varasm_status ();
> init_expr ();
> + default_rtl_profile ();
>
> cse_not_expected = ! optimize;
>
> Index: function.h
> ===================================================================
> *** function.h (revision 136426)
> --- function.h (working copy)
> *************** struct rtl_data GTY(())
> *** 397,402 ****
> --- 397,405 ----
> Set in stmt.c if anything is allocated on the stack there.
> Set in reload1.c if anything is allocated on the stack there. */
> bool frame_pointer_needed;
> +
> + /* When set, expand should optimize for speed. */
> + bool maybe_hot_insn_p;
> };
>
> #define return_label (crtl->x_return_label)
> Index: cfgexpand.c
> ===================================================================
> *** cfgexpand.c (revision 136426)
> --- cfgexpand.c (working copy)
> *************** expand_gimple_basic_block (basic_block b
> *** 1478,1483 ****
> --- 1478,1484 ----
> }
>
> bb->il.tree = NULL;
> + rtl_profile_for_bb (bb);
> init_rtl_bb_info (bb);
> bb->flags |= BB_RTL;
>
> *************** construct_exit_block (void)
> *** 1710,1715 ****
> --- 1711,1718 ----
> edge_iterator ei;
> rtx orig_end = BB_END (EXIT_BLOCK_PTR->prev_bb);
>
> + rtl_profile_for_bb (EXIT_BLOCK_PTR);
> +
> /* Make sure the locus is set to the end of the function, so that
> epilogue line numbers and warnings are set properly. */
> if (cfun->function_end_locus != UNKNOWN_LOCATION)
> *************** tree_expand_cfg (void)
> *** 1843,1848 ****
> --- 1846,1853 ----
> /* Some backends want to know that we are expanding to RTL. */
> currently_expanding_to_rtl = 1;
>
> + rtl_profile_for_bb (ENTRY_BLOCK_PTR);
> +
> insn_locators_alloc ();
> if (!DECL_BUILT_IN (current_function_decl))
> set_curr_insn_source_location (DECL_SOURCE_LOCATION (current_function_decl));
> *************** tree_expand_cfg (void)
> *** 1906,1911 ****
> --- 1911,1919 ----
> lab_rtx_for_bb = pointer_map_create ();
> FOR_BB_BETWEEN (bb, init_block->next_bb, EXIT_BLOCK_PTR, next_bb)
> bb = expand_gimple_basic_block (bb);
> +
> + /* Expansion is used by optimization passes too, set maybe_hot_insn_p
> + conservatively to true until they are all profile aware. */
> pointer_map_destroy (lab_rtx_for_bb);
> free_histograms ();
>
> *************** tree_expand_cfg (void)
> *** 1974,1979 ****
> --- 1982,1988 ----
> /* Tag the blocks with a depth number so that change_scope can find
> the common parent easily. */
> set_block_levels (DECL_INITIAL (cfun->decl), 0);
> + default_rtl_profile ();
> return 0;
> }
>
> Index: basic-block.h
> ===================================================================
> *** basic-block.h (revision 136426)
> --- basic-block.h (working copy)
> *************** extern bool maybe_hot_bb_p (const_basic_
> *** 830,835 ****
> --- 830,841 ----
> extern bool maybe_hot_edge_p (edge);
> extern bool probably_cold_bb_p (const_basic_block);
> extern bool probably_never_executed_bb_p (const_basic_block);
> + extern bool optimize_bb_for_size_p (basic_block);
> + extern bool optimize_bb_for_speed_p (basic_block);
> + extern bool optimize_edge_for_size_p (edge);
> + extern bool optimize_edge_for_speed_p (edge);
> + extern bool optimize_insn_for_size_p (void);
> + extern bool optimize_insn_for_speed_p (void);
> extern bool tree_predicted_by_p (const_basic_block, enum br_predictor);
> extern bool rtl_predicted_by_p (const_basic_block, enum br_predictor);
> extern void tree_predict_edge (edge, enum br_predictor, int);
> *************** bb_has_abnormal_pred (basic_block bb)
> *** 987,992 ****
>
> /* In cfgloopmanip.c. */
> extern edge mfb_kj_edge;
> ! bool mfb_keep_just (edge);
>
> #endif /* GCC_BASIC_BLOCK_H */
> --- 993,1003 ----
>
> /* In cfgloopmanip.c. */
> extern edge mfb_kj_edge;
> ! extern bool mfb_keep_just (edge);
> !
> ! /* In cfgexpand.c. */
> ! extern void rtl_profile_for_bb (basic_block);
> ! extern void rtl_profile_for_edge (edge);
> ! extern void default_rtl_profile (void);
>
> #endif /* GCC_BASIC_BLOCK_H */
> Index: config/i386/i386.c
> ===================================================================
> *** config/i386/i386.c (revision 136426)
> --- config/i386/i386.c (working copy)
> *************** standard_80387_constant_p (rtx x)
> *** 5746,5752 ****
> /* For XFmode constants, try to find a special 80387 instruction when
> optimizing for size or on those CPUs that benefit from them. */
> if (mode == XFmode
> ! && (optimize_size || TARGET_EXT_80387_CONSTANTS))
> {
> int i;
>
> --- 5746,5752 ----
> /* For XFmode constants, try to find a special 80387 instruction when
> optimizing for size or on those CPUs that benefit from them. */
> if (mode == XFmode
> ! && (optimize_insn_for_size_p () || TARGET_EXT_80387_CONSTANTS))
> {
> int i;
>
> *************** decide_alg (HOST_WIDE_INT count, HOST_WI
> *** 15447,15458 ****
> || (alg != rep_prefix_1_byte \
> && alg != rep_prefix_4_byte \
> && alg != rep_prefix_8_byte))
>
> *dynamic_check = -1;
> if (memset)
> ! algs = &ix86_cost->memset[TARGET_64BIT != 0];
> else
> ! algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
> if (stringop_alg != no_stringop && ALG_USABLE_P (stringop_alg))
> return stringop_alg;
> /* rep; movq or rep; movl is the smallest variant. */
> --- 15447,15461 ----
> || (alg != rep_prefix_1_byte \
> && alg != rep_prefix_4_byte \
> && alg != rep_prefix_8_byte))
> + const struct processor_costs *cost;
> +
> + cost = optimize_insn_for_size_p () ? &size_cost : ix86_cost;
>
> *dynamic_check = -1;
> if (memset)
> ! algs = &cost->memset[TARGET_64BIT != 0];
> else
> ! algs = &cost->memcpy[TARGET_64BIT != 0];
> if (stringop_alg != no_stringop && ALG_USABLE_P (stringop_alg))
> return stringop_alg;
> /* rep; movq or rep; movl is the smallest variant. */
>
More information about the Gcc-patches
mailing list