[PATCH 2/3] [PATCH 2/3] ipa-reorder-for-locality - Address compile time issues for locality cloning pass

Prachi Godbole pgodbole@nvidia.com
Tue Jan 6 09:16:34 GMT 2026


Ping

> On 18 Dec 2025, at 9:04 PM, Prachi Godbole <pgodbole@nvidia.com> wrote:
> 
> Ping
> 
>> On 9 Dec 2025, at 3:12 PM, Prachi Godbole <pgodbole@nvidia.com> wrote:
>> 
>> A gentle ping
>> 
>>> On 30 Oct 2025, at 10:42 AM, Prachi Godbole <pgodbole@nvidia.com> wrote:
>>> 
>>> This patch attempts to reduce compile time for locality cloning pass by
>>> reducing recursive calls to partition_callchain ().  This is achieved by
>>> precomputing caller callee information into locality_info.  locality_info
>>> stores all callees of a node, either directly or via inlined nodes thereby
>>> avoiding calls to partition_callchain () for inlined nodes which are already
>>> partitioned with their inlined_to nodes.  locality_info stores precomputed
>>> accumulated incoming edge frequencies per unique caller and avoids repeated
>>> computation within partition_callchain ().  It also stores preaccumulated and
>>> sorted outgoing edge frequencies for unique callees.
>>> 
>>> This patch refines is_entry_node_p () check by calling local_p () instead of
>>> just alias check.
>>> 
>>> Approximately 45% compile time improvement is observed for
>>> bootstrap-lto-locality config, and takes 2-5% more time on top of
>>> bootstrap-lto.
>>> 
>>> This patch also handles appropriate memory management of pass specific data
>>> structures.
>>> 
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>> Ok for mainline?
>>> 
>>> Thanks,
>>> Prachi
>>> 
>>> Signed-off-by: Prachi Godbole <pgodbole@nvidia.com>
>>> 
>>> gcc/ChangeLog:
>>> 
>>> 	* ipa-locality-cloning.cc (struct locality_callee_info): New struct.
>>> 	(struct locality_info): Ditto.
>>> 	(loc_infos): Ditto.
>>> 	(get_locality_info): New function.
>>> 	(sort_all_callees_default): Ditto.
>>> 	(callee_default_cmp): Ditto.
>>> 	(populate_callee_locality_info): Ditto.
>>> 	(populate_caller_locality_info): Ditto.
>>> 	(create_locality_info): Ditto.
>>> 	(adjust_recursive_callees): Access node_to_clone by reference.
>>> 	(inline_clones): Access node_to_clone and clone_to_node by reference.
>>> 	(clone_node_as_needed): Ditto.
>>> 	(accumulate_incoming_edge_frequency): Remove function.
>>> 	(clone_node_p): New function.
>>> 	(partition_callchain): Refactor the function.
>>> 	(is_entry_node_p): Call local_p ().
>>> 	(locality_determine_ipa_order): Call create_locality_info ().
>>> 	(locality_determine_static_order): Ditto.
>>> 	(locality_partition_and_clone): Update call to partition_callchain ()
>>> 							 according prototype.
>>> 	(lc_execute): Allocate and free node_to_ch_info, node_to_clone,
>>> 	clone_to_node.
>>> 
>>> <0002-PATCH-2-3-ipa-reorder-for-locality-Address-compile-t.patch>
>> 
> 



More information about the Gcc-patches mailing list