[PATCH] Add splay-tree "view" for bitmap

Richard Sandiford richard.sandiford@arm.com
Fri Oct 19 08:44:00 GMT 2018


Richard Biener <rguenther@suse.de> writes:
> On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>Richard Biener <rguenther@suse.de> writes:
>>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>>>
>>>> Richard Biener <rguenther@suse.de> writes:
>>>> > PR63155 made me pick up this old work from Steven, it turns our
>>>> > linked-list implementation to a two-mode one with one being a
>>>> > splay tree featuring O(log N) complexity for find/remove.
>>>> >
>>>> > Over Stevens original patch I added a bitmap_tree_to_vec helper
>>>> > that I use from the debug/print methods to avoid changing view
>>>> > there.  In theory the bitmap iterator could get a "stack"
>>>> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>>>> >
>>>> > This can be used to fix the two biggest bottlenecks in the PRs
>>>> > testcase, namely SSA propagator worklist handling and out-of-SSA
>>>> > coalesce list building.  perf shows the following data, first
>>>> > unpatched, second patched - also watch the thrid coulumn (samples)
>>>> > when comparing percentages.
>>>> >
>>>> > -O0
>>>> > -   18.19%    17.35%           407  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>>>>> >       + 8.77% create_coalesce_list_for_region                     
>>>>>> >       + 4.21% calculate_live_ranges                               
>>>>>> >       + 2.02% build_ssa_conflict_graph                            
>>>>>> >       + 1.66% insert_phi_nodes_for                                
>>>>>> >       + 0.86% coalesce_ssa_name                      
>>>> > patched:
>>>> > -   12.39%    10.48%           129  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>>>>> >       + 5.27% calculate_live_ranges                               
>>>>>> >       + 2.76% insert_phi_nodes_for                                
>>>>>> >       + 1.90% create_coalesce_list_for_region                     
>>>>>> >       + 1.63% build_ssa_conflict_graph                            
>>>>>> >       + 0.35% coalesce_ssa_name                           
>>>> >
>>>> > -O1
>>>> > -   17.53%    17.53%           842  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>>>>> >       + 12.39% add_ssa_edge                                       
>>>>>> >       + 1.48% create_coalesce_list_for_region                     
>>>>>> >       + 0.82% solve_constraints                                   
>>>>>> >       + 0.71% calculate_live_ranges                               
>>>>>> >       + 0.64% add_implicit_graph_edge                             
>>>>>> >       + 0.41% insert_phi_nodes_for                                
>>>>>> >       + 0.34% build_ssa_conflict_graph                      
>>>> > patched:
>>>> > -    5.79%     5.00%           167  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>>>>> >       + 1.41% add_ssa_edge                                        
>>>>>> >       + 0.88% calculate_live_ranges                               
>>>>>> >       + 0.75% add_implicit_graph_edge                             
>>>>>> >       + 0.68% solve_constraints                                   
>>>>>> >       + 0.48% insert_phi_nodes_for                                
>>>>>> >       + 0.45% build_ssa_conflict_graph                   
>>>> >
>>>> > -O3
>>>> > -   12.37%    12.34%          1145  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>>>>> >       + 9.14% add_ssa_edge                                        
>>>>>> >       + 0.80% create_coalesce_list_for_region                     
>>>>>> >       + 0.69% add_implicit_graph_edge                             
>>>>>> >       + 0.54% solve_constraints                                   
>>>>>> >       + 0.34% calculate_live_ranges                               
>>>>>> >       + 0.27% insert_phi_nodes_for                                
>>>>>> >       + 0.21% build_ssa_conflict_graph                 
>>>> > -    4.36%     3.86%           227  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>>>>> >       + 0.98% add_ssa_edge                                        
>>>>>> >       + 0.86% add_implicit_graph_edge                             
>>>>>> >       + 0.64% solve_constraints                                   
>>>>>> >       + 0.57% calculate_live_ranges                               
>>>>>> >       + 0.32% build_ssa_conflict_graph                            
>>>>>> >       + 0.29% mark_all_vars_used_1                                
>>>>>> >       + 0.20% insert_phi_nodes_for                                
>>>>>> >       + 0.16% create_coalesce_list_for_region             
>>>> >
>>>> >
>>>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>>> 
>>>> What's the performance like for more reasonable testcases?  E.g. is
>>there
>>>> a measurable change either way in --enable-checking=release for some
>>gcc
>>>> .iis at -g or -O2 -g?
>>>
>>> I did a quick check on my set of cc1files (still .i, .ii ones tend to
>>> be unusable quite quickly...).  Most of them compile too quickly
>>> to make any difference appear other than noise.  Multi-second
>>differences
>>> like for PR63155 should be the exception but our O(n) bitmap
>>> implementation really makes some parts of GCC quadratic where it
>>> doesn't appear so.
>>>
>>> Is there a reason you expect it to be ever slower?
>>
>>During recent stage3s I've tried to look at profiles of cc1plus
>>to see whether there was something easy we could do to improve
>>compile times.  And bitmap operations always showed up near the
>>top of the profile.  There were always some pathological queries
>>in which the linear search really hurt, but whenever I tried "simple"
>>ways to avoid the obvious cases, they made those queries quicker
>>but slowed things down overall.  It seemed that adding any extra logic
>>to the queries hurted because even a small slowdown in common lookups
>>overwhelmed a big saving in less common lookups.
>
> Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> I guess in the end well predicted branches in the out of line code are
> important...
>
>>
>>But there again I was looking to speed up more "normal" cases, not ones
>>like the PR.
>>
>>FWIW I've tried it on a local x86_64 box and it slows down current
>>optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
>>for the other files I tried.  But that's hardly a huge amount, and
>>probably a price worth paying for the speed-up in the PR.
>
> Just to make sure what to reproduce - this is with checking disabled?
> And with or without the hunks actually making use of the splay tree
> path?

Yeah, with an --enable-checking=release stage3:

   ./cc1plus optabs.ii -o /dev/null -O2 -g

using the optabs.ii from the unpatched --enable-checking=release build.

It was the whole patch vs. without the patch.

Thanks,
Richard



More information about the Gcc-patches mailing list