Using the program in PR 10060, to generate 1000 and 2000, I see that "IV OPT" goes from: tree iv optimization : 1.79 (14%) usr 0.15 ( 5%) sys 2.60 (13%) wall to tree iv optimization : 6.92 (13%) usr 0.29 ( 5%) sys 7.30 (12%) wall which looks like PHI iv opt does not scale O(n). This is not as importran as PR 18594 though (which is the same testcase).
if IV opts doesn't scale, then why is the compile time ~14% in both cases?
use this awk script to generate testcase (first arg is number of loops): ---------------------------------------------------------------------------------------- BEGIN { ORS="" print "int f ()\n{\tint " for (j = 0; j < ARGV [1]; j++) print "j" j ", " print "a;\n\ta = 0;\n" print "\tfor (j0 = 0; j0 < 2; j0 ++)\n" for (j = 1; j < ARGV [1]; j++) print "\tfor (j" j " = j" j-1 "; j" j " < 2; j" j" ++)\n" print "\ta += " for (j = 0; j < ARGV [1]-1; j++) print "j" j " + " print "j" j ";\n\treturn a;\n}\n" } ---------------------------------------------------------------------------------------- N loops Time, s 5 0.05 10 0.17 15 0.38 20 1.14 25 2.81 30 4.68 35 7.52 40 13.6 45 21.8 50 25.6 55 33.9
Of course it's not very useful to claim that some algorithm is O(N^x) when you don't say what N is...
(In reply to comment #3) > Of course it's not very useful to claim that some algorithm > is O(N^x) when you don't say what N is... N is number of nested loops.
IVOPTs should behave quadratically on this type of tests; I do not know where does the extra O(N) factor come from, and I would not expect the times to grow this fast.
It seems that extra O(N) factor comes from high memory usage. For smaller N (0..30) compiler behaves like O(2) and for larger (> 50) like O(2.5 .. 3). gnuplot> fit [0:30] A*x**k+B "data" via A,k,B [snip] Final set of parameters Asymptotic Standard Error ======================= ========================== A = 0.000465461 +/- 2.625e-05 (5.64%) k = 1.90612 +/- 0.01652 (0.8667%) B = 0.0328722 +/- 0.0007677 (2.335%) gnuplot> fit [50:175] A*x**k+B "data" via A,k,B [snip] Final set of parameters Asymptotic Standard Error ======================= ========================== A = 4.76329e-06 +/- 3.493e-07 (7.332%) k = 3.0033 +/- 0.01405 (0.4678%) B = 0.392746 +/- 0.02972 (7.567%) For N=175 memory usage on my machine exceedes 600 MB
hmmm maybe the extra O(N) comes from O(N) bitmap operations? (Just guessing)
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > hmmm maybe the extra O(N) comes from O(N) bitmap operations? (Just guessing) that might be the case, but I don't think it is likely (also just guessing :-) As far as I can tell all bitmaps in ivopts should be small for this testcase.
It's definitely not the bitmaps: time seconds seconds calls s/call s/call name 10.28 25.14 25.14 24356740 0.00 0.00 ggc_alloc_stat 5.39 38.32 13.18 3133403 0.00 0.00 instantiate_parameters_1 5.25 51.15 12.83 34260186 0.00 0.00 htab_find_slot_with_hash 4.41 61.94 10.79 40175413 0.00 0.00 build_int_cst_wide 4.35 72.58 10.64 1139473 0.00 0.00 chrec_convert 3.79 81.86 9.28 1810 0.01 0.01 htab_empty 3.63 90.73 8.87 22863854 0.00 0.00 build3_stat 3.41 99.07 8.34 800 0.01 0.01 for_each_insn_in_loop 2.74 105.76 6.69 415264 0.00 0.00 follow_ssa_edge 2.64 112.21 6.45 3085512 0.00 0.00 find_interesting_uses_stmt 2.14 117.45 5.24 3197505 0.00 0.00 record_invariant 2.13 122.66 5.21 9695984 0.00 0.00 analyze_scalar_evolution_1 1.65 126.70 4.04 11769469 0.00 0.00 comptypes 1.64 130.70 4.00 11710741 0.00 0.00 fold_convert_const 1.63 134.68 3.98 23551936 0.00 0.00 make_node_stat 1.44 138.19 3.51 11464308 0.00 0.00 force_fit_type 1.28 141.31 3.12 12009984 0.00 0.00 fold_convert 1.25 144.37 3.06 21935414 0.00 0.00 eq_scev_info 1.18 147.26 2.89 9695984 0.00 0.00 analyze_scalar_evolution 1.09 149.93 2.67 326865 0.00 0.00 chrec_fold_plus_1 1.01 152.41 2.48 200 0.01 0.94 tree_ssa_iv_optimize_loop 1.01 154.88 2.47 29473273 0.00 0.00 is_gimple_min_invariant 0.99 157.30 2.43 26291603 0.00 0.00 flow_bb_inside_loop_p 0.98 159.70 2.40 22605936 0.00 0.00 flow_loop_nested_p 0.95 162.03 2.33 3135046 0.00 0.00 htab_delete 0.95 164.35 2.32 164459 0.00 0.00 htab_expand
More profile data: ----------------------------------------------- 0.00 187.04 1/1 execute_pass_list [6] [7] 76.5 0.00 187.04 1 tree_ssa_iv_optimize [7] 2.48 184.56 200/200 tree_ssa_iv_optimize_loop [8] 0.00 0.00 1/201 free_loop_data [349] 0.00 0.00 2/30137 bitmap_obstack_free [178] 0.00 0.00 201/7042572 xcalloc [127] 0.00 0.00 3/2369 varray_init [1063] 0.00 0.00 2/36950 bitmap_obstack_alloc [794] ----------------------------------------------- 2.48 184.56 200/200 tree_ssa_iv_optimize [7] [8] 76.5 2.48 184.56 200 tree_ssa_iv_optimize_loop [8] 30.61 134.21 309011/311017 simple_iv <cycle 2> [35] 6.45 9.73 3085512/3085512 find_interesting_uses_stmt [25] 0.04 1.03 200/202 scev_reset [93] 0.00 0.76 200/200 determine_use_iv_costs [107] 0.00 0.50 200/200 rewrite_uses [142] 0.00 0.35 200/202 loop_commit_inserts [181] 0.11 0.11 20100/20100 find_interesting_uses_cond [241] 0.02 0.09 200/311017 number_of_iterations_exit <cycle 2> [41] 0.06 0.04 200/201 free_loop_data [349] 0.00 0.09 200/200 create_new_ivs [360] 0.00 0.08 200/401 get_loop_body_in_dom_order [274] 0.00 0.06 200/2601 get_loop_body [111] 0.05 0.00 200/200 remove_unused_ivs [440] 0.02 0.03 19703/99904 find_interesting_uses_outer_or_nonlin [223] 0.00 0.04 2004/2804 add_candidate [413] 0.00 0.02 200/200 find_optimal_iv_set [609] 0.00 0.02 20507/40407 set_iv [503] 0.00 0.02 800/800 add_iv_value_candidates [689] 0.01 0.01 80200/26291603 flow_bb_inside_loop_p [44] 0.01 0.00 800/30137 bitmap_obstack_free [178] 0.00 0.00 1006/5012 force_var_cost [590] 0.00 0.00 200/200 iv_ca_free [1067] 0.00 0.00 201/3005 add_candidate_1 [530] 0.00 0.00 20708/370258 zero_p [517] 0.00 0.00 1399/3525413 get_iv [69] 0.00 0.00 402/12009984 fold_convert [17] 0.00 0.00 1198/3329551 is_gimple_reg [71] 0.00 0.00 1202/40175413 build_int_cst_wide [26] 0.00 0.00 200/200 single_dom_exit [1344] 0.00 0.00 402/132285 find_edge [436] 0.00 0.00 1003/8376514 bitmap_set_bit [92] 0.00 0.00 20507/20507 contains_abnormal_ssa_name_p [1427] 0.00 0.00 201/57945 int_cst_value [804] 0.00 0.00 1202/22889750 build_int_cst [141] 0.00 0.00 1006/5034 add_cost [1463] 0.00 0.00 1007/22165 cst_and_fits_in_hwi [1748] 0.00 0.00 402/1401 loop_latch_edge [2001] 0.00 0.00 201/1597 loop_preheader_edge [1984] ----------------------------------------------- [9] 67.8 30.81 135.08 311017+24191806 <cycle 2 as a whole> [9] 13.18 43.06 3133403+18788021 instantiate_parameters_1 <cycle 2> [11] 5.21 44.15 9695984 analyze_scalar_evolution_1 <cycle 2> [12] 0.22 20.27 703698 interpret_rhs_modify_expr <cycle 2> [22] 2.89 7.22 9695984 analyze_scalar_evolution <cycle 2> [32] 0.57 6.74 313469+81390 follow_ssa_edge_in_rhs <cycle 2> [38] 6.69 0.56 415264+1352605 follow_ssa_edge <cycle 2> [39] 0.22 4.74 21906 number_of_iterations_exit <cycle 2> [41] 0.05 1.16 30495 number_of_iterations_in_loop <cycle 2> [86] 0.04 0.07 43784 instantiate_parameters <cycle 2> [347] 0.05 0.01 65718 simplify_using_outer_evolutions <cycle 2> [441] 0.05 0.00 30295 compute_overall_effect_of_inner_loop <cycle 2> [459] ----------------------------------------------- 22088230 chrec_convert [10] 0.95 4.12 101392/1139473 instantiate_parameters_1 <cycle 2> [11] 1.23 5.36 131992/1139473 follow_ssa_edge_in_rhs <cycle 2> [38] 3.78 16.43 404782/1139473 interpret_rhs_modify_expr <cycle 2> [22] 4.68 20.35 501307/1139473 analyze_scalar_evolution_1 <cycle 2> [12] [10] 23.3 10.64 46.26 1139473+22088230 chrec_convert [10] 3.01 17.81 11575896/12009984 fold_convert [17] 4.15 13.69 10684608/22863854 build3_stat [13] 2.97 1.31 11044115/40175413 build_int_cst_wide [26] 0.76 0.94 340478/806987 fold <cycle 7> [47] 1.37 0.00 23227703/23318505 chrec_type [77] 0.25 0.00 11044115/22889750 build_int_cst [141] 22088230 chrec_convert [10] ----------------------------------------------- 18788021 instantiate_parameters_1 <cycle 2> [11] 43784 instantiate_parameters <cycle 2> [347] 3089619 simple_iv <cycle 2> [35] [11] 23.0 13.18 43.06 3133403+18788021 instantiate_parameters_1 <cycle 2> [11] 1.28 8.71 11465180/11465180 set_instantiated_value [33] 2.25 7.43 5793593/22863854 build3_stat [13] 0.85 5.09 104374/326865 chrec_fold_plus_1 [24] 0.95 4.12 101392/1139473 chrec_convert [10] 1.51 0.67 5628417/40175413 build_int_cst_wide [26] 1.92 0.03 5773179/5820723 htab_find_with_hash [67] 1.84 0.00 21921424/29473273 is_gimple_min_invariant [56] 0.82 0.70 8874753/26291603 flow_bb_inside_loop_p [44] 0.90 0.28 5773179/5784085 htab_find [88] 1.14 0.00 5732590/6202135 bitmap_clear_bit [85] 0.85 0.00 5732590/5756097 find_common_loop [104] 0.78 0.00 5732590/8376514 bitmap_set_bit [92] 0.67 0.00 5732590/10054625 bitmap_bit_p [89] 0.13 0.00 5628417/22889750 build_int_cst [141] 0.05 0.06 21892/806987 fold <cycle 7> [47] 0.01 0.03 21892/609615 build2_stat [98] 0.01 0.01 82482/3352077 chrec_fold_plus [122] 0.00 0.00 21892/2867796 chrec_fold_minus [154] 5732590 analyze_scalar_evolution <cycle 2> [32] 18788021 instantiate_parameters_1 <cycle 2> [11] ----------------------------------------------- 9695984 analyze_scalar_evolution <cycle 2> [32] [12] 20.2 5.21 44.15 9695984 analyze_scalar_evolution_1 <cycle 2> [12] 4.68 20.35 501307/1139473 chrec_convert [10] 1.65 9.88 202391/326865 chrec_fold_plus_1 [24] 0.45 4.61 6482915/15734611 find_var_scev_info [29] 0.88 0.75 9534492/26291603 flow_bb_inside_loop_p [44] 0.17 0.55 428920/22863854 build3_stat [13] 0.05 0.06 21785/806987 fold <cycle 7> [47] 0.01 0.03 202389/3352077 chrec_fold_plus [122] 0.02 0.00 70699/70699 chrec_merge [642] 0.01 0.00 70699/1575094 loop_phi_node_p [202] 0.00 0.00 2/2867796 chrec_fold_minus [154] 703698 interpret_rhs_modify_expr <cycle 2> [22] 70699 follow_ssa_edge <cycle 2> [39] -----------------------------------------------
Looks to me like Seb may want to look at this bug too...
One more, because I also run out of memory pretty quickly: ----------------------------------------------- 0.00 0.00 1/22863854 c_build_bind_expr [1608] 0.00 0.00 1/22863854 thread_across_edge [392] 0.00 0.00 199/22863854 estimate_numbers_of_iterations [184] 0.00 0.00 200/22863854 c_finish_loop [754] 0.00 0.01 9801/22863854 follow_ssa_edge_in_rhs <cycle 2> [38] 0.02 0.06 50602/22863854 add_to_evolution [324] 0.10 0.32 247913/22863854 simple_iv <cycle 2> [35] 0.17 0.55 428920/22863854 analyze_scalar_evolution_1 <cycle 2> [12] 2.19 7.24 5648016/22863854 chrec_fold_plus_1 [24] 2.25 7.43 5793593/22863854 instantiate_parameters_1 <cycle 2> [11] 4.15 13.69 10684608/22863854 chrec_convert [10] [13] 15.6 8.87 29.30 22863854 build3_stat [13] 3.86 25.44 22863854/23551936 make_node_stat [15] -----------------------------------------------
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) I believe seb/zdenek already submitted patches for speeding up scev quite recently, with the goal of alleviating this problem. I'm pretty sure they have not been applied yet.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > I believe seb/zdenek already submitted patches for speeding up scev quite > recently, with the goal of alleviating this problem. > I'm pretty sure they have not been applied yet. the Sebastian's patch is not in yet; it might help a bit in this case, a believe, although I suspect a real source of the problem is elsewhere.
Sebastian's patch does not help significantly :-(
One extra N factor seems to be coming from simple_iv (analyze_scalar_evolution_in_loop). The function was created before loop closed ssa form, under assumptions of lcssa it should be possible to get rid of this. I am working on patch
Also did not help much.
The patch that causes ivopts to reset just the relevant parts of the scev cache (just regtesting) instead of clearing it completely helps a bit -- the compile time for N=100 gets about 2x better and memory consumption about 10x. Still there remain some inefficiences within the scev analysis itself.
On a side note, PRE also seems to have problems with the testcase. With the patch mentioned above, the largest consumers of compile time are ivopts (45%) and pre (20%).
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) On Sun, 24 Jan 2005, rakdver at gcc dot gnu dot org wrote: > > ------- Additional Comments From rakdver at gcc dot gnu dot org 2005-01-24 01:46 ------- > On a side note, PRE also seems to have problems with the testcase. With the > patch mentioned above, the largest consumers of compile time are ivopts (45%) > and pre (20%). Uh, there was a bug filed about this, and i fixed it, last i looked.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) >> On a side note, PRE also seems to have problems with the testcase. With the >> patch mentioned above, the largest consumers of compile time are ivopts (45%) >> and pre (20%). > > Uh, there was a bug filed about this, and i fixed it, last i looked. Here it is. Bug 18673. N=100 takes .25 seconds now. I can't make PRE much faster than that, that's almost all hash function time. --Dan
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > Still there remain some inefficiences within the scev analysis itself. > Zdenek, have you tried to revert the patch that caches the instantiation information? This could speed up things a little. On a side note, I wonder how many times we're using this instantiation cache, in other words whether we pay a high price for the scev analysis for some (probably) rare cases...
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > Still there remain some inefficiences within the scev analysis itself. > > > > Zdenek, have you tried to revert the patch that caches the > instantiation information? This could speed up things a little. > > On a side note, I wonder how many times we're using this instantiation > cache, in other words whether we pay a high price for the scev > analysis for some (probably) rare cases... Adding the instantiation cache was compile time neutral on normal code, so I don't think the effect on scev cost is significant. The problem is that we end up calling the instantiate_parameters_1 function 83214+2453273 (outside + recursive) times (for N=100). We also call analyze_scalar_evolution_1 1146317 times. Many of these calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat). Large part of 5142869 of build_int_cst_wide calls is very likely also due to scev analysis. This is not going to be cheap. Removing the instantiation cache definitely would not decrease the number of instantiate_parameters_1 calls (might increase it, in fact, although I don't know how significantly). One part of the problem is that loop optimizers need to throw away the scev caches after each change to loops, which leads to recounting the information too much. Allowing to invalidate only changed parts helps, but seems to be relatively costly on normal code -- you need to scan the hash table for things that refer to removed or from some other reason invalidated ssa names, which is slow. And to make things worse, this change of course means that most of the information is left in the hash table, i.e. the size of the table grows and these scans get slower and slower. The alternative -- checking the validity of entries only when querying the cache -- is not possible, since we reuse the released ssa names, so we would be unable to recognize the validity of the information. Other part is that scev tries to be too clever. Without need to represent nonaffine induction variables (that we do not use anywhere) we could use more memory efficient representation of ivs. Without need to handle symbolic references (that we also do not use anywhere, we could store evolutions in a fully instantiated form, and we would not need instantiate_parameters/resolve_mixers functions at all.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > Other part is that scev tries to be too clever. Without need to > represent nonaffine induction variables (that we do not use anywhere) > we could use more memory efficient representation of ivs. > Without need to handle symbolic references (that we also do not use > anywhere, we could store evolutions in a fully instantiated form, and > we would not need instantiate_parameters/resolve_mixers functions atall. Uh, symbolic references are or will be used to do data dependence when MEM_REF and ARRAY_REF couldn't be generated from the pointers.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > Other part is that scev tries to be too clever. Without need to > > represent nonaffine induction variables (that we do not use anywhere) > > we could use more memory efficient representation of ivs. > > Without need to handle symbolic references (that we also do not use > > anywhere, we could store evolutions in a fully instantiated form, and > > we would not need instantiate_parameters/resolve_mixers functions atall. > > Uh, symbolic references are or will be used to do data dependence when > MEM_REF and ARRAY_REF couldn't be generated from the pointers. How? If the reference is left in symbolic form, it means that you know nothing about its value, so the only thing you can do with it is to check its equality/inequality, in code like while (...) { i = something_weird (); a[i] = ...; (a) a[i+1] = ...; (b) a[i] = ...; (c) } to find that (b) is independent on (a) and (c) and to find the dependence between (a) and (c), but you do not need scev for this -- value numbering info is enough.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) On Mon, 24 Jan 2005, rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-01-24 23:09 ------- > Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > >>> Other part is that scev tries to be too clever. Without need to >>> represent nonaffine induction variables (that we do not use anywhere) >>> we could use more memory efficient representation of ivs. >>> Without need to handle symbolic references (that we also do not use >>> anywhere, we could store evolutions in a fully instantiated form, and >>> we would not need instantiate_parameters/resolve_mixers functions atall. >> >> Uh, symbolic references are or will be used to do data dependence when >> MEM_REF and ARRAY_REF couldn't be generated from the pointers. > > How? If the reference is left in symbolic form, it means that you know > nothing about its value, Wrong. We could know things about it's value, such as ranges. We just don't know an exact number.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > >>> Other part is that scev tries to be too clever. Without need to > >>> represent nonaffine induction variables (that we do not use anywhere) > >>> we could use more memory efficient representation of ivs. > >>> Without need to handle symbolic references (that we also do not use > >>> anywhere, we could store evolutions in a fully instantiated form, and > >>> we would not need instantiate_parameters/resolve_mixers functions atall. > >> > >> Uh, symbolic references are or will be used to do data dependence when > >> MEM_REF and ARRAY_REF couldn't be generated from the pointers. > > > > How? If the reference is left in symbolic form, it means that you know > > nothing about its value, > > Wrong. > > We could know things about it's value, such as ranges. We just don't know > an exact number. OK. This is work for VRP, not scev.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) >>>> >>>> Uh, symbolic references are or will be used to do data dependence when >>>> MEM_REF and ARRAY_REF couldn't be generated from the pointers. >>> >>> How? If the reference is left in symbolic form, it means that you know >>> nothing about its value, >> >> Wrong. >> >> We could know things about it's value, such as ranges. We just don't know >> an exact number. > > OK. This is work for VRP, not scev. And once we have it, we can talk about removing the symbolic stuff from SCEV :)
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > >>>> Uh, symbolic references are or will be used to do data dependence when > >>>> MEM_REF and ARRAY_REF couldn't be generated from the pointers. > >>> > >>> How? If the reference is left in symbolic form, it means that you know > >>> nothing about its value, > >> > >> Wrong. > >> > >> We could know things about it's value, such as ranges. We just don't know > >> an exact number. > > > > OK. This is work for VRP, not scev. > > And once we have it, we can talk about removing the symbolic stuff from > SCEV :) Once we use any value range info from SCEV, we can speak about leaving symbolic references in SCEV until we have a proper VRP :-)
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) On Mon, 25 Jan 2005, rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-01-25 00:49 ------- > Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > >>>>>> Uh, symbolic references are or will be used to do data dependence when >>>>>> MEM_REF and ARRAY_REF couldn't be generated from the pointers. >>>>> >>>>> How? If the reference is left in symbolic form, it means that you know >>>>> nothing about its value, >>>> >>>> Wrong. >>>> >>>> We could know things about it's value, such as ranges. We just don't know >>>> an exact number. >>> >>> OK. This is work for VRP, not scev. >> >> And once we have it, we can talk about removing the symbolic stuff from >> SCEV :) > > Once we use any value range info from SCEV, we can speak about leaving > symbolic references in SCEV until we have a proper VRP :-) See autovec branch.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > See autovec branch. You could also look at recent patches posted by sebastian and i for the autovect branch that have been adding this support.
Which one? I cannot find anything relevant in changelog.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > Which one? I cannot find anything relevant in changelog. > * tree-data-ref.c (analyze_subscript_affine_affine): Implement a solution for the FIXME concerning the symbolic affine dependence testing; remove the FIXME. (can_use_analyze_subscript_affine_affine): New function. (analyze_siv_subscript): Use it. and 2004-12-15 Daniel Berlin <dberlin@dberlin.org> * Makefile.in (tree-chrec.o): Add cfgloop.h * tree-chrec.c: Add cfgloop.h, tree-flow.h. (evolution_function_is_invariant_p): New function. (evolution_function_is_affine_multivariate_p): Use evolution_function_is_invariant_p instead of evolution_function_is_constant_p. * tree-chrec.h: Add prototype for evolution_function_is_invariant_p. (evolution_function_is_affine_p): Use evolution_function_is_invariant_p. * tree-data-ref.c (analyze_overlapping_iterations): chrecs that are equal overlap on every iteration. This stuff is just simple symbolic differencing, and checking of invariantness of symbols, But it is indeed starting to se the symbolic scev info
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > Which one? I cannot find anything relevant in changelog. > > > > > * tree-data-ref.c (analyze_subscript_affine_affine): Implement a > solution for the FIXME concerning the symbolic affine > dependence testing; remove the FIXME. > (can_use_analyze_subscript_affine_affine): New function. > (analyze_siv_subscript): Use it. > > and > > > 2004-12-15 Daniel Berlin <dberlin@dberlin.org> > > * Makefile.in (tree-chrec.o): Add cfgloop.h > > * tree-chrec.c: Add cfgloop.h, tree-flow.h. > (evolution_function_is_invariant_p): New function. > (evolution_function_is_affine_multivariate_p): Use > evolution_function_is_invariant_p instead of > evolution_function_is_constant_p. > > * tree-chrec.h: Add prototype for > evolution_function_is_invariant_p. > (evolution_function_is_affine_p): Use > evolution_function_is_invariant_p. > > * tree-data-ref.c (analyze_overlapping_iterations): chrecs that > are equal overlap on every iteration. > > This stuff is just simple symbolic differencing, and checking of > invariantness of symbols, > But it is indeed starting to se the symbolic scev info Ugh... OK, if you think it is a right way... Anyway, I am seriously considering resurrecting the simple iv analysis for purposes of passes that are not interested in this fancy stuff. Overhead of scev is probably acceptable if it is only used for this type of analysis, but for other purposes it is clearly overkill.
Do remember that this bug is about bad behavior with deeply nested loops and we already have other algorithms that are quadratic in the loop nest depth. So I really wouldn't consider this to be a very serious problem, but rather something that could be improved. It is a shame that Seb has so far not commented on the behavior of his scev algorithm with respect to the loop nest depth. Is it expected to be cubic in the loop nest depth? Perhaps he can improve the algorithm?
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) >> >> * tree-data-ref.c (analyze_overlapping_iterations): chrecs that >> are equal overlap on every iteration. >> >> This stuff is just simple symbolic differencing, and checking of >> invariantness of symbols, >> But it is indeed starting to se the symbolic scev info > > Ugh... OK, if you think it is a right way... Anyway, I am seriously > considering resurrecting the simple iv analysis for purposes of passes > that are not interested in this fancy stuff. Overhead of scev is > probably acceptable if it is only used for this type of analysis, > but for other purposes it is clearly overkill. Uh, almost all high level loop optimizations are going to do very badly in the depth of the loop nest. The fact that scev doesn't do so well on a 100 deep loop nest doesn't really concern me at all.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > Do remember that this bug is about bad behavior with deeply nested loops > and we already have other algorithms that are quadratic in the loop nest > depth. So I really wouldn't consider this to be a very serious problem, > but rather something that could be improved. > > It is a shame that Seb has so far not commented on the behavior of his > scev algorithm with respect to the loop nest depth. Is it expected to > be cubic in the loop nest depth? Perhaps he can improve the algorithm? the algorithm itself is not cubic with loop depth, but worst case linear (per query) (*). This time complexity is acceptable, IMHO. (*) I hope; scev is a mess of mutualy recursive functions -- analyze_scalar_evolution calling number_of_iterations calling instantiate_parameters calling analyze_scalar_evolution again, with instantiate_parameters hacked so that the infinite cycle cannot occur is my favourite one. Nobody can say anything sure about behavior of scev -- it is not even well defined what analyze_scalar_evolutions will return to you, unless you call instantiate_parameters or resolve_mixers on the result.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > >> * tree-data-ref.c (analyze_overlapping_iterations): chrecs that > >> are equal overlap on every iteration. > >> > >> This stuff is just simple symbolic differencing, and checking of > >> invariantness of symbols, > >> But it is indeed starting to se the symbolic scev info > > > > Ugh... OK, if you think it is a right way... Anyway, I am seriously > > considering resurrecting the simple iv analysis for purposes of passes > > that are not interested in this fancy stuff. Overhead of scev is > > probably acceptable if it is only used for this type of analysis, > > but for other purposes it is clearly overkill. > > Uh, almost all high level loop optimizations are going to do very badly in > the depth of the loop nest. Remove "almost" and "high level" from this sentence and I will agree with you. This really does not worry me that much. > The fact that scev doesn't do so well on a 100 deep loop nest doesn't > really concern me at all. Me neither. Its time and memory consumption on normal testcases however is not entirely satisfactory (not critical, but there definitely is a place for improvement), and implementation could (should?) also be significantly cleaned up.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > Adding the instantiation cache was compile time neutral on normal code, > so I don't think the effect on scev cost is significant. > How that? You end up querying and caching an evolution for every instantiation of an SSA_NAME! I will quantify the number of queries wrt the number of times this information is useful. I think that with my patch, the instantiation cache information is used zero times on a bootstrap. > The problem is that we end up calling the instantiate_parameters_1 > function 83214+2453273 (outside + recursive) times (for N=100). > We also call analyze_scalar_evolution_1 1146317 times. Many of these > calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat). > Large part of 5142869 of build_int_cst_wide calls is very likely also > due to scev analysis. This is not going to be cheap. Removing the > instantiation cache definitely would not decrease the number of > instantiate_parameters_1 calls (might increase it, in fact, although > I don't know how significantly). > This also could be a bad use of the scev analysis. For Steven: you can have a O(N**3) algorithm very simply: loop O(N) times loop O(N) times algorithm in O(N) > One part of the problem is that loop optimizers need to throw away the > scev caches after each change to loops, which leads to recounting the > information too much. Allowing to invalidate only changed parts helps, > but seems to be relatively costly on normal code -- you need to scan the > hash table for things that refer to removed or from some other reason > invalidated ssa names, which is slow. Just set the SSA_NAME to not_analyzed_yet or NULL_TREE, and the next time you'll ask for the evolution of this SSA_NAME you'll analyze the evolution from scratch.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > How? If the reference is left in symbolic form, it means that you know > nothing about its value, so the only thing you can do with it is to > check its equality/inequality, in code like > > while (...) > { > i = something_weird (); > > a[i] = ...; (a) > a[i+1] = ...; (b) > a[i] = ...; (c) > } > The following is probably a more frequent case: i = 0 x = something_weird () or a function parameter loop i = i + 1 a[i] = ... ... = a[i + x] endloop where you end with a symbolic distance vector.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > (*) I hope; scev is a mess of mutualy recursive functions -- > analyze_scalar_evolution calling number_of_iterations calling > instantiate_parameters calling analyze_scalar_evolution again, with > instantiate_parameters hacked so that the infinite cycle cannot occur is > my favourite one. Nobody can say anything sure about behavior of scev > -- it is not even well defined what analyze_scalar_evolutions will > return to you, It returns to you an evolution that might contain SSA_NAMEs. > unless you call instantiate_parameters or resolve_mixers > on the result. > And once you call instantiate_parameters on the result you're guaranteed that what you get does contain only determined constants, or otherwise the result is chrec_dont_know.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > Adding the instantiation cache was compile time neutral on normal code, > > so I don't think the effect on scev cost is significant. > > > > How that? You end up querying and caching an evolution for every > instantiation of an SSA_NAME! which is pretty cheap (constant time operation); note that we do an equivalent lookup in the analyze_scalar_evolution call in any case, also without causing any compile time problems. I haven't measured any slowdown on normal testcases. > I will quantify the number of queries wrt the number of times this > information is useful. I think that with my patch, the instantiation > cache information is used zero times on a bootstrap. I don't think so. Please come up with some numbers, otherwise this discussion is pointless. > > The problem is that we end up calling the instantiate_parameters_1 > > function 83214+2453273 (outside + recursive) times (for N=100). > > We also call analyze_scalar_evolution_1 1146317 times. Many of these > > calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat). > > Large part of 5142869 of build_int_cst_wide calls is very likely also > > due to scev analysis. This is not going to be cheap. Removing the > > instantiation cache definitely would not decrease the number of > > instantiate_parameters_1 calls (might increase it, in fact, although > > I don't know how significantly). > > > > This also could be a bad use of the scev analysis. partly -- see the analysis at the PR. However one O(n) per query comes from scev itself. > For Steven: you can have a O(N**3) algorithm very simply: > > loop O(N) times > loop O(N) times > algorithm in O(N) > > > One part of the problem is that loop optimizers need to throw away the > > scev caches after each change to loops, which leads to recounting the > > information too much. Allowing to invalidate only changed parts helps, > > but seems to be relatively costly on normal code -- you need to scan the > > hash table for things that refer to removed or from some other reason > > invalidated ssa names, which is slow. > > Just set the SSA_NAME to not_analyzed_yet or NULL_TREE, and the next time > you'll ask for the evolution of this SSA_NAME you'll analyze the > evolution from scratch. This does not work, of course. When the ssa name is removed, you must remove all references to it from the cache. Otherwise you will ICE when you try to process the relevant entry in (say) instantiate_parameters. What's worse, the ssa name may get reused by ssa name management, thus causing you not even be able to note the change, and thus possibly to misscompilations.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > How? If the reference is left in symbolic form, it means that you know > > nothing about its value, so the only thing you can do with it is to > > check its equality/inequality, in code like > > > > while (...) > > { > > i = something_weird (); > > > > a[i] = ...; (a) > > a[i+1] = ...; (b) > > a[i] = ...; (c) > > } > > > > The following is probably a more frequent case: > > i = 0 > x = something_weird () or a function parameter > loop > i = i + 1 > a[i] = ... > ... = a[i + x] > endloop > > where you end with a symbolic distance vector. But here x is a loop invariant. Nothing would change here if we were keeping the evolutions fully instantiated.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > (*) I hope; scev is a mess of mutualy recursive functions -- > > analyze_scalar_evolution calling number_of_iterations calling > > instantiate_parameters calling analyze_scalar_evolution again, with > > instantiate_parameters hacked so that the infinite cycle cannot occur is > > my favourite one. Nobody can say anything sure about behavior of scev > > -- it is not even well defined what analyze_scalar_evolutions will > > return to you, > > It returns to you an evolution that might contain SSA_NAMEs. Hmm... if this is all you can tell, I start fear even more; this does not define any semantics at all :-) More seriously -- which of the possibilities? If I have loops like while (...) { while (...) { x_1 = something (); } x_2 = phi (x_1); x_3 = x_2 + 1; } What will analyze_scalar_evolutions return for x_3? There are (at least) three possible valid values: x_3 x_2 + 1 x_1 + 1 In this example probably the last one will be chosen, but in more complicated cases you cannot tell (without simulating what scev analysis does). It might be better to not export analyze_scalar_evolution at all and to force it to be used through instantiate_parameters/resolve_mixers, that have at least a well defined semantics of return value -- I hope. Till recently I believed that I made the functions to behave reasonably in cases when values defined inside loop are used outside of it (through chain of lcssa phi nodes), but my attempt to remove the hacks that ensure sane behavior of the results for ivopts causes misscompilations; I am investigating the cause now.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-01-25 11:14 ------- > Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > > rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > > Adding the instantiation cache was compile time neutral on normal code, > > > so I don't think the effect on scev cost is significant. > > > > > > > How that? You end up querying and caching an evolution for every > > instantiation of an SSA_NAME! > > which is pretty cheap (constant time operation); note that we do an > equivalent lookup in the analyze_scalar_evolution call in any case, also > without causing any compile time problems. I haven't measured any slowdown on > normal testcases. > > > I will quantify the number of queries wrt the number of times this > > information is useful. I think that with my patch, the instantiation > > cache information is used zero times on a bootstrap. > > I don't think so. Please come up with some numbers, otherwise this > discussion is pointless. > during a bootstrap: instantiation cache queries: 1146452 instantiation cache uses: 247452 of which 143977 were scev_not_known, the other were SSA_NAMEs. this was counted with a grep|wc with the following patch: Index: tree-scalar-evolution.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v retrieving revision 2.16 diff -d -u -p -r2.16 tree-scalar-evolution.c --- tree-scalar-evolution.c 18 Jan 2005 11:36:26 -0000 2.16 +++ tree-scalar-evolution.c 25 Jan 2005 12:00:57 -0000 @@ -1898,8 +1898,15 @@ get_instantiated_value (htab_t cache, tr pattern.var = version; info = htab_find (cache, &pattern); + fprintf (stdout, "IC_query \n"); + if (info) - return info->chrec; + { + fprintf (stdout, "IC_used_once \n"); + print_generic_expr (stdout, info->chrec, 0); + fprintf (stdout, "\n"); + return info->chrec; + } else return NULL_TREE; } @@ -1915,6 +1922,8 @@ set_instantiated_value (htab_t cache, tr pattern.var = version; slot = htab_find_slot (cache, &pattern, INSERT); + fprintf (stdout, "IC_query \n"); + if (*slot) info = *slot; else @@ -1990,7 +1999,8 @@ instantiate_parameters_1 (struct loop *l result again. */ bitmap_set_bit (already_instantiated, SSA_NAME_VERSION (chrec)); res = analyze_scalar_evolution (def_loop, chrec); - res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs, cache); + if (res != chrec_dont_know) + res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs, cache); bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec)); /* Store the correct value to the cache. */ @@ -2000,8 +2010,14 @@ instantiate_parameters_1 (struct loop *l case POLYNOMIAL_CHREC: op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec), allow_superloop_chrecs, cache); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (CHREC_LEFT (chrec) != op0 || CHREC_RIGHT (chrec) != op1) chrec = build_polynomial_chrec (CHREC_VARIABLE (chrec), op0, op1); @@ -2010,8 +2026,14 @@ instantiate_parameters_1 (struct loop *l case PLUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs, cache); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_plus (TREE_TYPE (chrec), op0, op1); @@ -2020,8 +2042,14 @@ instantiate_parameters_1 (struct loop *l case MINUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs, cache); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_minus (TREE_TYPE (chrec), op0, op1); @@ -2030,8 +2058,14 @@ instantiate_parameters_1 (struct loop *l case MULT_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs, cache); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_multiply (TREE_TYPE (chrec), op0, op1); @@ -2065,13 +2099,17 @@ instantiate_parameters_1 (struct loop *l case 3: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs, cache); + if (op1 == chrec_dont_know) + return chrec_dont_know; + op2 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 2), allow_superloop_chrecs, cache); - if (op0 == chrec_dont_know - || op1 == chrec_dont_know - || op2 == chrec_dont_know) + if (op2 == chrec_dont_know) return chrec_dont_know; if (op0 == TREE_OPERAND (chrec, 0) @@ -2085,10 +2123,12 @@ instantiate_parameters_1 (struct loop *l case 2: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs, cache); - if (op0 == chrec_dont_know - || op1 == chrec_dont_know) + if (op1 == chrec_dont_know) return chrec_dont_know; if (op0 == TREE_OPERAND (chrec, 0)
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > More seriously -- which of the possibilities? If I have loops like > > while (...) > { > while (...) > { > x_1 = something (); > } > x_2 = phi (x_1); > x_3 = x_2 + 1; > } > > What will analyze_scalar_evolutions return for x_3? There are (at least) > three possible valid values: > > x_3 This would be the answer if analyze_scalar_evolutions would be the identity function. If you want, you could change analyze_scalar_evolutions such that it behaves like that, and decide that the instantiation do the rest of the work (I mean moving the code that is currently in analyze_scalar_evolutions to the instantiation phase). > x_2 + 1 If you decide to reconstruct the tree expression, there is no reason to stop on a phi node that has a single argument. Why would you like to get this answer as the reconstructed tree? > x_1 + 1 > IMO this would be the answer, although I didn't checked.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > More seriously -- which of the possibilities? If I have loops like > > > > while (...) > > { > > while (...) > > { > > x_1 = something (); > > } > > x_2 = phi (x_1); > > x_3 = x_2 + 1; > > } > > > > What will analyze_scalar_evolutions return for x_3? There are (at least) > > three possible valid values: > > > > x_3 > > This would be the answer if analyze_scalar_evolutions would be the > identity function. If you want, you could change analyze_scalar_evolutions > such that it behaves like that, and decide that the instantiation do > the rest of the work (I mean moving the code that is currently in > analyze_scalar_evolutions to the instantiation phase). > > > x_2 + 1 > > If you decide to reconstruct the tree expression, there is no reason > to stop on a phi node that has a single argument. Why would you like > to get this answer as the reconstructed tree? because this answer preserves loop closed ssa form -- the answer x_1 + 1 copy propagates the value outsied of the loop. In some applications this choice could make sense.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > If you decide to reconstruct the tree expression, there is no reason > > to stop on a phi node that has a single argument. Why would you like > > to get this answer as the reconstructed tree? > > because this answer preserves loop closed ssa form -- the answer x_1 + 1 > copy propagates the value outsied of the loop. In some applications > this choice could make sense. > Okay, if it makes sense, you could modify the analyzer such that it stops reconstructing the tree once it is on the phi following a loop. This won't change the information provided after the instantiation.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) With the following patch I got some speedup for depth 100. from: tree iv optimization : 2.62 (62%) usr 0.27 (82%) sys 2.92 (62%) wall tree iv optimization : 2.33 (59%) usr 0.25 (83%) sys 2.63 (54%) wall to: tree iv optimization : 1.19 (46%) usr 0.04 (40%) sys 1.26 (45%) wall tree iv optimization : 1.21 (47%) usr 0.05 (45%) sys 1.30 (46%) wall Basically we're reseting too much information each time scev_reset is called. It would even be possible to reset only the part of the scev database that contains symbols instead of erasing everything. Do somebody know how to iterate over all the elements of the hash setting to NULL_TREE the elements that answer true to chrec_contains_symbols? Index: tree-scalar-evolution.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v retrieving revision 2.16 diff -d -u -p -r2.16 tree-scalar-evolution.c --- tree-scalar-evolution.c 18 Jan 2005 11:36:26 -0000 2.16 +++ tree-scalar-evolution.c 27 Jan 2005 13:12:36 -0000 @@ -2490,7 +2490,7 @@ scev_reset (void) for (i = 1; i < current_loops->num; i++) { loop = current_loops->parray[i]; - if (loop) + if (loop && chrec_contains_symbols (loop->nb_iterations)) loop->nb_iterations = NULL_TREE; } }
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > ------- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-27 13:18 ------- > Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > > With the following patch I got some speedup for depth 100. > > from: > tree iv optimization : 2.62 (62%) usr 0.27 (82%) sys 2.92 (62%) wall > tree iv optimization : 2.33 (59%) usr 0.25 (83%) sys 2.63 (54%) wall > > to: > tree iv optimization : 1.19 (46%) usr 0.04 (40%) sys 1.26 (45%) wall > tree iv optimization : 1.21 (47%) usr 0.05 (45%) sys 1.30 (46%) wall > > Basically we're reseting too much information each time scev_reset is > called. It would even be possible to reset only the part of the scev > database that contains symbols instead of erasing everything. > > Do somebody know how to iterate over all the elements of the hash > setting to NULL_TREE the elements that answer true to > chrec_contains_symbols? the patch is below (in stronger form -- only removing entries that contain invalidated symbols), and indeed helps with this testcase. It however causes measurable slow down on normal code (see analysis in one of the previous mails). Note that your patch is not entirely correct -- in case loop is unrolled, its number of iterations is changed, so in this case scev_reset should clear its number of iterations unconditionally (I think something similar occurs with vectorizer, so we need to be a bit careful). Index: tree-loop-linear.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-loop-linear.c,v retrieving revision 2.6 diff -c -3 -p -r2.6 tree-loop-linear.c *** tree-loop-linear.c 18 Jan 2005 11:36:24 -0000 2.6 --- tree-loop-linear.c 24 Jan 2005 01:34:04 -0000 *************** linear_transform_loops (struct loops *lo *** 373,379 **** free_data_refs (datarefs); } free_df (); ! scev_reset (); rewrite_into_loop_closed_ssa (); #ifdef ENABLE_CHECKING verify_loop_closed_ssa (); --- 373,379 ---- free_data_refs (datarefs); } free_df (); ! scev_reset (NULL); rewrite_into_loop_closed_ssa (); #ifdef ENABLE_CHECKING verify_loop_closed_ssa (); Index: tree-scalar-evolution.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v retrieving revision 2.16 diff -c -3 -p -r2.16 tree-scalar-evolution.c *** tree-scalar-evolution.c 18 Jan 2005 11:36:26 -0000 2.16 --- tree-scalar-evolution.c 24 Jan 2005 01:34:05 -0000 *************** scev_initialize (struct loops *loops) *** 2475,2484 **** loops->parray[i]->nb_iterations = NULL_TREE; } ! /* Cleans up the information cached by the scalar evolutions analysis. */ void ! scev_reset (void) { unsigned i; struct loop *loop; --- 2475,2539 ---- loops->parray[i]->nb_iterations = NULL_TREE; } ! /* Returns true if EXPR references one of the ssa names in set ! INVALIDATED_NAMES. */ ! ! static bool ! tree_references_names (tree expr, bitmap invalidated_names) ! { ! if (!expr) ! return false; ! ! if (TREE_CODE (expr) == SSA_NAME) ! return bitmap_bit_p (invalidated_names, SSA_NAME_VERSION (expr)); ! ! switch (TREE_CODE_LENGTH (TREE_CODE (expr))) ! { ! case 4: ! if (tree_references_names (TREE_OPERAND (expr, 3), invalidated_names)) ! return true; ! case 3: ! if (tree_references_names (TREE_OPERAND (expr, 2), invalidated_names)) ! return true; ! case 2: ! if (tree_references_names (TREE_OPERAND (expr, 1), invalidated_names)) ! return true; ! case 1: ! if (tree_references_names (TREE_OPERAND (expr, 0), invalidated_names)) ! return true; ! case 0: ! return false; ! ! default: ! /* Don't worry about handling strange cases. This function is only used ! in a way in that it does not matter if we return true here. */ ! return true; ! } ! } ! ! /* If the SLOT in the scev cache contains ssa name belonging to the set ! passed in DATA, the function removes it from the cache. Callback ! for htab_traverse. */ ! ! static int ! invalidate_names_reference (void **slot, void *data) ! { ! bitmap invalidated_names = data; ! struct scev_info_str *elt = *slot; ! ! if (tree_references_names (elt->var, invalidated_names) ! || tree_references_names (elt->chrec, invalidated_names)) ! htab_clear_slot (scalar_evolution_info, slot); ! ! return 1; ! } ! ! /* Cleans up the information cached by the scalar evolutions analysis. ! If INVALIDATED_NAMES is provided, only the references to invalidated ! ssa names are removed. */ void ! scev_reset (bitmap invalidated_names) { unsigned i; struct loop *loop; *************** scev_reset (void) *** 2486,2497 **** if (!scalar_evolution_info || !current_loops) return; ! htab_empty (scalar_evolution_info); for (i = 1; i < current_loops->num; i++) { loop = current_loops->parray[i]; ! if (loop) ! loop->nb_iterations = NULL_TREE; } } --- 2541,2564 ---- if (!scalar_evolution_info || !current_loops) return; ! if (invalidated_names) ! htab_traverse (scalar_evolution_info, invalidate_names_reference, ! invalidated_names); ! else ! htab_empty (scalar_evolution_info); ! for (i = 1; i < current_loops->num; i++) { loop = current_loops->parray[i]; ! if (!loop ! || !loop->nb_iterations) ! continue; ! ! if (invalidated_names ! && !tree_references_names (loop->nb_iterations, invalidated_names)) ! continue; ! ! loop->nb_iterations = NULL_TREE; } } Index: tree-scalar-evolution.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.h,v retrieving revision 2.3 diff -c -3 -p -r2.3 tree-scalar-evolution.h *** tree-scalar-evolution.h 17 Nov 2004 22:06:00 -0000 2.3 --- tree-scalar-evolution.h 24 Jan 2005 01:34:05 -0000 *************** extern tree number_of_iterations_in_loop *** 26,32 **** extern tree get_loop_exit_condition (struct loop *); extern void scev_initialize (struct loops *loops); ! extern void scev_reset (void); extern void scev_finalize (void); extern tree analyze_scalar_evolution (struct loop *, tree); extern tree instantiate_parameters (struct loop *, tree); --- 26,32 ---- extern tree get_loop_exit_condition (struct loop *); extern void scev_initialize (struct loops *loops); ! extern void scev_reset (bitmap); extern void scev_finalize (void); extern tree analyze_scalar_evolution (struct loop *, tree); extern tree instantiate_parameters (struct loop *, tree); Index: tree-ssa-loop-ivcanon.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivcanon.c,v retrieving revision 2.5 diff -c -3 -p -r2.5 tree-ssa-loop-ivcanon.c *** tree-ssa-loop-ivcanon.c 1 Oct 2004 18:26:31 -0000 2.5 --- tree-ssa-loop-ivcanon.c 24 Jan 2005 01:34:05 -0000 *************** canonicalize_induction_variables (struct *** 262,268 **** /* Clean up the information about numbers of iterations, since brute force evaluation could reveal new information. */ ! scev_reset (); #if 0 /* The necessary infrastructure is not in yet. */ --- 262,268 ---- /* Clean up the information about numbers of iterations, since brute force evaluation could reveal new information. */ ! scev_reset (NULL); #if 0 /* The necessary infrastructure is not in yet. */ *************** tree_unroll_loops_completely (struct loo *** 294,300 **** /* Clean up the information about numbers of iterations, since complete unrolling might have invalidated it. */ ! scev_reset (); #if 0 /* The necessary infrastructure is not in yet. */ --- 294,300 ---- /* Clean up the information about numbers of iterations, since complete unrolling might have invalidated it. */ ! scev_reset (NULL); #if 0 /* The necessary infrastructure is not in yet. */ Index: tree-ssa-loop-ivopts.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v retrieving revision 2.42 diff -c -3 -p -r2.42 tree-ssa-loop-ivopts.c *** tree-ssa-loop-ivopts.c 19 Jan 2005 22:50:04 -0000 2.42 --- tree-ssa-loop-ivopts.c 24 Jan 2005 01:34:05 -0000 *************** struct ivopts_data *** 208,213 **** --- 208,216 ---- /* The bitmap of indices in version_info whose value was changed. */ bitmap relevant; + /* The set of ssa names whose scev info might get invalidated. */ + bitmap invalidated_names; + /* The maximum invariant id. */ unsigned max_inv_id; *************** tree_ssa_iv_optimize_init (struct loops *** 651,656 **** --- 654,660 ---- data->version_info = xcalloc (data->version_info_size, sizeof (struct version_info)); data->relevant = BITMAP_XMALLOC (); + data->invalidated_names = BITMAP_XMALLOC (); data->important_candidates = BITMAP_XMALLOC (); data->max_inv_id = 0; *************** remove_unused_ivs (struct ivopts_data *d *** 5016,5022 **** && !info->inv_id && !info->iv->have_use_for && !info->preserve_biv) ! remove_statement (SSA_NAME_DEF_STMT (info->iv->ssa_name), true); } } --- 5020,5029 ---- && !info->inv_id && !info->iv->have_use_for && !info->preserve_biv) ! { ! bitmap_set_bit (data->invalidated_names, j); ! remove_statement (SSA_NAME_DEF_STMT (info->iv->ssa_name), true); ! } } } *************** free_loop_data (struct ivopts_data *data *** 5041,5046 **** --- 5048,5054 ---- info->inv_id = 0; } bitmap_clear (data->relevant); + bitmap_clear (data->invalidated_names); bitmap_clear (data->important_candidates); for (i = 0; i < n_iv_uses (data); i++) *************** tree_ssa_iv_optimize_finalize (struct lo *** 5104,5109 **** --- 5112,5118 ---- free_loop_data (data); free (data->version_info); BITMAP_XFREE (data->relevant); + BITMAP_XFREE (data->invalidated_names); BITMAP_XFREE (data->important_candidates); VARRAY_FREE (decl_rtl_to_reset); *************** tree_ssa_iv_optimize_loop (struct ivopts *** 5177,5183 **** /* We have changed the structure of induction variables; it might happen that definitions in the scev database refer to some of them that were eliminated. */ ! scev_reset (); finish: free_loop_data (data); --- 5186,5192 ---- /* We have changed the structure of induction variables; it might happen that definitions in the scev database refer to some of them that were eliminated. */ ! scev_reset (data->invalidated_names); finish: free_loop_data (data); Index: tree-ssa-loop-manip.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-manip.c,v retrieving revision 2.21 diff -c -3 -p -r2.21 tree-ssa-loop-manip.c *** tree-ssa-loop-manip.c 29 Nov 2004 17:53:47 -0000 2.21 --- tree-ssa-loop-manip.c 24 Jan 2005 01:34:05 -0000 *************** tree_duplicate_loop_to_header_edge (stru *** 633,639 **** set_phi_def_stmts (bb->rbi->original); } ! scev_reset (); #ifdef ENABLE_CHECKING verify_loop_closed_ssa (); #endif --- 633,639 ---- set_phi_def_stmts (bb->rbi->original); } ! scev_reset (NULL); #ifdef ENABLE_CHECKING verify_loop_closed_ssa (); #endif Index: tree-ssa-loop.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop.c,v retrieving revision 2.23 diff -c -3 -p -r2.23 tree-ssa-loop.c *** tree-ssa-loop.c 26 Nov 2004 06:42:25 -0000 2.23 --- tree-ssa-loop.c 24 Jan 2005 01:34:05 -0000 *************** tree_ssa_loop_bounds (void) *** 306,312 **** return; estimate_numbers_of_iterations (current_loops); ! scev_reset (); } struct tree_opt_pass pass_record_bounds = --- 306,312 ---- return; estimate_numbers_of_iterations (current_loops); ! scev_reset (NULL); } struct tree_opt_pass pass_record_bounds = Index: tree-vectorizer.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-vectorizer.c,v retrieving revision 2.62 diff -c -3 -p -r2.62 tree-vectorizer.c *** tree-vectorizer.c 18 Jan 2005 11:36:29 -0000 2.62 --- tree-vectorizer.c 24 Jan 2005 01:34:05 -0000 *************** vect_do_peeling_for_loop_bound (loop_vec *** 3258,3264 **** vect_update_ivs_after_vectorizer (loop, ratio_mult_vf_name, update_e); /* After peeling we have to reset scalar evolution analyzer. */ ! scev_reset (); return; } --- 3258,3264 ---- vect_update_ivs_after_vectorizer (loop, ratio_mult_vf_name, update_e); /* After peeling we have to reset scalar evolution analyzer. */ ! scev_reset (NULL); return; } *************** vect_do_peeling_for_alignment (loop_vec_ *** 3442,3448 **** vect_update_inits_of_drs (loop_vinfo, niters_of_prolog_loop); /* After peeling we have to reset scalar evolution analyzer. */ ! scev_reset (); return; } --- 3442,3448 ---- vect_update_inits_of_drs (loop_vinfo, niters_of_prolog_loop); /* After peeling we have to reset scalar evolution analyzer. */ ! scev_reset (NULL); return; }
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > the patch is below (in stronger form -- only removing entries that > contain invalidated symbols), and indeed helps with this testcase. Many thanks. > It however causes measurable slow down on normal code (see analysis > in one of the previous mails). > I see. Another idea: would it be possible to insert the invalidated names during the optimization pass instead of invalidating all the symbols? This would avoid the first pass over the scev database that search for symbols. > Note that your patch is not entirely correct -- in case loop is > unrolled, its number of iterations is changed, so in this case > scev_reset should clear its number of iterations unconditionally or the unroller should do that work on the unrolled loops?
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) > Another idea: would it be possible to insert the invalidated names > during the optimization pass instead of invalidating all the symbols? > This would avoid the first pass over the scev database that search for > symbols. I don't understand this proposal. > > Note that your patch is not entirely correct -- in case loop is > > unrolled, its number of iterations is changed, so in this case > > scev_reset should clear its number of iterations unconditionally > > or the unroller should do that work on the unrolled loops? This seems to be the right approach.
Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: > > Another idea: would it be possible to insert the invalidated names > > during the optimization pass instead of invalidating all the symbols? > > This would avoid the first pass over the scev database that search for > > symbols. > > I don't understand this proposal. > Anyway, I see that I was wrong: even if you mark the SSA_NAMEs that are removed in an optimization, and that you invalidate only those symbols, you still have to walk the scev database to ensure that there is no other evolution that references the removed SSA_NAMEs.
Moving to 4.1 as I don't care much if this gets fixed.
I wouldn't worry too much about exceedingly deep loop nests. Even the scientific code I've looked at rarely has loop nests deeper than ten -- and if it does, optimizing the inner ten loops is probably good enough. So, I'd suggest that we add a --param here for max-loop-nest-depth, and then just not do this stuff on deeper nests, or ignore all the outer loops in that case. Is that feasible? Marked as P4; I think we should fix this, but I don't think it needs to happen for 4.1. I'll revise for 4.2.
Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3) > > So, I'd suggest that we add a --param here for max-loop-nest-depth, and then > just not do this stuff on deeper nests, or ignore all the outer loops in that > case. > > Is that feasible? > Mark, I think that this has already been fixed by the patch that added PARAM_SCEV_MAX_EXPR_SIZE: we give up when the expressions that we handle are longer than this size (if I remember correctly this defaults to 20). For the example in this bug, { int j0, j1, j2, j3, j4, j5, j6, j7, j8, j9, a; a = 0; for (j0 = 0; j0 < 2; j0 ++) for (j1 = j0; j1 < 2; j1 ++) for (j2 = j1; j2 < 2; j2 ++) for (j3 = j2; j3 < 2; j3 ++) for (j4 = j3; j4 < 2; j4 ++) for (j5 = j4; j5 < 2; j5 ++) for (j6 = j5; j6 < 2; j6 ++) for (j7 = j6; j7 < 2; j7 ++) for (j8 = j7; j8 < 2; j8 ++) for (j9 = j8; j9 < 2; j9 ++) a += j0 + j1 + j2 + j3 + j4 + j5 + j6 + j7 + j8 + j9; return a; } we would have a scev with 10 dimensions, i.e. an evolution in each dimension, thus an expression with about 10 components. Some numbers for mainline with -O2 (on amd64 with cpufreq): time ./gcc/cc1 -O2 ~/ex/pr18595_10.c real 0m0.345s user 0m0.089s sys 0m0.017s time ./gcc/cc1 -O2 ~/ex/pr18595_100.c tree PRE : 0.28 (21%) usr 0.02 (50%) sys 0.29 (20%) wall 7742 kB (59%) ggc tree loop bounds : 0.14 (10%) usr 0.00 ( 0%) sys 0.15 (10%) wall 0 kB ( 0%) ggc real 0m1.776s user 0m1.348s sys 0m0.050s time ./gcc/cc1 -O2 ~/ex/pr18595_200.c tree PRE : 1.53 (24%) usr 0.09 (60%) sys 1.63 (25%) wall 31149 kB (74%) ggc tree loop bounds : 1.21 (19%) usr 0.00 ( 0%) sys 1.21 (18%) wall 0 kB ( 0%) ggc real 0m7.077s user 0m6.363s sys 0m0.167s time ./gcc/cc1 -O2 ~/ex/pr18595_300.c tree PRE : 4.69 (28%) usr 0.21 (68%) sys 5.03 (29%) wall 69961 kB (80%) ggc tree loop bounds : 4.03 (24%) usr 0.00 ( 0%) sys 4.06 (23%) wall 0 kB ( 0%) ggc real 0m18.126s user 0m17.022s sys 0m0.333s Now, if we remove PRE, we can even compile the case with 1000 nested loops in a reasonable time: time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_300.c tree loop bounds : 0.09 ( 4%) usr 0.00 ( 0%) sys 0.24 ( 9%) wall 0 kB ( 0%) ggc real 0m3.184s user 0m2.147s sys 0m0.054s time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_1000.c tree loop bounds : 2.00 ( 8%) usr 0.00 ( 0%) sys 2.00 ( 8%) wall 0 kB ( 0%) ggc real 0m27.171s user 0m26.298s sys 0m0.169s So, I think that we can safely close this PR. Seb
Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3) sebastian dot pop at cri dot ensmp dot fr wrote: > > So, I think that we can safely close this PR. > The previous numbers were with mainline + patch, otherwise mainline gets an ice as described in PR24483. I only now realized this, sorry. Seb
Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3) Here are again the numbers for mainline with no other patch: time ./gcc/cc1 -O2 ~/ex/pr18595_10.c real 0m0.164s user 0m0.116s sys 0m0.018s time ./gcc/cc1 -O2 ~/ex/pr18595_100.c tree PRE : 0.33 ( 3%) usr 0.01 ( 3%) sys 0.34 ( 3%) wall 7742 kB ( 2%) ggc tree loop bounds : 0.77 ( 7%) usr 0.03 (10%) sys 0.80 ( 7%) wall 53463 kB (17%) ggc tree iv optimization : 7.70 (74%) usr 0.19 (66%) sys 7.97 (73%) wall 220347 kB (71%) ggc real 0m10.912s user 0m10.448s sys 0m0.317s For 200 nested loops it begins to swap even with -fno-tree-pre, so we'd need a patch for limiting this.
Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3) Here is a first patch that uses PARAM_SCEV_MAX_EXPR_SIZE for limiting the size of expressions that we want to handle. I will send it to gcc-patches once it bootstraps, etc. time ./gcc/cc1 -O2 ~/ex/pr18595_100.c tree loop bounds : 0.51 (17%) usr 0.03 (23%) sys 0.53 (17%) wall 32717 kB (29%) ggc tree iv optimization : 1.23 (41%) usr 0.06 (46%) sys 1.29 (40%) wall 64971 kB (58%) ggc real 0m3.995s user 0m3.015s sys 0m0.147s time ./gcc/cc1 -O2 ~/ex/pr18595_200.c tree loop bounds : 2.75 (23%) usr 0.05 (13%) sys 2.80 (22%) wall 72775 kB (22%) ggc tree iv optimization : 3.74 (31%) usr 0.20 (51%) sys 4.30 (33%) wall 210289 kB (64%) ggc real 0m13.316s user 0m12.163s sys 0m0.423s time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_300.c tree loop bounds : 0.16 ( 3%) usr 0.00 ( 0%) sys 0.15 ( 3%) wall 508 kB ( 1%) ggc tree iv optimization : 2.26 (47%) usr 0.08 (62%) sys 2.33 (46%) wall 74743 kB (84%) ggc real 0m6.287s user 0m4.849s sys 0m0.142s time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_1000.c tree loop bounds : 2.92 ( 4%) usr 0.00 ( 0%) sys 2.92 ( 4%) wall 913 kB ( 0%) ggc tree iv optimization : 37.78 (54%) usr 0.91 (65%) sys 41.15 (52%) wall 786700 kB (93%) ggc real 1m19.611s user 1m10.288s sys 0m1.403s Index: tree-scalar-evolution.c =================================================================== --- tree-scalar-evolution.c (revision 106440) +++ tree-scalar-evolution.c (working copy) @@ -1982,7 +1982,8 @@ loop_closed_phi_def (tree var) /* Analyze all the parameters of the chrec that were left under a symbolic form, with respect to LOOP. CHREC is the chrec to instantiate. CACHE is the cache of already instantiated values. FLAGS modify the way chrecs are - instantiated. */ + instantiated. SIZE_EXPR is used for computing the size of the expression to + be instantiated, and to stop if it exceeds some limit. */ /* Values for FLAGS. */ enum @@ -1995,12 +1996,17 @@ enum }; static tree -instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t cache) +instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t cache, + int size_expr) { tree res, op0, op1, op2; basic_block def_bb; struct loop *def_loop; + /* Give up if the path is longer than the MAX that we allow. */ + if (size_expr++ > PARAM_VALUE (PARAM_SCEV_MAX_EXPR_SIZE)) + return chrec_dont_know; + if (automatically_generated_chrec_p (chrec) || is_gimple_min_invariant (chrec)) return chrec; @@ -2068,7 +2074,7 @@ instantiate_parameters_1 (struct loop *l } else if (res != chrec_dont_know) - res = instantiate_parameters_1 (loop, res, flags, cache); + res = instantiate_parameters_1 (loop, res, flags, cache, size_expr); bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec)); @@ -2078,12 +2084,12 @@ instantiate_parameters_1 (struct loop *l case POLYNOMIAL_CHREC: op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; @@ -2094,12 +2100,12 @@ instantiate_parameters_1 (struct loop *l case PLUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; @@ -2110,12 +2116,12 @@ instantiate_parameters_1 (struct loop *l case MINUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; @@ -2126,12 +2132,12 @@ instantiate_parameters_1 (struct loop *l case MULT_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; @@ -2144,7 +2150,7 @@ instantiate_parameters_1 (struct loop *l case CONVERT_EXPR: case NON_LVALUE_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; @@ -2174,17 +2180,17 @@ instantiate_parameters_1 (struct loop *l { case 3: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; op2 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 2), - flags, cache); + flags, cache, size_expr); if (op2 == chrec_dont_know) return chrec_dont_know; @@ -2198,12 +2204,12 @@ instantiate_parameters_1 (struct loop *l case 2: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; @@ -2214,7 +2220,7 @@ instantiate_parameters_1 (struct loop *l case 1: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; if (op0 == TREE_OPERAND (chrec, 0)) @@ -2252,7 +2258,8 @@ instantiate_parameters (struct loop *loo fprintf (dump_file, ")\n"); } - res = instantiate_parameters_1 (loop, chrec, INSERT_SUPERLOOP_CHRECS, cache); + res = instantiate_parameters_1 (loop, chrec, INSERT_SUPERLOOP_CHRECS, cache, + 0); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2275,7 +2282,7 @@ static tree resolve_mixers (struct loop *loop, tree chrec) { htab_t cache = htab_create (10, hash_scev_info, eq_scev_info, del_scev_info); - tree ret = instantiate_parameters_1 (loop, chrec, FOLD_CONVERSIONS, cache); + tree ret = instantiate_parameters_1 (loop, chrec, FOLD_CONVERSIONS, cache, 0); htab_delete (cache); return ret; }
Fixed on the mainline.