18595 – [4.0/4.1 Regression] IV-OPTS is O(N^3)

Bug 18595 - [4.0/4.1 Regression] IV-OPTS is O(N^3)

Summary: [4.0/4.1 Regression] IV-OPTS is O(N^3)

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.0.0

Importance:	P4 minor
Target Milestone:	4.1.0
Assignee:	Zdenek Dvorak

URL:
Keywords:	compile-time-hog, memory-hog

Depends on:	24483
Blocks:	18693
	Show dependency tree / graph

Reported:	2004-11-21 14:59 UTC by Andrew Pinski
Modified:	2005-11-08 01:40 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:	4.0.0
Last reconfirmed:	2005-09-10 05:44:30

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2004-11-21 14:59:49 UTC

Using the program in PR 10060, to generate 1000 and 2000, I see that "IV OPT" goes from:
 tree iv optimization  :   1.79 (14%) usr   0.15 ( 5%) sys   2.60 (13%) wall
to
 tree iv optimization  :   6.92 (13%) usr   0.29 ( 5%) sys   7.30 (12%) wall

which looks like PHI iv opt does not scale O(n).

This is not as importran as PR 18594 though (which is the same testcase).

Comment 1 Steven Bosscher 2004-11-24 11:14:48 UTC

if IV opts doesn't scale, then why is the compile time ~14% in both
cases?

Comment 2 Serge Belyshev 2004-11-25 11:16:30 UTC

use this awk script to generate testcase (first arg is number of loops):
----------------------------------------------------------------------------------------
BEGIN {
	ORS=""
	print "int f ()\n{\tint "
	for (j = 0; j < ARGV [1]; j++)
		print "j" j ", "
	print "a;\n\ta = 0;\n"
	print "\tfor (j0 = 0; j0 < 2; j0 ++)\n"
	for (j = 1; j < ARGV [1]; j++)
		print "\tfor (j" j " = j" j-1 "; j" j " < 2; j" j" ++)\n"
	print "\ta += "
	for (j = 0; j < ARGV [1]-1; j++)
		print "j" j " + "
	print "j" j ";\n\treturn a;\n}\n"
}
----------------------------------------------------------------------------------------

N loops   Time, s
 5        0.05
10        0.17
15        0.38
20        1.14
25        2.81
30        4.68
35        7.52
40        13.6
45        21.8
50        25.6
55        33.9

Comment 3 Steven Bosscher 2004-11-27 01:10:32 UTC

Of course it's not very useful to claim that some algorithm
is O(N^x) when you don't say what N is...

Comment 4 Serge Belyshev 2004-11-27 01:45:27 UTC

(In reply to comment #3)
> Of course it's not very useful to claim that some algorithm
> is O(N^x) when you don't say what N is...

N is number of nested loops.

Comment 5 Zdenek Dvorak 2004-11-28 18:42:47 UTC

IVOPTs should behave quadratically on this type of tests; I do not know where
does the extra O(N) factor come from, and I would not expect the times to grow
this fast.

Comment 6 Serge Belyshev 2004-12-22 23:40:41 UTC

It seems that extra O(N) factor comes from high memory usage. For smaller N
(0..30) compiler behaves like O(2) and for larger (> 50) like O(2.5 .. 3).

gnuplot> fit [0:30] A*x**k+B "data" via A,k,B
[snip]
Final set of parameters            Asymptotic Standard Error
=======================            ==========================

A               = 0.000465461      +/- 2.625e-05    (5.64%)
k               = 1.90612          +/- 0.01652      (0.8667%)
B               = 0.0328722        +/- 0.0007677    (2.335%)

gnuplot> fit [50:175] A*x**k+B "data" via A,k,B
[snip]
Final set of parameters            Asymptotic Standard Error
=======================            ==========================

A               = 4.76329e-06      +/- 3.493e-07    (7.332%)
k               = 3.0033           +/- 0.01405      (0.4678%)
B               = 0.392746         +/- 0.02972      (7.567%)

For N=175 memory usage on my machine exceedes 600 MB

Comment 7 Steven Bosscher 2004-12-23 01:26:52 UTC

hmmm maybe the extra O(N) comes from O(N) bitmap operations? (Just guessing)

Comment 8 Zdenek Dvorak 2004-12-23 01:29:52 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> hmmm maybe the extra O(N) comes from O(N) bitmap operations? (Just guessing) 

that might be the case, but I don't think it is likely (also just
guessing :-)  As far as I can tell all bitmaps in ivopts should be small
for this testcase.

Comment 9 Steven Bosscher 2005-01-23 14:18:34 UTC

It's definitely not the bitmaps: 
 time   seconds   seconds    calls   s/call   s/call  name 
 10.28     25.14    25.14 24356740     0.00     0.00  ggc_alloc_stat 
  5.39     38.32    13.18  3133403     0.00     0.00  instantiate_parameters_1 
  5.25     51.15    12.83 34260186     0.00     0.00  htab_find_slot_with_hash 
  4.41     61.94    10.79 40175413     0.00     0.00  build_int_cst_wide 
  4.35     72.58    10.64  1139473     0.00     0.00  chrec_convert 
  3.79     81.86     9.28     1810     0.01     0.01  htab_empty 
  3.63     90.73     8.87 22863854     0.00     0.00  build3_stat 
  3.41     99.07     8.34      800     0.01     0.01  for_each_insn_in_loop 
  2.74    105.76     6.69   415264     0.00     0.00  follow_ssa_edge 
  2.64    112.21     6.45  3085512     0.00     0.00  
find_interesting_uses_stmt 
  2.14    117.45     5.24  3197505     0.00     0.00  record_invariant 
  2.13    122.66     5.21  9695984     0.00     0.00  
analyze_scalar_evolution_1 
  1.65    126.70     4.04 11769469     0.00     0.00  comptypes 
  1.64    130.70     4.00 11710741     0.00     0.00  fold_convert_const 
  1.63    134.68     3.98 23551936     0.00     0.00  make_node_stat 
  1.44    138.19     3.51 11464308     0.00     0.00  force_fit_type 
  1.28    141.31     3.12 12009984     0.00     0.00  fold_convert 
  1.25    144.37     3.06 21935414     0.00     0.00  eq_scev_info 
  1.18    147.26     2.89  9695984     0.00     0.00  analyze_scalar_evolution 
  1.09    149.93     2.67   326865     0.00     0.00  chrec_fold_plus_1 
  1.01    152.41     2.48      200     0.01     0.94  
tree_ssa_iv_optimize_loop 
  1.01    154.88     2.47 29473273     0.00     0.00  is_gimple_min_invariant 
  0.99    157.30     2.43 26291603     0.00     0.00  flow_bb_inside_loop_p 
  0.98    159.70     2.40 22605936     0.00     0.00  flow_loop_nested_p 
  0.95    162.03     2.33  3135046     0.00     0.00  htab_delete 
  0.95    164.35     2.32   164459     0.00     0.00  htab_expand

Comment 10 Steven Bosscher 2005-01-23 14:22:14 UTC

More profile data: 
 
----------------------------------------------- 
                0.00  187.04       1/1           execute_pass_list [6] 
[7]     76.5    0.00  187.04       1         tree_ssa_iv_optimize [7] 
                2.48  184.56     200/200         tree_ssa_iv_optimize_loop [8] 
                0.00    0.00       1/201         free_loop_data [349] 
                0.00    0.00       2/30137       bitmap_obstack_free [178] 
                0.00    0.00     201/7042572     xcalloc [127] 
                0.00    0.00       3/2369        varray_init [1063] 
                0.00    0.00       2/36950       bitmap_obstack_alloc [794] 
----------------------------------------------- 
                2.48  184.56     200/200         tree_ssa_iv_optimize [7] 
[8]     76.5    2.48  184.56     200         tree_ssa_iv_optimize_loop [8] 
               30.61  134.21  309011/311017      simple_iv <cycle 2> [35] 
                6.45    9.73 3085512/3085512     find_interesting_uses_stmt 
[25] 
                0.04    1.03     200/202         scev_reset [93] 
                0.00    0.76     200/200         determine_use_iv_costs [107] 
                0.00    0.50     200/200         rewrite_uses [142] 
                0.00    0.35     200/202         loop_commit_inserts [181] 
                0.11    0.11   20100/20100       find_interesting_uses_cond 
[241] 
                0.02    0.09     200/311017      number_of_iterations_exit 
<cycle 2> [41] 
                0.06    0.04     200/201         free_loop_data [349] 
                0.00    0.09     200/200         create_new_ivs [360] 
                0.00    0.08     200/401         get_loop_body_in_dom_order 
[274] 
                0.00    0.06     200/2601        get_loop_body [111] 
                0.05    0.00     200/200         remove_unused_ivs [440] 
                0.02    0.03   19703/99904       
find_interesting_uses_outer_or_nonlin [223] 
                0.00    0.04    2004/2804        add_candidate [413] 
                0.00    0.02     200/200         find_optimal_iv_set [609] 
                0.00    0.02   20507/40407       set_iv [503] 
                0.00    0.02     800/800         add_iv_value_candidates [689] 
                0.01    0.01   80200/26291603     flow_bb_inside_loop_p [44] 
                0.01    0.00     800/30137       bitmap_obstack_free [178] 
                0.00    0.00    1006/5012        force_var_cost [590] 
                0.00    0.00     200/200         iv_ca_free [1067] 
                0.00    0.00     201/3005        add_candidate_1 [530] 
                0.00    0.00   20708/370258      zero_p [517] 
                0.00    0.00    1399/3525413     get_iv [69] 
                0.00    0.00     402/12009984     fold_convert [17] 
                0.00    0.00    1198/3329551     is_gimple_reg [71] 
                0.00    0.00    1202/40175413     build_int_cst_wide [26] 
                0.00    0.00     200/200         single_dom_exit [1344] 
                0.00    0.00     402/132285      find_edge [436] 
                0.00    0.00    1003/8376514     bitmap_set_bit [92] 
                0.00    0.00   20507/20507       contains_abnormal_ssa_name_p 
[1427] 
                0.00    0.00     201/57945       int_cst_value [804] 
                0.00    0.00    1202/22889750     build_int_cst [141] 
                0.00    0.00    1006/5034        add_cost [1463] 
                0.00    0.00    1007/22165       cst_and_fits_in_hwi [1748] 
                0.00    0.00     402/1401        loop_latch_edge [2001] 
                0.00    0.00     201/1597        loop_preheader_edge [1984] 
----------------------------------------------- 
[9]     67.8   30.81  135.08  311017+24191806 <cycle 2 as a whole> [9] 
               13.18   43.06 3133403+18788021     instantiate_parameters_1 
<cycle 2> [11] 
                5.21   44.15 9695984             analyze_scalar_evolution_1 
<cycle 2> [12] 
                0.22   20.27  703698             interpret_rhs_modify_expr 
<cycle 2> [22] 
                2.89    7.22 9695984             analyze_scalar_evolution 
<cycle 2> [32] 
                0.57    6.74  313469+81390       follow_ssa_edge_in_rhs <cycle 
2> [38] 
                6.69    0.56  415264+1352605     follow_ssa_edge <cycle 2> 
[39] 
                0.22    4.74   21906             number_of_iterations_exit 
<cycle 2> [41] 
                0.05    1.16   30495             number_of_iterations_in_loop 
<cycle 2> [86] 
                0.04    0.07   43784             instantiate_parameters <cycle 
2> [347] 
                0.05    0.01   65718             
simplify_using_outer_evolutions <cycle 2> [441] 
                0.05    0.00   30295             
compute_overall_effect_of_inner_loop <cycle 2> [459] 
----------------------------------------------- 
                             22088230             chrec_convert [10] 
                0.95    4.12  101392/1139473     instantiate_parameters_1 
<cycle 2> [11] 
                1.23    5.36  131992/1139473     follow_ssa_edge_in_rhs <cycle 
2> [38] 
                3.78   16.43  404782/1139473     interpret_rhs_modify_expr 
<cycle 2> [22] 
                4.68   20.35  501307/1139473     analyze_scalar_evolution_1 
<cycle 2> [12] 
[10]    23.3   10.64   46.26 1139473+22088230 chrec_convert [10] 
                3.01   17.81 11575896/12009984     fold_convert [17] 
                4.15   13.69 10684608/22863854     build3_stat [13] 
                2.97    1.31 11044115/40175413     build_int_cst_wide [26] 
                0.76    0.94  340478/806987      fold <cycle 7> [47] 
                1.37    0.00 23227703/23318505     chrec_type [77] 
                0.25    0.00 11044115/22889750     build_int_cst [141] 
                             22088230             chrec_convert [10] 
----------------------------------------------- 
                             18788021             instantiate_parameters_1 
<cycle 2> [11] 
                               43784             instantiate_parameters <cycle 
2> [347] 
                             3089619             simple_iv <cycle 2> [35] 
[11]    23.0   13.18   43.06 3133403+18788021 instantiate_parameters_1 <cycle 
2> [11] 
                1.28    8.71 11465180/11465180     set_instantiated_value [33] 
                2.25    7.43 5793593/22863854     build3_stat [13] 
                0.85    5.09  104374/326865      chrec_fold_plus_1 [24] 
                0.95    4.12  101392/1139473     chrec_convert [10] 
                1.51    0.67 5628417/40175413     build_int_cst_wide [26] 
                1.92    0.03 5773179/5820723     htab_find_with_hash [67] 
                1.84    0.00 21921424/29473273     is_gimple_min_invariant 
[56] 
                0.82    0.70 8874753/26291603     flow_bb_inside_loop_p [44] 
                0.90    0.28 5773179/5784085     htab_find [88] 
                1.14    0.00 5732590/6202135     bitmap_clear_bit [85] 
                0.85    0.00 5732590/5756097     find_common_loop [104] 
                0.78    0.00 5732590/8376514     bitmap_set_bit [92] 
                0.67    0.00 5732590/10054625     bitmap_bit_p [89] 
                0.13    0.00 5628417/22889750     build_int_cst [141] 
                0.05    0.06   21892/806987      fold <cycle 7> [47] 
                0.01    0.03   21892/609615      build2_stat [98] 
                0.01    0.01   82482/3352077     chrec_fold_plus [122] 
                0.00    0.00   21892/2867796     chrec_fold_minus [154] 
                             5732590             analyze_scalar_evolution 
<cycle 2> [32] 
                             18788021             instantiate_parameters_1 
<cycle 2> [11] 
----------------------------------------------- 
                             9695984             analyze_scalar_evolution 
<cycle 2> [32] 
[12]    20.2    5.21   44.15 9695984         analyze_scalar_evolution_1 <cycle 
2> [12] 
                4.68   20.35  501307/1139473     chrec_convert [10] 
                1.65    9.88  202391/326865      chrec_fold_plus_1 [24] 
                0.45    4.61 6482915/15734611     find_var_scev_info [29] 
                0.88    0.75 9534492/26291603     flow_bb_inside_loop_p [44] 
                0.17    0.55  428920/22863854     build3_stat [13] 
                0.05    0.06   21785/806987      fold <cycle 7> [47] 
                0.01    0.03  202389/3352077     chrec_fold_plus [122] 
                0.02    0.00   70699/70699       chrec_merge [642] 
                0.01    0.00   70699/1575094     loop_phi_node_p [202] 
                0.00    0.00       2/2867796     chrec_fold_minus [154] 
                              703698             interpret_rhs_modify_expr 
<cycle 2> [22] 
                               70699             follow_ssa_edge <cycle 2> 
[39] 
-----------------------------------------------

Comment 11 Steven Bosscher 2005-01-23 14:23:09 UTC

Looks to me like Seb may want to look at this bug too...

Comment 12 Steven Bosscher 2005-01-23 14:24:50 UTC

One more, because I also run out of memory pretty quickly: 
----------------------------------------------- 
                0.00    0.00       1/22863854     c_build_bind_expr [1608] 
                0.00    0.00       1/22863854     thread_across_edge [392] 
                0.00    0.00     199/22863854     
estimate_numbers_of_iterations [184] 
                0.00    0.00     200/22863854     c_finish_loop [754] 
                0.00    0.01    9801/22863854     follow_ssa_edge_in_rhs 
<cycle 2> [38] 
                0.02    0.06   50602/22863854     add_to_evolution [324] 
                0.10    0.32  247913/22863854     simple_iv <cycle 2> [35] 
                0.17    0.55  428920/22863854     analyze_scalar_evolution_1 
<cycle 2> [12] 
                2.19    7.24 5648016/22863854     chrec_fold_plus_1 [24] 
                2.25    7.43 5793593/22863854     instantiate_parameters_1 
<cycle 2> [11] 
                4.15   13.69 10684608/22863854     chrec_convert [10] 
[13]    15.6    8.87   29.30 22863854         build3_stat [13] 
                3.86   25.44 22863854/23551936     make_node_stat [15] 
-----------------------------------------------

Comment 13 Daniel Berlin 2005-01-23 15:01:25 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

I believe seb/zdenek already submitted patches for speeding up scev quite 
recently, with the goal of alleviating this problem.
I'm pretty sure they have not been applied yet.

Comment 14 Zdenek Dvorak 2005-01-23 15:06:12 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> I believe seb/zdenek already submitted patches for speeding up scev quite 
> recently, with the goal of alleviating this problem.
> I'm pretty sure they have not been applied yet.

the Sebastian's patch is not in yet; it might help a bit in this case, a
believe, although I suspect a real source of the problem is elsewhere.

Comment 15 Zdenek Dvorak 2005-01-23 16:38:32 UTC

Sebastian's patch does not help significantly :-(

Comment 16 Zdenek Dvorak 2005-01-23 17:04:23 UTC

One extra N factor seems to be coming from simple_iv
(analyze_scalar_evolution_in_loop).  The function was created before
loop closed ssa form, under assumptions of lcssa it should be possible
to get rid of this.  I am working on patch

Comment 17 Zdenek Dvorak 2005-01-23 17:14:03 UTC

Also did not help much.

Comment 18 Zdenek Dvorak 2005-01-24 01:40:17 UTC

The patch that causes ivopts to reset just the relevant parts of the scev cache
(just regtesting) instead of clearing it completely helps a bit -- the compile
time for N=100 gets about 2x better and memory consumption about 10x.

Still there remain some inefficiences within the scev analysis itself.

Comment 19 Zdenek Dvorak 2005-01-24 01:46:12 UTC

On a side note, PRE also seems to have problems with the testcase.  With the
patch mentioned above, the largest consumers of compile time are ivopts (45%)
and pre (20%).

Comment 20 Daniel Berlin 2005-01-24 01:49:53 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)



On Sun, 24 Jan 2005, rakdver at gcc dot gnu dot org wrote:

>
> ------- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-24 01:46 -------
> On a side note, PRE also seems to have problems with the testcase.  With the
> patch mentioned above, the largest consumers of compile time are ivopts (45%)
> and pre (20%).

Uh, there was a bug filed about this, and i fixed it, last i looked.

Comment 21 Daniel Berlin 2005-01-24 01:57:51 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

>> On a side note, PRE also seems to have problems with the testcase.  With the
>> patch mentioned above, the largest consumers of compile time are ivopts (45%)
>> and pre (20%).
>
> Uh, there was a bug filed about this, and i fixed it, last i looked.

Here it is.
Bug 18673.

N=100 takes .25 seconds now.

I can't make PRE much faster than that, that's almost all hash function 
time.
--Dan

Comment 22 sebastian.pop@cri.ensmp.fr 2005-01-24 21:36:27 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> 
> Still there remain some inefficiences within the scev analysis itself.
> 

Zdenek, have you tried to revert the patch that caches the
instantiation information?  This could speed up things a little.  

On a side note, I wonder how many times we're using this instantiation
cache, in other words whether we pay a high price for the scev
analysis for some (probably) rare cases...

Comment 23 Zdenek Dvorak 2005-01-24 22:16:50 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> > Still there remain some inefficiences within the scev analysis itself.
> > 
> 
> Zdenek, have you tried to revert the patch that caches the
> instantiation information?  This could speed up things a little.  
>
> On a side note, I wonder how many times we're using this instantiation
> cache, in other words whether we pay a high price for the scev
> analysis for some (probably) rare cases...

Adding the instantiation cache was compile time neutral on normal code,
so I don't think the effect on scev cost is significant.

The problem is that we end up calling the instantiate_parameters_1
function 83214+2453273 (outside + recursive) times (for N=100).
We also call analyze_scalar_evolution_1 1146317 times.  Many of these
calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat).
Large part of 5142869 of build_int_cst_wide calls is very likely also
due to scev analysis.  This is not going to be cheap.  Removing the
instantiation cache definitely would not decrease the number of
instantiate_parameters_1 calls (might increase it, in fact, although
I don't know how significantly).

One part of the problem is that loop optimizers need to throw away the
scev caches after each change to loops, which leads to recounting the
information too much.  Allowing to invalidate only changed parts helps,
but seems to be relatively costly on normal code -- you need to scan the
hash table for things that refer to removed or from some other reason
invalidated ssa names, which is slow.  And to make things worse, this
change of course means that most of the information is left in the hash
table, i.e. the size of the table grows and these scans get slower and
slower.  The alternative -- checking the validity of entries only when
querying the cache -- is not possible, since we reuse the released
ssa names, so we would be unable to recognize the validity of the
information.

Other part is that scev tries to be too clever.  Without need to
represent nonaffine induction variables (that we do not use anywhere)
we could use more memory efficient representation of ivs.
Without need to handle symbolic references (that we also do not use
anywhere, we could store evolutions in a fully instantiated form, and
we would not need instantiate_parameters/resolve_mixers functions at all.

Comment 24 Daniel Berlin 2005-01-24 22:25:04 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

>
> Other part is that scev tries to be too clever.  Without need to
> represent nonaffine induction variables (that we do not use anywhere)
> we could use more memory efficient representation of ivs.
> Without need to handle symbolic references (that we also do not use
> anywhere, we could store evolutions in a fully instantiated form, and
> we would not need instantiate_parameters/resolve_mixers functions atall.

Uh, symbolic references are or will be used to do data dependence when 
MEM_REF and ARRAY_REF couldn't be generated from the pointers.

Comment 25 Zdenek Dvorak 2005-01-24 23:09:31 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> > Other part is that scev tries to be too clever.  Without need to
> > represent nonaffine induction variables (that we do not use anywhere)
> > we could use more memory efficient representation of ivs.
> > Without need to handle symbolic references (that we also do not use
> > anywhere, we could store evolutions in a fully instantiated form, and
> > we would not need instantiate_parameters/resolve_mixers functions atall.
> 
> Uh, symbolic references are or will be used to do data dependence when 
> MEM_REF and ARRAY_REF couldn't be generated from the pointers.

How?  If the reference is left in symbolic form, it means that you know
nothing about its value, so the only thing you can do with it is to
check its equality/inequality, in code like

while (...)
  {
    i = something_weird ();

    a[i] = ...;  (a)
    a[i+1] = ...;  (b)
    a[i] = ...;  (c)
  }

to find that (b) is independent on (a) and (c) and to find the
dependence between (a) and (c), but you do not need scev for this --
value numbering info is enough.

Comment 26 Daniel Berlin 2005-01-24 23:12:14 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)



On Mon, 24 Jan 2005, rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:

>
> ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz  2005-01-24 23:09 -------
> Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)
>
>>> Other part is that scev tries to be too clever.  Without need to
>>> represent nonaffine induction variables (that we do not use anywhere)
>>> we could use more memory efficient representation of ivs.
>>> Without need to handle symbolic references (that we also do not use
>>> anywhere, we could store evolutions in a fully instantiated form, and
>>> we would not need instantiate_parameters/resolve_mixers functions atall.
>>
>> Uh, symbolic references are or will be used to do data dependence when
>> MEM_REF and ARRAY_REF couldn't be generated from the pointers.
>
> How?  If the reference is left in symbolic form, it means that you know
> nothing about its value,

Wrong.

We could know things about it's value, such as ranges. We just don't know 
an exact number.

Comment 27 Zdenek Dvorak 2005-01-25 00:27:04 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> >>> Other part is that scev tries to be too clever.  Without need to
> >>> represent nonaffine induction variables (that we do not use anywhere)
> >>> we could use more memory efficient representation of ivs.
> >>> Without need to handle symbolic references (that we also do not use
> >>> anywhere, we could store evolutions in a fully instantiated form, and
> >>> we would not need instantiate_parameters/resolve_mixers functions atall.
> >>
> >> Uh, symbolic references are or will be used to do data dependence when
> >> MEM_REF and ARRAY_REF couldn't be generated from the pointers.
> >
> > How?  If the reference is left in symbolic form, it means that you know
> > nothing about its value,
> 
> Wrong.
> 
> We could know things about it's value, such as ranges. We just don't know 
> an exact number.

OK.  This is work for VRP, not scev.

Comment 28 Daniel Berlin 2005-01-25 00:39:05 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)


>>>>
>>>> Uh, symbolic references are or will be used to do data dependence when
>>>> MEM_REF and ARRAY_REF couldn't be generated from the pointers.
>>>
>>> How?  If the reference is left in symbolic form, it means that you know
>>> nothing about its value,
>>
>> Wrong.
>>
>> We could know things about it's value, such as ranges. We just don't know
>> an exact number.
>
> OK.  This is work for VRP, not scev.

And once we have it, we can talk about removing the symbolic stuff from 
SCEV :)

Comment 29 Zdenek Dvorak 2005-01-25 00:49:56 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> >>>> Uh, symbolic references are or will be used to do data dependence when
> >>>> MEM_REF and ARRAY_REF couldn't be generated from the pointers.
> >>>
> >>> How?  If the reference is left in symbolic form, it means that you know
> >>> nothing about its value,
> >>
> >> Wrong.
> >>
> >> We could know things about it's value, such as ranges. We just don't know
> >> an exact number.
> >
> > OK.  This is work for VRP, not scev.
> 
> And once we have it, we can talk about removing the symbolic stuff from 
> SCEV :)

Once we use any value range info from SCEV, we can speak about leaving
symbolic references in SCEV until we have a proper VRP :-)

Comment 30 Daniel Berlin 2005-01-25 00:50:46 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)



On Mon, 25 Jan 2005, rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:

>
> ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz  2005-01-25 00:49 -------
> Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)
>
>>>>>> Uh, symbolic references are or will be used to do data dependence when
>>>>>> MEM_REF and ARRAY_REF couldn't be generated from the pointers.
>>>>>
>>>>> How?  If the reference is left in symbolic form, it means that you know
>>>>> nothing about its value,
>>>>
>>>> Wrong.
>>>>
>>>> We could know things about it's value, such as ranges. We just don't know
>>>> an exact number.
>>>
>>> OK.  This is work for VRP, not scev.
>>
>> And once we have it, we can talk about removing the symbolic stuff from
>> SCEV :)
>
> Once we use any value range info from SCEV, we can speak about leaving
> symbolic references in SCEV until we have a proper VRP :-)
See autovec branch.

Comment 31 Daniel Berlin 2005-01-25 00:52:09 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> See autovec branch.


You could also look at recent patches posted by sebastian and i for the 
autovect branch that have been adding this support.

Comment 32 Zdenek Dvorak 2005-01-25 01:11:48 UTC

Which one? I cannot find anything relevant in changelog.

Comment 33 Daniel Berlin 2005-01-25 01:15:31 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)


> Which one? I cannot find anything relevant in changelog.
>


         * tree-data-ref.c (analyze_subscript_affine_affine): Implement a
         solution for the FIXME concerning the symbolic affine
         dependence testing; remove the FIXME.
         (can_use_analyze_subscript_affine_affine): New function.
         (analyze_siv_subscript): Use it.

and


2004-12-15  Daniel Berlin  <dberlin@dberlin.org>

         * Makefile.in (tree-chrec.o): Add cfgloop.h

         * tree-chrec.c: Add cfgloop.h, tree-flow.h.
         (evolution_function_is_invariant_p): New function.
         (evolution_function_is_affine_multivariate_p): Use
         evolution_function_is_invariant_p instead of
         evolution_function_is_constant_p.

         * tree-chrec.h: Add prototype for
         evolution_function_is_invariant_p.
         (evolution_function_is_affine_p): Use
         evolution_function_is_invariant_p.

         * tree-data-ref.c (analyze_overlapping_iterations): chrecs that
         are equal overlap on every iteration.

This stuff is just simple symbolic differencing, and checking of 
invariantness of symbols,
But it is indeed starting to se the symbolic scev info

Comment 34 Zdenek Dvorak 2005-01-25 01:24:12 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> > Which one? I cannot find anything relevant in changelog.
> >
> 
> 
>          * tree-data-ref.c (analyze_subscript_affine_affine): Implement a
>          solution for the FIXME concerning the symbolic affine
>          dependence testing; remove the FIXME.
>          (can_use_analyze_subscript_affine_affine): New function.
>          (analyze_siv_subscript): Use it.
> 
> and
> 
> 
> 2004-12-15  Daniel Berlin  <dberlin@dberlin.org>
> 
>          * Makefile.in (tree-chrec.o): Add cfgloop.h
> 
>          * tree-chrec.c: Add cfgloop.h, tree-flow.h.
>          (evolution_function_is_invariant_p): New function.
>          (evolution_function_is_affine_multivariate_p): Use
>          evolution_function_is_invariant_p instead of
>          evolution_function_is_constant_p.
> 
>          * tree-chrec.h: Add prototype for
>          evolution_function_is_invariant_p.
>          (evolution_function_is_affine_p): Use
>          evolution_function_is_invariant_p.
> 
>          * tree-data-ref.c (analyze_overlapping_iterations): chrecs that
>          are equal overlap on every iteration.
> 
> This stuff is just simple symbolic differencing, and checking of 
> invariantness of symbols,
> But it is indeed starting to se the symbolic scev info

Ugh... OK, if you think it is a right way...  Anyway, I am seriously
considering resurrecting the simple iv analysis for purposes of passes
that are not interested in this fancy stuff.  Overhead of scev is
probably acceptable if it is only used for this type of analysis,
but for other purposes it is clearly overkill.

Comment 35 Steven Bosscher 2005-01-25 01:28:14 UTC

Do remember that this bug is about bad behavior with deeply nested loops 
and we already have other algorithms that are quadratic in the loop nest 
depth.  So I really wouldn't consider this to be a very serious problem, 
but rather something that could be improved. 
 
It is a shame that Seb has so far not commented on the behavior of his 
scev algorithm with respect to the loop nest depth.  Is it expected to 
be cubic in the loop nest depth?  Perhaps he can improve the algorithm?

Comment 36 Daniel Berlin 2005-01-25 01:42:30 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

>>
>>          * tree-data-ref.c (analyze_overlapping_iterations): chrecs that
>>          are equal overlap on every iteration.
>>
>> This stuff is just simple symbolic differencing, and checking of
>> invariantness of symbols,
>> But it is indeed starting to se the symbolic scev info
>
> Ugh... OK, if you think it is a right way...  Anyway, I am seriously
> considering resurrecting the simple iv analysis for purposes of passes
> that are not interested in this fancy stuff.  Overhead of scev is
> probably acceptable if it is only used for this type of analysis,
> but for other purposes it is clearly overkill.

Uh, almost all high level loop optimizations are going to do very badly in 
the depth of the loop nest.

The fact that scev doesn't do so well on a 100 deep loop nest doesn't 
really concern me at all.

Comment 37 Zdenek Dvorak 2005-01-25 01:52:12 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> Do remember that this bug is about bad behavior with deeply nested loops 
> and we already have other algorithms that are quadratic in the loop nest 
> depth.  So I really wouldn't consider this to be a very serious problem, 
> but rather something that could be improved. 
>  
> It is a shame that Seb has so far not commented on the behavior of his 
> scev algorithm with respect to the loop nest depth.  Is it expected to 
> be cubic in the loop nest depth?  Perhaps he can improve the algorithm? 

the algorithm itself is not cubic with loop depth, but worst case linear
(per query) (*). This time complexity is acceptable, IMHO.

(*) I hope; scev is a mess of mutualy recursive functions --
analyze_scalar_evolution calling number_of_iterations calling
instantiate_parameters calling analyze_scalar_evolution again, with
instantiate_parameters hacked so that the infinite cycle cannot occur is
my favourite one.  Nobody can say anything sure about behavior of scev
-- it is not even well defined what analyze_scalar_evolutions will
return to you, unless you call instantiate_parameters or resolve_mixers
on the result.

Comment 38 Zdenek Dvorak 2005-01-25 01:57:48 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> >>          * tree-data-ref.c (analyze_overlapping_iterations): chrecs that
> >>          are equal overlap on every iteration.
> >>
> >> This stuff is just simple symbolic differencing, and checking of
> >> invariantness of symbols,
> >> But it is indeed starting to se the symbolic scev info
> >
> > Ugh... OK, if you think it is a right way...  Anyway, I am seriously
> > considering resurrecting the simple iv analysis for purposes of passes
> > that are not interested in this fancy stuff.  Overhead of scev is
> > probably acceptable if it is only used for this type of analysis,
> > but for other purposes it is clearly overkill.
> 
> Uh, almost all high level loop optimizations are going to do very badly in 
> the depth of the loop nest.

Remove "almost" and "high level" from this sentence and I will agree
with you.  This really does not worry me that much.

> The fact that scev doesn't do so well on a 100 deep loop nest doesn't 
> really concern me at all.

Me neither.  Its time and memory consumption on normal testcases however
is not entirely satisfactory (not critical, but there definitely is a
place for improvement), and implementation could (should?) also be
significantly cleaned up.

Comment 39 sebastian.pop@cri.ensmp.fr 2005-01-25 10:32:23 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> Adding the instantiation cache was compile time neutral on normal code,
> so I don't think the effect on scev cost is significant.
> 

How that?  You end up querying and caching an evolution for every
instantiation of an SSA_NAME!  

I will quantify the number of queries wrt the number of times this
information is useful.  I think that with my patch, the instantiation
cache information is used zero times on a bootstrap.

> The problem is that we end up calling the instantiate_parameters_1
> function 83214+2453273 (outside + recursive) times (for N=100).
> We also call analyze_scalar_evolution_1 1146317 times.  Many of these
> calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat).
> Large part of 5142869 of build_int_cst_wide calls is very likely also
> due to scev analysis.  This is not going to be cheap.  Removing the
> instantiation cache definitely would not decrease the number of
> instantiate_parameters_1 calls (might increase it, in fact, although
> I don't know how significantly).
> 

This also could be a bad use of the scev analysis.

For Steven: you can have a O(N**3) algorithm very simply:

  loop O(N) times
    loop O(N) times
      algorithm in O(N)

> One part of the problem is that loop optimizers need to throw away the
> scev caches after each change to loops, which leads to recounting the
> information too much.  Allowing to invalidate only changed parts helps,
> but seems to be relatively costly on normal code -- you need to scan the
> hash table for things that refer to removed or from some other reason
> invalidated ssa names, which is slow.  

Just set the SSA_NAME to not_analyzed_yet or NULL_TREE, and the next time
you'll ask for the evolution of this SSA_NAME you'll analyze the
evolution from scratch.

Comment 40 sebastian.pop@cri.ensmp.fr 2005-01-25 10:39:04 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> 
> How?  If the reference is left in symbolic form, it means that you know
> nothing about its value, so the only thing you can do with it is to
> check its equality/inequality, in code like
> 
> while (...)
>   {
>     i = something_weird ();
> 
>     a[i] = ...;  (a)
>     a[i+1] = ...;  (b)
>     a[i] = ...;  (c)
>   }
> 

The following is probably a more frequent case:

i = 0
x = something_weird () or a function parameter
loop
  i = i + 1 
  a[i] = ...
  ... = a[i + x]
endloop

where you end with a symbolic distance vector.

Comment 41 sebastian.pop@cri.ensmp.fr 2005-01-25 11:02:47 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> 
> (*) I hope; scev is a mess of mutualy recursive functions --
> analyze_scalar_evolution calling number_of_iterations calling
> instantiate_parameters calling analyze_scalar_evolution again, with
> instantiate_parameters hacked so that the infinite cycle cannot occur is
> my favourite one.  Nobody can say anything sure about behavior of scev
> -- it is not even well defined what analyze_scalar_evolutions will
> return to you, 

It returns to you an evolution that might contain SSA_NAMEs.

> unless you call instantiate_parameters or resolve_mixers
> on the result.
> 

And once you call instantiate_parameters on the result you're
guaranteed that what you get does contain only determined constants,
or otherwise the result is chrec_dont_know.

Comment 42 Zdenek Dvorak 2005-01-25 11:14:59 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> > Adding the instantiation cache was compile time neutral on normal code,
> > so I don't think the effect on scev cost is significant.
> > 
> 
> How that?  You end up querying and caching an evolution for every
> instantiation of an SSA_NAME!  

which is pretty cheap (constant time operation); note that we do an
equivalent lookup in the analyze_scalar_evolution call in any case, also
without causing any compile time problems.  I haven't measured any slowdown on
normal testcases.

> I will quantify the number of queries wrt the number of times this
> information is useful.  I think that with my patch, the instantiation
> cache information is used zero times on a bootstrap.

I don't think so.  Please come up with some numbers, otherwise this
discussion is pointless.

> > The problem is that we end up calling the instantiate_parameters_1
> > function 83214+2453273 (outside + recursive) times (for N=100).
> > We also call analyze_scalar_evolution_1 1146317 times.  Many of these
> > calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat).
> > Large part of 5142869 of build_int_cst_wide calls is very likely also
> > due to scev analysis.  This is not going to be cheap.  Removing the
> > instantiation cache definitely would not decrease the number of
> > instantiate_parameters_1 calls (might increase it, in fact, although
> > I don't know how significantly).
> > 
> 
> This also could be a bad use of the scev analysis.

partly -- see the analysis at the PR.  However one O(n) per query comes
from scev itself.

> For Steven: you can have a O(N**3) algorithm very simply:
> 
>   loop O(N) times
>     loop O(N) times
>       algorithm in O(N)
> 
> > One part of the problem is that loop optimizers need to throw away the
> > scev caches after each change to loops, which leads to recounting the
> > information too much.  Allowing to invalidate only changed parts helps,
> > but seems to be relatively costly on normal code -- you need to scan the
> > hash table for things that refer to removed or from some other reason
> > invalidated ssa names, which is slow.  
> 
> Just set the SSA_NAME to not_analyzed_yet or NULL_TREE, and the next time
> you'll ask for the evolution of this SSA_NAME you'll analyze the
> evolution from scratch.

This does not work, of course.  When the ssa name is removed, you must
remove all references to it from the cache.  Otherwise you will ICE when
you try to process the relevant entry in (say) instantiate_parameters.
What's worse, the ssa name may get reused by ssa name management, thus
causing you not even be able to note the change, and thus possibly to
misscompilations.

Comment 43 Zdenek Dvorak 2005-01-25 11:16:08 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> > How?  If the reference is left in symbolic form, it means that you know
> > nothing about its value, so the only thing you can do with it is to
> > check its equality/inequality, in code like
> > 
> > while (...)
> >   {
> >     i = something_weird ();
> > 
> >     a[i] = ...;  (a)
> >     a[i+1] = ...;  (b)
> >     a[i] = ...;  (c)
> >   }
> > 
> 
> The following is probably a more frequent case:
> 
> i = 0
> x = something_weird () or a function parameter
> loop
>   i = i + 1 
>   a[i] = ...
>   ... = a[i + x]
> endloop
> 
> where you end with a symbolic distance vector.

But here x is a loop invariant.  Nothing would change here if we were
keeping the evolutions fully instantiated.

Comment 44 Zdenek Dvorak 2005-01-25 11:29:13 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> > (*) I hope; scev is a mess of mutualy recursive functions --
> > analyze_scalar_evolution calling number_of_iterations calling
> > instantiate_parameters calling analyze_scalar_evolution again, with
> > instantiate_parameters hacked so that the infinite cycle cannot occur is
> > my favourite one.  Nobody can say anything sure about behavior of scev
> > -- it is not even well defined what analyze_scalar_evolutions will
> > return to you, 
> 
> It returns to you an evolution that might contain SSA_NAMEs.

Hmm... if this is all you can tell, I start fear even more; this does
not define any semantics at all :-)

More seriously -- which of the possibilities?  If I have loops like

while (...)
  {
    while (...)
      {
        x_1 = something ();
      }
    x_2 = phi (x_1);
    x_3 = x_2 + 1;
  }

What will analyze_scalar_evolutions return for x_3? There are (at least)
three possible valid values:

x_3
x_2 + 1
x_1 + 1

In this example probably the last one will be chosen, but in more
complicated cases you cannot tell (without simulating what scev analysis
does).  It might be better to not export analyze_scalar_evolution at
all and to force it to be used through
instantiate_parameters/resolve_mixers, that have at least a well defined
semantics of return value -- I hope.  Till recently I believed that I
made the functions to behave reasonably in cases when values defined
inside loop are used outside of it (through chain of lcssa phi nodes),
but my attempt to remove the hacks that ensure sane behavior of the
results for ivopts causes misscompilations; I am investigating the cause
now.

Comment 45 sebastian.pop@cri.ensmp.fr 2005-01-25 12:03:30 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> 
> ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz  2005-01-25 11:14 -------
> Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)
> 
> > rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> > > Adding the instantiation cache was compile time neutral on normal code,
> > > so I don't think the effect on scev cost is significant.
> > > 
> > 
> > How that?  You end up querying and caching an evolution for every
> > instantiation of an SSA_NAME!  
> 
> which is pretty cheap (constant time operation); note that we do an
> equivalent lookup in the analyze_scalar_evolution call in any case, also
> without causing any compile time problems.  I haven't measured any slowdown on
> normal testcases.
> 
> > I will quantify the number of queries wrt the number of times this
> > information is useful.  I think that with my patch, the instantiation
> > cache information is used zero times on a bootstrap.
> 
> I don't think so.  Please come up with some numbers, otherwise this
> discussion is pointless.
> 

during a bootstrap: 

instantiation cache queries: 1146452

instantiation cache uses: 247452 of which 143977 were scev_not_known,
the other were SSA_NAMEs.

this was counted with a grep|wc with the following patch: 

Index: tree-scalar-evolution.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v
retrieving revision 2.16
diff -d -u -p -r2.16 tree-scalar-evolution.c
--- tree-scalar-evolution.c	18 Jan 2005 11:36:26 -0000	2.16
+++ tree-scalar-evolution.c	25 Jan 2005 12:00:57 -0000
@@ -1898,8 +1898,15 @@ get_instantiated_value (htab_t cache, tr
   pattern.var = version;
   info = htab_find (cache, &pattern);
 
+  fprintf (stdout, "IC_query \n");
+
   if (info)
-    return info->chrec;
+    {
+      fprintf (stdout, "IC_used_once \n");
+      print_generic_expr (stdout, info->chrec, 0);
+      fprintf (stdout, "\n");
+      return info->chrec;
+    }
   else
     return NULL_TREE;
 }
@@ -1915,6 +1922,8 @@ set_instantiated_value (htab_t cache, tr
   pattern.var = version;
   slot = htab_find_slot (cache, &pattern, INSERT);
 
+  fprintf (stdout, "IC_query \n");
+
   if (*slot)
     info = *slot;
   else
@@ -1990,7 +1999,8 @@ instantiate_parameters_1 (struct loop *l
 	 result again.  */
       bitmap_set_bit (already_instantiated, SSA_NAME_VERSION (chrec));
       res = analyze_scalar_evolution (def_loop, chrec);
-      res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs, cache);
+      if  (res != chrec_dont_know)
+	res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs, cache);
       bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec));
 
       /* Store the correct value to the cache.  */
@@ -2000,8 +2010,14 @@ instantiate_parameters_1 (struct loop *l
     case POLYNOMIAL_CHREC:
       op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec),
 				      allow_superloop_chrecs, cache);
+      if (op0 == chrec_dont_know)
+	return chrec_dont_know;
+
       op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec),
 				      allow_superloop_chrecs, cache);
+      if (op1 == chrec_dont_know)
+	return chrec_dont_know;
+
       if (CHREC_LEFT (chrec) != op0
 	  || CHREC_RIGHT (chrec) != op1)
 	chrec = build_polynomial_chrec (CHREC_VARIABLE (chrec), op0, op1);
@@ -2010,8 +2026,14 @@ instantiate_parameters_1 (struct loop *l
     case PLUS_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
 				      allow_superloop_chrecs, cache);
+      if (op0 == chrec_dont_know)
+	return chrec_dont_know;
+
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
 				      allow_superloop_chrecs, cache);
+      if (op1 == chrec_dont_know)
+	return chrec_dont_know;
+
       if (TREE_OPERAND (chrec, 0) != op0
 	  || TREE_OPERAND (chrec, 1) != op1)
       	chrec = chrec_fold_plus (TREE_TYPE (chrec), op0, op1);
@@ -2020,8 +2042,14 @@ instantiate_parameters_1 (struct loop *l
     case MINUS_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
 				      allow_superloop_chrecs, cache);
+      if (op0 == chrec_dont_know)
+	return chrec_dont_know;
+
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
 				      allow_superloop_chrecs, cache);
+      if (op1 == chrec_dont_know)
+        return chrec_dont_know;
+
       if (TREE_OPERAND (chrec, 0) != op0
 	  || TREE_OPERAND (chrec, 1) != op1)
         chrec = chrec_fold_minus (TREE_TYPE (chrec), op0, op1);
@@ -2030,8 +2058,14 @@ instantiate_parameters_1 (struct loop *l
     case MULT_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
 				      allow_superloop_chrecs, cache);
+      if (op0 == chrec_dont_know)
+        return chrec_dont_know;
+
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
 				      allow_superloop_chrecs, cache);
+      if (op1 == chrec_dont_know)
+        return chrec_dont_know;
+
       if (TREE_OPERAND (chrec, 0) != op0
 	  || TREE_OPERAND (chrec, 1) != op1)
 	chrec = chrec_fold_multiply (TREE_TYPE (chrec), op0, op1);
@@ -2065,13 +2099,17 @@ instantiate_parameters_1 (struct loop *l
     case 3:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
 				      allow_superloop_chrecs, cache);
+      if (op0 == chrec_dont_know)
+	return chrec_dont_know;
+
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
 				      allow_superloop_chrecs, cache);
+      if (op1 == chrec_dont_know)
+	return chrec_dont_know;
+
       op2 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 2),
 				      allow_superloop_chrecs, cache);
-      if (op0 == chrec_dont_know
-	  || op1 == chrec_dont_know
-	  || op2 == chrec_dont_know)
+      if (op2 == chrec_dont_know)
         return chrec_dont_know;
 
       if (op0 == TREE_OPERAND (chrec, 0)
@@ -2085,10 +2123,12 @@ instantiate_parameters_1 (struct loop *l
     case 2:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
 				      allow_superloop_chrecs, cache);
+      if (op0 == chrec_dont_know)
+	return chrec_dont_know;
+
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
 				      allow_superloop_chrecs, cache);
-      if (op0 == chrec_dont_know
-	  || op1 == chrec_dont_know)
+      if (op1 == chrec_dont_know)
         return chrec_dont_know;
 
       if (op0 == TREE_OPERAND (chrec, 0)

Comment 46 sebastian.pop@cri.ensmp.fr 2005-01-25 12:44:13 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> More seriously -- which of the possibilities?  If I have loops like
> 
> while (...)
>   {
>     while (...)
>       {
>         x_1 = something ();
>       }
>     x_2 = phi (x_1);
>     x_3 = x_2 + 1;
>   }
> 
> What will analyze_scalar_evolutions return for x_3? There are (at least)
> three possible valid values:
> 
> x_3

This would be the answer if analyze_scalar_evolutions would be the
identity function.  If you want, you could change analyze_scalar_evolutions
such that it behaves like that, and decide that the instantiation do
the rest of the work (I mean moving the code that is currently in
analyze_scalar_evolutions to the instantiation phase).

> x_2 + 1

If you decide to reconstruct the tree expression, there is no reason
to stop on a phi node that has a single argument.  Why would you like
to get this answer as the reconstructed tree?

> x_1 + 1
> 

IMO this would be the answer, although I didn't checked.

Comment 47 Zdenek Dvorak 2005-01-25 15:54:03 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> > More seriously -- which of the possibilities?  If I have loops like
> > 
> > while (...)
> >   {
> >     while (...)
> >       {
> >         x_1 = something ();
> >       }
> >     x_2 = phi (x_1);
> >     x_3 = x_2 + 1;
> >   }
> > 
> > What will analyze_scalar_evolutions return for x_3? There are (at least)
> > three possible valid values:
> > 
> > x_3
> 
> This would be the answer if analyze_scalar_evolutions would be the
> identity function.  If you want, you could change analyze_scalar_evolutions
> such that it behaves like that, and decide that the instantiation do
> the rest of the work (I mean moving the code that is currently in
> analyze_scalar_evolutions to the instantiation phase).
> 
> > x_2 + 1
> 
> If you decide to reconstruct the tree expression, there is no reason
> to stop on a phi node that has a single argument.  Why would you like
> to get this answer as the reconstructed tree?

because this answer preserves loop closed ssa form -- the answer x_1 + 1
copy propagates the value outsied of the loop.  In some applications
this choice could make sense.

Comment 48 sebastian.pop@cri.ensmp.fr 2005-01-25 16:26:58 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> > If you decide to reconstruct the tree expression, there is no reason
> > to stop on a phi node that has a single argument.  Why would you like
> > to get this answer as the reconstructed tree?
> 
> because this answer preserves loop closed ssa form -- the answer x_1 + 1
> copy propagates the value outsied of the loop.  In some applications
> this choice could make sense.
> 

Okay, if it makes sense, you could modify the analyzer such that it
stops reconstructing the tree once it is on the phi following a loop.
This won't change the information provided after the instantiation.

Comment 49 sebastian.pop@cri.ensmp.fr 2005-01-27 13:18:36 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

With the following patch I got some speedup for depth 100.

from: 
 tree iv optimization  :   2.62 (62%) usr   0.27 (82%) sys   2.92 (62%) wall
 tree iv optimization  :   2.33 (59%) usr   0.25 (83%) sys   2.63 (54%) wall

to:
 tree iv optimization  :   1.19 (46%) usr   0.04 (40%) sys   1.26 (45%) wall
 tree iv optimization  :   1.21 (47%) usr   0.05 (45%) sys   1.30 (46%) wall

Basically we're reseting too much information each time scev_reset is
called.  It would even be possible to reset only the part of the scev
database that contains symbols instead of erasing everything.

Do somebody know how to iterate over all the elements of the hash
setting to NULL_TREE the elements that answer true to
chrec_contains_symbols?

Index: tree-scalar-evolution.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v
retrieving revision 2.16
diff -d -u -p -r2.16 tree-scalar-evolution.c
--- tree-scalar-evolution.c	18 Jan 2005 11:36:26 -0000	2.16
+++ tree-scalar-evolution.c	27 Jan 2005 13:12:36 -0000
@@ -2490,7 +2490,7 @@ scev_reset (void)
   for (i = 1; i < current_loops->num; i++)
     {
       loop = current_loops->parray[i];
-      if (loop)
+      if (loop && chrec_contains_symbols (loop->nb_iterations))
 	loop->nb_iterations = NULL_TREE;
     }
 }

Comment 50 Zdenek Dvorak 2005-01-27 15:00:13 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> ------- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  2005-01-27 13:18 -------
> Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)
> 
> With the following patch I got some speedup for depth 100.
> 
> from: 
>  tree iv optimization  :   2.62 (62%) usr   0.27 (82%) sys   2.92 (62%) wall
>  tree iv optimization  :   2.33 (59%) usr   0.25 (83%) sys   2.63 (54%) wall
> 
> to:
>  tree iv optimization  :   1.19 (46%) usr   0.04 (40%) sys   1.26 (45%) wall
>  tree iv optimization  :   1.21 (47%) usr   0.05 (45%) sys   1.30 (46%) wall
> 
> Basically we're reseting too much information each time scev_reset is
> called.  It would even be possible to reset only the part of the scev
> database that contains symbols instead of erasing everything.
> 
> Do somebody know how to iterate over all the elements of the hash
> setting to NULL_TREE the elements that answer true to
> chrec_contains_symbols?

the patch is below (in stronger form -- only removing entries that
contain invalidated symbols), and indeed helps with this testcase.
It however causes measurable slow down on normal code (see analysis
in one of the previous mails).

Note that your patch is not entirely correct -- in case loop is
unrolled, its number of iterations is changed, so in this case
scev_reset should clear its number of iterations unconditionally
(I think something similar occurs with vectorizer, so we need to
be a bit careful).

Index: tree-loop-linear.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-loop-linear.c,v
retrieving revision 2.6
diff -c -3 -p -r2.6 tree-loop-linear.c
*** tree-loop-linear.c	18 Jan 2005 11:36:24 -0000	2.6
--- tree-loop-linear.c	24 Jan 2005 01:34:04 -0000
*************** linear_transform_loops (struct loops *lo
*** 373,379 ****
        free_data_refs (datarefs);
      }
    free_df ();
!   scev_reset ();
    rewrite_into_loop_closed_ssa ();
  #ifdef ENABLE_CHECKING
    verify_loop_closed_ssa ();
--- 373,379 ----
        free_data_refs (datarefs);
      }
    free_df ();
!   scev_reset (NULL);
    rewrite_into_loop_closed_ssa ();
  #ifdef ENABLE_CHECKING
    verify_loop_closed_ssa ();
Index: tree-scalar-evolution.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v
retrieving revision 2.16
diff -c -3 -p -r2.16 tree-scalar-evolution.c
*** tree-scalar-evolution.c	18 Jan 2005 11:36:26 -0000	2.16
--- tree-scalar-evolution.c	24 Jan 2005 01:34:05 -0000
*************** scev_initialize (struct loops *loops)
*** 2475,2484 ****
        loops->parray[i]->nb_iterations = NULL_TREE;
  }
  
! /* Cleans up the information cached by the scalar evolutions analysis.  */
  
  void
! scev_reset (void)
  {
    unsigned i;
    struct loop *loop;
--- 2475,2539 ----
        loops->parray[i]->nb_iterations = NULL_TREE;
  }
  
! /* Returns true if EXPR references one of the ssa names in set
!    INVALIDATED_NAMES.  */
! 
! static bool
! tree_references_names (tree expr, bitmap invalidated_names)
! {
!   if (!expr)
!     return false;
! 
!   if (TREE_CODE (expr) == SSA_NAME)
!     return bitmap_bit_p (invalidated_names, SSA_NAME_VERSION (expr));
! 
!   switch (TREE_CODE_LENGTH (TREE_CODE (expr)))
!     {
!     case 4:
!       if (tree_references_names (TREE_OPERAND (expr, 3), invalidated_names))
! 	return true;
!     case 3:
!       if (tree_references_names (TREE_OPERAND (expr, 2), invalidated_names))
! 	return true;
!     case 2:
!       if (tree_references_names (TREE_OPERAND (expr, 1), invalidated_names))
! 	return true;
!     case 1:
!       if (tree_references_names (TREE_OPERAND (expr, 0), invalidated_names))
! 	return true;
!     case 0:
!       return false;
! 
!     default:
!       /* Don't worry about handling strange cases.  This function is only used
! 	 in a way in that it does not matter if we return true here.  */
!       return true;
!     }
! }
! 
! /* If the SLOT in the scev cache contains ssa name belonging to the set
!    passed in DATA, the function removes it from the cache.  Callback
!    for htab_traverse.  */
! 
! static int
! invalidate_names_reference (void **slot, void *data)
! {
!   bitmap invalidated_names = data;
!   struct scev_info_str *elt = *slot;
! 
!   if (tree_references_names (elt->var, invalidated_names)
!       || tree_references_names (elt->chrec, invalidated_names))
!     htab_clear_slot (scalar_evolution_info, slot);
! 
!   return 1;
! }
! 
! /* Cleans up the information cached by the scalar evolutions analysis.
!    If INVALIDATED_NAMES is provided, only the references to invalidated
!    ssa names are removed.  */
  
  void
! scev_reset (bitmap invalidated_names)
  {
    unsigned i;
    struct loop *loop;
*************** scev_reset (void)
*** 2486,2497 ****
    if (!scalar_evolution_info || !current_loops)
      return;
  
!   htab_empty (scalar_evolution_info);
    for (i = 1; i < current_loops->num; i++)
      {
        loop = current_loops->parray[i];
!       if (loop)
! 	loop->nb_iterations = NULL_TREE;
      }
  }
  
--- 2541,2564 ----
    if (!scalar_evolution_info || !current_loops)
      return;
  
!   if (invalidated_names)
!     htab_traverse (scalar_evolution_info, invalidate_names_reference,
! 		   invalidated_names);
!   else
!     htab_empty (scalar_evolution_info);
! 
    for (i = 1; i < current_loops->num; i++)
      {
        loop = current_loops->parray[i];
!       if (!loop
! 	  || !loop->nb_iterations)
! 	continue;
! 	  
!       if (invalidated_names
! 	  && !tree_references_names (loop->nb_iterations, invalidated_names))
! 	continue;
! 
!       loop->nb_iterations = NULL_TREE;
      }
  }
  
Index: tree-scalar-evolution.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.h,v
retrieving revision 2.3
diff -c -3 -p -r2.3 tree-scalar-evolution.h
*** tree-scalar-evolution.h	17 Nov 2004 22:06:00 -0000	2.3
--- tree-scalar-evolution.h	24 Jan 2005 01:34:05 -0000
*************** extern tree number_of_iterations_in_loop
*** 26,32 ****
  extern tree get_loop_exit_condition (struct loop *);
  
  extern void scev_initialize (struct loops *loops);
! extern void scev_reset (void);
  extern void scev_finalize (void);
  extern tree analyze_scalar_evolution (struct loop *, tree);
  extern tree instantiate_parameters (struct loop *, tree);
--- 26,32 ----
  extern tree get_loop_exit_condition (struct loop *);
  
  extern void scev_initialize (struct loops *loops);
! extern void scev_reset (bitmap);
  extern void scev_finalize (void);
  extern tree analyze_scalar_evolution (struct loop *, tree);
  extern tree instantiate_parameters (struct loop *, tree);
Index: tree-ssa-loop-ivcanon.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivcanon.c,v
retrieving revision 2.5
diff -c -3 -p -r2.5 tree-ssa-loop-ivcanon.c
*** tree-ssa-loop-ivcanon.c	1 Oct 2004 18:26:31 -0000	2.5
--- tree-ssa-loop-ivcanon.c	24 Jan 2005 01:34:05 -0000
*************** canonicalize_induction_variables (struct
*** 262,268 ****
  
    /* Clean up the information about numbers of iterations, since brute force
       evaluation could reveal new information.  */
!   scev_reset ();
  
  #if 0
    /* The necessary infrastructure is not in yet.  */
--- 262,268 ----
  
    /* Clean up the information about numbers of iterations, since brute force
       evaluation could reveal new information.  */
!   scev_reset (NULL);
  
  #if 0
    /* The necessary infrastructure is not in yet.  */
*************** tree_unroll_loops_completely (struct loo
*** 294,300 ****
  
    /* Clean up the information about numbers of iterations, since complete
       unrolling might have invalidated it.  */
!   scev_reset ();
  
  #if 0
    /* The necessary infrastructure is not in yet.  */
--- 294,300 ----
  
    /* Clean up the information about numbers of iterations, since complete
       unrolling might have invalidated it.  */
!   scev_reset (NULL);
  
  #if 0
    /* The necessary infrastructure is not in yet.  */
Index: tree-ssa-loop-ivopts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v
retrieving revision 2.42
diff -c -3 -p -r2.42 tree-ssa-loop-ivopts.c
*** tree-ssa-loop-ivopts.c	19 Jan 2005 22:50:04 -0000	2.42
--- tree-ssa-loop-ivopts.c	24 Jan 2005 01:34:05 -0000
*************** struct ivopts_data
*** 208,213 ****
--- 208,216 ----
    /* The bitmap of indices in version_info whose value was changed.  */
    bitmap relevant;
  
+   /* The set of ssa names whose scev info might get invalidated.  */
+   bitmap invalidated_names;
+ 
    /* The maximum invariant id.  */
    unsigned max_inv_id;
  
*************** tree_ssa_iv_optimize_init (struct loops 
*** 651,656 ****
--- 654,660 ----
    data->version_info = xcalloc (data->version_info_size,
  				sizeof (struct version_info));
    data->relevant = BITMAP_XMALLOC ();
+   data->invalidated_names = BITMAP_XMALLOC ();
    data->important_candidates = BITMAP_XMALLOC ();
    data->max_inv_id = 0;
  
*************** remove_unused_ivs (struct ivopts_data *d
*** 5016,5022 ****
  	  && !info->inv_id
  	  && !info->iv->have_use_for
  	  && !info->preserve_biv)
! 	remove_statement (SSA_NAME_DEF_STMT (info->iv->ssa_name), true);
      }
  }
  
--- 5020,5029 ----
  	  && !info->inv_id
  	  && !info->iv->have_use_for
  	  && !info->preserve_biv)
! 	{
! 	  bitmap_set_bit (data->invalidated_names, j);
! 	  remove_statement (SSA_NAME_DEF_STMT (info->iv->ssa_name), true);
! 	}
      }
  }
  
*************** free_loop_data (struct ivopts_data *data
*** 5041,5046 ****
--- 5048,5054 ----
        info->inv_id = 0;
      }
    bitmap_clear (data->relevant);
+   bitmap_clear (data->invalidated_names);
    bitmap_clear (data->important_candidates);
  
    for (i = 0; i < n_iv_uses (data); i++)
*************** tree_ssa_iv_optimize_finalize (struct lo
*** 5104,5109 ****
--- 5112,5118 ----
    free_loop_data (data);
    free (data->version_info);
    BITMAP_XFREE (data->relevant);
+   BITMAP_XFREE (data->invalidated_names);
    BITMAP_XFREE (data->important_candidates);
  
    VARRAY_FREE (decl_rtl_to_reset);
*************** tree_ssa_iv_optimize_loop (struct ivopts
*** 5177,5183 ****
    /* We have changed the structure of induction variables; it might happen
       that definitions in the scev database refer to some of them that were
       eliminated.  */
!   scev_reset ();
  
  finish:
    free_loop_data (data);
--- 5186,5192 ----
    /* We have changed the structure of induction variables; it might happen
       that definitions in the scev database refer to some of them that were
       eliminated.  */
!   scev_reset (data->invalidated_names);
  
  finish:
    free_loop_data (data);
Index: tree-ssa-loop-manip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-manip.c,v
retrieving revision 2.21
diff -c -3 -p -r2.21 tree-ssa-loop-manip.c
*** tree-ssa-loop-manip.c	29 Nov 2004 17:53:47 -0000	2.21
--- tree-ssa-loop-manip.c	24 Jan 2005 01:34:05 -0000
*************** tree_duplicate_loop_to_header_edge (stru
*** 633,639 ****
    	set_phi_def_stmts (bb->rbi->original);
      }
  
!   scev_reset ();
  #ifdef ENABLE_CHECKING
    verify_loop_closed_ssa ();
  #endif
--- 633,639 ----
    	set_phi_def_stmts (bb->rbi->original);
      }
  
!   scev_reset (NULL);
  #ifdef ENABLE_CHECKING
    verify_loop_closed_ssa ();
  #endif
Index: tree-ssa-loop.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop.c,v
retrieving revision 2.23
diff -c -3 -p -r2.23 tree-ssa-loop.c
*** tree-ssa-loop.c	26 Nov 2004 06:42:25 -0000	2.23
--- tree-ssa-loop.c	24 Jan 2005 01:34:05 -0000
*************** tree_ssa_loop_bounds (void)
*** 306,312 ****
      return;
  
    estimate_numbers_of_iterations (current_loops);
!   scev_reset ();
  }
  
  struct tree_opt_pass pass_record_bounds =
--- 306,312 ----
      return;
  
    estimate_numbers_of_iterations (current_loops);
!   scev_reset (NULL);
  }
  
  struct tree_opt_pass pass_record_bounds =
Index: tree-vectorizer.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-vectorizer.c,v
retrieving revision 2.62
diff -c -3 -p -r2.62 tree-vectorizer.c
*** tree-vectorizer.c	18 Jan 2005 11:36:29 -0000	2.62
--- tree-vectorizer.c	24 Jan 2005 01:34:05 -0000
*************** vect_do_peeling_for_loop_bound (loop_vec
*** 3258,3264 ****
    vect_update_ivs_after_vectorizer (loop, ratio_mult_vf_name, update_e); 
  
    /* After peeling we have to reset scalar evolution analyzer.  */
!   scev_reset ();
  
    return;
  }
--- 3258,3264 ----
    vect_update_ivs_after_vectorizer (loop, ratio_mult_vf_name, update_e); 
  
    /* After peeling we have to reset scalar evolution analyzer.  */
!   scev_reset (NULL);
  
    return;
  }
*************** vect_do_peeling_for_alignment (loop_vec_
*** 3442,3448 ****
    vect_update_inits_of_drs (loop_vinfo, niters_of_prolog_loop);
  
    /* After peeling we have to reset scalar evolution analyzer.  */
!   scev_reset ();
  
    return;
  }
--- 3442,3448 ----
    vect_update_inits_of_drs (loop_vinfo, niters_of_prolog_loop);
  
    /* After peeling we have to reset scalar evolution analyzer.  */
!   scev_reset (NULL);
  
    return;
  }

Comment 51 sebastian.pop@cri.ensmp.fr 2005-01-27 15:12:35 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> the patch is below (in stronger form -- only removing entries that
> contain invalidated symbols), and indeed helps with this testcase.

Many thanks.

> It however causes measurable slow down on normal code (see analysis
> in one of the previous mails).
> 

I see. 

Another idea: would it be possible to insert the invalidated names
during the optimization pass instead of invalidating all the symbols?
This would avoid the first pass over the scev database that search for
symbols.

> Note that your patch is not entirely correct -- in case loop is
> unrolled, its number of iterations is changed, so in this case
> scev_reset should clear its number of iterations unconditionally

or the unroller should do that work on the unrolled loops?

Comment 52 Zdenek Dvorak 2005-01-27 19:10:38 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

> Another idea: would it be possible to insert the invalidated names
> during the optimization pass instead of invalidating all the symbols?
> This would avoid the first pass over the scev database that search for
> symbols.

I don't understand this proposal.

> > Note that your patch is not entirely correct -- in case loop is
> > unrolled, its number of iterations is changed, so in this case
> > scev_reset should clear its number of iterations unconditionally
> 
> or the unroller should do that work on the unrolled loops?

This seems to be the right approach.

Comment 53 sebastian.pop@cri.ensmp.fr 2005-01-27 20:38:10 UTC

Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
> > Another idea: would it be possible to insert the invalidated names
> > during the optimization pass instead of invalidating all the symbols?
> > This would avoid the first pass over the scev database that search for
> > symbols.
> 
> I don't understand this proposal.
> 

Anyway, I see that I was wrong: even if you mark the SSA_NAMEs that
are removed in an optimization, and that you invalidate only those
symbols, you still have to walk the scev database to ensure that there
is no other evolution that references the removed SSA_NAMEs.

Comment 54 Andrew Pinski 2005-02-13 23:14:58 UTC

Moving to 4.1 as I don't care much if this gets fixed.

Comment 55 Mark Mitchell 2005-10-31 01:48:51 UTC

I wouldn't worry too much about exceedingly deep loop nests.  Even the scientific code I've looked at rarely has loop nests deeper than ten -- and if it does, optimizing the inner ten loops is probably good enough.

So, I'd suggest that we add a --param here for max-loop-nest-depth, and then just not do this stuff on deeper nests, or ignore all the outer loops in that case.

Is that feasible?

Marked as P4; I think we should fix this, but I don't think it needs to happen for 4.1.  I'll revise for 4.2.

Comment 56 sebastian.pop@cri.ensmp.fr 2005-11-03 13:24:44 UTC

Subject: Re:  [4.0/4.1 Regression] IV-OPTS is O(N^3)

> 
> So, I'd suggest that we add a --param here for max-loop-nest-depth, and then
> just not do this stuff on deeper nests, or ignore all the outer loops in that
> case.
> 
> Is that feasible?
> 

Mark, I think that this has already been fixed by the patch that added
PARAM_SCEV_MAX_EXPR_SIZE: we give up when the expressions that we
handle are longer than this size (if I remember correctly this
defaults to 20).  For the example in this bug,

{	int j0, j1, j2, j3, j4, j5, j6, j7, j8, j9, a;
	a = 0;
	for (j0 = 0; j0 < 2; j0 ++)
	for (j1 = j0; j1 < 2; j1 ++)
	for (j2 = j1; j2 < 2; j2 ++)
	for (j3 = j2; j3 < 2; j3 ++)
	for (j4 = j3; j4 < 2; j4 ++)
	for (j5 = j4; j5 < 2; j5 ++)
	for (j6 = j5; j6 < 2; j6 ++)
	for (j7 = j6; j7 < 2; j7 ++)
	for (j8 = j7; j8 < 2; j8 ++)
	for (j9 = j8; j9 < 2; j9 ++)
	a += j0 + j1 + j2 + j3 + j4 + j5 + j6 + j7 + j8 + j9;
	return a;
}

we would have a scev with 10 dimensions, i.e. an evolution in each
dimension, thus an expression with about 10 components.

Some numbers for mainline with -O2 (on amd64 with cpufreq):

time ./gcc/cc1 -O2 ~/ex/pr18595_10.c 
real	0m0.345s
user	0m0.089s
sys	0m0.017s

time ./gcc/cc1 -O2 ~/ex/pr18595_100.c 
 tree PRE              :   0.28 (21%) usr   0.02 (50%) sys   0.29 (20%) wall    7742 kB (59%) ggc
 tree loop bounds      :   0.14 (10%) usr   0.00 ( 0%) sys   0.15 (10%) wall       0 kB ( 0%) ggc
real	0m1.776s
user	0m1.348s
sys	0m0.050s

time ./gcc/cc1 -O2 ~/ex/pr18595_200.c 
 tree PRE              :   1.53 (24%) usr   0.09 (60%) sys   1.63 (25%) wall   31149 kB (74%) ggc
 tree loop bounds      :   1.21 (19%) usr   0.00 ( 0%) sys   1.21 (18%) wall       0 kB ( 0%) ggc
real	0m7.077s
user	0m6.363s
sys	0m0.167s

time ./gcc/cc1 -O2 ~/ex/pr18595_300.c
 tree PRE              :   4.69 (28%) usr   0.21 (68%) sys   5.03 (29%) wall   69961 kB (80%) ggc
 tree loop bounds      :   4.03 (24%) usr   0.00 ( 0%) sys   4.06 (23%) wall       0 kB ( 0%) ggc
real	0m18.126s
user	0m17.022s
sys	0m0.333s

Now, if we remove PRE, we can even compile the case with 1000 nested
loops in a reasonable time:

time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_300.c 
 tree loop bounds      :   0.09 ( 4%) usr   0.00 ( 0%) sys   0.24 ( 9%) wall       0 kB ( 0%) ggc
real	0m3.184s
user	0m2.147s
sys	0m0.054s

time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_1000.c 
 tree loop bounds      :   2.00 ( 8%) usr   0.00 ( 0%) sys   2.00 ( 8%) wall       0 kB ( 0%) ggc
real	0m27.171s
user	0m26.298s
sys	0m0.169s

So, I think that we can safely close this PR.

Seb

Comment 57 sebastian.pop@cri.ensmp.fr 2005-11-03 15:02:35 UTC

Subject: Re:  [4.0/4.1 Regression] IV-OPTS is O(N^3)

sebastian dot pop at cri dot ensmp dot fr wrote:
> 
> So, I think that we can safely close this PR.
> 

The previous numbers were with mainline + patch, otherwise mainline
gets an ice as described in PR24483.  I only now realized this, sorry.

Seb

Comment 58 sebastian.pop@cri.ensmp.fr 2005-11-03 19:31:13 UTC

Subject: Re:  [4.0/4.1 Regression] IV-OPTS is O(N^3)

Here are again the numbers for mainline with no other patch:

time ./gcc/cc1 -O2 ~/ex/pr18595_10.c 
real    0m0.164s
user    0m0.116s
sys     0m0.018s

time ./gcc/cc1 -O2 ~/ex/pr18595_100.c 
 tree PRE              :   0.33 ( 3%) usr   0.01 ( 3%) sys   0.34 ( 3%) wall    7742 kB ( 2%) ggc
 tree loop bounds      :   0.77 ( 7%) usr   0.03 (10%) sys   0.80 ( 7%) wall   53463 kB (17%) ggc
 tree iv optimization  :   7.70 (74%) usr   0.19 (66%) sys   7.97 (73%) wall  220347 kB (71%) ggc
real    0m10.912s
user    0m10.448s
sys     0m0.317s

For 200 nested loops it begins to swap even with -fno-tree-pre, so
we'd need a patch for limiting this.

Comment 59 sebastian.pop@cri.ensmp.fr 2005-11-03 20:10:54 UTC

Subject: Re:  [4.0/4.1 Regression] IV-OPTS is O(N^3)

Here is a first patch that uses PARAM_SCEV_MAX_EXPR_SIZE for limiting
the size of expressions that we want to handle.  I will send it to
gcc-patches once it bootstraps, etc.

time ./gcc/cc1 -O2 ~/ex/pr18595_100.c
 tree loop bounds      :   0.51 (17%) usr   0.03 (23%) sys   0.53 (17%) wall   32717 kB (29%) ggc
 tree iv optimization  :   1.23 (41%) usr   0.06 (46%) sys   1.29 (40%) wall   64971 kB (58%) ggc
real	0m3.995s
user	0m3.015s
sys	0m0.147s

time ./gcc/cc1 -O2 ~/ex/pr18595_200.c
 tree loop bounds      :   2.75 (23%) usr   0.05 (13%) sys   2.80 (22%) wall   72775 kB (22%) ggc
 tree iv optimization  :   3.74 (31%) usr   0.20 (51%) sys   4.30 (33%) wall  210289 kB (64%) ggc
real	0m13.316s
user	0m12.163s
sys	0m0.423s

time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_300.c
 tree loop bounds      :   0.16 ( 3%) usr   0.00 ( 0%) sys   0.15 ( 3%) wall     508 kB ( 1%) ggc
 tree iv optimization  :   2.26 (47%) usr   0.08 (62%) sys   2.33 (46%) wall   74743 kB (84%) ggc
real	0m6.287s
user	0m4.849s
sys	0m0.142s

time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_1000.c
 tree loop bounds      :   2.92 ( 4%) usr   0.00 ( 0%) sys   2.92 ( 4%) wall     913 kB ( 0%) ggc
 tree iv optimization  :  37.78 (54%) usr   0.91 (65%) sys  41.15 (52%) wall  786700 kB (93%) ggc
real	1m19.611s
user	1m10.288s
sys	0m1.403s


Index: tree-scalar-evolution.c
===================================================================
--- tree-scalar-evolution.c	(revision 106440)
+++ tree-scalar-evolution.c	(working copy)
@@ -1982,7 +1982,8 @@ loop_closed_phi_def (tree var)
 /* Analyze all the parameters of the chrec that were left under a symbolic form,
    with respect to LOOP.  CHREC is the chrec to instantiate.  CACHE is the cache
    of already instantiated values.  FLAGS modify the way chrecs are
-   instantiated.  */
+   instantiated.  SIZE_EXPR is used for computing the size of the expression to
+   be instantiated, and to stop if it exceeds some limit.  */
 
 /* Values for FLAGS.  */
 enum
@@ -1995,12 +1996,17 @@ enum
 };
   
 static tree
-instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t cache)
+instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t cache,
+			  int size_expr)
 {
   tree res, op0, op1, op2;
   basic_block def_bb;
   struct loop *def_loop;
 
+  /* Give up if the path is longer than the MAX that we allow.  */
+  if (size_expr++ > PARAM_VALUE (PARAM_SCEV_MAX_EXPR_SIZE))
+    return chrec_dont_know;
+
   if (automatically_generated_chrec_p (chrec)
       || is_gimple_min_invariant (chrec))
     return chrec;
@@ -2068,7 +2074,7 @@ instantiate_parameters_1 (struct loop *l
 	}
 
       else if (res != chrec_dont_know)
-	res = instantiate_parameters_1 (loop, res, flags, cache);
+	res = instantiate_parameters_1 (loop, res, flags, cache, size_expr);
 
       bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec));
 
@@ -2078,12 +2084,12 @@ instantiate_parameters_1 (struct loop *l
 
     case POLYNOMIAL_CHREC:
       op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op0 == chrec_dont_know)
 	return chrec_dont_know;
 
       op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op1 == chrec_dont_know)
 	return chrec_dont_know;
 
@@ -2094,12 +2100,12 @@ instantiate_parameters_1 (struct loop *l
 
     case PLUS_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op0 == chrec_dont_know)
 	return chrec_dont_know;
 
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op1 == chrec_dont_know)
 	return chrec_dont_know;
 
@@ -2110,12 +2116,12 @@ instantiate_parameters_1 (struct loop *l
 
     case MINUS_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op0 == chrec_dont_know)
 	return chrec_dont_know;
 
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op1 == chrec_dont_know)
 	return chrec_dont_know;
 
@@ -2126,12 +2132,12 @@ instantiate_parameters_1 (struct loop *l
 
     case MULT_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op0 == chrec_dont_know)
 	return chrec_dont_know;
 
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op1 == chrec_dont_know)
 	return chrec_dont_know;
 
@@ -2144,7 +2150,7 @@ instantiate_parameters_1 (struct loop *l
     case CONVERT_EXPR:
     case NON_LVALUE_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op0 == chrec_dont_know)
         return chrec_dont_know;
 
@@ -2174,17 +2180,17 @@ instantiate_parameters_1 (struct loop *l
     {
     case 3:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op0 == chrec_dont_know)
 	return chrec_dont_know;
 
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op1 == chrec_dont_know)
 	return chrec_dont_know;
 
       op2 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 2),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op2 == chrec_dont_know)
         return chrec_dont_know;
 
@@ -2198,12 +2204,12 @@ instantiate_parameters_1 (struct loop *l
 
     case 2:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op0 == chrec_dont_know)
 	return chrec_dont_know;
 
       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op1 == chrec_dont_know)
         return chrec_dont_know;
 
@@ -2214,7 +2220,7 @@ instantiate_parameters_1 (struct loop *l
 	    
     case 1:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-				      flags, cache);
+				      flags, cache, size_expr);
       if (op0 == chrec_dont_know)
         return chrec_dont_know;
       if (op0 == TREE_OPERAND (chrec, 0))
@@ -2252,7 +2258,8 @@ instantiate_parameters (struct loop *loo
       fprintf (dump_file, ")\n");
     }
  
-  res = instantiate_parameters_1 (loop, chrec, INSERT_SUPERLOOP_CHRECS, cache);
+  res = instantiate_parameters_1 (loop, chrec, INSERT_SUPERLOOP_CHRECS, cache,
+				  0);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -2275,7 +2282,7 @@ static tree
 resolve_mixers (struct loop *loop, tree chrec)
 {
   htab_t cache = htab_create (10, hash_scev_info, eq_scev_info, del_scev_info);
-  tree ret = instantiate_parameters_1 (loop, chrec, FOLD_CONVERSIONS, cache);
+  tree ret = instantiate_parameters_1 (loop, chrec, FOLD_CONVERSIONS, cache, 0);
   htab_delete (cache);
   return ret;
 }

Comment 60 Andrew Pinski 2005-11-08 01:40:50 UTC

Fixed on the mainline.