Bug 28071 - [4.1 regression] A file that can not be compiled in reasonable time/space
Summary: [4.1 regression] A file that can not be compiled in reasonable time/space
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.0.2
: P2 normal
Target Milestone: 4.2.0
Assignee: Jan Hubicka
URL:
Keywords: memory-hog
Depends on: 28075
Blocks:
  Show dependency treegraph
 
Reported: 2006-06-17 09:24 UTC by Christophe Raffalli
Modified: 2023-07-28 08:40 UTC (History)
13 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.4.6 4.0.2 4.2.0
Known to fail: 4.1.2
Last reconfirmed: 2006-06-17 10:18:56


Attachments
a file that gcc can not compile with -O (173.59 KB, text/plain)
2006-06-17 09:27 UTC, Christophe Raffalli
Details
bug2.c.099t.optimized (251.49 KB, text/plain)
2006-07-22 18:09 UTC, Jan Hubicka
Details
regmovefix (1.52 KB, text/plain)
2006-07-22 19:30 UTC, Jan Hubicka
Details
intossaspeedup (621 bytes, text/plain)
2006-07-22 20:51 UTC, Jan Hubicka
Details
patch to resolve some of the SSA to Normal slowdowns. (6.85 KB, patch)
2006-08-25 01:37 UTC, Andrew Macleod
Details | Diff
Patch for the remaining SSA to Normal time issues (4.24 KB, patch)
2006-08-25 01:42 UTC, Andrew Macleod
Details | Diff
Patch for scheduler dependency lists. (19.16 KB, patch)
2007-01-10 11:42 UTC, Maxim Kuvyrkov
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Raffalli 2006-06-17 09:24:19 UTC
The following file compiles in 30s with "gcc -c" and never compiles with "gcc -O -c"
Comment 1 Christophe Raffalli 2006-06-17 09:27:05 UTC
Created attachment 11687 [details]
a file that gcc can not compile with -O

just try gcc -c -O on this file !
(remark no problem with icc)
Comment 2 Steven Bosscher 2006-06-17 10:18:56 UTC
It actually does finish for me at -O with gcc 4.0.2.  It just takes an incredible amount of time and memory, but that doesn't surprise me so much, given the nature of this evil test case ;-)

With gcc 4.2 20060617, I can't compile the test case.  After a long time and after using up to 1.5 GB, the compiler dies with:
cc1: out of memory allocating 399751872 bytes after a total of 79527936 bytes

Comment 3 Steven Bosscher 2006-06-17 11:05:14 UTC
Caused by excessive inlining:

 inline heuristics     :  37.25 (25%) usr   0.04 ( 1%) sys  36.56 (15%) wall    2312 kB ( 0%) ggc
 integration           :  19.91 (13%) usr   1.49 (36%) sys  62.70 (26%) wall 1058857 kB (76%) ggc
Comment 4 Steven Bosscher 2006-06-17 11:05:56 UTC
Platform independent.  Honza, one for you I suppose.
Comment 5 Richard Biener 2006-06-17 18:18:08 UTC
Same with 4.1.  4.0.3 needs about 500MB ram at -O, while 4.1 get's killed with
cc1: out of memory allocating 1134939624 bytes after a total of 43368448 bytes
(though that first number looks "interesting")
Comment 6 Richard Biener 2006-06-17 18:44:56 UTC
Btw, we do not die during inlining, but during optimization which is confronted with one gigantic basic block, as all BBs after inlining are merged by fixupcfg ;)

Oh, and we die during RTL optimizations...  but I wonder why we are not able to free up some memory before (lower gc params help for this, and we enter greg with 250MB used and it still wants
cc1: out of memory allocating 1134939624 bytes after a total of 43487232 bytes

So, more something for Matz/Vladimir.
Comment 7 Christophe Raffalli 2006-06-19 08:44:12 UTC
Just for comparison: on my Intel dual core 3GHz,

icc compiles in 15s within 200Mb with -O3 (including cpp)


Comment 8 Jan Hubicka 2006-07-21 21:11:57 UTC
Hmm,
the function fi contains 30000 calls, many of called functions contains further calls. 
Since our metric allows to replace each call by up to 10 instructions and we allow fi to grow twice, we can end up with 600000 instructions in single basic block (in fact we do with roughly 390000 in the inliner metrics).  This is still linear growth and the testcase is rather extreme, so I am not sure if I would declare this inliner bug (user has asked for it by declaring stuff inline after all ;)

Without inlining we are not behaving much better (I am just running the compilation and it is at 900MB, so using 1GB for inlined function bodies don't seems to be that unresonable.  I will try to play with this a bit.

One solution might be to adjust our size estimates to be less aggressive for large functions so the growth in actual number of statements is not 20 fold at most but some smaller constant, but it is rather ugly.

Honza
Comment 9 Christophe Raffalli 2006-07-21 22:01:27 UTC
Subject: Re:  [4.1/4.2 regression] A file that
 can not be compiled in reasonable time/space

hubicka at gcc dot gnu dot org a écrit :
> ------- Comment #8 from hubicka at gcc dot gnu dot org  2006-07-21 21:11 -------
> Hmm,
> the function fi contains 30000 calls, many of called functions contains further
> calls. 
> Since our metric allows to replace each call by up to 10 instructions and we
> allow fi to grow twice, we can end up with 600000 instructions in single basic
> block (in fact we do with roughly 390000 in the inliner metrics).  This is
> still linear growth and the testcase is rather extreme, so I am not sure if I
> would declare this inliner bug (user has asked for it by declaring stuff inline
> after all ;)
>
> Without inlining we are not behaving much better (I am just running the
> compilation and it is at 900MB, so using 1GB for inlined function bodies don't
> seems to be that unresonable.  I will try to play with this a bit.
>
> One solution might be to adjust our size estimates to be less aggressive for
> large functions so the growth in actual number of statements is not 20 fold at
> most but some smaller constant, but it is rather ugly.
>
> Honza
>
>
>   
may be a look at the assembly code generated by icc which behave very 
well on this test case could be usefull ?

Christophe

Comment 10 Jan Hubicka 2006-07-22 13:47:42 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
this patch makes the -O2 case work pretty well on tree side.  Inliner
expands code from 8MB to 40MB of GGC memory that seems under control.
Aliasing peaks at 85MB that also don't seem completely unresonable.
I will need to give it more testing.  I believe inliner is always ggc
safe but it is easy to be mistaken here.
The patch also speeds up the inline heuristic by prunning out the
impossible edges early making the priority queue smaller.
Also I am quite curious how inliner manages to produce 800MB of
garbage...

Honza

Index: ipa-inline.c
===================================================================
*** ipa-inline.c	(revision 115645)
--- ipa-inline.c	(working copy)
*************** update_caller_keys (fibheap_t heap, stru
*** 413,418 ****
--- 413,419 ----
  		    bitmap updated_nodes)
  {
    struct cgraph_edge *edge;
+   const char *failed_reason;
  
    if (!node->local.inlinable || node->local.disregard_inline_limits
        || node->global.inlined_to)
*************** update_caller_keys (fibheap_t heap, stru
*** 421,426 ****
--- 422,441 ----
      return;
    bitmap_set_bit (updated_nodes, node->uid);
    node->global.estimated_growth = INT_MIN;
+ 
+   if (!node->local.inlinable)
+     return;
+   /* Prune out edges we won't inline into anymore.  */
+   if (!cgraph_default_inline_p (node, &failed_reason))
+     {
+       for (edge = node->callers; edge; edge = edge->next_caller)
+ 	if (edge->aux)
+ 	  {
+ 	    fibheap_delete_node (heap, edge->aux);
+ 	    edge->aux = NULL;
+ 	  }
+       return;
+     }
  
    for (edge = node->callers; edge; edge = edge->next_caller)
      if (edge->inline_failed)
Index: tree-inline.c
===================================================================
*** tree-inline.c	(revision 115645)
--- tree-inline.c	(working copy)
*************** expand_call_inline (basic_block bb, tree
*** 2163,2172 ****
    /* Update callgraph if needed.  */
    cgraph_remove_node (cg_edge->callee);
  
-   /* Declare the 'auto' variables added with this inlined body.  */
-   record_vars (BLOCK_VARS (id->block));
    id->block = NULL_TREE;
    successfully_inlined = TRUE;
  
   egress:
    input_location = saved_location;
--- 2163,2171 ----
    /* Update callgraph if needed.  */
    cgraph_remove_node (cg_edge->callee);
  
    id->block = NULL_TREE;
    successfully_inlined = TRUE;
+   ggc_collect ();
  
   egress:
    input_location = saved_location;
*************** declare_inline_vars (tree block, tree va
*** 2556,2562 ****
  {
    tree t;
    for (t = vars; t; t = TREE_CHAIN (t))
!     DECL_SEEN_IN_BIND_EXPR_P (t) = 1;
  
    if (block)
      BLOCK_VARS (block) = chainon (BLOCK_VARS (block), vars);
--- 2555,2567 ----
  {
    tree t;
    for (t = vars; t; t = TREE_CHAIN (t))
!     {
!       DECL_SEEN_IN_BIND_EXPR_P (t) = 1;
!       gcc_assert (!TREE_STATIC (t) && !TREE_ASM_WRITTEN (t));
!       cfun->unexpanded_var_list =
! 	tree_cons (NULL_TREE, t,
! 		   cfun->unexpanded_var_list);
!     }
  
    if (block)
      BLOCK_VARS (block) = chainon (BLOCK_VARS (block), vars);
Comment 11 Jan Hubicka 2006-07-22 17:12:46 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
this avoids inliner to produce quadratically many STMT list nodes, so
inlining is now resonably fast.  Next offenders are alias info, PRE,
regmove, global alloc and schedulers.

Index: tree-cfg.c
===================================================================
*** tree-cfg.c	(revision 115645)
--- tree-cfg.c	(working copy)
*************** tree_redirect_edge_and_branch_force (edg
*** 4158,4164 ****
  static basic_block
  tree_split_block (basic_block bb, void *stmt)
  {
!   block_stmt_iterator bsi, bsi_tgt;
    tree act;
    basic_block new_bb;
    edge e;
--- 4158,4165 ----
  static basic_block
  tree_split_block (basic_block bb, void *stmt)
  {
!   block_stmt_iterator bsi;
!   tree_stmt_iterator tsi_tgt;
    tree act;
    basic_block new_bb;
    edge e;
*************** tree_split_block (basic_block bb, void *
*** 4192,4204 ****
  	}
      }
  
!   bsi_tgt = bsi_start (new_bb);
!   while (!bsi_end_p (bsi))
!     {
!       act = bsi_stmt (bsi);
!       bsi_remove (&bsi, false);
!       bsi_insert_after (&bsi_tgt, act, BSI_NEW_STMT);
!     }
  
    return new_bb;
  }
--- 4193,4209 ----
  	}
      }
  
!   if (bsi_end_p (bsi))
!     return new_bb;
! 
!   /* Split the statement list - avoid re-creating new containers as this
!      brings ugly quadratic memory consumption in the inliner.  
!      (We are still quadratic since we need to update stmt BB pointers,
!      sadly) */
!   new_bb->stmt_list = tsi_split_statement_list_before (&bsi.tsi);
!   for (tsi_tgt = tsi_start (new_bb->stmt_list);
!        !tsi_end_p (tsi_tgt); tsi_next (&tsi_tgt))
!     set_bb_for_stmt (tsi_stmt (tsi_tgt), new_bb);
  
    return new_bb;
  }
Comment 12 Jan Hubicka 2006-07-22 18:09:17 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
I am attaching the .optimized dump of this testcase.  It is quite good
demonstration on how SRA and TER tends to increase register pressure in
code like:


;; Function add (add)

Analyzing Edge Insertions.
add (x, y)
{
  double r$min;

<bb 2>:
  r$min = x.min + y.min;
  <retval>.max = x.max + y.max;
  <retval>.min = r$min;
  return <retval>;

}

;; Function mul (mul)

Analyzing Edge Insertions.
mul (x, y)
{
  double y$min;
  double y$max;
  double x$min;
  double x$max;
  double d;
  double c;
  double b;
  double a;

<bb 2>:
  x$max = x.max;
  x$min = x.min;
  y$max = y.max;
  y$min = y.min;
  a = y$min * x$min;
  b = y$max * x$min;
  c = y$min * x$max;
  d = y$max * x$max;
  <retval>.max = max (max (a, b), max (c, d));
  <retval>.min = min (min (a, b), min (c, d));
  return <retval>;

}



;; Function fz (fz)

fz (x, y, z)
{

<bb 2>:
  tmp3 = pow (z, 3.7e+1);
  tmp7 = pow (y, 2.0e+0);
  tmp9 = pow (z, 3.6e+1);
  tmp14 = pow (y, 3.0e+0);
  tmp16 = pow (z, 3.5e+1);
...
  tmp3922 = pow (x, 3.8e+1);
  D.17848 = pow (x, 3.9e+1);
  D.17965 = pow (y, 3.9e+1);
  D.17968 = pow (z, 3.9e+1);
  return tmp3 * x * 2.04629333124046830505449179327115416526794433594e+1 * y + tmp9 * tmp7 * x * 1.63737898728226838329646852798759937286376953125e+2 + tmp16 * tmp14 * x * 3.102825991153964650948182679712772369384765625e+2 + tmp23 * tmp21 * x * -1.38580890184729059910750947892665863037109375e+3 + tmp30 * tmp28 * x * -4.39080063708386560961116629187017679214477539062e+1 + tmp37 * tmp35 * x * 1.737348223038549986085854470729827880859375e+4 + tmp44 * tmp42 * x * -1.069806869373114386689849197864532470703125e+4 + tmp51 * tmp49 * x * -3.542086638969252817332744598388671875e+4 + tmp58 * tmp56 * x * -3.091774346229622824466787278652191162109375e+4 + tmp65 * tmp63 * x * 1.5680886586212887777946889400482177734375e+5 + tmp72 * tmp70 * x * 4.19376520881160162389278411865234375e+5 + tmp79 * tmp77 * x * 2.0111082929561330820433795452117919921875e+5 + tmp86 * tmp84 * x * -4.337742627231603837572038173675537109375e+5 + tmp93 * tmp91 * x * -4.829501801337040960788726806640625e+5 + tmp100 * tmp98 * x * 5.32241994551055715419352054595947265625e+5 + tmp107 * tmp105 * x * 1.8250994926701225340366363525390625e+6 + tmp114 * tmp112 * x * 1.6382205795514374040067195892333984375e+6 + tmp121 * tmp119 * x * 1.1912621023960295133292675018310546875e+5 + tmp128 * tmp126 * x * 8.811503159726611338555812835693359375e+5 + tmp135 * tmp133 * x * 2.690164492243868880905210971832275390625e+5 + tmp142 * tmp140 * x * 2.271892026609037420712411403656005859375e+5 + tmp149 * tmp147 * x * 1.795814638975697453133761882781982421875e+5 + tmp156 * tmp154 * x * -3.94381184819339658133685588836669921875e+5 + tmp163 * tmp161 * x * 7.64450454622797551564872264862060546875e+5 + tmp170 * tmp168 * x * 6.9298171586054741055704653263092041015625e+4 + tmp177 * tmp175 * x * -3.129066099043917492963373661041259765625e+5 + tmp184 * tmp182 * x * -4.0792914801556640304625034332275390625e+5 + tmp191 * tmp189 * x * 7.3512920753349564620293676853179931640625e+4 + tmp198 * tmp196 * x * 3.5470695311840399881475605070590972900390625e+3 + tmp205 * tmp203 * x * -8.8733450804951236932538449764251708984375e+4 + tmp212 * tmp210 * x * -1.3805889644669676272314973175525665283203125e+4 + tmp219 * tmp217 * x * -7.54301319902873729006387293338775634765625e+3 + tmp226 * tmp224 * x * 2.23731170493404579246998764574527740478515625e+3 + tmp233 * tmp231 * x * -3.9037651153389475666699581779539585113525390625e+2 + tmp240 * tmp238 * x * 4.743319333283892547115101478993892669677734375e+2 + tmp247 * tmp245 * x * -6.32641294603530113249689748045057058334350585938e+1 + tmp252 * x * -6.76527508139541300380415123072452843189239501953e+0 * z + tmp258 * x * -4.51436297228304250772623618104262277483940124512e-1 + tmp263 * x * 2.89405090268957065902100111998151987791061401367e+0 + tmp9 * tmp268 * -3.7483157190701700756108039058744907379150390625e+2 * y + tmp16 * tmp7 * tmp268 * 9.276025613194925654170219786465167999267578125e+2 + tmp23 * tmp14 * tmp268 * 1.358400470188729514120495878159999847412109375e+2 + tmp30 * tmp21 * tmp268 * -3.2681330410168111484381370246410369873046875e+3 + tmp37 * tmp28 * tmp268 * 2.77737094612259534187614917755126953125e+3 + tmp44 * tmp35 * tmp268 * 2.2773056570869275674340315163135528564453125e+3 + tmp51 * tmp42 * tmp268 * 9.2295963366692260024137794971466064453125e+4 + tmp58 * tmp49 * tmp268 * -3.049601738325569895096123218536376953125e+5 + tmp65 * tmp56 * tmp268 * -2.69300746038850047625601291656494140625e+5 + tmp72 * tmp63 * tmp268 * 3.92479526798162725754082202911376953125e+5 + tmp79 * tmp70 * tmp268 * -1.4348648827185891568660736083984375e+6 + tmp86 * tmp77 * tmp268 * 1.2925352909364881925284862518310546875e+6 + tmp93 * tmp84 * tmp268 * 3.44742843619707785546779632568359375e+6 + tmp100 * tmp91 * tmp268 * 2.2975221813043109141290187835693359375e+6 + tmp107 * tmp98 * tmp268 * -8.753704570182035677134990692138671875e+5 + tmp114 * tmp105 * tmp268 * -4.683100195028461515903472900390625e+6 + tmp121 * tmp112 * tmp268 * -2.4950389851368105155415832996368408203125e+5 + tmp128 * tmp119 * tmp268 * 4.864730415365164168179035186767578125e+6 + tmp135 * tmp126 * tmp268 * -4.660151695632442715577781200408935546875e+5 + tmp142 * tmp133 * tmp268 * -6.7161351688091107644140720367431640625e+5 + tmp149 * tmp140 * tmp268 * -1.4141434789546797401271760463714599609375e+5 + tmp156 * tmp147 * tmp268 * -1.5259173265962512232363224029541015625e+6 + tmp163 * tmp154 * tmp268 * -7.40285312171890516765415668487548828125e+5 + tmp170 * tmp161 * tmp268 * 1.072791414269997738301753997802734375e+6 + tmp177 * tmp168 * tmp268 * -4.951253421552001382224261760711669921875e+5 + tmp184 * tmp175 * tmp268 * -1.05241366402662693872116506099700927734375e+5 + tmp191 * tmp182 * tmp268 * 2.0352227243198428186587989330291748046875e+5 + tmp198 * tmp189 * tmp268 * 1.3298028337804946932010352611541748046875e+5 + tmp205 * tmp196 * tmp268 * -6.6668077510616494691930711269378662109375e+4 + tmp212 * tmp203 * tmp268 * -5.17525810794326171162538230419158935546875e+4 + tmp219 * tmp210 * tmp268 * -8.1499322304497427467140369117259979248046875e+3 + tmp226 * tmp217 * tmp268 * 7.7733723892777788933017291128635406494140625e+3 + tmp233 * tmp224 * tmp268 * -2.143225547523337809252552688121795654296875e+3 + tmp240 * tmp231 * tmp268 * -8.7049279990650347826885990798473358154296875e+2 + tmp247 * tmp238 * tmp268 * 3.0833041233127761415744316764175891876220703125e+2 + tmp245 * tmp268 * -2.86594246589304226802141783991828560829162597656e+1 * z + tmp252 * tmp268 * 1.15628609452422050907216544146649539470672607422e+1 + tmp3 * tmp268 * 2.530432411832947536822757683694362640380859375e+1 + tmp16 * tmp457 * 1.3205680909865186549723148345947265625e+3 * y + tmp23 * tmp7 * tmp457 * -6.072741419595380648388527333736419677734375e+3 + tmp30 * tmp14 * tmp457 * 1.4301229031810655214940197765827178955078125e+4 + tmp37 * tmp21 * tmp457 * 1.2509849814464205337571911513805389404296875e+4 + tmp44 * tmp28 * tmp457 * 2.43755655239219777286052703857421875e+4 + tmp51 * tmp35 * tmp457 * 1.5025955822637255187146365642547607421875e+5 + tmp58 * tmp42 * tmp457 * -2.57449792538532870821654796600341796875e+5 + tmp65 * tmp49 * tmp457 * -6.18108468636372243054211139678955078125e+5 + tmp72 * tmp56 * tmp457 * -5.77129579276848933659493923187255859375e+5 + tmp79 * tmp63 * tmp457 * 8.2502991879217163659632205963134765625e+5 + tmp86 * tmp70 * tmp457 * -3.3274662617215062491595745086669921875e+6 + tmp93 * tmp77 * tmp457 * 6.39019438752098591066896915435791015625e+5 + tmp100 * tmp84 * tmp457 * -3.5095450540453977882862091064453125e+6 + tmp107 * tmp91 * tmp457 * -5.701980742367389611899852752685546875e+6 + tmp114 * tmp98 * tmp457 * 8.48527840505857206881046295166015625e+6 + tmp121 * tmp105 * tmp457 * 3.2467750119913811795413494110107421875e+6 + tmp128 * tmp112 * tmp457 * 2.1212157989888186566531658172607421875e+6 + tmp135 * tmp119 * tmp457 * 6.030377525842911563813686370849609375e+6 + tmp142 * tmp126 * tmp457 * -8.838882032796226441860198974609375e+6 + tmp149 * tmp133 * tmp457 * -2.08285087554152193479239940643310546875e+6 + tmp156 * tmp140 * tmp457 * 2.2503529974754941649734973907470703125e+6 + tmp163 * tmp147 * tmp457 * -6.995801159220845438539981842041015625e+6 + tmp170 * tmp154 * tmp457 * 6.716210355322583578526973724365234375e+6 + tmp177 * tmp161 * tmp457 * 1.19912664452435608836822211742401123046875e+5 + tmp184 * tmp168 * tmp457 * 1.17548020877087931148707866668701171875e+6 + tmp191 * tmp175 * tmp457 * -9.4537417097875251783989369869232177734375e+4 + tmp198 * tmp182 * tmp457 * 7.89964485756713547743856906890869140625e+5 + tmp205 * tmp189 * tmp457 * 1.52741514544914476573467254638671875e+5 + tmp212 * tmp196 * tmp457 * 2.791946326383915147744119167327880859375e+5 + tmp219 * tmp203 * tmp457 * -2.679505212665906219626776874065399169921875e+4 + tmp226 * tmp210 * tmp457 * -3.6525730859511895687319338321685791015625e+4 + tmp233 * tmp217 * tmp457 * 2.1418943770332829444669187068939208984375e+4 + tmp240 * tmp224 * tmp457 * 3.0843383887098834748030640184879302978515625e+3 + tmp247 * tmp231 * tmp457 * -9.9569611795820310362614691257476806640625e+2 + tmp238 * tmp457 * 2.564511516465935301312129013240337371826171875e+2 * z + tmp245 * tmp457 * 2.70656003026684537360324611654505133628845214844e+1 + tmp9 * tmp457 * -3.46369109036699356352073664311319589614868164062e+1 + tmp23 * tmp641 * -3.32806485927452058604103513062000274658203125e+3 * y + tmp30 * tmp7 * tmp641 * 1.234968261707164128893055021762847900390625e+4 + tmp37 * tmp14 * tmp641 * -2.8753344016540040684049017727375030517578125e+3 + tmp44 * tmp21 * tmp641 * 1.0114036156335461782873608171939849853515625e+4 + tmp51 * tmp28 * tmp641 * -1.49688347647457034327089786529541015625e+5 + tmp58 * tmp35 * tmp641 * -5.67623374289566534571349620819091796875e+5 + tmp65 * tmp42 * tmp641 * -4.42819365904183243401348590850830078125e+5 + tmp72 * tmp49 * tmp641 * -2.2845012135416171513497829437255859375e+6 + tmp79 * tmp56 * tmp641 * 1.59017671147860283963382244110107421875e+6 + tmp86 * tmp63 * tmp641 * 3.172225132318005780689418315887451171875e+5 + tmp93 * tmp70 * tmp641 * 6.949452887683830223977565765380859375e+6 + tmp100 * tmp77 * tmp641 * 2.005212832816918194293975830078125e+7 + tmp107 * tmp84 * tmp641 * -1.70683697845189571380615234375e+7 + tmp114 * tmp91 * tmp641 * 2.57033030682088024914264678955078125e+7 + tmp121 * tmp98 * tmp641 * -6.240241324918039143085479736328125e+6 + tmp128 * tmp105 * tmp641 * 7.28192108448500744998455047607421875e+6 + tmp135 * tmp112 * tmp641 * -6.668553789195828139781951904296875e+6 + tmp142 * tmp119 * tmp641 * 4.279670435295154340565204620361328125e+6 + tmp149 * tmp126 * tmp641 * 3.7841761433819212019443511962890625e+7 + tmp156 * tmp133 * tmp641 * -1.1735384500224292278289794921875e+7 + tmp163 * tmp140 * tmp641 * -1.02138700657811500132083892822265625e+7 + tmp170 * tmp147 * tmp641 * 4.0497243350173835642635822296142578125e+6 + tmp177 * tmp154 * tmp641 * -9.26496405551051162183284759521484375e+6 + tmp184 * tmp161 * tmp641 * 6.0643515210227929055690765380859375e+6 + tmp191 * tmp168 * tmp641 * 7.53476951245888951234519481658935546875e+5 + tmp198 * tmp175 * tmp641 * -4.46591788140458636917173862457275390625e+5 + tmp205 * tmp182 * tmp641 * -2.3236266487165386206470429897308349609375e+5 + tmp212 * tmp189 * tmp641 * 7.44194054349235841073095798492431640625e+5 + tmp219 * tmp196 * tmp641 * -1.075418140586107256240211427211761474609375e+4 + tmp226 * tmp203 * tmp641 * -6.514834348431314765548449940979480743408203125e+2 + tmp233 * tmp210 * tmp641 * 2.995416853417091260780580341815948486328125e+4 + tmp240 * tmp217 * tmp641 * 9.335514683708748862045467831194400787353515625e+2 + tmp247 * tmp224 * tmp641 * -3.9295324078555941014201380312442779541015625e+3 + tmp231 * tmp641 * 5.562684841171861762632033787667751312255859375e+2 * z + tmp238 * tmp641 * 4.0098599791658301683128229342401027679443359375e+1 + tmp16 * tmp641 * 1.722368240309973543844535015523433685302734375e+2 + tmp30 * tmp820 * 1.1062878068190211706678383052349090576171875e+3 * y + tmp37 * tmp7 * tmp820 * 3.251567569670028387918137013912200927734375e+4 + tmp44 * tmp14 * tmp820 * -5.1560599597941718457150273025035858154296875e+3 + tmp51 * tmp21 * tmp820 * 5.023688870652744662947952747344970703125e+4 + tmp58 * tmp28 * tmp820 * -3.133724041621834112447686493396759033203125e+4 + tmp65 * tmp35 * tmp820 * 6.0302757396407960914075374603271484375e+5 + tmp72 * tmp42 * tmp820 * -1.05701377140930178575217723846435546875e+6 + tmp79 * tmp49 * tmp820 * 1.4157320613848813809454441070556640625e+6 + tmp86 * tmp56 * tmp820 * -3.3873541540618874132633209228515625e+6 + tmp93 * tmp63 * tmp820 * 1.203755469354635290801525115966796875e+7 + tmp100 * tmp70 * tmp820 * -9.313967591197453439235687255859375e+6 + tmp107 * tmp77 * tmp820 * -2.5084943886144324205815792083740234375e+6 + tmp114 * tmp84 * tmp820 * 1.231539372972822375595569610595703125e+7 + tmp121 * tmp91 * tmp820 * 1.37443684359668679535388946533203125e+7 + tmp128 * tmp98 * tmp820 * -2.0392658207672379910945892333984375e+7 + tmp135 * tmp105 * tmp820 * 1.16408645810100026428699493408203125e+7 + tmp142 * tmp112 * tmp820 * 1.66728234309127293527126312255859375e+7 + tmp149 * tmp119 * tmp820 * -1.32349803357985951006412506103515625e+7 + tmp156 * tmp126 * tmp820 * 1.011935817535785399377346038818359375e+7 + tmp163 * tmp133 * tmp820 * -3.6625153269577123224735260009765625e+7 + tmp170 * tmp140 * tmp820 * -1.62270849433632194995880126953125e+6 + tmp177 * tmp147 * tmp820 * -1.41072644445291124284267425537109375e+7 + tmp184 * tmp154 * tmp820 * 1.15812868076924490742385387420654296875e+6 + tmp191 * tmp161 * tmp820 * -3.140711580294744111597537994384765625e+6 + tmp198 * tmp168 * tmp820 * -6.269764561109821312129497528076171875e+6 + tmp205 * tmp175 * tmp820 * -2.2402100653061470948159694671630859375e+6 + tmp212 * tmp182 * tmp820 * 2.16854364765677414834499359130859375e+6 + tmp219 * tmp189 * tmp820 * -6.62552598277222947217524051666259765625e+5 + tmp226 * tmp196 * tmp820 * 1.09954907817221595905721187591552734375e+5 + tmp233 * tmp203 * tmp820 * 5.11132941899898578412830829620361328125e+4 + tmp240 * tmp210 * tmp820 * -1.975986489228717982769012451171875e+4 + tmp247 * tmp217 * tmp820 * -1.47082028405166101947543211281299591064453125e+3 + tmp224 * tmp820 * 6.996020510731731292253243736922740936279296875e+2 * z + tmp231 * tmp820 * -6.08214109731458023588857031427323818206787109375e+1 + tmp23 * tmp820 * -1.593195222480677557541639544069766998291015625e+3 + tmp37 * tmp994 * -1.4824974043079730108729563653469085693359375e+4 * y + tmp44 * tmp7 * tmp994 * -2.122104544337373590678907930850982666015625e+4 + tmp51 * tmp14 * tmp994 * 2.12153040344621869735419750213623046875e+5 + tmp58 * tmp21 * tmp994 * 4.9715857901374087668955326080322265625e+5 + tmp65 * tmp28 * tmp994 * 7.24156301083912025205790996551513671875e+5 + tmp72 * tmp35 * tmp994 * 9.6109271284611173905432224273681640625e+5 + tmp79 * tmp42 * tmp994 * 3.1879956945974607951939105987548828125e+6 + tmp86 * tmp49 * tmp994 * 3.67518356809010542929172515869140625e+6 + tmp93 * tmp56 * tmp994 * 5.9706666052485667169094085693359375e+6 + tmp100 * tmp63 * tmp994 * -4.637464374532920308411121368408203125e+6 + tmp107 * tmp70 * tmp994 * 3.8143604915147304534912109375e+7 + tmp114 * tmp77 * tmp994 * -2.47901225821007005870342254638671875e+7 + tmp121 * tmp84 * tmp994 * 6.444626674287511408329010009765625e+7 + tmp128 * tmp91 * tmp994 * 4.502534575005705654621124267578125e+7 + tmp135 * tmp98 * tmp994 * 6.757853017015595734119415283203125e+7 + tmp142 * tmp105 * tmp994 * -1.480053742969308234751224517822265625e+7 + tmp149 * tmp112 * tmp994 * 5.290692565359492599964141845703125e+7 + tmp156 * tmp119 * tmp994 * 4.3287289755464904010295867919921875e+7 + tmp163 * tmp126 * tmp994 * 8.70799907827146053314208984375e+7 + tmp170 * tmp133 * tmp994 * -1.9175662664391241967678070068359375e+7 + tmp177 * tmp140 * tmp994 * -2.86826938508348129689693450927734375e+7 + tmp184 * tmp147 * tmp994 * -2.1875272203575193881988525390625e+7 + tmp191 * tmp154 * tmp994 * -8.71044591558253206312656402587890625e+6 + tmp198 * tmp161 * tmp994 * -8.256123433752777986228466033935546875e+6 + tmp205 * tmp168 * tmp994 * -6.41929563034610691829584538936614990234375e+4 + tmp212 * tmp175 * tmp994 * -1.089041725969471037387847900390625e+6 + tmp219 * tmp182 * tmp994 * 1.3808361243931539356708526611328125e+6 + tmp226 * tmp189 * tmp994 * 3.4075661280863615684211254119873046875e+5 + tmp233 * tmp196 * tmp994 * 1.326406940819893134175799787044525146484375e+4 + tmp240 * tmp203 * tmp994 * 3.2914888870737791876308619976043701171875e+4 + tmp247 * tmp210 * tmp994 * -1.75309104671274035354144871234893798828125e+4 + tmp217 * tmp994 * 1.739025196774369760532863438129425048828125e+3 * z + tmp224 * tmp994 * 8.98714279806095959202139056287705898284912109375e+1 + tmp30 * tmp994 * -3.3488540726757861420992412604391574859619140625e+2 + tmp44 * tmp1163 * 2.568602901142760310904122889041900634765625e+4 * y + tmp51 * tmp7 * tmp1163 * -6.39910687152194077498279511928558349609375e+4 + tmp58 * tmp14 * tmp1163 * 9.0083097099733888171613216400146484375e+4 + tmp65 * tmp21 * tmp1163 * -9.484217037814422510564327239990234375e+5 + tmp72 * tmp28 * tmp1163 * -3.600075980834834277629852294921875e+6 + tmp79 * tmp35 * tmp1163 * 1.3657137534186341799795627593994140625e+6 + tmp86 * tmp42 * tmp1163 * -5.672421984374326653778553009033203125e+6 + tmp93 * tmp49 * tmp1163 * 8.083055848013154231011867523193359375e+6 + tmp100 * tmp56 * tmp1163 * -2.12083704075715839862823486328125e+7 + tmp107 * tmp63 * tmp1163 * 3.64002259584418833255767822265625e+7 + tmp114 * tmp70 * tmp1163 * 2.584474493634389340877532958984375e+7 + tmp121 * tmp77 * tmp1163 * 7.82170241000406742095947265625e+7 + tmp128 * tmp84 * tmp1163 * 1.2600267222192929685115814208984375e+8 + tmp135 * tmp91 * tmp1163 * 6.67959145687679946422576904296875e+7 + tmp142 * tmp98 * tmp1163 * -8.61548507725602947175502777099609375e+6 + tmp149 * tmp105 * tmp1163 * 4.307888321512959897518157958984375e+7 + tmp156 * tmp112 * tmp1163 * 9.947022469961284101009368896484375e+7 + tmp163 * tmp119 * tmp1163 * -5.27523920739738941192626953125e+7 + tmp170 * tmp126 * tmp1163 * 5.9976018798557378351688385009765625e+7 + tmp177 * tmp133 * tmp1163 * 1.208818361157749406993389129638671875e+7 + tmp184 * tmp140 * tmp1163 * -3.9161991798560507595539093017578125e+6 + tmp191 * tmp147 * tmp1163 * -1.33758822220621886663138866424560546875e+6 + tmp198 * tmp154 * tmp1163 * 3.9807503596418728120625019073486328125e+6 + tmp205 * tmp161 * tmp1163 * 2.6472385677292346954345703125e+6 + tmp212 * tmp168 * tmp1163 * -3.7567882296312092803418636322021484375e+6 + tmp219 * tmp175 * tmp1163 * 8.8227231825006823055446147918701171875e+5 + tmp226 * tmp182 * tmp1163 * 6.4713598971529048867523670196533203125e+5 + tmp233 * tmp189 * tmp1163 * 7.9470694912795021082274615764617919921875e+4 + tmp240 * tmp196 * tmp1163 * -7.8606846475083220866508781909942626953125e+4 + tmp247 * tmp203 * tmp1163 * 2.4377184989274406689219176769256591796875e+4 + tmp210 * tmp1163 * -4.840576578522581257857382297515869140625e+3 * z + tmp217 * tmp1163 * -6.651144870158286721562035381793975830078125e+2 + tmp37 * tmp1163 * -2.16338941384806958012632094323635101318359375e+3 + tmp51 * tmp1327 * 3.4580778344412290607579052448272705078125e+4 * y + tmp58 * tmp7 * tmp1327 * -6.67006272528452682308852672576904296875e+4 + tmp65 * tmp14 * tmp1327 * -2.813216282899986836127936840057373046875e+5 + tmp72 * tmp21 * tmp1327 * 1.600366882563983090221881866455078125e+6 + tmp79 * tmp28 * tmp1327 * -4.651572890973095782101154327392578125e+6 + tmp86 * tmp35 * tmp1327 * -4.635367576471083797514438629150390625e+6 + tmp93 * tmp42 * tmp1327 * 1.853141126211662590503692626953125e+7 + tmp100 * tmp49 * tmp1327 * -8.26175553112882305867969989776611328125e+5 + tmp107 * tmp56 * tmp1327 * -5.6065551580552704632282257080078125e+7 + tmp114 * tmp63 * tmp1327 * 4.477918015893580019474029541015625e+7 + tmp121 * tmp70 * tmp1327 * 3.033503298015733063220977783203125e+7 + tmp128 * tmp77 * tmp1327 * -4.54599329982551634311676025390625e+7 + tmp135 * tmp84 * tmp1327 * 1.7184725321057498455047607421875e+8 + tmp142 * tmp91 * tmp1327 * -4.1228013490049801766872406005859375e+7 + tmp149 * tmp98 * tmp1327 * -1.48599943568904860876500606536865234375e+6 + tmp156 * tmp105 * tmp1327 * -1.640394065018566548824310302734375e+8 + tmp163 * tmp112 * tmp1327 * -1.50231541055843651294708251953125e+8 + tmp170 * tmp119 * tmp1327 * 5.51311634158718883991241455078125e+7 + tmp177 * tmp126 * tmp1327 * -4.60967007123934924602508544921875e+7 + tmp184 * tmp133 * tmp1327 * 1.720812180032856762409210205078125e+6 + tmp191 * tmp140 * tmp1327 * 2.3425050308803081512451171875e+7 + tmp198 * tmp147 * tmp1327 * -2.7573650323011361062526702880859375e+7 + tmp205 * tmp154 * tmp1327 * 1.93565412163910232484340667724609375e+7 + tmp212 * tmp161 * tmp1327 * -4.656490794350852258503437042236328125e+6 + tmp219 * tmp168 * tmp1327 * -1.13067003847309318371117115020751953125e+6 + tmp226 * tmp175 * tmp1327 * -1.7996660977921369485557079315185546875e+6 + tmp233 * tmp182 * tmp1327 * 7.72823332619267632253468036651611328125e+5 + tmp240 * tmp189 * tmp1327 * -1.18684576869108741448144428431987762451171875e+3 + tmp247 * tmp196 * tmp1327 * 4.209293110268269083462655544281005859375e+4 + tmp203 * tmp1327 * 2.53757331467094409163109958171844482421875e+4 * z + tmp210 * tmp1327 * -1.27419822955720928803202696144580841064453125e+3 + tmp44 * tmp1327 * 1.416090525239873750251717865467071533203125e+4 + tmp58 * tmp1486 * 6.2245545194429301773197948932647705078125e+4 * y + tmp65 * tmp7 * tmp1486 * -2.2082292848679612507112324237823486328125e+5 + tmp72 * tmp14 * tmp1486 * 3.124014111098635839880444109439849853515625e+4 + tmp79 * tmp21 * tmp1486 * -6.25758416791016049683094024658203125e+6 + tmp86 * tmp28 * tmp1486 * 1.7897089838210845482535660266876220703125e+5 + tmp93 * tmp35 * tmp1486 * 9.44042397158017195761203765869140625e+6 + tmp100 * tmp42 * tmp1486 * 1.508565758663363754749298095703125e+7 + tmp107 * tmp49 * tmp1486 * -5.3818888675383813679218292236328125e+7 + tmp114 * tmp56 * tmp1486 * -2.06950398579259105026721954345703125e+7 + tmp121 * tmp63 * tmp1486 * -1.5811462105617272853851318359375e+8 + tmp128 * tmp70 * tmp1486 * -2.537457552539723813533782958984375e+8 + tmp135 * tmp77 * tmp1486 * -1.3200169321094192564487457275390625e+7 + tmp142 * tmp84 * tmp1486 * -1.331086251372104585170745849609375e+8 + tmp149 * tmp91 * tmp1486 * -2.7587135367326819896697998046875e+8 + tmp156 * tmp98 * tmp1486 * -5.9600140417412407696247100830078125e+7 + tmp163 * tmp105 * tmp1486 * -4.993067045644967257976531982421875e+7 + tmp170 * tmp112 * tmp1486 * -4.263504846351540088653564453125e+7 + tmp177 * tmp119 * tmp1486 * 1.62158858565548837184906005859375e+8 + tmp184 * tmp126 * tmp1486 * 5.54861279418977908790111541748046875e+6 + tmp191 * tmp133 * tmp1486 * -9.22789402370282113552093505859375e+7 + tmp198 * tmp140 * tmp1486 * -4.997240008979074656963348388671875e+7 + tmp205 * tmp147 * tmp1486 * 1.80041542236631549894809722900390625e+7 + tmp212 * tmp154 * tmp1486 * 1.67892701110942661762237548828125e+7 + tmp219 * tmp161 * tmp1486 * 5.817362431911484338343143463134765625e+6 + tmp226 * tmp168 * tmp1486 * 1.24827960727365291677415370941162109375e+6 + tmp233 * tmp175 * tmp1486 * 4.118617833759034983813762664794921875e+5 + tmp240 * tmp182 * tmp1486 * -1.7956024484510053298436105251312255859375e+5 + tmp247 * tmp189 * tmp1486 * 2.190258133842541719786822795867919921875e+5 + tmp196 * tmp1486 * 3.4973117752789097721688449382781982421875e+4 * z + tmp203 * tmp1486 * -4.9817151344862040787120349705219268798828125e+3 + tmp51 * tmp1486 * -2.636649145558089003316126763820648193359375e+4 + tmp65 * tmp1640 * 5.57694030614999282988719642162322998046875e+4 * y + tmp72 * tmp7 * tmp1640 * 7.506203535231878049671649932861328125e+5 + tmp79 * tmp14 * tmp1640 * -6.65316567910718731582164764404296875e+5 + tmp86 * tmp21 * tmp1640 * -5.73123900796337611973285675048828125e+6 + tmp93 * tmp28 * tmp1640 * -1.60242270754070021212100982666015625e+7 + tmp100 * tmp35 * tmp1640 * -1.23835784345802031457424163818359375e+7 + tmp107 * tmp42 * tmp1640 * -3.103722012892372906208038330078125e+7 + tmp114 * tmp49 * tmp1640 * -8.755555659181118011474609375e+6 + tmp121 * tmp56 * tmp1640 * -5.2931458524988718330860137939453125e+7 + tmp128 * tmp63 * tmp1640 * 9.716058526160360872745513916015625e+7 + tmp135 * tmp70 * tmp1640 * 2.077633122112773954868316650390625e+8 + tmp142 * tmp77 * tmp1640 * 9.805324207639189064502716064453125e+7 + tmp149 * tmp84 * tmp1640 * -8.176873865114526450634002685546875e+7 + tmp156 * tmp91 * tmp1640 * 2.043620769532851874828338623046875e+8 + tmp163 * tmp98 * tmp1640 * -3.25460571152403056621551513671875e+8 + tmp170 * tmp105 * tmp1640 * 1.98919900744407832622528076171875e+8 + tmp177 * tmp112 * tmp1640 * 1.20987061431789398193359375e+7 + tmp184 * tmp119 * tmp1640 * 1.945936166237996518611907958984375e+7 + tmp191 * tmp126 * tmp1640 * -9.1809528345345020294189453125e+7 + tmp198 * tmp133 * tmp1640 * 8.7596486608074724674224853515625e+7 + tmp205 * tmp140 * tmp1640 * 5.667884081580150127410888671875e+7 + tmp212 * tmp147 * tmp1640 * 5.332579924994421191513538360595703125e+6 + tmp219 * tmp154 * tmp1640 * 1.375029893630347959697246551513671875e+7 + tmp226 * tmp161 * tmp1640 * 6.672319392773433588445186614990234375e+6 + tmp233 * tmp168 * tmp1640 * 2.479340224603985436260700225830078125e+6 + tmp240 * tmp175 * tmp1640 * 2.611227488589480708469636738300323486328125e+4 + tmp247 * tmp182 * tmp1640 * 1.22295237010598604683764278888702392578125e+5 + tmp189 * tmp1640 * 1.0393079596052661145222373306751251220703125e+4 * z + tmp196 * tmp1640 * -2.81537782232619383648852817714214324951171875e+3 + tmp58 * tmp1640 * 2.121729458680083553190343081951141357421875e+4 + tmp72 * tmp1789 * -1.5582822445429078652523458003997802734375e+5 * y + tmp79 * tmp7 * tmp1789 * -8.5014462077487216447480022907257080078125e+4 + tmp86 * tmp14 * tmp1789 * -2.564084819367307238280773162841796875e+6 + tmp93 * tmp21 * tmp1789 * -4.8959490236534662544727325439453125e+6 + tmp100 * tmp28 * tmp1789 * 9.09271001997027732431888580322265625e+6 + tmp107 * tmp35 * tmp1789 * -1.184030747874318063259124755859375e+7 + tmp114 * tmp42 * tmp1789 * 7.313609503410560078918933868408203125e+6 + tmp121 * tmp49 * tmp1789 * 4.3130171236253045499324798583984375e+7 + tmp128 * tmp56 * tmp1789 * 1.61642042481128990650177001953125e+8 + tmp135 * tmp63 * tmp1789 * 5.4773132097419001162052154541015625e+7 + tmp142 * tmp70 * tmp1789 * -1.1192463713859331607818603515625e+8 + tmp149 * tmp77 * tmp1789 * 1.835571215001440346240997314453125e+8 + tmp156 * tmp84 * tmp1789 * 1.8349478056333744525909423828125e+8 + tmp163 * tmp91 * tmp1789 * 3.1267025802735745906829833984375e+8 + tmp170 * tmp98 * tmp1789 * -1.8558052356652104854583740234375e+8 + tmp177 * tmp105 * tmp1789 * 2.1026138130288355052471160888671875e+7 + tmp184 * tmp112 * tmp1789 * -6.034970737308229506015777587890625e+7 + tmp191 * tmp119 * tmp1789 * 1.4345592741761243087239563465118408203125e+5 + tmp198 * tmp126 * tmp1789 * -5.2055970324661791324615478515625e+7 + tmp205 * tmp133 * tmp1789 * 3.4640618597813777625560760498046875e+7 + tmp212 * tmp140 * tmp1789 * 2.54905641133936941623687744140625e+7 + tmp219 * tmp147 * tmp1789 * 1.009492445635469257831573486328125e+7 + tmp226 * tmp154 * tmp1789 * 1.29918910747458450496196746826171875e+7 + tmp233 * tmp161 * tmp1789 * 1.4599689552738177590072154998779296875e+6 + tmp240 * tmp168 * tmp1789 * 3.803518739046299015171825885772705078125e+5 + tmp247 * tmp175 * tmp1789 * -4.9696515768544240927440114319324493408203125e+3 + tmp182 * tmp1789 * 1.879712141416647864389233291149139404296875e+4 * z + tmp189 * tmp1789 * 6.082043117109244121820665895938873291015625e+3 + tmp65 * tmp1789 * 8.2712639573318578186444938182830810546875e+4 + tmp79 * tmp1933 * 2.666949476320579997263848781585693359375e+5 * y + tmp86 * tmp7 * tmp1933 * 1.4226227042609662748873233795166015625e+6 + tmp93 * tmp14 * tmp1933 * 3.2924379865826624445617198944091796875e+6 + tmp100 * tmp21 * tmp1933 * -2.248118221933196298778057098388671875e+6 + tmp107 * tmp28 * tmp1933 * 3.455771416940818727016448974609375e+7 + tmp114 * tmp35 * tmp1933 * 8.233402167024926282465457916259765625e+6 + tmp121 * tmp42 * tmp1933 * -3.461033149931831657886505126953125e+7 + tmp128 * tmp49 * tmp1933 * 5.4086363262705214321613311767578125e+7 + tmp135 * tmp56 * tmp1933 * -2.547066871103002130985260009765625e+7 + tmp142 * tmp63 * tmp1933 * -1.683588524535671770572662353515625e+8 + tmp149 * tmp70 * tmp1933 * -1.95119007163369238376617431640625e+8 + tmp156 * tmp77 * tmp1933 * 3.8535509928844535350799560546875e+8 + tmp163 * tmp84 * tmp1933 * 2.56480834877651222050189971923828125e+7 + tmp170 * tmp91 * tmp1933 * 3.1291794804183743894100189208984375e+7 + tmp177 * tmp98 * tmp1933 * -9.80228578551062643527984619140625e+7 + tmp184 * tmp105 * tmp1933 * 2.123474623355538845062255859375e+8 + tmp191 * tmp112 * tmp1933 * -1.528956090408284403383731842041015625e+7 + tmp198 * tmp119 * tmp1933 * 8.5850724279107272624969482421875e+7 + tmp205 * tmp126 * tmp1933 * -2.84909180857491232454776763916015625e+7 + tmp212 * tmp133 * tmp1933 * -4.691959027145697176456451416015625e+7 + tmp219 * tmp140 * tmp1933 * 6.0141831040819920599460601806640625e+6 + tmp226 * tmp147 * tmp1933 * -5.2106887149818730540573596954345703125e+5 + tmp233 * tmp154 * tmp1933 * 2.7952842278834707103669643402099609375e+6 + tmp240 * tmp161 * tmp1933 * -2.88638275911270291544497013092041015625e+5 + tmp247 * tmp168 * tmp1933 * 4.457745123987680417485535144805908203125e+5 + tmp175 * tmp1933 * 5.9065315189619708689860999584197998046875e+4 * z + tmp182 * tmp1933 * -1.2760543873188800716889090836048126220703125e+4 + tmp72 * tmp1933 * -5.9350806509691683459095656871795654296875e+4 + tmp86 * tmp2072 * -4.37076535879798117093741893768310546875e+5 * y + tmp93 * tmp7 * tmp2072 * 6.21225273702624253928661346435546875e+5 + tmp100 * tmp14 * tmp2072 * 1.34474278229182795621454715728759765625e+6 + tmp107 * tmp21 * tmp2072 * 8.64706747676239907741546630859375e+6 + tmp114 * tmp28 * tmp2072 * 7.12180358605618588626384735107421875e+6 + tmp121 * tmp35 * tmp2072 * 3.827497848354625701904296875e+7 + tmp128 * tmp42 * tmp2072 * -1.742001978288175165653228759765625e+8 + tmp135 * tmp49 * tmp2072 * -4.54740322609897553920745849609375e+7 + tmp142 * tmp56 * tmp2072 * 1.970099033694564402103424072265625e+8 + tmp149 * tmp63 * tmp2072 * 1.63699950393453948199748992919921875e+7 + tmp156 * tmp70 * tmp2072 * 3.551953377226607799530029296875e+8 + tmp163 * tmp77 * tmp2072 * 5.8535899101732797920703887939453125e+7 + tmp170 * tmp84 * tmp2072 * -3.691813720367423258721828460693359375e+6 + tmp177 * tmp91 * tmp2072 * 3.475685091003601253032684326171875e+7 + tmp184 * tmp98 * tmp2072 * -6.9688461877361774444580078125e+7 + tmp191 * tmp105 * tmp2072 * 3.31334713522325456142425537109375e+7 + tmp198 * tmp112 * tmp2072 * 1.516847511844202578067779541015625e+8 + tmp205 * tmp119 * tmp2072 * 5.761894228159494698047637939453125e+6 + tmp212 * tmp126 * tmp2072 * 1.45572282166549004614353179931640625e+6 + tmp219 * tmp133 * tmp2072 * 9.61802703536680154502391815185546875e+6 + tmp226 * tmp140 * tmp2072 * -2.3219634096620096825063228607177734375e+6 + tmp233 * tmp147 * tmp2072 * 1.73072555791297950781881809234619140625e+6 + tmp240 * tmp154 * tmp2072 * -1.18718132343210768885910511016845703125e+6 + tmp247 * tmp161 * tmp2072 * 7.8965165672725089825689792633056640625e+5 + tmp168 * tmp2072 * -3.34464620756417396478354930877685546875e+5 * z + tmp175 * tmp2072 * -7.010083325011617489508353173732757568359375e+3 + tmp79 * tmp2072 * 1.4982720565705254557542502880096435546875e+5 + tmp93 * tmp2206 * 1.09561273699354263953864574432373046875e+6 * y + tmp100 * tmp7 * tmp2206 * 5.653165521158934570848941802978515625e+5 + tmp107 * tmp14 * tmp2206 * -5.08627886824323050677776336669921875e+6 + tmp114 * tmp21 * tmp2206 * 4.619561523487622849643230438232421875e+6 + tmp121 * tmp28 * tmp2206 * -3.2498529335082940757274627685546875e+7 + tmp128 * tmp35 * tmp2206 * 3.8627622660031211562454700469970703125e+6 + tmp135 * tmp42 * tmp2206 * -9.94951568371478617191314697265625e+7 + tmp142 * tmp49 * tmp2206 * -6.498758897924329340457916259765625e+7 + tmp149 * tmp56 * tmp2206 * -5.8249932622556559741497039794921875e+7 + tmp156 * tmp63 * tmp2206 * -2.4230648372675888240337371826171875e+6 + tmp163 * tmp70 * tmp2206 * 1.03731606469178497791290283203125e+8 + tmp170 * tmp77 * tmp2206 * -2.0427615738942849636077880859375e+8 + tmp177 * tmp84 * tmp2206 * -1.50748359051868617534637451171875e+8 + tmp184 * tmp91 * tmp2206 * 9.92026080833674967288970947265625e+7 + tmp191 * tmp98 * tmp2206 * 1.30632097315468527376651763916015625e+7 + tmp198 * tmp105 * tmp2206 * 3.01572003520688079297542572021484375e+7 + tmp205 * tmp112 * tmp2206 * -5.5203932730417378246784210205078125e+7 + tmp212 * tmp119 * tmp2206 * -2.6667973678420163691043853759765625e+7 + tmp219 * tmp126 * tmp2206 * -3.6285128474793575704097747802734375e+7 + tmp226 * tmp133 * tmp2206 * -8.4175394045858271420001983642578125e+6 + tmp233 * tmp140 * tmp2206 * 1.42712876962076015770435333251953125e+7 + tmp240 * tmp147 * tmp2206 * 2.395607660239310280303470790386199951171875e+4 + tmp247 * tmp154 * tmp2206 * 2.898329678472016821615397930145263671875e+5 + tmp161 * tmp2206 * -1.16374480384422335191629827022552490234375e+5 * z + tmp168 * tmp2206 * -3.93206288950740781729109585285186767578125e+4 + tmp86 * tmp2206 * 1.888966023686795379035174846649169921875e+5 + tmp100 * tmp2335 * -9.80892267126337974332273006439208984375e+4 * y + tmp107 * tmp7 * tmp2335 * -4.588705895706429728306829929351806640625e+5 + tmp114 * tmp14 * tmp2335 * -1.30630061991654336452484130859375e+7 + tmp121 * tmp21 * tmp2335 * 4.0679880029777748859487473964691162109375e+4 + tmp128 * tmp28 * tmp2335 * -3.63935602368953227996826171875e+7 + tmp135 * tmp35 * tmp2335 * 5.4386689414552085101604461669921875e+7 + tmp142 * tmp42 * tmp2335 * -2.81242624278098903596401214599609375e+7 + tmp149 * tmp49 * tmp2335 * -2.38920425068769566714763641357421875e+7 + tmp156 * tmp56 * tmp2335 * 3.7146625610067002475261688232421875e+7 + tmp163 * tmp63 * tmp2335 * 5.88534597320024013519287109375e+8 + tmp170 * tmp70 * tmp2335 * -1.260135513327358663082122802734375e+8 + tmp177 * tmp77 * tmp2335 * 2.87700037787031948566436767578125e+8 + tmp184 * tmp84 * tmp2335 * 3.014437467919366061687469482421875e+7 + tmp191 * tmp91 * tmp2335 * 2.7518071017046439647674560546875e+8 + tmp198 * tmp98 * tmp2335 * -8.687773782170189917087554931640625e+7 + tmp205 * tmp105 * tmp2335 * -2.4757954760902742855250835418701171875e+6 + tmp212 * tmp112 * tmp2335 * 2.2724469779710628092288970947265625e+7 + tmp219 * tmp119 * tmp2335 * -8.4175323631999827921390533447265625e+6 + tmp226 * tmp126 * tmp2335 * -2.02737718505742065608501434326171875e+7 + tmp233 * tmp133 * tmp2335 * -2.585732109079533256590366363525390625e+5 + tmp240 * tmp140 * tmp2335 * 6.2184272504521645605564117431640625e+6 + tmp247 * tmp147 * tmp2335 * -9.8543751923963683657348155975341796875e+5 + tmp154 * tmp2335 * 6.721201784509257413446903228759765625e+4 * z + tmp161 * tmp2335 * -4.47173183769296811078675091266632080078125e+4 + tmp93 * tmp2335 * 5.84211610529696699813939630985260009765625e+4 + tmp107 * tmp2459 * 8.45730708690018393099308013916015625e+5 * y + tmp114 * tmp7 * tmp2459 * -1.035580324632983305491507053375244140625e+6 + tmp121 * tmp14 * tmp2459 * -3.2716145148810571990907192230224609375e+6 + tmp128 * tmp21 * tmp2459 * 1.338619878564165718853473663330078125e+7 + tmp135 * tmp28 * tmp2459 * -4.01688388825332224369049072265625e+7 + tmp142 * tmp35 * tmp2459 * 6.1332127101156137883663177490234375e+7 + tmp149 * tmp42 * tmp2459 * 2.27466766361683197319507598876953125e+7 + tmp156 * tmp49 * tmp2459 * 2.79567729638895690441131591796875e+7 + tmp163 * tmp56 * tmp2459 * -1.44725133021199524402618408203125e+8 + tmp170 * tmp63 * tmp2459 * 2.95497035975821278989315032958984375e+7 + tmp177 * tmp70 * tmp2459 * -1.553165074406856000423431396484375e+8 + tmp184 * tmp77 * tmp2459 * 1.156018422933063507080078125e+8 + tmp191 * tmp84 * tmp2459 * -2.5698091917498958110809326171875e+8 + tmp198 * tmp91 * tmp2459 * -8.3217407145425856113433837890625e+7 + tmp205 * tmp98 * tmp2459 * 9.815282649565088748931884765625e+7 + tmp212 * tmp105 * tmp2459 * -2.88599067245211564004421234130859375e+7 + tmp219 * tmp112 * tmp2459 * -1.7150838800360523164272308349609375e+7 + tmp226 * tmp119 * tmp2459 * -4.7408591025969088077545166015625e+6 + tmp233 * tmp126 * tmp2459 * -1.91743101012959368526935577392578125e+7 + tmp240 * tmp133 * tmp2459 * 2.75301951399843394756317138671875e+6 + tmp247 * tmp140 * tmp2459 * 5.57324648742317338474094867706298828125e+5 + tmp147 * tmp2459 * -4.88813523280134540982544422149658203125e+5 * z + tmp154 * tmp2459 * 5.60432359195865647052414715290069580078125e+4 + tmp100 * tmp2459 * -6.420611867320121382363140583038330078125e+4 + tmp114 * tmp2578 * -1.16443655400782614015042781829833984375e+6 * y + tmp121 * tmp7 * tmp2578 * 9.75869323413391411304473876953125e+6 + tmp128 * tmp14 * tmp2578 * 7.8029716233189292252063751220703125e+6 + tmp135 * tmp21 * tmp2578 * -9.539327565418183803558349609375e+6 + tmp142 * tmp28 * tmp2578 * 4.500671201433275826275348663330078125e+6 + tmp149 * tmp35 * tmp2578 * -7.41270583640496730804443359375e+7 + tmp156 * tmp42 * tmp2578 * 3.20603720115968100726604461669921875e+7 + tmp163 * tmp49 * tmp2578 * 2.0797874982817940413951873779296875e+7 + tmp170 * tmp56 * tmp2578 * 1.920662750504035055637359619140625e+8 + tmp177 * tmp63 * tmp2578 * 2.60824116936004579067230224609375e+8 + tmp184 * tmp70 * tmp2578 * -2.07162406374793946743011474609375e+8 + tmp191 * tmp77 * tmp2578 * 1.7028196463528764247894287109375e+8 + tmp198 * tmp84 * tmp2578 * 4.8950080180530630052089691162109375e+7 + tmp205 * tmp91 * tmp2578 * -1.52098231474184580147266387939453125e+7 + tmp212 * tmp98 * tmp2578 * 5.6548313956749431788921356201171875e+7 + tmp219 * tmp105 * tmp2578 * 5.919939421895049512386322021484375e+7 + tmp226 * tmp112 * tmp2578 * 4.2763350111199297010898590087890625e+7 + tmp233 * tmp119 * tmp2578 * -2.46197850238000159151852130889892578125e+5 + tmp240 * tmp126 * tmp2578 * -5.546544308642183430492877960205078125e+6 + tmp247 * tmp133 * tmp2578 * 7.42925450487637077458202838897705078125e+5 + tmp140 * tmp2578 * -3.31937665064190514385700225830078125e+5 * z + tmp147 * tmp2578 * 2.766069662381142916274257004261016845703125e+4 + tmp107 * tmp2578 * 3.2983813843254186213016510009765625e+5 + tmp121 * tmp2692 * 3.0595618833062280900776386260986328125e+6 * y + tmp128 * tmp7 * tmp2692 * -2.144393227491213940083980560302734375e+6 + tmp135 * tmp14 * tmp2692 * -1.486583230402656085789203643798828125e+7 + tmp142 * tmp21 * tmp2692 * 1.5772441564001192455179989337921142578125e+5 + tmp149 * tmp28 * tmp2692 * -2.41103545138470493257045745849609375e+7 + tmp156 * tmp35 * tmp2692 * 5.899479885978206060826778411865234375e+6 + tmp163 * tmp42 * tmp2692 * -2.6430979763294376432895660400390625e+7 + tmp170 * tmp49 * tmp2692 * 5.133620760493158013559877872467041015625e+5 + tmp177 * tmp56 * tmp2692 * -9.37540943622738420963287353515625e+7 + tmp184 * tmp63 * tmp2692 * 1.85119906107368655502796173095703125e+7 + tmp191 * tmp70 * tmp2692 * -1.7615100093860842287540435791015625e+7 + tmp198 * tmp77 * tmp2692 * 3.6253904847083978354930877685546875e+7 + tmp205 * tmp84 * tmp2692 * 1.295719884057255089282989501953125e+8 + tmp212 * tmp91 * tmp2692 * -2.00803835253717899322509765625e+7 + tmp219 * tmp98 * tmp2692 * 5.3526329015579812228679656982421875e+7 + tmp226 * tmp105 * tmp2692 * -5.095322002082568593323230743408203125e+6 + tmp233 * tmp112 * tmp2692 * -5.359179434494399465620517730712890625e+6 + tmp240 * tmp119 * tmp2692 * 2.7676412322572679258882999420166015625e+6 + tmp247 * tmp126 * tmp2692 * 2.8785371270983857102692127227783203125e+6 + tmp133 * tmp2692 * -4.966306447809512610547244548797607421875e+5 * z + tmp140 * tmp2692 * -4.13705430211361599504016339778900146484375e+4 + tmp114 * tmp2692 * 1.653312923425719491206109523773193359375e+5 + tmp128 * tmp2801 * 7.96481968665814376436173915863037109375e+5 * y + tmp135 * tmp7 * tmp2801 * -1.28184011431963765062391757965087890625e+6 + tmp142 * tmp14 * tmp2801 * 3.739547609614540706388652324676513671875e+5 + tmp149 * tmp21 * tmp2801 * 2.212634053146792948246002197265625e+7 + tmp156 * tmp28 * tmp2801 * -5.60538157229077885858714580535888671875e+5 + tmp163 * tmp35 * tmp2801 * -2.5704425960932739078998565673828125e+7 + tmp170 * tmp42 * tmp2801 * -4.796751842378087341785430908203125e+7 + tmp177 * tmp49 * tmp2801 * 1.1452965290544028580188751220703125e+8 + tmp184 * tmp56 * tmp2801 * -8.557430825550214946269989013671875e+7 + tmp191 * tmp63 * tmp2801 * 7.166651077396799623966217041015625e+7 + tmp198 * tmp70 * tmp2801 * -3.168597039628877304494380950927734375e+6 + tmp205 * tmp77 * tmp2801 * 3.37717321003581769764423370361328125e+6 + tmp212 * tmp84 * tmp2801 * -2.91004992474741674959659576416015625e+7 + tmp219 * tmp91 * tmp2801 * -1.638337553455092199146747589111328125e+7 + tmp226 * tmp98 * tmp2801 * 1.24621004734718166291713714599609375e+7 + tmp233 * tmp105 * tmp2801 * 6.788069589686895720660686492919921875e+6 + tmp240 * tmp112 * tmp2801 * -6.34222029203966818749904632568359375e+6 + tmp247 * tmp119 * tmp2801 * 1.49785184677109098993241786956787109375e+6 + tmp126 * tmp2801 * -9.1405823712457739748060703277587890625e+5 * z + tmp133 * tmp2801 * 5.14689472374872420914471149444580078125e+4 + tmp121 * tmp2801 * 1.426149394206288852728903293609619140625e+5 + tmp135 * tmp2905 * 3.03609536294961930252611637115478515625e+5 * y + tmp142 * tmp7 * tmp2905 * -1.4714275790982604958117008209228515625e+6 + tmp149 * tmp14 * tmp2905 * -5.566759981338084675371646881103515625e+6 + tmp156 * tmp21 * tmp2905 * 2.3176940786889843642711639404296875e+7 + tmp163 * tmp28 * tmp2905 * 3.6069973592460402287542819976806640625e+6 + tmp170 * tmp35 * tmp2905 * 5.39247569399430532939732074737548828125e+5 + tmp177 * tmp42 * tmp2905 * -9.7824792022197246551513671875e+6 + tmp184 * tmp49 * tmp2905 * -1.945933786054898798465728759765625e+7 + tmp191 * tmp56 * tmp2905 * -1.077700385539690963923931121826171875e+7 + tmp198 * tmp63 * tmp2905 * 2.9433004530883781611919403076171875e+7 + tmp205 * tmp70 * tmp2905 * -6.730298381491740047931671142578125e+7 + tmp212 * tmp77 * tmp2905 * -3.928191222619201242923736572265625e+7 + tmp219 * tmp84 * tmp2905 * 2.6849644694305844604969024658203125e+7 + tmp226 * tmp91 * tmp2905 * 2.704867155703890323638916015625e+7 + tmp233 * tmp98 * tmp2905 * -6.78763992792464233934879302978515625e+6 + tmp240 * tmp105 * tmp2905 * 3.5959446982630081474781036376953125e+6 + tmp247 * tmp112 * tmp2905 * -4.497979745594228734262287616729736328125e+4 + tmp119 * tmp2905 * 1.21103635326783594791777431964874267578125e+5 * z + tmp126 * tmp2905 * 3.20004822359698982836562208831310272216796875e+3 + tmp128 * tmp2905 * 1.809142044416526914574205875396728515625e+5 + tmp142 * tmp3004 * 7.62679539351840387098491191864013671875e+5 * y + tmp149 * tmp7 * tmp3004 * -4.866038028806217014789581298828125e+6 + tmp156 * tmp14 * tmp3004 * -1.63776057305473624728620052337646484375e+6 + tmp163 * tmp21 * tmp3004 * 7.11217500781370140612125396728515625e+6 + tmp170 * tmp28 * tmp3004 * -3.9978607764128334820270538330078125e+7 + tmp177 * tmp35 * tmp3004 * -2.47929526534062363207340240478515625e+7 + tmp184 * tmp42 * tmp3004 * 5.7877387789233028888702392578125e+7 + tmp191 * tmp49 * tmp3004 * 2.415442343246303498744964599609375e+7 + tmp198 * tmp56 * tmp3004 * -2.06969723439102955162525177001953125e+7 + tmp205 * tmp63 * tmp3004 * 1.78525538171035833656787872314453125e+7 + tmp212 * tmp70 * tmp3004 * 1.4009467444685436785221099853515625e+7 + tmp219 * tmp77 * tmp3004 * -2.91895625420843400061130523681640625e+7 + tmp226 * tmp84 * tmp3004 * -9.44744408039938099682331085205078125e+6 + tmp233 * tmp91 * tmp3004 * 8.043186692793383263051509857177734375e+5 + tmp240 * tmp98 * tmp3004 * 1.1014191633621859364211559295654296875e+6 + tmp247 * tmp105 * tmp3004 * 2.806176347862780094146728515625e+5 + tmp112 * tmp3004 * -2.5533889297865671687759459018707275390625e+5 * z + tmp119 * tmp3004 * 2.7430891344744231901131570339202880859375e+4 + tmp135 * tmp3004 * 1.06134195710988031351007521152496337890625e+5 + tmp149 * tmp3098 * 1.1463186221695042331703007221221923828125e+5 * y + tmp156 * tmp7 * tmp3098 * 3.34098046720228740014135837554931640625e+5 + tmp163 * tmp14 * tmp3098 * 1.4934738638681494630873203277587890625e+6 + tmp170 * tmp21 * tmp3098 * -5.099484923313385806977748870849609375e+6 + tmp177 * tmp28 * tmp3098 * -2.89505389009524248540401458740234375e+7 + tmp184 * tmp35 * tmp3098 * -1.024711136925701797008514404296875e+7 + tmp191 * tmp42 * tmp3098 * -1.889690480137197673320770263671875e+7 + tmp198 * tmp49 * tmp3098 * -6.779162954163896851241588592529296875e+6 + tmp205 * tmp56 * tmp3098 * -3.632387960473184287548065185546875e+7 + tmp212 * tmp63 * tmp3098 * -2.55563506650698594748973846435546875e+7 + tmp219 * tmp70 * tmp3098 * 2.120485573269045352935791015625e+7 + tmp226 * tmp77 * tmp3098 * 1.265800266697686351835727691650390625e+7 + tmp233 * tmp84 * tmp3098 * 2.7973069577973075211048126220703125e+6 + tmp240 * tmp91 * tmp3098 * -4.2064045706932060420513153076171875e+6 + tmp247 * tmp98 * tmp3098 * -2.89720523540850146673619747161865234375e+5 + tmp105 * tmp3098 * 5.91017071080561145208775997161865234375e+5 * z + tmp112 * tmp3098 * -1.9248215497334153042174875736236572265625e+4 + tmp142 * tmp3098 * -1.098764736653561121784150600433349609375e+5 + tmp156 * tmp3187 * -5.137771720746974460780620574951171875e+5 * y + tmp163 * tmp7 * tmp3187 * -1.966078069582206197082996368408203125e+6 + tmp170 * tmp14 * tmp3187 * 6.24776580437773279845714569091796875e+6 + tmp177 * tmp21 * tmp3187 * 2.416337206486933864653110504150390625e+6 + tmp184 * tmp28 * tmp3187 * -2.3176539925927552394568920135498046875e+6 + tmp191 * tmp35 * tmp3187 * -1.403261177437100745737552642822265625e+7 + tmp198 * tmp42 * tmp3187 * 1.4312325385542665608227252960205078125e+6 + tmp205 * tmp49 * tmp3187 * 3.9408041259764735586941242218017578125e+6 + tmp212 * tmp56 * tmp3187 * 2.73211694501934386789798736572265625e+7 + tmp219 * tmp63 * tmp3187 * 5.9111524814186431467533111572265625e+6 + tmp226 * tmp70 * tmp3187 * -6.770602485044692642986774444580078125e+6 + tmp233 * tmp77 * tmp3187 * -6.29087843741706199944019317626953125e+6 + tmp240 * tmp84 * tmp3187 * 2.300029976749385707080364227294921875e+6 + tmp247 * tmp91 * tmp3187 * 5.716050566138536669313907623291015625e+5 + tmp98 * tmp3187 * -1.0669314752172600128687918186187744140625e+5 * z + tmp105 * tmp3187 * -1.223132019274805134045891463756561279296875e+4 + tmp149 * tmp3187 * 7.6460948540774334105663001537322998046875e+4 + tmp163 * tmp3271 * 4.2711688357697144965641200542449951171875e+4 * y + tmp170 * tmp7 * tmp3271 * 7.4079773240102943964302539825439453125e+5 + tmp177 * tmp14 * tmp3271 * -3.5754870884058983065187931060791015625e+6 + tmp184 * tmp21 * tmp3271 * 3.427145644821204245090484619140625e+6 + tmp191 * tmp28 * tmp3271 * -7.82998233624283969402313232421875e+6 + tmp198 * tmp35 * tmp3271 * -5.56519071248883567750453948974609375e+6 + tmp205 * tmp42 * tmp3271 * -2.451746062666709534823894500732421875e+6 + tmp212 * tmp49 * tmp3271 * 3.860770833344188518822193145751953125e+6 + tmp219 * tmp56 * tmp3271 * -3.3728278379827416501939296722412109375e+6 + tmp226 * tmp63 * tmp3271 * 2.2797554663294744677841663360595703125e+6 + tmp233 * tmp70 * tmp3271 * -3.8582958995827850885689258575439453125e+6 + tmp240 * tmp77 * tmp3271 * 5.56788241484340163879096508026123046875e+5 + tmp247 * tmp84 * tmp3271 * -4.376836840318930335342884063720703125e+5 + tmp91 * tmp3271 * 1.861945290110078640282154083251953125e+5 * z + tmp98 * tmp3271 * 3.5523566884319487144239246845245361328125e+4 + tmp156 * tmp3271 * -1.09171314424141004565171897411346435546875e+5 + tmp170 * tmp3350 * 5.44082330137627548538148403167724609375e+5 * y + tmp177 * tmp7 * tmp3350 * -1.26412620585219957865774631500244140625e+6 + tmp184 * tmp14 * tmp3350 * -8.68918582106443704105913639068603515625e+5 + tmp191 * tmp21 * tmp3350 * 1.463365755330618121661245822906494140625e+5 + tmp198 * tmp28 * tmp3350 * 3.2938731546351271681487560272216796875e+6 + tmp205 * tmp35 * tmp3350 * -1.8840158935666061006486415863037109375e+6 + tmp212 * tmp42 * tmp3350 * 5.350654957326852716505527496337890625e+6 + tmp219 * tmp49 * tmp3350 * 4.08686540659756958484649658203125e+6 + tmp226 * tmp56 * tmp3350 * 2.4612022012540991418063640594482421875e+6 + tmp233 * tmp63 * tmp3350 * -2.4538310317834909074008464813232421875e+5 + tmp240 * tmp70 * tmp3350 * 3.94916749895367189310491085052490234375e+5 + tmp247 * tmp77 * tmp3350 * 6.9935071883391030132770538330078125e+5 + tmp84 * tmp3350 * -2.182382886054189657443203032016754150390625e+4 * z + tmp91 * tmp3350 * 1.24793611090912527288310229778289794921875e+4 + tmp163 * tmp3350 * 9.6716059967903347569517791271209716796875e+4 + tmp177 * tmp3424 * -3.473462570327022694982588291168212890625e+4 * y + tmp184 * tmp7 * tmp3424 * -1.7710366337074176408350467681884765625e+5 + tmp191 * tmp14 * tmp3424 * 2.8114873208916489966213703155517578125e+5 + tmp198 * tmp21 * tmp3424 * 1.431031933379808324389159679412841796875e+5 + tmp205 * tmp28 * tmp3424 * 3.2918287437849775888025760650634765625e+6 + tmp212 * tmp35 * tmp3424 * -4.0018918043949562124907970428466796875e+6 + tmp219 * tmp42 * tmp3424 * 2.08536552602579002268612384796142578125e+6 + tmp226 * tmp49 * tmp3424 * 7.7187410104315378703176975250244140625e+5 + tmp233 * tmp56 * tmp3424 * 1.51516300701577565632760524749755859375e+6 + tmp240 * tmp63 * tmp3424 * -9.5935196350207013892941176891326904296875e+4 + tmp247 * tmp70 * tmp3424 * 1.8764899438338782056234776973724365234375e+5 + tmp77 * tmp3424 * -1.7650105654693322139792144298553466796875e+5 * z + tmp84 * tmp3424 * -1.12599907083165380754508078098297119140625e+4 + tmp170 * tmp3424 * 1.65243484159407744300551712512969970703125e+4 + tmp184 * tmp3493 * 3.170604361968534431071020662784576416015625e+4 * y + tmp191 * tmp7 * tmp3493 * -7.1019284896686685897293500602245330810546875e+3 + tmp198 * tmp14 * tmp3493 * -1.0667806064645524020306766033172607421875e+5 + tmp205 * tmp21 * tmp3493 * -5.3628663532777383807115256786346435546875e+4 + tmp212 * tmp28 * tmp3493 * 1.80971219243237585760653018951416015625e+6 + tmp219 * tmp35 * tmp3493 * -1.0597402581397411413490772247314453125e+6 + tmp226 * tmp42 * tmp3493 * -1.1588275993866080534644424915313720703125e+5 + tmp233 * tmp49 * tmp3493 * -4.25511186032313504256308078765869140625e+5 + tmp240 * tmp56 * tmp3493 * -2.938991744034239090979099273681640625e+5 + tmp247 * tmp63 * tmp3493 * 3.517423913705237209796905517578125e+5 + tmp70 * tmp3493 * -1.3036163511801595450378954410552978515625e+5 * z + tmp77 * tmp3493 * -1.15164797042504287674091756343841552734375e+4 + tmp177 * tmp3493 * -1.027326106002176529727876186370849609375e+5 + tmp191 * tmp3557 * 2.32968870028700403054244816303253173828125e+4 * y + tmp198 * tmp7 * tmp3557 * -1.7206523041620221920311450958251953125e+5 + tmp205 * tmp14 * tmp3557 * -1.20100356877218815498054027557373046875e+5 + tmp212 * tmp21 * tmp3557 * 8.7875761462733824737370014190673828125e+5 + tmp219 * tmp28 * tmp3557 * -2.122252146525706848478876054286956787109375e+4 + tmp226 * tmp35 * tmp3557 * -1.28475469950316357426345348358154296875e+5 + tmp233 * tmp42 * tmp3557 * -2.1290984832363171153701841831207275390625e+5 + tmp240 * tmp49 * tmp3557 * 1.2339747299729688165825791656970977783203125e+4 + tmp247 * tmp56 * tmp3557 * 6.92400885106657515279948711395263671875e+4 + tmp63 * tmp3557 * -2.536715383273988845758140087127685546875e+4 * z + tmp70 * tmp3557 * 1.1653339780162687020492739975452423095703125e+4 + tmp184 * tmp3557 * -1.616392726083058732911013066768646240234375e+3 + tmp198 * tmp3616 * 6.699978345966499546193517744541168212890625e+3 * y + tmp205 * tmp7 * tmp3616 * 5.187473514055114355869591236114501953125e+4 + tmp212 * tmp14 * tmp3616 * 1.65763816227832925505936145782470703125e+5 + tmp219 * tmp21 * tmp3616 * -3.25056090106885298155248165130615234375e+5 + tmp226 * tmp28 * tmp3616 * 1.8006313213867394370026886463165283203125e+5 + tmp233 * tmp35 * tmp3616 * 1.6149108969951418112032115459442138671875e+5 + tmp240 * tmp42 * tmp3616 * 9.2216657954497219179756939411163330078125e+4 + tmp247 * tmp49 * tmp3616 * -6.89965494541488005779683589935302734375e+4 + tmp56 * tmp3616 * -1.356782839007581424084492027759552001953125e+4 * z + tmp63 * tmp3616 * 3.8200083861179182349587790668010711669921875e+3 + tmp191 * tmp3616 * -2.522350773908335759188048541545867919921875e+4 + tmp205 * tmp3670 * -1.62814408780031881178729236125946044921875e+4 * y + tmp212 * tmp7 * tmp3670 * -6.41554116158736360375769436359405517578125e+4 + tmp219 * tmp14 * tmp3670 * 5.71622242260840721428394317626953125e+4 + tmp226 * tmp21 * tmp3670 * -9.0587334121958469040691852569580078125e+4 + tmp233 * tmp28 * tmp3670 * -7.3649611555596042308025062084197998046875e+4 + tmp240 * tmp35 * tmp3670 * 3.271239722421842088806442916393280029296875e+4 + tmp247 * tmp42 * tmp3670 * 3.37375063881458263495005667209625244140625e+4 + tmp49 * tmp3670 * 2.96347007322415311136865057051181793212890625e+3 * z + tmp56 * tmp3670 * -3.2546262208931884742924012243747711181640625e+3 + tmp198 * tmp3670 * 1.5338915752405284365522675216197967529296875e+4 + tmp212 * tmp3719 * 3.4136824779939588552224449813365936279296875e+3 * y + tmp219 * tmp7 * tmp3719 * -2.34577830300033165258355438709259033203125e+4 + tmp226 * tmp14 * tmp3719 * -5.9762243919275797452428378164768218994140625e+3 + tmp233 * tmp21 * tmp3719 * -5.814102600184004404582083225250244140625e+4 + tmp240 * tmp28 * tmp3719 * -3.0788500974681155639700591564178466796875e+4 + tmp247 * tmp35 * tmp3719 * 1.44903749359589928644709289073944091796875e+4 + tmp42 * tmp3719 * 8.998666125799138171714730560779571533203125e+3 * z + tmp49 * tmp3719 * -2.15837762855332357503357343375682830810546875e+3 + tmp205 * tmp3719 * 1.0489749206845866865478456020355224609375e+4 + tmp219 * tmp3763 * -7.9145091489850237849168479442596435546875e+3 * y + tmp226 * tmp7 * tmp3763 * -4.297619298829873514478094875812530517578125e+3 + tmp233 * tmp14 * tmp3763 * -3.6394460748627388966269791126251220703125e+3 + tmp240 * tmp21 * tmp3763 * -1.76700041940687924579833634197711944580078125e+2 + tmp247 * tmp28 * tmp3763 * -6.814583279386506546870805323123931884765625e+3 + tmp35 * tmp3763 * -3.72972720405301743085146881639957427978515625e+3 * z + tmp42 * tmp3763 * 5.423292687640742997245979495346546173095703125e+2 + tmp212 * tmp3763 * 5.6591764289461070802644826471805572509765625e+2 + tmp226 * tmp3802 * 1.75396447503114859500783495604991912841796875e+3 * y + tmp233 * tmp7 * tmp3802 * -5.6398178776681461386033333837985992431640625e+3 + tmp240 * tmp14 * tmp3802 * 4.5933270835433504544198513031005859375e+3 + tmp247 * tmp21 * tmp3802 * 2.34904035252849325843271799385547637939453125e+3 + tmp28 * tmp3802 * -1.9049080159450359133188612759113311767578125e+3 * z + tmp35 * tmp3802 * 7.929259720218824440962634980678558349609375e+2 + tmp219 * tmp3802 * 2.3259669282514806809558649547398090362548828125e+2 + tmp233 * tmp3836 * -2.8585752222517425025216653011739253997802734375e+2 * y + tmp240 * tmp7 * tmp3836 * -6.493934622707987500689341686666011810302734375e+2 + tmp247 * tmp14 * tmp3836 * -9.9977621505944307500612922012805938720703125e+2 + tmp21 * tmp3836 * 7.98660887368331486868555657565593719482421875e+1 * z + tmp28 * tmp3836 * 8.27886182528766454424840048886835575103759765625e+1 + tmp226 * tmp3836 * -4.777593477712571257143281400203704833984375e+2 + tmp240 * tmp3865 * 2.977550553870204339546035043895244598388671875e+2 * y + tmp247 * tmp7 * tmp3865 * 5.7826802635265408980558277107775211334228515625e+1 + tmp14 * tmp3865 * -3.19466599982074441754775762092322111129760742188e+1 * z + tmp21 * tmp3865 * -1.94448401828137278357644390780478715896606445312e+1 + tmp233 * tmp3865 * 3.52655094199027852042149788758251816034317016602e+0 + tmp247 * tmp3889 * -6.5845299554512251916094101034104824066162109375e+1 * y + tmp7 * tmp3889 * -2.42424916979261766414310841355472803115844726562e+1 * z + tmp14 * tmp3889 * -1.78320103293753859929893224034458398818969726562e+1 + tmp240 * tmp3889 * 4.077843129885712869509006850421428680419921875e+1 + tmp3908 * -1.18923647099609706145884047145955264568328857422e+1 * y * z + tmp7 * tmp3908 * 1.32242112002705436424321305821649730205535888672e+1 + tmp247 * tmp3908 * 1.49522929849522157041974423918873071670532226562e+1 + tmp3922 * 2.34174854619518546527956459613051265478134155273e+0 * y + tmp3922 * 9.04514018432027167015974100650055333971977233887e-1 * z + D.17848 * -1.68295515076845758617452020189375616610050201416e-1 + tmp263 * y * 2.44392153459739969179054241976700723171234130859e+0 + tmp3 * tmp7 * 4.93345174860685844464569527190178632736206054688e+1 + tmp9 * tmp14 * 8.24720252171355205916825070744380354881286621094e+0 + tmp16 * tmp21 * -1.9126235671034720553507213480770587921142578125e+2 + tmp23 * tmp28 * 4.316295746607764840518939308822154998779296875e+2 + tmp30 * tmp35 * 1.52199653640183896641246974468231201171875e+3 + tmp37 * tmp42 * -1.417924435742725108866579830646514892578125e+3 + tmp44 * tmp49 * 1.12212855482832528650760650634765625e+4 + tmp51 * tmp56 * -3.2432832132926720078103244304656982421875e+4 + tmp58 * tmp63 * 3.82009238930551000521518290042877197265625e+4 + tmp65 * tmp70 * -8.23258735196742345578968524932861328125e+4 + tmp72 * tmp77 * 1.14637278647861327044665813446044921875e+5 + tmp79 * tmp84 * 1.20300110761476520565338432788848876953125e+5 + tmp86 * tmp91 * -3.809750085251752170734107494354248046875e+4 + tmp93 * tmp98 * -5.87331939325498606194742023944854736328125e+4 + tmp100 * tmp105 * 1.5174465636768660624511539936065673828125e+5 + tmp107 * tmp112 * 5.7061912117433804087340831756591796875e+5 + tmp114 * tmp119 * 1.0945829384530000970698893070220947265625e+5 + tmp121 * tmp126 * 2.73575341893493314273655414581298828125e+5 + tmp128 * tmp133 * -2.663092194049791432917118072509765625e+5 + tmp135 * tmp140 * -1.7987658674338340642862021923065185546875e+5 + tmp142 * tmp147 * 9.581333776664166362024843692779541015625e+4 + tmp149 * tmp154 * -7.4135173588056395601597614586353302001953125e+3 + tmp156 * tmp161 * 1.3765767618855574983172118663787841796875e+5 + tmp163 * tmp168 * -1.29963515349221575888805091381072998046875e+5 + tmp170 * tmp175 * 1.024760051369240391068160533905029296875e+5 + tmp177 * tmp182 * 1.1774167185705955489538609981536865234375e+4 + tmp184 * tmp189 * -4.54049751975205363123677670955657958984375e+4 + tmp191 * tmp196 * 2.202691293171887446078471839427947998046875e+4 + tmp198 * tmp203 * -6.9031785646773232656414620578289031982421875e+3 + tmp205 * tmp210 * 2.587270389266899655922316014766693115234375e+3 + tmp212 * tmp217 * 1.584923463841544389651971869170665740966796875e+3 + tmp219 * tmp224 * 3.169149462572121365155908279120922088623046875e+2 + tmp226 * tmp231 * 5.80735738826707120097125880420207977294921875e+2 + tmp233 * tmp238 * 3.1680722189976205527273123152554035186767578125e+2 + tmp240 * tmp245 * 8.85777261495995782425438846985343843698501586914e-1 + tmp247 * tmp252 * -6.73539987844872634070725325727835297584533691406e+0 + tmp258 * -2.01161588801537272175323778355959802865982055664e+0 * z + D.17965 * -2.58248045144082116753025957223144359886646270752e-1 + D.17968 * 3.27889198207912357929671998135745525360107421875e+0;
}

Perhaps little more cureful code placement in SRA or local register
pressure pass at the end of SSA path would do?

Honza
Comment 13 Jan Hubicka 2006-07-22 18:09:18 UTC
Created attachment 11919 [details]
bug2.c.099t.optimized
Comment 14 Jan Hubicka 2006-07-22 19:30:44 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
with the attached patch I can cure the regmove quadratic behaviour and
the time report is not so unresonable now:

 gnu_dev_major gnu_dev_minor gnu_dev_makedev max min f fx fy fz add addl addr sub subl subr mul mull mulr divl ipow fi
Analyzing compilation unitPerforming intraprocedural optimizations
Assembling functions:
 max min add addl addr sub subl subr mul mull mulr divl ipow fz fy fx f fi {GC 126177k -> 85112k} {GC 327625k -> 39474k}
Execution times (seconds)
 garbage collection    :   0.83 ( 0%) usr   0.00 ( 0%) sys   0.82 ( 0%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.16 ( 0%) usr   0.02 ( 1%) sys   0.16 ( 0%) wall    1147 kB ( 0%) ggc
 callgraph optimization:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     533 kB ( 0%) ggc
 ipa reference         :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa type escape       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 trivially dead code   :   0.45 ( 0%) usr   0.00 ( 0%) sys   0.42 ( 0%) wall       0 kB ( 0%) ggc
 life analysis         :  21.38 ( 3%) usr   0.02 ( 1%) sys  21.39 ( 3%) wall    1120 kB ( 0%) ggc
 life info update      :   0.54 ( 0%) usr   0.00 ( 0%) sys   0.61 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis        :   0.87 ( 0%) usr   0.00 ( 0%) sys   0.89 ( 0%) wall    4266 kB ( 1%) ggc
 register scan         :   0.42 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall     150 kB ( 0%) ggc
 rebuild jump labels   :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing         :   0.27 ( 0%) usr   0.06 ( 2%) sys   0.36 ( 0%) wall     471 kB ( 0%) ggc
 lexical analysis      :   0.04 ( 0%) usr   0.05 ( 2%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 parser                :   0.12 ( 0%) usr   0.03 ( 1%) sys   0.17 ( 0%) wall    3207 kB ( 1%) ggc
 inline heuristics     :  15.14 ( 2%) usr   0.01 ( 0%) sys  15.26 ( 2%) wall    1486 kB ( 0%) ggc
 integration           :  21.35 ( 3%) usr   0.12 ( 4%) sys  21.71 ( 3%) wall   33445 kB ( 8%) ggc
 tree gimplify         :   0.18 ( 0%) usr   0.01 ( 0%) sys   0.19 ( 0%) wall    3341 kB ( 1%) ggc
 tree eh               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1338 kB ( 0%) ggc
 tree CFG cleanup      :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall      20 kB ( 0%) ggc
 tree VRP              :   0.38 ( 0%) usr   0.01 ( 0%) sys   0.42 ( 0%) wall      11 kB ( 0%) ggc
 tree copy propagation :   0.23 ( 0%) usr   0.01 ( 0%) sys   0.28 ( 0%) wall     222 kB ( 0%) ggc
 tree store copy prop  :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.14 ( 0%) wall       4 kB ( 0%) ggc
 tree find ref. vars   :   0.10 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall    8137 kB ( 2%) ggc
 tree PTA              :   1.29 ( 0%) usr   0.04 ( 1%) sys   1.36 ( 0%) wall      57 kB ( 0%) ggc
 tree alias analysis   :   1.89 ( 0%) usr   0.20 ( 7%) sys   2.10 ( 0%) wall       0 kB ( 0%) ggc
 tree PHI insertion    :   1.68 ( 0%) usr   0.01 ( 0%) sys   1.70 ( 0%) wall      18 kB ( 0%) ggc
 tree SSA rewrite      :   0.62 ( 0%) usr   0.04 ( 1%) sys   0.65 ( 0%) wall   17084 kB ( 4%) ggc
 tree SSA other        :   0.48 ( 0%) usr   0.08 ( 3%) sys   0.56 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA incremental  :   1.20 ( 0%) usr   0.00 ( 0%) sys   1.24 ( 0%) wall       0 kB ( 0%) ggc
 tree operand scan     :   1.48 ( 0%) usr   0.34 (11%) sys   1.93 ( 0%) wall   15634 kB ( 4%) ggc
 dominator optimization:   1.05 ( 0%) usr   0.05 ( 2%) sys   1.05 ( 0%) wall    2698 kB ( 1%) ggc
 tree SRA              :   1.05 ( 0%) usr   0.09 ( 3%) sys   1.15 ( 0%) wall   24835 kB ( 6%) ggc
 tree STORE-CCP        :   0.09 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall       4 kB ( 0%) ggc
 tree CCP              :   0.51 ( 0%) usr   0.02 ( 1%) sys   0.56 ( 0%) wall     154 kB ( 0%) ggc
 tree reassociation    :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 tree PRE              : 296.46 (45%) usr   0.49 (16%) sys 298.81 (45%) wall   19481 kB ( 5%) ggc
 tree FRE              :   0.96 ( 0%) usr   0.05 ( 2%) sys   1.00 ( 0%) wall    7991 kB ( 2%) ggc
 tree forward propagate:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   0.54 ( 0%) usr   0.00 ( 0%) sys   0.54 ( 0%) wall       0 kB ( 0%) ggc
 tree aggressive DCE   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       8 kB ( 0%) ggc
 tree SSA uncprop      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA to normal    :  27.19 ( 4%) usr   0.01 ( 0%) sys  27.33 ( 4%) wall      22 kB ( 0%) ggc
 tree rename SSA copies:   0.15 ( 0%) usr   0.01 ( 0%) sys   0.16 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   2.96 ( 0%) usr   0.09 ( 3%) sys   3.05 ( 0%) wall   24095 kB ( 6%) ggc
 jump                  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 CSE                   :   1.87 ( 0%) usr   0.00 ( 0%) sys   1.88 ( 0%) wall     118 kB ( 0%) ggc
 global CSE            :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 CPROP 1               :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall    1620 kB ( 0%) ggc
 PRE                   :  21.36 ( 3%) usr   0.01 ( 0%) sys  21.41 ( 3%) wall     200 kB ( 0%) ggc
 CPROP 2               :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall     390 kB ( 0%) ggc
 bypass jumps          :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.37 ( 0%) wall     389 kB ( 0%) ggc
 CSE 2                 :   1.05 ( 0%) usr   0.00 ( 0%) sys   1.07 ( 0%) wall      72 kB ( 0%) ggc
 branch prediction     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       1 kB ( 0%) ggc
 flow analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 combiner              :   0.87 ( 0%) usr   0.01 ( 0%) sys   0.88 ( 0%) wall    1745 kB ( 0%) ggc
 if-conversion         :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       3 kB ( 0%) ggc
 regmove               :  21.69 ( 3%) usr   0.02 ( 1%) sys  21.78 ( 3%) wall       2 kB ( 0%) ggc
 local alloc           :   7.60 ( 1%) usr   0.00 ( 0%) sys   7.62 ( 1%) wall    1480 kB ( 0%) ggc
 global alloc          :  16.47 ( 2%) usr   0.35 (12%) sys  16.91 ( 3%) wall   16915 kB ( 4%) ggc
 reload CSE regs       : 107.52 (16%) usr   0.15 ( 5%) sys 108.55 (16%) wall    4783 kB ( 1%) ggc
 flow 2                :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall     225 kB ( 0%) ggc
 peephole 2            :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall       0 kB ( 0%) ggc
 rename registers      :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) wall       0 kB ( 0%) ggc
 scheduling 2          :  75.09 (11%) usr   0.53 (18%) sys  76.86 (12%) wall  206227 kB (51%) ggc
 machine dep reorg     :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall       0 kB ( 0%) ggc
 reorder blocks        :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall      15 kB ( 0%) ggc
 reg stack             :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall      37 kB ( 0%) ggc
 final                 :   0.66 ( 0%) usr   0.02 ( 1%) sys   0.74 ( 0%) wall    1156 kB ( 0%) ggc
 TOTAL                 : 659.57             2.99           668.06             407297 kB

PRE is somewhat slow, but I will leave this to Danny.

For scheduling the situation is quite clear - we have huge basic blocks
and produce huge amount of dependencies.  For reload, I am also not
really surprised since the code produces is regalloc nightmare and
reload manages to create very huge bitmaps that results in quadratic
behaviour.

Since Danny asked for allocpools:

Alloc-pool Kind        Pools  Allocated      Peak        Leak
-------------------------------------------------------------
Value sets                18    2230608    1929200          0
Bitmap sets               18       9504       8432          0
Value set nodes           18    2032208    1768488          0
Binary tree nodes         18    1291320     783992          0
value                     48    3875872    1246744          0
et_occ pool              127     238144      48040          0
et_node pool             127     159680      36024          0
Reference tree nodes      18    1430880    1437864          0
Expression tree nodes     18     426240     428840          0
elt_list                  48    3639816     397672          0
List tree nodes           18     511488     516880          0
elt_loc_list              48   14186784     975240          0
Comparison tree nodes     18       4520       4832          0
original_copy             26         48         88          0
Constraint pool          108    4335432    1501136          0
Unary tree nodes          18         96        968          0
Variable info pool       108   12261704    4550848          0
Constraint edges         108       2112        496          0
operand entry pool        36        512        248          0
cselib_val_list           48   11627616     974144          0
-------------------------------------------------------------
Total                    994   58264584

Memory consumption is now dominated by scheduler's dependency info:

ggc-common.c:193 (ggc_calloc)                       6303224: 1.9%    5139976:12.3%    1863696: 8.8%    1073688:21.8%        530
gimplify.c:453 (create_tmp_var_raw)                 7325032: 2.2%          0: 0.0%     889240: 4.2%          0: 0.0%      93344
genrtl.c:17 (gen_rtx_fmt_ee)                        9819384: 2.9%          0: 0.0%     138900: 0.7%          0: 0.0%     829857
tree-dfa.c:186 (create_stmt_ann)                    9970168: 2.9%     763932: 1.8%       3692: 0.0%          0: 0.0%     206496
tree-ssanames.c:147 (make_ssa_name)                 9740544: 2.9%          0: 0.0%    2373936:11.2%          0: 0.0%     252385
bitmap.c:139 (bitmap_element_allocate)             18876340: 5.6%          0: 0.0%          0: 0.0%          0: 0.0%     674155
genrtl.c:32 (gen_rtx_fmt_ue)                      193579104:57.2%          0: 0.0%          0: 0.0%          0: 0.0%   16131592
Total                                             338496482         41839722         21146495          4929007         22457179

I am now looking into -O3 compilation that creases at into-ssa by overly
large stack.

Honza
Comment 15 Jan Hubicka 2006-07-22 19:30:44 UTC
Created attachment 11920 [details]
regmovefix
Comment 16 Jan Hubicka 2006-07-22 20:51:21 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
with the attached patch that saves roughly 10 minutes of tree-into-ssa
pass, I can compile with -O3 -fno-tree-fre -fno-tree-pre.  Only without
checking-enabled since we do incredibly deep dominator walks running out
of stack space that can be considered as bug too. 
TER still manages to enfore few thousdand temporaries with overlapping
liveranges.

THe out-of-ssa pass spends most of time in calculate_live_on_exit
and calculate_live_on_entry that looks rather symmetric to problem cured
by the attached patch, but I don't see directly how to avoid the
quadratic behaviour there.

Honza

 garbage collection    :   1.22 ( 0%) usr   0.10 ( 1%) sys   8.40 ( 1%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.14 ( 0%) usr   0.03 ( 0%) sys   0.18 ( 0%) wall    1147 kB ( 0%) ggc
 callgraph optimization:   0.07 ( 0%) usr   0.01 ( 0%) sys   0.45 ( 0%) wall     533 kB ( 0%) ggc
 ipa reference         :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa type escape       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 cfg cleanup           :   3.89 ( 1%) usr   0.01 ( 0%) sys   4.11 ( 0%) wall    1576 kB ( 1%) ggc
 trivially dead code   :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%) wall       0 kB ( 0%) ggc
 life analysis         :  51.34 ( 9%) usr   2.65 (21%) sys  73.91 ( 5%) wall    2653 kB ( 1%) ggc
 life info update      :  48.97 ( 9%) usr   0.14 ( 1%) sys  50.68 ( 4%) wall     641 kB ( 0%) ggc
 alias analysis        :   0.69 ( 0%) usr   0.00 ( 0%) sys   1.05 ( 0%) wall    4139 kB ( 1%) ggc
 register scan         :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall       0 kB ( 0%) ggc
 rebuild jump labels   :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing         :   0.37 ( 0%) usr   0.06 ( 0%) sys   0.34 ( 0%) wall     471 kB ( 0%) ggc
 lexical analysis      :   0.01 ( 0%) usr   0.05 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 parser                :   0.09 ( 0%) usr   0.02 ( 0%) sys   0.18 ( 0%) wall    3207 kB ( 1%) ggc
 inline heuristics     :  14.79 ( 3%) usr   0.02 ( 0%) sys  14.86 ( 1%) wall    1118 kB ( 0%) ggc
 integration           :  17.07 ( 3%) usr   0.22 ( 2%) sys  17.36 ( 1%) wall   79483 kB (27%) ggc
 tree gimplify         :   0.15 ( 0%) usr   0.01 ( 0%) sys   0.17 ( 0%) wall    3341 kB ( 1%) ggc
 tree eh               :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1338 kB ( 0%) ggc
 tree CFG cleanup      :   4.27 ( 1%) usr   0.00 ( 0%) sys   4.27 ( 0%) wall      20 kB ( 0%) ggc
 tree VRP              :   1.26 ( 0%) usr   0.03 ( 0%) sys   1.33 ( 0%) wall      14 kB ( 0%) ggc
 tree copy propagation :   0.85 ( 0%) usr   0.05 ( 0%) sys   0.94 ( 0%) wall     313 kB ( 0%) ggc
 tree store copy prop  :   0.27 ( 0%) usr   0.01 ( 0%) sys   0.28 ( 0%) wall       5 kB ( 0%) ggc
 tree find ref. vars   :   0.16 ( 0%) usr   0.03 ( 0%) sys   0.18 ( 0%) wall   12044 kB ( 4%) ggc
 tree PTA              :   1.55 ( 0%) usr   0.06 ( 0%) sys   1.63 ( 0%) wall      57 kB ( 0%) ggc
 tree alias analysis   :   2.81 ( 0%) usr   0.29 ( 2%) sys   3.10 ( 0%) wall       0 kB ( 0%) ggc
 tree PHI insertion    :   0.57 ( 0%) usr   0.92 ( 7%) sys   1.52 ( 0%) wall    3137 kB ( 1%) ggc
 tree SSA rewrite      :   2.33 ( 0%) usr   0.06 ( 0%) sys   5.02 ( 0%) wall   21592 kB ( 7%) ggc
 tree SSA other        :   0.41 ( 0%) usr   0.16 ( 1%) sys   0.65 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA incremental  :   4.18 ( 1%) usr   0.45 ( 4%) sys   4.72 ( 0%) wall     520 kB ( 0%) ggc
 tree operand scan     :   1.79 ( 0%) usr   0.69 ( 5%) sys  39.97 ( 3%) wall   18374 kB ( 6%) ggc
 dominator optimization:   2.91 ( 1%) usr   0.05 ( 0%) sys   2.99 ( 0%) wall   11155 kB ( 4%) ggc
 tree SRA              :   4.24 ( 1%) usr   0.15 ( 1%) sys   4.51 ( 0%) wall   25568 kB ( 9%) ggc
 tree STORE-CCP        :   0.29 ( 0%) usr   0.01 ( 0%) sys   0.31 ( 0%) wall      18 kB ( 0%) ggc
 tree CCP              :   0.87 ( 0%) usr   0.01 ( 0%) sys   2.39 ( 0%) wall     154 kB ( 0%) ggc
 tree split crit edges :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.14 ( 0%) wall    9284 kB ( 3%) ggc
 tree reassociation    :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall       0 kB ( 0%) ggc
 tree code sinking     :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) wall       0 kB ( 0%) ggc
 tree linearize phis   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate:   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   1.13 ( 0%) usr   0.00 ( 0%) sys   1.11 ( 0%) wall       0 kB ( 0%) ggc
 tree aggressive DCE   :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall       1 kB ( 0%) ggc
 PHI merge             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 complete unrolling    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree loop init        :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall       0 kB ( 0%) ggc
 tree copy headers     :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA uncprop      :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA to normal    : 228.94 (40%) usr   0.64 ( 5%) sys 337.06 (25%) wall   10323 kB ( 4%) ggc
 tree rename SSA copies:   0.49 ( 0%) usr   0.03 ( 0%) sys   0.51 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.26 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   2.63 ( 0%) usr   0.09 ( 1%) sys   2.85 ( 0%) wall       0 kB ( 0%) ggc
 control dependences   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   6.10 ( 1%) usr   1.13 ( 9%) sys 192.49 (14%) wall   35008 kB (12%) ggc
 jump                  :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 CSE                   :   0.89 ( 0%) usr   0.01 ( 0%) sys   0.89 ( 0%) wall      53 kB ( 0%) ggc
 loop analysis         :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall     930 kB ( 0%) ggc
 CPROP 1               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 CSE 2                 :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.46 ( 0%) wall      29 kB ( 0%) ggc
 branch prediction     :   0.55 ( 0%) usr   0.00 ( 0%) sys   0.56 ( 0%) wall       0 kB ( 0%) ggc
 flow analysis         :  37.33 ( 6%) usr   0.10 ( 1%) sys  53.59 ( 4%) wall       0 kB ( 0%) ggc
 combiner              :   1.02 ( 0%) usr   0.02 ( 0%) sys   1.37 ( 0%) wall    2685 kB ( 1%) ggc
 if-conversion         :   5.21 ( 1%) usr   0.00 ( 0%) sys   5.36 ( 0%) wall    1614 kB ( 1%) ggc
 regmove               :   0.72 ( 0%) usr   0.01 ( 0%) sys   0.83 ( 0%) wall       4 kB ( 0%) ggc
 mode switching        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 local alloc           :   1.06 ( 0%) usr   0.02 ( 0%) sys   1.46 ( 0%) wall    1045 kB ( 0%) ggc
 global alloc          :  86.33 (15%) usr   4.12 (32%) sys 452.97 (34%) wall    8488 kB ( 3%) ggc
 reload CSE regs       :  24.86 ( 4%) usr   0.07 ( 1%) sys  28.13 ( 2%) wall    3370 kB ( 1%) ggc
 load CSE after reload :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 flow 2                :   0.36 ( 0%) usr   0.01 ( 0%) sys   1.19 ( 0%) wall    5064 kB ( 2%) ggc
 if-conversion 2       :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2            :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall       0 kB ( 0%) ggc
 rename registers      :   0.38 ( 0%) usr   0.05 ( 0%) sys   0.50 ( 0%) wall       1 kB ( 0%) ggc
 scheduling 2          :   2.10 ( 0%) usr   0.07 ( 1%) sys   2.40 ( 0%) wall    4347 kB ( 1%) ggc
 machine dep reorg     :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall      79 kB ( 0%) ggc
 reorder blocks        :   0.63 ( 0%) usr   0.01 ( 0%) sys   1.06 ( 0%) wall    2738 kB ( 1%) ggc
 reg stack             :   1.07 ( 0%) usr   0.02 ( 0%) sys   1.53 ( 0%) wall   11030 kB ( 4%) ggc
 final                 :   1.06 ( 0%) usr   0.04 ( 0%) sys   1.18 ( 0%) wall    2182 kB ( 1%) ggc
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 : 575.62            12.78          1351.48             291955 kB
Comment 17 Jan Hubicka 2006-07-22 20:51:21 UTC
Created attachment 11921 [details]
intossaspeedup
Comment 18 patchapp@dberlin.org 2006-07-24 00:05:17 UTC
Subject: Bug number PR28071

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01011.html
Comment 19 Jan Hubicka 2006-07-24 11:24:01 UTC
Subject: Bug 28071

Author: hubicka
Date: Mon Jul 24 11:23:21 2006
New Revision: 115712

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115712
Log:
	PR rtl-optimization/28071
	* ipa-inline.c (update_caller_keys): Remove edges that
	are no longer inline candidates.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline.c

Comment 20 Jan Hubicka 2006-07-24 11:28:14 UTC
Subject: Bug 28071

Author: hubicka
Date: Mon Jul 24 11:27:53 2006
New Revision: 115713

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115713
Log:
	PR rtl-optimization/28071
	* tree-cfg.c (tree_split_block): Do not allocate new stmt_list nodes.
	* tree-iterator.c (tsi_split_statement_list_before): Do not crash when
	splitting before first stmt.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-iterator.c

Comment 21 Jan Hubicka 2006-07-24 11:54:08 UTC
OK, some summary ;)

Mainline (after the first three patches) at -O now peaks 450MB (just because of register allocator's conflict matrix, otherwise it is about 150MB).  Still not quite icc's 12 seconds/200MB, but we are out of regression land for -O relative to 4.0.I tested 3.0 and it bombs on the testcase, 2.95 however compile it quite fluently on 200MB peak, it needs 6 minutes however.

 life analysis         :  25.92 (16%) usr   0.01 ( 0%) sys  26.18 (15%) wall    2565 kB ( 1%) ggc
 inline heuristics     :  15.15 ( 9%) usr   0.01 ( 0%) sys  15.27 ( 9%) wall    1486 kB ( 1%) ggc
 integration           :  21.37 (13%) usr   0.12 ( 5%) sys  21.66 (13%) wall   33445 kB (19%) ggc
 tree SSA to normal    :  27.73 (17%) usr   0.03 ( 1%) sys  27.93 (16%) wall      17 kB ( 0%) ggc
 local alloc           :   7.33 ( 4%) usr   0.03 ( 1%) sys   7.41 ( 4%) wall    1855 kB ( 1%) ggc
 global alloc          :  13.67 ( 8%) usr   0.73 (32%) sys  15.85 ( 9%) wall   14178 kB ( 8%) ggc
 reload CSE regs       :  30.88 (19%) usr   0.04 ( 2%) sys  31.09 (18%) wall    2393 kB ( 1%) ggc
 TOTAL                 : 164.46             2.27           169.53             173593 kB

It would be interesting to see how dataflow branch score here after re-merging from mainline.  Hopefully integration and register allocation issues should be tracked there.

The inliner is still quadratic in time because of quadratic split_block and
cgraph_node.  Both can be made linear quite easilly (split_block by always renumbering the smaller area of block and cgraph_node by producing hashtables for nodes with many edges), but I am not sure I want to do that for 4.2.
Inline heuristics might be trickier to get in speed.

I duno about reload. Oprofile might be handy ;)

-O2 expose problem in PRE DannyB has fix for.  Regmove and into-SSA can also be significantly sped up by patches I attached and will commit them once testing converge.

-O3 turns the testcase into quite different one (gigantic basic block is turned into many basic blocks by inlining min/max functions).
There few problems are still visible - FRE consume unbounded amount of memory
and we fail to synthetize fmin/fmax operators where we ought to.

If the FRE problem is fixed, I would say it should no longer be considered as 4.2 blocker.

Honza
Comment 22 patchapp@dberlin.org 2006-07-25 18:20:36 UTC
Subject: Bug number PR rtl-optimization/28071

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01083.html
Comment 23 Jan Hubicka 2006-07-26 22:52:06 UTC
Subject: Bug 28071

Author: hubicka
Date: Wed Jul 26 22:51:56 2006
New Revision: 115765

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115765
Log:
	PR rtl-optimization/28071
	* regmove.c (reg_is_remote_constant_p): Avoid quadratic behaviour.
	(reg_set_in_bb, max_reg_computed): New static variables.
	(regmove_optimize): Free the new array.
	(fixup_match_1): Update call of reg_is_remote_constant_p.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/regmove.c

Comment 24 patchapp@dberlin.org 2006-07-27 07:15:18 UTC
Subject: Bug number PR rtl-optimization/28071

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01144.html
Comment 25 patchapp@dberlin.org 2006-07-27 07:20:16 UTC
Subject: Bug number PR rtl-optimization/28071

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01145.html
Comment 26 patchapp@dberlin.org 2006-07-27 07:25:16 UTC
Subject: Bug number PR rtl-optimization/28071

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01146.html
Comment 27 patchapp@dberlin.org 2006-07-27 08:00:21 UTC
Subject: Bug number PR rtl-optimization/28071

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01147.html
Comment 28 Jan Hubicka 2006-07-27 16:02:38 UTC
Subject: Bug 28071

Author: hubicka
Date: Thu Jul 27 16:02:27 2006
New Revision: 115776

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115776
Log:
	PR rtl-optimization/28071
	* global.c (greg_obstack): New obstack.
	(allocate_bb_info): Use it.
	(free_bb_info): Likewise.
	(modify_reg_pav): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/global.c

Comment 29 Jan Hubicka 2006-07-27 16:03:32 UTC
Subject: Bug 28071

Author: hubicka
Date: Thu Jul 27 16:03:22 2006
New Revision: 115777

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115777
Log:
	PR rtl-optimization/28071
	* cselib.c (cselib_process_insn): Don't remove useless values too
	often for very large hashtables.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cselib.c

Comment 30 Jan Hubicka 2006-07-27 17:10:16 UTC
Subject: Bug 28071

Author: hubicka
Date: Thu Jul 27 17:10:07 2006
New Revision: 115779

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115779
Log:
	PR rtl-optimization/28071
	* hashtab.c (htab_empty): Clear out n_deleted/n_elements;
	downsize the hashtable.

Modified:
    trunk/libiberty/ChangeLog
    trunk/libiberty/hashtab.c

Comment 31 patchapp@dberlin.org 2006-07-28 09:30:12 UTC
Subject: Bug number PR rtl-optimization/28071

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01185.html
Comment 32 Jan Hubicka 2006-07-28 09:41:39 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
I've added this testcase to our's memory regression tester (see
gcc-regression mainling list), so hopefully the quadratic memory
consumption issues will be tracked now.  It would be nice to have
runtime benchmark variant of the test we can track the runtime and
compilation time.  It seems to uncover quite interesting behaviours
across the compiler.

Honza
Comment 33 Jan Hubicka 2006-07-29 13:14:51 UTC
Subject: Bug 28071

Author: hubicka
Date: Sat Jul 29 13:14:22 2006
New Revision: 115810

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115810
Log:

	PR rtl-optimization/28071
	* cfgrtl.c (rtl_delete_block): Free regsets.
	* flow.c (allocate_bb_life_data): Re-use regsets if available.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgrtl.c
    trunk/gcc/flow.c

Comment 34 patchapp@dberlin.org 2006-07-30 05:45:16 UTC
Subject: Bug number PR rtl-optimization/28071

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01221.html
Comment 35 Eric Botcazou 2006-08-11 07:17:41 UTC
Jan, I'm assigning it to you since you have already spent a fair amount of time
on it and made significant progress.  Thanks for tackling the hard stuff.
Comment 36 Zdenek Dvorak 2006-08-16 21:25:48 UTC
Subject: Bug 28071

Author: rakdver
Date: Wed Aug 16 21:25:39 2006
New Revision: 116190

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116190
Log:
	PR rtl-optimization/28071
	* basic-block.h (bb_dom_dfs_in, bb_dom_dfs_out): Declare.
	* dominance.c (bb_dom_dfs_in, bb_dom_dfs_out): New functions.
	* tree-into-ssa.c (struct dom_dfsnum): New.
	(cmp_dfsnum, find_dfsnum_interval, prune_unused_phi_nodes): New
	functions.
	(insert_phi_nodes_for): Use prune_unused_phi_nodes instead of
	compute_global_livein.
	(prepare_block_for_update, prepare_use_sites_for): Mark the uses
	in phi nodes in the correct blocks.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/basic-block.h
    trunk/gcc/dominance.c
    trunk/gcc/tree-into-ssa.c

Comment 37 Jan Hubicka 2006-08-18 23:10:01 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
to summary current process, the memory consumption seems to be in
control now:

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
  Ovarall memory allocated via mmap and sbrk decreased from 146456k to 134136k, overall -9.18%
  Peak amount of GGC memory allocated before garbage collecting run decreased from 95412k to 81628k, overall -16.89%
  Amount of produced GGC garbage decreased from 163295k to 143524k, overall -13.77%
    Overall memory needed: 146456k -> 134136k
    Peak memory use before GGC: 95412k -> 81628k
    Peak memory use after GGC: 58507k
    Maximum of released memory in single GGC run: 45493k
    Garbage: 163295k -> 143524k
    Leak: 7142k
    Overhead: 29023k -> 25103k
    GGC runs: 87

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
    Overall memory needed: 430308k -> 424700k
    Peak memory use before GGC: 201177k
    Peak memory use after GGC: 196173k
    Maximum of released memory in single GGC run: 100203k -> 95156k
    Garbage: 279198k -> 271636k
    Leak: 47195k
    Overhead: 31459k -> 29952k
    GGC runs: 105

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
    Overall memory needed: 350424k -> 344820k
    Peak memory use before GGC: 208293k
    Peak memory use after GGC: 196536k
    Maximum of released memory in single GGC run: 101565k -> 96536k
    Garbage: 394891k -> 387353k
    Leak: 47778k
    Overhead: 49054k -> 47552k
    GGC runs: 111

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
    Overall memory needed: 535696k -> 536260k
    Peak memory use before GGC: 314602k
    Peak memory use after GGC: 292946k
    Maximum of released memory in single GGC run: 163430k
    Garbage: 494953k -> 486928k
    Leak: 65110k
    Overhead: 60330k -> 58798k
    GGC runs: 100

I will post short summary of remaining bottleneks on each optimization
level.

Honza
Comment 38 Jan Hubicka 2006-08-19 00:19:22 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

At -O0 we get time sinks:
 life analysis         :   0.75 (10%) usr   0.01 ( 3%) sys   0.78 ( 9%) wall    2714 kB ( 4%) ggc
 expand                :   1.46 (15%) usr   0.04 (11%) sys   1.66 (15%) wall   37656 kB (58%) ggc
 local alloc           :   1.40 (14%) usr   0.04 (11%) sys   1.45 (13%) wall    1293 kB ( 2%) ggc
 global alloc          :   3.55 (36%) usr   0.05 (14%) sys   3.67 (34%) wall    7509 kB (12%) ggc
 final                 :   0.96 (10%) usr   0.04 (11%) sys   1.00 ( 9%) wall    1157 kB ( 2%) ggc
 TOTAL                 :   9.95             0.35            10.77              64543 kB

Expand seems resonable given that almost everything is call that has
long representation. 

Global alloc is copying important portion of insn stream because of:

      /* If we aren't replacing things permanently and we changed something,
         make another copy to ensure that all the RTL is new.  Otherwise
         things can go wrong if find_reload swaps commutative operands
         and one is inside RTL that has been copied while the other is not.  */
      new_body = old_body;
      if (! replace)
        {
          new_body = copy_insn (old_body);
          if (REG_NOTES (insn))
            REG_NOTES (insn) = copy_insn_1 (REG_NOTES (insn));
        }

and few other occurences of copy_insn in reload1.c.  They seems to copy
quite a lot of unnecesary RTL "just for sure".  Also virtual register
ellimination produce a lot of duplicated RTL, perhaps it can be cached?

global alloc also spend 50% of time by clearing out
reg_has_output_reload.  I am testing patch that fix that.

 global alloc          :   1.51 (19%) usr   0.07 (20%) sys   1.60 (18%) wall    7509 kB (12%) ggc

Final is spending all it's time in shorten branches, that are not needed
at all.

Honza
Comment 39 Jan Hubicka 2006-08-19 01:51:10 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

The -O1 time sinks:

 life analysis         :  25.44 (19%) usr   0.00 ( 0%) sys  25.49 (17%) wall    2565 kB ( 2%) ggc
 inline heuristics     :  14.92 (11%) usr   0.00 ( 0%) sys  14.95 (10%) wall    1486 kB ( 1%) ggc
 integration           :  20.73 (15%) usr   0.10 ( 4%) sys  22.72 (15%) wall   33445 kB (20%) ggc
 tree SSA to normal    :  27.97 (20%) usr   0.04 ( 2%) sys  28.13 (19%) wall      17 kB ( 0%) ggc
 expand                :   2.56 ( 2%) usr   0.04 ( 2%) sys   2.67 ( 2%) wall   24100 kB (14%) ggc
 local alloc           :   7.21 ( 5%) usr   0.03 ( 1%) sys   7.18 ( 5%) wall    1855 kB ( 1%) ggc
 global alloc          :  11.76 ( 9%) usr   0.99 (39%) sys  17.71 (12%) wall   11029 kB ( 6%) ggc
 reload CSE regs       :   7.91 ( 6%) usr   0.02 ( 1%) sys   7.97 ( 5%) wall    2393 kB ( 1%) ggc
 TOTAL                 : 136.62             2.56           148.01             170448 kB

tree SSA to normal spends most of time in find_value_in_list because TER
is shuffling around single linked lists in the quadratic way.  I got
quickly lost in the logic there.  Andrew, can you take a look, please?

integration runs into qudratic behaviour of cgraph_edge.  Implementing
hashtable for large cgraphs is easy, I will do so.  Also
tree_split_block quadratic behaviour hits us here.

reload CSE regs has hard time to track all the stack slot memory
locations.  It is working harder than needed because a lot of memories
are believed to be aliasing even if theoretically almost everything SRA
and has no address taken so it should have unique alias sets.

Life analysis spends most of time in dead store removal code.  Again
lowering --param might help.  I am also testing little patch to cut it
to 13 seconds by speeding up reg_overlap_mentioned_p.  It would be
insteresting to see how dataflow branch score here.

inline heuristics spends most time checking inline_function_growth
limit, I will need to think about it a bit.

Honza
Comment 40 Andrew Macleod 2006-08-19 21:58:18 UTC
I'll take a look. On the new out-of-ssa branch I've already converted the coalesce list from a linked list type linear algorithm to a hash table, as well as changed the live on entry and live on exit implementations to be more efficient.  I didn't bother with TER because its due to be removed on the new branch... eventually :-)   I'll take a peek and see how much work it is to change that.

Andrew
Comment 41 Jan Hubicka 2006-08-20 00:58:59 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Thank you for consideration,
Live on entry/exit code shows up high on -O3 compilation time too
(something like 30% of time without PRE/FRE I believe).  So if it is
self contained change, perhaps pushing it to mainline as PR fix would
make sense.

Honza
Comment 42 Jan Hubicka 2006-08-21 00:00:31 UTC
Subject: Bug 28071

Author: hubicka
Date: Mon Aug 21 00:00:14 2006
New Revision: 116277

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116277
Log:
	PR rtl-optimization/28071
	* reload1.c (reg_has_output_reload): Turn into regset.
	(reload_as_needed, forget_old_reloads_1, forget_marked_reloads,
	choose_reload_regs, emit_reload_insns): Update to new
	reg_has_output_reload.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/reload1.c

Comment 43 Jan Hubicka 2006-08-21 01:42:49 UTC
Subject: Bug 28071

Author: hubicka
Date: Mon Aug 21 01:42:39 2006
New Revision: 116284

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116284
Log:
	PR rtl-optimization/28071
	* tree-optimize.c (tree_rest_of_compilation): Do not remove edges
	twice.
	* tree-inline.c (copy_bb): Use cgraph_set_call_stmt.
	* ipa-inline.c (cgraph_check_inline_limits): Add one_only argument.
	(cgraph_decide_inlining, cgraph_decide_inlining_of_small_function,
	cgraph_decide_inlining_incrementally): Update use of
	cgraph_check_inline_limits.
	* cgraph.c (edge_hash, edge_eq): New function.
	(cgraph_edge, cgraph_set_call_stmt, cgraph_create_edge,
	cgraph_edge_remove_caller, cgraph_node_remove_callees,
	cgraph_remove_node): Maintain call site hash.
	* cgraph.h (struct cgraph_node): Add call_site_hash.
	(cgraph_set_call_stmt): New function.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cgraph.c
    trunk/gcc/cgraph.h
    trunk/gcc/ipa-inline.c
    trunk/gcc/tree-inline.c
    trunk/gcc/tree-optimize.c

Comment 44 Jan Hubicka 2006-08-21 02:59:16 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
update at -O1 few patches later (different machine with "only" 500MB
ram, so some swappin occurs, but we almost fit now):
 life analysis         :  23.50 (20%) usr   0.00 ( 0%) sys  23.51 (17%) wall    2565 kB ( 2%) ggc
 inline heuristics     :   0.60 ( 1%) usr   0.00 ( 0%) sys   0.60 ( 0%) wall    1561 kB ( 1%) ggc
 integration           :   5.75 ( 5%) usr   0.04 ( 2%) sys   5.79 ( 4%) wall   33701 kB (20%) ggc
 tree SSA rewrite      :   0.51 ( 0%) usr   0.01 ( 1%) sys   0.53 ( 0%) wall   17087 kB (10%) ggc
 tree SRA              :   0.98 ( 1%) usr   0.08 ( 4%) sys   1.10 ( 1%) wall   24835 kB (15%) ggc
 tree SSA to normal    :  45.11 (39%) usr   0.02 ( 1%) sys  45.14 (33%) wall      17 kB ( 0%) ggc
 local alloc           :   5.82 ( 5%) usr   0.01 ( 1%) sys   5.85 ( 4%) wall    1855 kB ( 1%) ggc
 global alloc          :   9.83 ( 8%) usr   0.76 (39%) sys  23.49 (17%) wall   11029 kB ( 6%) ggc
 reload CSE regs       :   7.30 ( 6%) usr   0.03 ( 2%) sys  10.16 ( 7%) wall    2393 kB ( 1%) ggc
 TOTAL                 : 116.65             1.96           136.52             170783 kB
Life analysis is almost completely code tracking dead stores after
reload (we have many stack slots).  Tree-SSA to normal is the SRA
problem discussed, integration is split_block, global alloc allocate
very huge conflict matrix, reload CSE regs has similar problem tracking
memories.  No idea about local alloc.

Honza
Comment 45 Jan Hubicka 2006-08-21 12:56:09 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
-O2 times:
Execution times (seconds)
 life analysis         :  18.08 ( 3%) usr   0.04 ( 1%) sys  19.42 ( 3%) wall    1120 kB ( 0%) ggc
 integration           :   5.97 ( 1%) usr   0.07 ( 2%) sys   6.13 ( 1%) wall   33701 kB ( 8%) ggc
 tree PRE              : 233.01 (43%) usr   0.46 (13%) sys 241.22 (37%) wall   19480 kB ( 5%) ggc
 tree SSA to normal    :  51.26 ( 9%) usr   0.07 ( 2%) sys  52.62 ( 8%) wall      22 kB ( 0%) ggc
 expand                :   2.62 ( 0%) usr   0.07 ( 2%) sys   9.45 ( 1%) wall   24095 kB ( 6%) ggc
 PRE                   :  20.39 ( 4%) usr   0.05 ( 1%) sys  21.70 ( 3%) wall     200 kB ( 0%) ggc
 regmove               :  97.32 (18%) usr   0.17 ( 5%) sys 107.36 (16%) wall       2 kB ( 0%) ggc
 local alloc           :   6.28 ( 1%) usr   0.00 ( 0%) sys   6.29 ( 1%) wall    1480 kB ( 0%) ggc
 global alloc          :  13.12 ( 2%) usr   0.71 (21%) sys  62.79 (10%) wall   13764 kB ( 3%) ggc
 reload CSE regs       :  16.16 ( 3%) usr   0.02 ( 1%) sys  19.21 ( 3%) wall    4783 kB ( 1%) ggc
 scheduling 2          :  60.85 (11%) usr   0.57 (17%) sys  67.94 (10%) wall  206199 kB (51%) ggc
 TOTAL                 : 547.14             3.41           651.49             404467 kB

Danny has fix for PRE scheduled for 4.2. Regmove hits again the same
predicate function sincle we now produce big basic blocks.  This can be
fixed rather easilly rather by limiting walk in that predicate or
assiging INSN consetuctive indexes.  Scheduling has problem moving
around linked lists of dependencies and fixing it seems to need to go
away from log links and thus it is 4.2 issue too.

One detail that just came to mind...  All memory consumed in scheduling
are log links. Producing 206MB of them for 24MB function is rather
dense. Can't we prune them out somewhat perhaps by accounting
transitivity (at least in special cases)?  The instructions are all
really mostly independent, but we apparently lose track of the fact
somewhere and producing almost complette tournament apparently.

Honza
Comment 46 Jan Hubicka 2006-08-21 17:11:44 UTC
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Hi,
for completeness the -O3 -fno-tree-pre -fno-tree-fre results
(tree-pre/fre needs something little over 2GB of ram to converge)

Execution times (seconds)
 garbage collection    :   1.11 ( 1%) usr   0.07 ( 2%) sys   8.57 ( 5%) wall       0 kB ( 0%) ggc
 life analysis         :   5.47 ( 4%) usr   0.12 ( 3%) sys   5.63 ( 3%) wall    2701 kB ( 1%) ggc
 life info update      :   2.05 ( 2%) usr   0.00 ( 0%) sys   2.10 ( 1%) wall     643 kB ( 0%) ggc
 integration           :   8.36 ( 7%) usr   0.18 ( 5%) sys   8.61 ( 5%) wall   79611 kB (27%) ggc
 tree CFG cleanup      :   3.69 ( 3%) usr   0.00 ( 0%) sys   3.77 ( 2%) wall      20 kB ( 0%) ggc
 tree alias analysis   :   2.64 ( 2%) usr   0.25 ( 6%) sys   3.01 ( 2%) wall       0 kB ( 0%) ggc
 tree SSA rewrite      :   2.17 ( 2%) usr   0.02 ( 1%) sys   2.22 ( 1%) wall   21589 kB ( 7%) ggc
 tree SSA incremental  :   4.04 ( 3%) usr   0.01 ( 0%) sys   4.10 ( 2%) wall    1061 kB ( 0%) ggc
 tree operand scan     :   1.54 ( 1%) usr   0.54 (14%) sys   1.95 ( 1%) wall   18382 kB ( 6%) ggc
 dominator optimization:   2.49 ( 2%) usr   0.06 ( 2%) sys   2.61 ( 1%) wall   11262 kB ( 4%) ggc
 tree SRA              :   3.04 ( 2%) usr   0.08 ( 2%) sys   3.12 ( 2%) wall   25600 kB ( 9%) ggc
 tree SSA to normal    :  38.17 (31%) usr   0.09 ( 2%) sys  38.56 (21%) wall   11214 kB ( 4%) ggc
 dominance computation :   2.40 ( 2%) usr   0.05 ( 1%) sys   2.52 ( 1%) wall       0 kB ( 0%) ggc
 expand                :   4.22 ( 3%) usr   0.20 ( 5%) sys  11.38 ( 6%) wall   35690 kB (12%) ggc
 global alloc          :  13.43 (11%) usr   1.28 (32%) sys  54.13 (29%) wall    5873 kB ( 2%) ggc
 flow 2                :   0.37 ( 0%) usr   0.01 ( 0%) sys   0.78 ( 0%) wall    5092 kB ( 2%) ggc
 TOTAL                 : 123.25             3.98           183.52             291674 kB

Note that the testcase is very different at -O3, because min/max
functions are inlined breaking gigantic basic blocks into number of
small BBs, so many of bottlenecks visible at -O2 go away.  I duno what
happens in global alloc, tree SSA to normal is the
live_on_entry/live_on_exit dicussed.  We also have problems with very
deep recursion levels as dominator tree is deep.  I am thinking about
implementing iterators for walking in dom order as the current fully
blown domtree walker is bit uneasy in some cases.

With FRE/PRE enabled also GGC runs out of stack frame size, because some
of temporary values in annotations leaks and instruct GGC to recurse
insanely.

Honza
Comment 47 Andrew Macleod 2006-08-25 01:37:21 UTC
Created attachment 12135 [details]
patch to resolve some of the SSA to Normal slowdowns.

By re-implementing the live on entry/exit code, I get the following improvement at -O3:

   tree SSA to normal    :  32.08 (35%) usr   0.08 ( 1%) sys  32.92 (28%) wall to
   tree SSA to normal    :  16.19 (22%) usr   0.08 ( 1%) sys  16.33 (13%) wall
 the remaining SSA to normal time is the fault of TER at both -O3 and -O2.
 
I'm not so sure this is stage 3 material, but I could be convinced.  I'll attach the patch, but I'll post a full breakdown of what was implemented in a note to gcc-patches. 

It has been bootstrapped on i686-pc-linux-gnu  with no new regressions.
Comment 48 Andrew Macleod 2006-08-25 01:42:34 UTC
Created attachment 12136 [details]
Patch for the remaining SSA to Normal time issues

I've attached a patch to address the slowdowns in TER. Again, not sure this is stage 3, but I'll send a note to gcc-patches with the full breakdown, but basically I replaced the expression linked lists with bitmaps.

This patch has been bootstrapped on 1686-pc-linux-gnu with no new regressions.

at -O2 timings go from:
   tree SSA to normal    :  30.79 (19%) usr   0.06 ( 2%) sys  31.89 (19%) wall
to
   tree SSA to normal    :   1.33 ( 1%) usr   0.02 ( 1%) sys   1.86 ( 1%) wall

and at -O3:
   tree SSA to normal    :  32.08 (35%) usr   0.08 ( 1%) sys  32.92 (28%) wall
to
   tree SSA to normal    :  18.75 (24%) usr   0.06 ( 1%) sys  18.83 (23%) wall

when combined with the previous live on entry/exit patch, I get the following times at -O2 :
   tree SSA to normal    :  30.79 (19%) usr   0.06 ( 2%) sys  31.89 (19%) wall
to
   tree SSA to normal    :   1.16 ( 1%) usr   0.01 ( 0%) sys   1.17 ( 1%) wall

and at -O3:
   tree SSA to normal    :  32.08 (35%) usr   0.08 ( 1%) sys  32.92 (28%) wall
to
   tree SSA to normal    :   2.50 ( 4%) usr   0.08 ( 1%) sys   2.61 ( 2%) wall
Comment 49 Andrew Macleod 2006-08-25 01:56:49 UTC
links to the 2 notes on gcc-patches:

live range changes:  http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00895.html

TER changes:  http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00896.html
Comment 50 Andrew Macleod 2006-08-28 17:18:40 UTC
Subject: Bug 28071

Author: amacleod
Date: Mon Aug 28 17:18:33 2006
New Revision: 116511

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116511
Log:


revert 116257 which is the rewrite_liverange_info patch, so be replaced with
the two patches I created for bug 28071.



Modified:
    branches/out-of-ssa-the-sequel/gcc/ChangeLog
    branches/out-of-ssa-the-sequel/gcc/tree-outof-ssa.c
    branches/out-of-ssa-the-sequel/gcc/tree-ssa-live.c
    branches/out-of-ssa-the-sequel/gcc/tree-ssa-live.h

Comment 51 Andrew Macleod 2006-08-28 17:37:15 UTC
Huh. I didn't realize bugzilla scanned the entire checkin message looking for bug numbers....  This has been checked in on a branch, so you can ignore the preceeding note's commentary. it's just a note to myself.
Comment 52 Jan Hubicka 2006-09-12 10:11:14 UTC
Subject: Bug 28071

Author: hubicka
Date: Tue Sep 12 10:11:04 2006
New Revision: 116886

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116886
Log:

	PR rtl-optimization/28071
	* tree-vect-transform.c (vect_create_data_ref_ptr): Kill cast.
	(vect_transform_loop): Likewise.
	* tree-vectorizer.c (new_loop_vec_info): Likewise.
	(new_loop_vec_info): Likewise.
	(destroy_loop_vec_info): Likewise.
	* tree-dfa.c (create_var_ann): Use GCC_CNEW.
	(create_stmt_ann): Likewise.
	(create_tree_ann): Rename to ...
	(create_tree_common_ann): ... this one; allocate only the common part
	of annotations.
	* tree-vn.c (set_value_handle): Use get_tree_common_ann.
	(get_value_handle): Likewise.
	* tree-ssa-pre.c (phi_translate): Delay annotation allocation for
	get_tree_common_ann.
	* tree-vectorizer.h (set_stmt_info): Take stmt annotation.
	(vinfo_for_stmt): Use stmt annotations.
	* tree-flow.h (tree_ann_common_t): New type.
	(tree_common_ann, get_tree_common_ann, create_tree_common_ann): New.
	(tree_ann, get_tree_ann, create_tree_ann): New.
	* tree-flow-inline.h (get_function_ann): Do more type checking.
	(stmt_ann): Likewise.
	(tree_ann): Rename to ...
	(tree_common_ann): ... this one; return ony common_ann
	(get_tree_ann): Rename to ...
	(tree_common_ann): This one; return only common_ann.
	* tree-vect-patterns.c (vect_pattern_recog_1): Update call
	of set_stmt_info.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-dfa.c
    trunk/gcc/tree-flow-inline.h
    trunk/gcc/tree-flow.h
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-vect-patterns.c
    trunk/gcc/tree-vect-transform.c
    trunk/gcc/tree-vectorizer.c
    trunk/gcc/tree-vectorizer.h
    trunk/gcc/tree-vn.c

Comment 53 Steven Bosscher 2006-09-23 09:44:26 UTC
Is this still a regression?
Comment 54 Richard Biener 2006-09-23 10:22:02 UTC
It's at least still a regression on the 4.1 branch, which still does

cc1: out of memory allocating 290995744 bytes after a total of 43593728 bytes

at -O1.  Otherwise we have

3.4.6: 106s
4.0.3: 108s
4.1.2:  OOM
4.2.0:  86s

and 4.2.0 uses a lot less memory than 4.0.3.  So, let's remove the 4.2 regression marker.
Comment 55 Maxim Kuvyrkov 2007-01-10 11:42:43 UTC
Created attachment 12879 [details]
Patch for scheduler dependency lists.

Hi,

This patch introduces new dependency lists to scheduler thus making LOG_LINKs not used in the schedulers.  The patch is preliminary and I will post an updated version to gcc-patches in a few days.

The structure of a change:
As before, we have backward dependencies (INSN_DEPS - replacement for LOG_LINKS) and forward dependencies (INSN_DEPEND).  These lists consist of dep_nodes.
Each dep_node has a pointer to dep_data_node which contains dependency data (data field), dep_node of the backward dep_list (back field) and dep_node of the forward dep_list (forw field).  Thus we can easily get forward dep_node by the backward one and vice versa.
Each dep_node also contains a pointer to the next field of the previous node in the dep_list (to the place where pointer to it is stored) making removal from the list fast and easy.

Changes are mostly just a pattern replacement of macros names.  Patched compiler produces exactly the same output as original (except for one small thing: removal of DEPS_LIST from rtl.def somehow results in different numbering of the registers.  The same occurs if add an additional rtx description to rtl.def.  Don't know why this happens, but will be glad if someone explained.)

Minimal changes to the backends were introduced.
1. ia64 scheduler hook adjust_cost was restored to its original version (as in gcc 4.1)
2. ia64 and rs6000 backends were fixed to walk through the new dependency lists, which they do for their own heuristics. (no other backend do that).
3. rs6000 scheduler hook is_costly_dependency () was changed so that there'll be no need to do a compatibility transformation (as being done for adjust_cost, btw) for a hook that is implemented on a single target.

The patch was bootstrapped on x86_64 and ia64.  Also I've build a cross to powerpc-740.

Results (on x86_64):
scheduler2 is now 4s instead of 12s.
Memory consumption: 11.5M instead of 48M


Thanks,

Maxim
Comment 56 Ayal Zaks 2007-01-15 07:19:05 UTC
(In reply to comment #55)
> Created an attachment (id=12879) [edit]
> Patch for scheduler dependency lists.

Looks like a pretty good cleanup IMHO. Here are some comments.

o dep_def: representing a dependence edge including both producer and consumer is very handy, albeit somewhat redundant as we're usually traversing all cons connected to a pro or vice versa. (I.e., has its pros and cons, but mostly pros I agree - also done in ddg.h/ddg_edge.) Maybe comment why both 'kind' and 'ds' are needed, as one supersedes the other.

o dep_node_def: this is a node in a (doubly-linked) chain, but it represents an *edge* in terms of the data-dependence graph. The prev_nextp field is a "/* Pointer to the next field of the previous node in the list.  */" except for the first node on the list, whose prev_nextp points to itself, right?

o dep_data_node_def: holding the two conjugate dependence edges together is very useful when switching directions. But perhaps most of the accesses go in one direction (e.g. iterating over cons of a pro), and having both conjugates structed together may reduce cache efficiency. So you may consider connecting each dep_node_def to its conjugate, not necessarily forcing them to be placed adjacent in memory.

o To add to the checking routines, the following can be checked: every dep_node_def is pointed-to by either its data->back xor its data->forw, right? If so, this can be used to identify if a dep_node_def is forward or backward; that all nodes on a list are forward (and share the same pro) or backward (and share the same con); and to assert the following regarding L:
+/* Add a dependency described by DEP to the list L.
+   L should be either INSN_DEPS1 or RESOLVED_DEPS1.  */

o insn_cost (insn, dep): maybe it's better to break this into insn_cost (insn) of a producer regardless of consumers, and "dep_cost (dep)".

o The comment explaining what 'resolve_dep' does can be inlined together with its code. 

+/* Detach dep_node N from the list.  */
+static void
+dep_node_detach (dep_node_t n)
+{
+  dep_node_t *prev_nextp = DEP_NODE_PREV_NEXTP (n);
+  dep_node_t next = DEP_NODE_NEXT (n);
+
+  *prev_nextp = next;
+
+  if (next != NULL)
+    DEP_NODE_PREV_NEXTP (next) = prev_nextp;
maybe complete the detachment by adding:
DEP_NODE_PREV_NEXTP (n) = NULL;
DEP_NODE_NEXT (n) = NULL;
+}


+/* Attach NEXT to the next field pointed to by PREV_NEXTP.  */
^^^^^^^^^^^N to appear after node X whose &DEP_NODE_NEXT (X) is given by 
PREV_NEXT_P
+static void
+dep_node_attach (dep_node_t n, dep_node_t *prev_nextp)


better place
+dep_node_check_p (dep_node_t n)
next to
+dep_nodes_check_p (dep_node_t n)


+/* Make a copy of FROM in TO with substitutin consumer with CON.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^substituting consumer with CON.

Ayal.
Comment 57 mkuvyrkov@ispras.ru 2007-01-15 07:52:54 UTC
Subject: Re:  [4.1 regression] A file that can not be
 compiled in reasonable time/space

Thanks!  Very useful comments.  I'm continuing to work on cleaning the 
patch (especially on writing the comments) and making code more 
transparent.  Below are my comments on yours:

zaks at il dot ibm dot com wrote:
> ------- Comment #56 from zaks at il dot ibm dot com  2007-01-15 07:19 -------
> (In reply to comment #55)
>> Created an attachment (id=12879)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12879&action=view) [edit]
>> Patch for scheduler dependency lists.
> 
> Looks like a pretty good cleanup IMHO. Here are some comments.
> 
> o dep_def: representing a dependence edge including both producer and consumer
> is very handy, albeit somewhat redundant as we're usually traversing all cons
> connected to a pro or vice versa.
This allows us to keep all things in one place - one of the things 
current deps don't provide.  I.e., when changing some property of the 
dep we need to find a corresponding to that dep nodes in both backward 
and forward lists and apply the change to two places instead of one.

  (I.e., has its pros and cons, but mostly pros
> I agree - also done in ddg.h/ddg_edge.) Maybe comment why both 'kind' and 'ds'
> are needed, as one supersedes the other.
There will be.  Thanks.

> 
> o dep_node_def: this is a node in a (doubly-linked) chain, but it represents an
> *edge* in terms of the data-dependence graph. The prev_nextp field is a "/*
Right!  I struggled to figure out the correct name and didn't prevail. 
Thanks for the tip.  It'll be dep_edge.

> Pointer to the next field of the previous node in the list.  */" except for the
> first node on the list, whose prev_nextp points to itself, right?
No.  Prev_nextp field of the first node points to deps_list->first. 
This allows us not to distinguish first node from the others.  I'll fix 
the comment.

> 
> o dep_data_node_def: holding the two conjugate dependence edges together is
> very useful when switching directions. But perhaps most of the accesses go in
> one direction (e.g. iterating over cons of a pro), and having both conjugates
> structed together may reduce cache efficiency. So you may consider connecting
> each dep_node_def to its conjugate, not necessarily forcing them to be placed
> adjacent in memory.
Dep_def and both edges were placed in one structure so that they could 
be allocated and freed within a single alloc/free.  As I understand you 
propose putting two pointers inside dep_edge_def: one to the dep_def and 
one to the opposite edge.  Currently we have one pointer in dep_edge_def 
to the dep_data_node which have all that pointers.  And probably I'm 
missing something, but I don't see how your way can improve cache 
efficiency.

> 
> o To add to the checking routines, the following can be checked: every
> dep_node_def is pointed-to by either its data->back xor its data->forw, right?
> If so, this can be used to identify if a dep_node_def is forward or backward;
> that all nodes on a list are forward (and share the same pro) or backward (and
> share the same con); and to assert the following regarding L:
> +/* Add a dependency described by DEP to the list L.
> +   L should be either INSN_DEPS1 or RESOLVED_DEPS1.  */
Good idea.

> 
> o insn_cost (insn, dep): maybe it's better to break this into insn_cost (insn)
> of a producer regardless of consumers, and "dep_cost (dep)".
Agree.

> 
> o The comment explaining what 'resolve_dep' does can be inlined together with
> its code. 
Agree.

> 
> +/* Detach dep_node N from the list.  */
> +static void
> +dep_node_detach (dep_node_t n)
> +{
> +  dep_node_t *prev_nextp = DEP_NODE_PREV_NEXTP (n);
> +  dep_node_t next = DEP_NODE_NEXT (n);
> +
> +  *prev_nextp = next;
> +
> +  if (next != NULL)
> +    DEP_NODE_PREV_NEXTP (next) = prev_nextp;
> maybe complete the detachment by adding:
> DEP_NODE_PREV_NEXTP (n) = NULL;
> DEP_NODE_NEXT (n) = NULL;
Probably, you are right.

> Ayal.

Thanks,

Maxim


Comment 58 Ayal Zaks 2007-01-15 15:30:55 UTC
(In reply to comment #57)
> Subject: Re:  [4.1 regression] A file that can not be
>  compiled in reasonable time/space
> Thanks!  Very useful comments.  I'm continuing to work on cleaning the 
> patch (especially on writing the comments)

Enjoy! One suggestion that may help explain the data-structure, is to provide a drawing of ddn with its dep and nodes connected.

> > o dep_node_def: this is a node in a (doubly-linked) chain, but it represents an
> > *edge* in terms of the data-dependence graph. The prev_nextp field is a "/*
> Right!  I struggled to figure out the correct name and didn't prevail. 
> Thanks for the tip.  It'll be dep_edge.
Ah, on second thought, perhaps the important property of this struct is the fact that it's a link on a forward or backward chain; so how about dep_link?


> > Pointer to the next field of the previous node in the list.  */" except for the
> > first node on the list, whose prev_nextp points to itself, right?
> No.  Prev_nextp field of the first node points to deps_list->first. 
> This allows us not to distinguish first node from the others.  I'll fix 
> the comment.
Ah, right.

> > 
> > o dep_data_node_def: holding the two conjugate dependence edges together is
> > very useful when switching directions. But perhaps most of the accesses go in
> > one direction (e.g. iterating over cons of a pro), and having both conjugates
> > structed together may reduce cache efficiency. So you may consider connecting
> > each dep_node_def to its conjugate, not necessarily forcing them to be placed
> > adjacent in memory.
> Dep_def and both edges were placed in one structure so that they could 
> be allocated and freed within a single alloc/free.  As I understand you 
> propose putting two pointers inside dep_edge_def: one to the dep_def and 
> one to the opposite edge.  Currently we have one pointer in dep_edge_def 
> to the dep_data_node which have all that pointers.  And probably I'm 
> missing something, but I don't see how your way can improve cache 
> efficiency.
You're right. There's probably not much to gain if anything paying an extra pointer to save the fields of the conjugate dep_node. Perhaps only place dep_def between back and forw (been too much into struct-reorg, I guess :). It does seem wasteful to hold two 'data' pointers for such nearby offsets ... ;)

And another note: INSN_DEPS may be renamed INSN_BACK_DEPS to better distinguish it from INSN_DEPEND (which in turn might be called INSN_FORW_DEPS). And maybe INSN_RESOLVED_BACK_DEPS for consistency.

Ayal.
Comment 59 Jan Hubicka 2007-01-18 09:51:52 UTC
Subject: Re:  [4.1 regression] A file that can not be compiled in reasonable time/space

Hi,
just as heads up, the early inlining change made inliner to now fully
inline to the function at -O2 (orignally we stopped because of inline
unit growth doing just few of inlines).  This enables more optimizations
and reduces memory usage of all other passes except for scheduler, that
increases.  So we have roughly peak of 60MB GGC memory without
scheduling, 360MB with scheduling, so this patch would be even more
greatly appreciated ;)

http://www.suse.de/~aj/SPEC/amd64/memory/pr28071-O2.rep

Honza
Comment 60 Jan Hubicka 2007-02-06 22:05:11 UTC
Hi,
small update on status.  At -O3 -fno-tree-fre -fno-tree-pre we are now doing 1.1GB footprint, 800MB of this out of gimple.  We still explode in FRE/PRE but majority of other problems was fixed:
Execution times (seconds)
 garbage collection    :  18.23 (12%) usr   0.04 ( 1%) sys  18.46 (10%) wall       0 kB ( 0%) ggc
 callgraph construction:  10.31 ( 7%) usr   0.04 ( 1%) sys  10.36 ( 5%) wall    2296 kB ( 0%) ggc
 life analysis         :   4.08 ( 3%) usr   0.16 ( 3%) sys   4.26 ( 2%) wall    7350 kB ( 2%) ggc
 inline heuristics     :  10.46 ( 7%) usr   0.12 ( 2%) sys  10.57 ( 6%) wall    2438 kB ( 1%) ggc
 integration           :  16.48 (11%) usr   0.46 ( 9%) sys  17.00 ( 9%) wall  143049 kB (29%) ggc
 tree CFG cleanup      :   4.69 ( 3%) usr   0.00 ( 0%) sys   4.69 ( 2%) wall       0 kB ( 0%) ggc
 tree SSA incremental  :   2.32 ( 2%) usr   0.40 ( 8%) sys   2.76 ( 1%) wall    3276 kB ( 1%) ggc
 tree operand scan     :   1.42 ( 1%) usr   0.22 ( 4%) sys   1.54 ( 1%) wall   27071 kB ( 6%) ggc
 dominator optimization:   2.25 ( 2%) usr   0.00 ( 0%) sys   2.24 ( 1%) wall   14657 kB ( 3%) ggc
 tree split crit edges :   0.39 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) wall   17558 kB ( 4%) ggc
 tree SSA to normal    :   8.06 ( 5%) usr   0.40 ( 8%) sys   8.51 ( 4%) wall   22874 kB ( 5%) ggc
 expand                :   3.83 ( 3%) usr   0.69 (14%) sys  38.08 (20%) wall   54312 kB (11%) ggc
 forward prop          :   3.20 ( 2%) usr   0.82 (16%) sys   4.22 ( 2%) wall    2470 kB ( 1%) ggc
 if-conversion         :   6.37 ( 4%) usr   0.00 ( 0%) sys   6.41 ( 3%) wall    9157 kB ( 2%) ggc
 global alloc          :  12.12 ( 8%) usr   0.94 (19%) sys  15.48 ( 8%) wall   18801 kB ( 4%) ggc
 TOTAL                 : 147.90             5.02           191.03             486834 kB

We get considerable usage in bitmaps (just those over 100MB of peak memory usage are listed):
df-problems.c:2957 (df_chain_create_bb)  208MB
df-problems.c:986 (df_rd_alloc)  207MB
df-problems.c:987 (df_rd_alloc)  110MB
tree-ssa-live.c:534 (new_tree_live_info)  110MB
tree-ssa-live.c:538 (new_tree_live_info)  110MB

At least 100MB, but probably more is consumed by the new linked lists used by scheduler.  Hopefully this can be tracked by moving everyting to allocpools.

I will send -O2 in separate post.
Honza
Comment 61 Jan Hubicka 2007-02-06 22:14:45 UTC
Also forgot to mention, integration is slow because of the split_block quadraticness.

For -O2:
We need 531MB of ram, GGC memory is peaking at 100MB, large portion of the non-GGC memory are definitly the scheduler dependency lists.

xecution times (seconds)
 garbage collection    :  14.26 ( 5%) usr   0.03 ( 1%) sys  14.27 ( 5%) wall       0 kB ( 0%) ggc
 life analysis         :  73.96 (24%) usr   1.55 (46%) sys  75.52 (24%) wall    7207 kB ( 2%) ggc
 alias analysis        :   0.92 ( 0%) usr   0.00 ( 0%) sys   0.87 ( 0%) wall    8530 kB ( 3%) ggc
 inline heuristics     :  11.64 ( 4%) usr   0.12 ( 4%) sys  11.77 ( 4%) wall    2695 kB ( 1%) ggc
 integration           :  16.71 ( 5%) usr   0.19 ( 6%) sys  16.91 ( 5%) wall   69808 kB (21%) ggc
 tree gimplify         :   0.49 ( 0%) usr   0.07 ( 2%) sys   0.58 ( 0%) wall   14977 kB ( 4%) ggc
 tree operand scan     :   1.25 ( 0%) usr   0.11 ( 3%) sys   1.29 ( 0%) wall   20889 kB ( 6%) ggc
 tree SRA              :   1.20 ( 0%) usr   0.07 ( 2%) sys   1.37 ( 0%) wall   40364 kB (12%) ggc
 tree FRE              :   1.14 ( 0%) usr   0.07 ( 2%) sys   1.21 ( 0%) wall    9230 kB ( 3%) ggc
 expand                :   3.29 ( 1%) usr   0.10 ( 3%) sys   3.39 ( 1%) wall   45828 kB (14%) ggc
 PRE                   :  21.54 ( 7%) usr   0.00 ( 0%) sys  21.54 ( 7%) wall     898 kB ( 0%) ggc
 regmove               :  93.59 (30%) usr   0.05 ( 1%) sys  93.64 (30%) wall     156 kB ( 0%) ggc
 local alloc           :   5.34 ( 2%) usr   0.00 ( 0%) sys   5.33 ( 2%) wall    2838 kB ( 1%) ggc
 global alloc          :   4.25 ( 1%) usr   0.06 ( 2%) sys   4.30 ( 1%) wall   19946 kB ( 6%) ggc
 reload CSE regs       :   4.09 ( 1%) usr   0.00 ( 0%) sys   4.11 ( 1%) wall   11354 kB ( 3%) ggc
 scheduling 2          :  16.97 ( 6%) usr   0.44 (13%) sys  17.53 ( 6%) wall   20069 kB ( 6%) ggc
 TOTAL                 : 308.36             3.39           312.58             334207 kB
total: 531915 kB

regmove has the quadratic loop issues I added param for earliler in the track, but the parameter is now apparently bit too large since rest of compiler is a lot faster.  Scheduler/out-of-SSA slowness is gone.

There are no overly large bitmaps, one large allocpool:
df_scan_ref pool          18   74449440   67061984          0

Looks like we are in pretty good shape on this one, only IMO important problems being the slowness of life (hopefully fixed by DFA) and memory houngryness of scheduler.

Honza
Comment 62 Paolo Bonzini 2007-03-26 16:50:18 UTC
dataflow branch cannot complete this at -O3 -fno-tree-pre -fno-tree-fre
Comment 63 Maxim Kuvyrkov 2007-04-16 16:04:30 UTC
Subject: Bug 28071

Author: mkuvyrkov
Date: Mon Apr 16 16:04:18 2007
New Revision: 123874

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123874
Log:
PR middle-end/28071
* sched-int.h (struct deps): Split field 'pending_lists_length' into
'pending_read_list_length' and 'pending_write_list_length'.  Update
comment.
* sched-deps.c (add_insn_mem_dependence): Change signature.  Update
to handle two length counters instead of one.  Update all uses.
(flush_pending_lists, sched_analyze_1, init_deps): Update to handle
two length counters instead of one.
* sched-rgn.c (propagate_deps): Update to handle two length counters
instead of one.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/sched-deps.c
    trunk/gcc/sched-int.h
    trunk/gcc/sched-rgn.c

Comment 64 Maxim Kuvyrkov 2007-04-16 16:07:13 UTC
(In reply to comment #63)

Scheduler memory hungryness should be fixed by the above commit.
Comment 65 Jan Hubicka 2007-04-17 19:16:37 UTC
I can confirm that at -O2, memory consumption dropped from 0.5GB to 0.28GB, that is indeed good improvement. To summarize http://www.suse.de/~gcctest/memory/results/200704171438/pr28071-O2.rep

Compile time wise major offenders are:
PRE                   : 259.18 (34%) usr   0.00 ( 0%) sys 259.18 (34%) wall    1421 kB ( 1%) ggc
scheduling 2          : 366.76 (49%) usr   0.00 ( 0%) sys 366.82 (49%) wall    3062 kB ( 1%) ggc

There is a lot of non-GGC memory. Major allocpool offender is:
df_scan_ref pool          36  130400160   58647984          0
d

bitmaps:
tree-ssa-pre.c:549 (bitmap_set_new)        95283   14667640    8814400    8798320    9704128
reload1.c:518 (new_insn_chain)             90286    8425760    8425760    8425760        761
tree-ssa-pre.c:548 (bitmap_set_new)        95283   20190640    9860640    9826200    3268384
tree-ssa-structalias.c:879 (add_pred_grap  94816    7585280    7585280    7585280     189632

Thanks,
Honza
Comment 66 Jan Hubicka 2007-04-17 19:38:40 UTC
Subject: Re:  [4.1 regression] A file that can not be compiled in reasonable time/space

Just to add some explanation to the numbers, df_scan_ref_pool is 50MB,
the bitmaps quoted are 8MB each.  Given nature of the testcase, I think
we are doing satisfactory job at -O2. At -O3 there are still problems
(the testcase -O2 has one huge BB, at -O3 we have many BBs). PRE explode
completely and we need over 1.2GB for -O3 -fno-tree-pre -fno-tree-fre.
What is also killing us at -O3 are the bitmaps.
385MB:
df-problems.c:2951 (df_chain_create_bb)    40198  386574160  385195560
385195560     462958
200MB
f-problems.c:984 (df_rd_alloc)            40198  385290320  208450840
0          0
110MB
df-problems.c:985 (df_rd_alloc)            40198  201714640  110324160
0          0
tree-ssa-live.c:540 (new_tree_live_info)   31939  114031520  113098360
0      84523
tree-ssa-live.c:536 (new_tree_live_info)   31939  113096920  113092320
0      80895

Honza
Comment 67 Mark Mitchell 2007-05-14 22:25:34 UTC
Will not be fixed in 4.2.0; retargeting at 4.2.1.
Comment 68 David Fang 2007-05-14 22:49:23 UTC
Audit trail shows that this isn't a problem with 4.2.  Target -> 4.1.3?
Comment 69 Mark Mitchell 2007-10-09 19:21:06 UTC
Change target milestone to 4.2.3, as 4.2.2 has been released.
Comment 70 Eric Botcazou 2007-11-03 08:07:27 UTC
> Audit trail shows that this isn't a problem with 4.2.  Target -> 4.1.3?

Yes, this has been fixed in the 4.2 series according to comment #54.
Comment 71 GCC Commits 2023-07-28 08:40:49 UTC
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:095eb138f736d94dabf9a07a6671bd351be0e66a

commit r14-2851-g095eb138f736d94dabf9a07a6671bd351be0e66a
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Jul 28 09:39:46 2023 +0100

    PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
    
    This patch is one of a series of fixes for PR rtl-optimization/110587,
    a compile-time regression with -O0, that attempts to address the underlying
    cause.  As noted previously, the pathological test case pr28071.c contains
    a large number of useless register-to-register moves that can produce
    quadratic behaviour (in LRA).  These moves are generated during RTL
    expansion in emit_group_load_1, where the middle-end attempts to simplify
    the source before calling extract_bit_field.  This is reasonable if the
    source is a complex expression (from before the tree-ssa optimizers), or
    a SUBREG, or a hard register, but it's not particularly useful to copy
    a pseudo register into a new pseudo register.  This patch eliminates that
    redundancy.
    
    The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
    777K lines, with this patch it contains 717K lines, i.e. saving about 60K
    lines (admittedly of debugging text output, but it makes the point).
    
    2023-07-28  Roger Sayle  <roger@nextmovesoftware.com>
                Richard Biener  <rguenther@suse.de>
    
    gcc/ChangeLog
            PR middle-end/28071
            PR rtl-optimization/110587
            * expr.cc (emit_group_load_1): Simplify logic for calling
            force_reg on ORIG_SRC, to avoid making a copy if the source
            is already in a pseudo register.