The following file compiles in 30s with "gcc -c" and never compiles with "gcc -O -c"
Created attachment 11687 [details] a file that gcc can not compile with -O just try gcc -c -O on this file ! (remark no problem with icc)
It actually does finish for me at -O with gcc 4.0.2. It just takes an incredible amount of time and memory, but that doesn't surprise me so much, given the nature of this evil test case ;-) With gcc 4.2 20060617, I can't compile the test case. After a long time and after using up to 1.5 GB, the compiler dies with: cc1: out of memory allocating 399751872 bytes after a total of 79527936 bytes
Caused by excessive inlining: inline heuristics : 37.25 (25%) usr 0.04 ( 1%) sys 36.56 (15%) wall 2312 kB ( 0%) ggc integration : 19.91 (13%) usr 1.49 (36%) sys 62.70 (26%) wall 1058857 kB (76%) ggc
Platform independent. Honza, one for you I suppose.
Same with 4.1. 4.0.3 needs about 500MB ram at -O, while 4.1 get's killed with cc1: out of memory allocating 1134939624 bytes after a total of 43368448 bytes (though that first number looks "interesting")
Btw, we do not die during inlining, but during optimization which is confronted with one gigantic basic block, as all BBs after inlining are merged by fixupcfg ;) Oh, and we die during RTL optimizations... but I wonder why we are not able to free up some memory before (lower gc params help for this, and we enter greg with 250MB used and it still wants cc1: out of memory allocating 1134939624 bytes after a total of 43487232 bytes So, more something for Matz/Vladimir.
Just for comparison: on my Intel dual core 3GHz, icc compiles in 15s within 200Mb with -O3 (including cpp)
Hmm, the function fi contains 30000 calls, many of called functions contains further calls. Since our metric allows to replace each call by up to 10 instructions and we allow fi to grow twice, we can end up with 600000 instructions in single basic block (in fact we do with roughly 390000 in the inliner metrics). This is still linear growth and the testcase is rather extreme, so I am not sure if I would declare this inliner bug (user has asked for it by declaring stuff inline after all ;) Without inlining we are not behaving much better (I am just running the compilation and it is at 900MB, so using 1GB for inlined function bodies don't seems to be that unresonable. I will try to play with this a bit. One solution might be to adjust our size estimates to be less aggressive for large functions so the growth in actual number of statements is not 20 fold at most but some smaller constant, but it is rather ugly. Honza
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space hubicka at gcc dot gnu dot org a écrit : > ------- Comment #8 from hubicka at gcc dot gnu dot org 2006-07-21 21:11 ------- > Hmm, > the function fi contains 30000 calls, many of called functions contains further > calls. > Since our metric allows to replace each call by up to 10 instructions and we > allow fi to grow twice, we can end up with 600000 instructions in single basic > block (in fact we do with roughly 390000 in the inliner metrics). This is > still linear growth and the testcase is rather extreme, so I am not sure if I > would declare this inliner bug (user has asked for it by declaring stuff inline > after all ;) > > Without inlining we are not behaving much better (I am just running the > compilation and it is at 900MB, so using 1GB for inlined function bodies don't > seems to be that unresonable. I will try to play with this a bit. > > One solution might be to adjust our size estimates to be less aggressive for > large functions so the growth in actual number of statements is not 20 fold at > most but some smaller constant, but it is rather ugly. > > Honza > > > may be a look at the assembly code generated by icc which behave very well on this test case could be usefull ? Christophe
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, this patch makes the -O2 case work pretty well on tree side. Inliner expands code from 8MB to 40MB of GGC memory that seems under control. Aliasing peaks at 85MB that also don't seem completely unresonable. I will need to give it more testing. I believe inliner is always ggc safe but it is easy to be mistaken here. The patch also speeds up the inline heuristic by prunning out the impossible edges early making the priority queue smaller. Also I am quite curious how inliner manages to produce 800MB of garbage... Honza Index: ipa-inline.c =================================================================== *** ipa-inline.c (revision 115645) --- ipa-inline.c (working copy) *************** update_caller_keys (fibheap_t heap, stru *** 413,418 **** --- 413,419 ---- bitmap updated_nodes) { struct cgraph_edge *edge; + const char *failed_reason; if (!node->local.inlinable || node->local.disregard_inline_limits || node->global.inlined_to) *************** update_caller_keys (fibheap_t heap, stru *** 421,426 **** --- 422,441 ---- return; bitmap_set_bit (updated_nodes, node->uid); node->global.estimated_growth = INT_MIN; + + if (!node->local.inlinable) + return; + /* Prune out edges we won't inline into anymore. */ + if (!cgraph_default_inline_p (node, &failed_reason)) + { + for (edge = node->callers; edge; edge = edge->next_caller) + if (edge->aux) + { + fibheap_delete_node (heap, edge->aux); + edge->aux = NULL; + } + return; + } for (edge = node->callers; edge; edge = edge->next_caller) if (edge->inline_failed) Index: tree-inline.c =================================================================== *** tree-inline.c (revision 115645) --- tree-inline.c (working copy) *************** expand_call_inline (basic_block bb, tree *** 2163,2172 **** /* Update callgraph if needed. */ cgraph_remove_node (cg_edge->callee); - /* Declare the 'auto' variables added with this inlined body. */ - record_vars (BLOCK_VARS (id->block)); id->block = NULL_TREE; successfully_inlined = TRUE; egress: input_location = saved_location; --- 2163,2171 ---- /* Update callgraph if needed. */ cgraph_remove_node (cg_edge->callee); id->block = NULL_TREE; successfully_inlined = TRUE; + ggc_collect (); egress: input_location = saved_location; *************** declare_inline_vars (tree block, tree va *** 2556,2562 **** { tree t; for (t = vars; t; t = TREE_CHAIN (t)) ! DECL_SEEN_IN_BIND_EXPR_P (t) = 1; if (block) BLOCK_VARS (block) = chainon (BLOCK_VARS (block), vars); --- 2555,2567 ---- { tree t; for (t = vars; t; t = TREE_CHAIN (t)) ! { ! DECL_SEEN_IN_BIND_EXPR_P (t) = 1; ! gcc_assert (!TREE_STATIC (t) && !TREE_ASM_WRITTEN (t)); ! cfun->unexpanded_var_list = ! tree_cons (NULL_TREE, t, ! cfun->unexpanded_var_list); ! } if (block) BLOCK_VARS (block) = chainon (BLOCK_VARS (block), vars);
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, this avoids inliner to produce quadratically many STMT list nodes, so inlining is now resonably fast. Next offenders are alias info, PRE, regmove, global alloc and schedulers. Index: tree-cfg.c =================================================================== *** tree-cfg.c (revision 115645) --- tree-cfg.c (working copy) *************** tree_redirect_edge_and_branch_force (edg *** 4158,4164 **** static basic_block tree_split_block (basic_block bb, void *stmt) { ! block_stmt_iterator bsi, bsi_tgt; tree act; basic_block new_bb; edge e; --- 4158,4165 ---- static basic_block tree_split_block (basic_block bb, void *stmt) { ! block_stmt_iterator bsi; ! tree_stmt_iterator tsi_tgt; tree act; basic_block new_bb; edge e; *************** tree_split_block (basic_block bb, void * *** 4192,4204 **** } } ! bsi_tgt = bsi_start (new_bb); ! while (!bsi_end_p (bsi)) ! { ! act = bsi_stmt (bsi); ! bsi_remove (&bsi, false); ! bsi_insert_after (&bsi_tgt, act, BSI_NEW_STMT); ! } return new_bb; } --- 4193,4209 ---- } } ! if (bsi_end_p (bsi)) ! return new_bb; ! ! /* Split the statement list - avoid re-creating new containers as this ! brings ugly quadratic memory consumption in the inliner. ! (We are still quadratic since we need to update stmt BB pointers, ! sadly) */ ! new_bb->stmt_list = tsi_split_statement_list_before (&bsi.tsi); ! for (tsi_tgt = tsi_start (new_bb->stmt_list); ! !tsi_end_p (tsi_tgt); tsi_next (&tsi_tgt)) ! set_bb_for_stmt (tsi_stmt (tsi_tgt), new_bb); return new_bb; }
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, I am attaching the .optimized dump of this testcase. It is quite good demonstration on how SRA and TER tends to increase register pressure in code like: ;; Function add (add) Analyzing Edge Insertions. add (x, y) { double r$min; <bb 2>: r$min = x.min + y.min; <retval>.max = x.max + y.max; <retval>.min = r$min; return <retval>; } ;; Function mul (mul) Analyzing Edge Insertions. mul (x, y) { double y$min; double y$max; double x$min; double x$max; double d; double c; double b; double a; <bb 2>: x$max = x.max; x$min = x.min; y$max = y.max; y$min = y.min; a = y$min * x$min; b = y$max * x$min; c = y$min * x$max; d = y$max * x$max; <retval>.max = max (max (a, b), max (c, d)); <retval>.min = min (min (a, b), min (c, d)); return <retval>; } ;; Function fz (fz) fz (x, y, z) { <bb 2>: tmp3 = pow (z, 3.7e+1); tmp7 = pow (y, 2.0e+0); tmp9 = pow (z, 3.6e+1); tmp14 = pow (y, 3.0e+0); tmp16 = pow (z, 3.5e+1); ... tmp3922 = pow (x, 3.8e+1); D.17848 = pow (x, 3.9e+1); D.17965 = pow (y, 3.9e+1); D.17968 = pow (z, 3.9e+1); return tmp3 * x * 2.04629333124046830505449179327115416526794433594e+1 * y + tmp9 * tmp7 * x * 1.63737898728226838329646852798759937286376953125e+2 + tmp16 * tmp14 * x * 3.102825991153964650948182679712772369384765625e+2 + tmp23 * tmp21 * x * -1.38580890184729059910750947892665863037109375e+3 + tmp30 * tmp28 * x * -4.39080063708386560961116629187017679214477539062e+1 + tmp37 * tmp35 * x * 1.737348223038549986085854470729827880859375e+4 + tmp44 * tmp42 * x * -1.069806869373114386689849197864532470703125e+4 + tmp51 * tmp49 * x * -3.542086638969252817332744598388671875e+4 + tmp58 * tmp56 * x * -3.091774346229622824466787278652191162109375e+4 + tmp65 * tmp63 * x * 1.5680886586212887777946889400482177734375e+5 + tmp72 * tmp70 * x * 4.19376520881160162389278411865234375e+5 + tmp79 * tmp77 * x * 2.0111082929561330820433795452117919921875e+5 + tmp86 * tmp84 * x * -4.337742627231603837572038173675537109375e+5 + tmp93 * tmp91 * x * -4.829501801337040960788726806640625e+5 + tmp100 * tmp98 * x * 5.32241994551055715419352054595947265625e+5 + tmp107 * tmp105 * x * 1.8250994926701225340366363525390625e+6 + tmp114 * tmp112 * x * 1.6382205795514374040067195892333984375e+6 + tmp121 * tmp119 * x * 1.1912621023960295133292675018310546875e+5 + tmp128 * tmp126 * x * 8.811503159726611338555812835693359375e+5 + tmp135 * tmp133 * x * 2.690164492243868880905210971832275390625e+5 + tmp142 * tmp140 * x * 2.271892026609037420712411403656005859375e+5 + tmp149 * tmp147 * x * 1.795814638975697453133761882781982421875e+5 + tmp156 * tmp154 * x * -3.94381184819339658133685588836669921875e+5 + tmp163 * tmp161 * x * 7.64450454622797551564872264862060546875e+5 + tmp170 * tmp168 * x * 6.9298171586054741055704653263092041015625e+4 + tmp177 * tmp175 * x * -3.129066099043917492963373661041259765625e+5 + tmp184 * tmp182 * x * -4.0792914801556640304625034332275390625e+5 + tmp191 * tmp189 * x * 7.3512920753349564620293676853179931640625e+4 + tmp198 * tmp196 * x * 3.5470695311840399881475605070590972900390625e+3 + tmp205 * tmp203 * x * -8.8733450804951236932538449764251708984375e+4 + tmp212 * tmp210 * x * -1.3805889644669676272314973175525665283203125e+4 + tmp219 * tmp217 * x * -7.54301319902873729006387293338775634765625e+3 + tmp226 * tmp224 * x * 2.23731170493404579246998764574527740478515625e+3 + tmp233 * tmp231 * x * -3.9037651153389475666699581779539585113525390625e+2 + tmp240 * tmp238 * x * 4.743319333283892547115101478993892669677734375e+2 + tmp247 * tmp245 * x * -6.32641294603530113249689748045057058334350585938e+1 + tmp252 * x * -6.76527508139541300380415123072452843189239501953e+0 * z + tmp258 * x * -4.51436297228304250772623618104262277483940124512e-1 + tmp263 * x * 2.89405090268957065902100111998151987791061401367e+0 + tmp9 * tmp268 * -3.7483157190701700756108039058744907379150390625e+2 * y + tmp16 * tmp7 * tmp268 * 9.276025613194925654170219786465167999267578125e+2 + tmp23 * tmp14 * tmp268 * 1.358400470188729514120495878159999847412109375e+2 + tmp30 * tmp21 * tmp268 * -3.2681330410168111484381370246410369873046875e+3 + tmp37 * tmp28 * tmp268 * 2.77737094612259534187614917755126953125e+3 + tmp44 * tmp35 * tmp268 * 2.2773056570869275674340315163135528564453125e+3 + tmp51 * tmp42 * tmp268 * 9.2295963366692260024137794971466064453125e+4 + tmp58 * tmp49 * tmp268 * -3.049601738325569895096123218536376953125e+5 + tmp65 * tmp56 * tmp268 * -2.69300746038850047625601291656494140625e+5 + tmp72 * tmp63 * tmp268 * 3.92479526798162725754082202911376953125e+5 + tmp79 * tmp70 * tmp268 * -1.4348648827185891568660736083984375e+6 + tmp86 * tmp77 * tmp268 * 1.2925352909364881925284862518310546875e+6 + tmp93 * tmp84 * tmp268 * 3.44742843619707785546779632568359375e+6 + tmp100 * tmp91 * tmp268 * 2.2975221813043109141290187835693359375e+6 + tmp107 * tmp98 * tmp268 * -8.753704570182035677134990692138671875e+5 + tmp114 * tmp105 * tmp268 * -4.683100195028461515903472900390625e+6 + tmp121 * tmp112 * tmp268 * -2.4950389851368105155415832996368408203125e+5 + tmp128 * tmp119 * tmp268 * 4.864730415365164168179035186767578125e+6 + tmp135 * tmp126 * tmp268 * -4.660151695632442715577781200408935546875e+5 + tmp142 * tmp133 * tmp268 * -6.7161351688091107644140720367431640625e+5 + tmp149 * tmp140 * tmp268 * -1.4141434789546797401271760463714599609375e+5 + tmp156 * tmp147 * tmp268 * -1.5259173265962512232363224029541015625e+6 + tmp163 * tmp154 * tmp268 * -7.40285312171890516765415668487548828125e+5 + tmp170 * tmp161 * tmp268 * 1.072791414269997738301753997802734375e+6 + tmp177 * tmp168 * tmp268 * -4.951253421552001382224261760711669921875e+5 + tmp184 * tmp175 * tmp268 * -1.05241366402662693872116506099700927734375e+5 + tmp191 * tmp182 * tmp268 * 2.0352227243198428186587989330291748046875e+5 + tmp198 * tmp189 * tmp268 * 1.3298028337804946932010352611541748046875e+5 + tmp205 * tmp196 * tmp268 * -6.6668077510616494691930711269378662109375e+4 + tmp212 * tmp203 * tmp268 * -5.17525810794326171162538230419158935546875e+4 + tmp219 * tmp210 * tmp268 * -8.1499322304497427467140369117259979248046875e+3 + tmp226 * tmp217 * tmp268 * 7.7733723892777788933017291128635406494140625e+3 + tmp233 * tmp224 * tmp268 * -2.143225547523337809252552688121795654296875e+3 + tmp240 * tmp231 * tmp268 * -8.7049279990650347826885990798473358154296875e+2 + tmp247 * tmp238 * tmp268 * 3.0833041233127761415744316764175891876220703125e+2 + tmp245 * tmp268 * -2.86594246589304226802141783991828560829162597656e+1 * z + tmp252 * tmp268 * 1.15628609452422050907216544146649539470672607422e+1 + tmp3 * tmp268 * 2.530432411832947536822757683694362640380859375e+1 + tmp16 * tmp457 * 1.3205680909865186549723148345947265625e+3 * y + tmp23 * tmp7 * tmp457 * -6.072741419595380648388527333736419677734375e+3 + tmp30 * tmp14 * tmp457 * 1.4301229031810655214940197765827178955078125e+4 + tmp37 * tmp21 * tmp457 * 1.2509849814464205337571911513805389404296875e+4 + tmp44 * tmp28 * tmp457 * 2.43755655239219777286052703857421875e+4 + tmp51 * tmp35 * tmp457 * 1.5025955822637255187146365642547607421875e+5 + tmp58 * tmp42 * tmp457 * -2.57449792538532870821654796600341796875e+5 + tmp65 * tmp49 * tmp457 * -6.18108468636372243054211139678955078125e+5 + tmp72 * tmp56 * tmp457 * -5.77129579276848933659493923187255859375e+5 + tmp79 * tmp63 * tmp457 * 8.2502991879217163659632205963134765625e+5 + tmp86 * tmp70 * tmp457 * -3.3274662617215062491595745086669921875e+6 + tmp93 * tmp77 * tmp457 * 6.39019438752098591066896915435791015625e+5 + tmp100 * tmp84 * tmp457 * -3.5095450540453977882862091064453125e+6 + tmp107 * tmp91 * tmp457 * -5.701980742367389611899852752685546875e+6 + tmp114 * tmp98 * tmp457 * 8.48527840505857206881046295166015625e+6 + tmp121 * tmp105 * tmp457 * 3.2467750119913811795413494110107421875e+6 + tmp128 * tmp112 * tmp457 * 2.1212157989888186566531658172607421875e+6 + tmp135 * tmp119 * tmp457 * 6.030377525842911563813686370849609375e+6 + tmp142 * tmp126 * tmp457 * -8.838882032796226441860198974609375e+6 + tmp149 * tmp133 * tmp457 * -2.08285087554152193479239940643310546875e+6 + tmp156 * tmp140 * tmp457 * 2.2503529974754941649734973907470703125e+6 + tmp163 * tmp147 * tmp457 * -6.995801159220845438539981842041015625e+6 + tmp170 * tmp154 * tmp457 * 6.716210355322583578526973724365234375e+6 + tmp177 * tmp161 * tmp457 * 1.19912664452435608836822211742401123046875e+5 + tmp184 * tmp168 * tmp457 * 1.17548020877087931148707866668701171875e+6 + tmp191 * tmp175 * tmp457 * -9.4537417097875251783989369869232177734375e+4 + tmp198 * tmp182 * tmp457 * 7.89964485756713547743856906890869140625e+5 + tmp205 * tmp189 * tmp457 * 1.52741514544914476573467254638671875e+5 + tmp212 * tmp196 * tmp457 * 2.791946326383915147744119167327880859375e+5 + tmp219 * tmp203 * tmp457 * -2.679505212665906219626776874065399169921875e+4 + tmp226 * tmp210 * tmp457 * -3.6525730859511895687319338321685791015625e+4 + tmp233 * tmp217 * tmp457 * 2.1418943770332829444669187068939208984375e+4 + tmp240 * tmp224 * tmp457 * 3.0843383887098834748030640184879302978515625e+3 + tmp247 * tmp231 * tmp457 * -9.9569611795820310362614691257476806640625e+2 + tmp238 * tmp457 * 2.564511516465935301312129013240337371826171875e+2 * z + tmp245 * tmp457 * 2.70656003026684537360324611654505133628845214844e+1 + tmp9 * tmp457 * -3.46369109036699356352073664311319589614868164062e+1 + tmp23 * tmp641 * -3.32806485927452058604103513062000274658203125e+3 * y + tmp30 * tmp7 * tmp641 * 1.234968261707164128893055021762847900390625e+4 + tmp37 * tmp14 * tmp641 * -2.8753344016540040684049017727375030517578125e+3 + tmp44 * tmp21 * tmp641 * 1.0114036156335461782873608171939849853515625e+4 + tmp51 * tmp28 * tmp641 * -1.49688347647457034327089786529541015625e+5 + tmp58 * tmp35 * tmp641 * -5.67623374289566534571349620819091796875e+5 + tmp65 * tmp42 * tmp641 * -4.42819365904183243401348590850830078125e+5 + tmp72 * tmp49 * tmp641 * -2.2845012135416171513497829437255859375e+6 + tmp79 * tmp56 * tmp641 * 1.59017671147860283963382244110107421875e+6 + tmp86 * tmp63 * tmp641 * 3.172225132318005780689418315887451171875e+5 + tmp93 * tmp70 * tmp641 * 6.949452887683830223977565765380859375e+6 + tmp100 * tmp77 * tmp641 * 2.005212832816918194293975830078125e+7 + tmp107 * tmp84 * tmp641 * -1.70683697845189571380615234375e+7 + tmp114 * tmp91 * tmp641 * 2.57033030682088024914264678955078125e+7 + tmp121 * tmp98 * tmp641 * -6.240241324918039143085479736328125e+6 + tmp128 * tmp105 * tmp641 * 7.28192108448500744998455047607421875e+6 + tmp135 * tmp112 * tmp641 * -6.668553789195828139781951904296875e+6 + tmp142 * tmp119 * tmp641 * 4.279670435295154340565204620361328125e+6 + tmp149 * tmp126 * tmp641 * 3.7841761433819212019443511962890625e+7 + tmp156 * tmp133 * tmp641 * -1.1735384500224292278289794921875e+7 + tmp163 * tmp140 * tmp641 * -1.02138700657811500132083892822265625e+7 + tmp170 * tmp147 * tmp641 * 4.0497243350173835642635822296142578125e+6 + tmp177 * tmp154 * tmp641 * -9.26496405551051162183284759521484375e+6 + tmp184 * tmp161 * tmp641 * 6.0643515210227929055690765380859375e+6 + tmp191 * tmp168 * tmp641 * 7.53476951245888951234519481658935546875e+5 + tmp198 * tmp175 * tmp641 * -4.46591788140458636917173862457275390625e+5 + tmp205 * tmp182 * tmp641 * -2.3236266487165386206470429897308349609375e+5 + tmp212 * tmp189 * tmp641 * 7.44194054349235841073095798492431640625e+5 + tmp219 * tmp196 * tmp641 * -1.075418140586107256240211427211761474609375e+4 + tmp226 * tmp203 * tmp641 * -6.514834348431314765548449940979480743408203125e+2 + tmp233 * tmp210 * tmp641 * 2.995416853417091260780580341815948486328125e+4 + tmp240 * tmp217 * tmp641 * 9.335514683708748862045467831194400787353515625e+2 + tmp247 * tmp224 * tmp641 * -3.9295324078555941014201380312442779541015625e+3 + tmp231 * tmp641 * 5.562684841171861762632033787667751312255859375e+2 * z + tmp238 * tmp641 * 4.0098599791658301683128229342401027679443359375e+1 + tmp16 * tmp641 * 1.722368240309973543844535015523433685302734375e+2 + tmp30 * tmp820 * 1.1062878068190211706678383052349090576171875e+3 * y + tmp37 * tmp7 * tmp820 * 3.251567569670028387918137013912200927734375e+4 + tmp44 * tmp14 * tmp820 * -5.1560599597941718457150273025035858154296875e+3 + tmp51 * tmp21 * tmp820 * 5.023688870652744662947952747344970703125e+4 + tmp58 * tmp28 * tmp820 * -3.133724041621834112447686493396759033203125e+4 + tmp65 * tmp35 * tmp820 * 6.0302757396407960914075374603271484375e+5 + tmp72 * tmp42 * tmp820 * -1.05701377140930178575217723846435546875e+6 + tmp79 * tmp49 * tmp820 * 1.4157320613848813809454441070556640625e+6 + tmp86 * tmp56 * tmp820 * -3.3873541540618874132633209228515625e+6 + tmp93 * tmp63 * tmp820 * 1.203755469354635290801525115966796875e+7 + tmp100 * tmp70 * tmp820 * -9.313967591197453439235687255859375e+6 + tmp107 * tmp77 * tmp820 * -2.5084943886144324205815792083740234375e+6 + tmp114 * tmp84 * tmp820 * 1.231539372972822375595569610595703125e+7 + tmp121 * tmp91 * tmp820 * 1.37443684359668679535388946533203125e+7 + tmp128 * tmp98 * tmp820 * -2.0392658207672379910945892333984375e+7 + tmp135 * tmp105 * tmp820 * 1.16408645810100026428699493408203125e+7 + tmp142 * tmp112 * tmp820 * 1.66728234309127293527126312255859375e+7 + tmp149 * tmp119 * tmp820 * -1.32349803357985951006412506103515625e+7 + tmp156 * tmp126 * tmp820 * 1.011935817535785399377346038818359375e+7 + tmp163 * tmp133 * tmp820 * -3.6625153269577123224735260009765625e+7 + tmp170 * tmp140 * tmp820 * -1.62270849433632194995880126953125e+6 + tmp177 * tmp147 * tmp820 * -1.41072644445291124284267425537109375e+7 + tmp184 * tmp154 * tmp820 * 1.15812868076924490742385387420654296875e+6 + tmp191 * tmp161 * tmp820 * -3.140711580294744111597537994384765625e+6 + tmp198 * tmp168 * tmp820 * -6.269764561109821312129497528076171875e+6 + tmp205 * tmp175 * tmp820 * -2.2402100653061470948159694671630859375e+6 + tmp212 * tmp182 * tmp820 * 2.16854364765677414834499359130859375e+6 + tmp219 * tmp189 * tmp820 * -6.62552598277222947217524051666259765625e+5 + tmp226 * tmp196 * tmp820 * 1.09954907817221595905721187591552734375e+5 + tmp233 * tmp203 * tmp820 * 5.11132941899898578412830829620361328125e+4 + tmp240 * tmp210 * tmp820 * -1.975986489228717982769012451171875e+4 + tmp247 * tmp217 * tmp820 * -1.47082028405166101947543211281299591064453125e+3 + tmp224 * tmp820 * 6.996020510731731292253243736922740936279296875e+2 * z + tmp231 * tmp820 * -6.08214109731458023588857031427323818206787109375e+1 + tmp23 * tmp820 * -1.593195222480677557541639544069766998291015625e+3 + tmp37 * tmp994 * -1.4824974043079730108729563653469085693359375e+4 * y + tmp44 * tmp7 * tmp994 * -2.122104544337373590678907930850982666015625e+4 + tmp51 * tmp14 * tmp994 * 2.12153040344621869735419750213623046875e+5 + tmp58 * tmp21 * tmp994 * 4.9715857901374087668955326080322265625e+5 + tmp65 * tmp28 * tmp994 * 7.24156301083912025205790996551513671875e+5 + tmp72 * tmp35 * tmp994 * 9.6109271284611173905432224273681640625e+5 + tmp79 * tmp42 * tmp994 * 3.1879956945974607951939105987548828125e+6 + tmp86 * tmp49 * tmp994 * 3.67518356809010542929172515869140625e+6 + tmp93 * tmp56 * tmp994 * 5.9706666052485667169094085693359375e+6 + tmp100 * tmp63 * tmp994 * -4.637464374532920308411121368408203125e+6 + tmp107 * tmp70 * tmp994 * 3.8143604915147304534912109375e+7 + tmp114 * tmp77 * tmp994 * -2.47901225821007005870342254638671875e+7 + tmp121 * tmp84 * tmp994 * 6.444626674287511408329010009765625e+7 + tmp128 * tmp91 * tmp994 * 4.502534575005705654621124267578125e+7 + tmp135 * tmp98 * tmp994 * 6.757853017015595734119415283203125e+7 + tmp142 * tmp105 * tmp994 * -1.480053742969308234751224517822265625e+7 + tmp149 * tmp112 * tmp994 * 5.290692565359492599964141845703125e+7 + tmp156 * tmp119 * tmp994 * 4.3287289755464904010295867919921875e+7 + tmp163 * tmp126 * tmp994 * 8.70799907827146053314208984375e+7 + tmp170 * tmp133 * tmp994 * -1.9175662664391241967678070068359375e+7 + tmp177 * tmp140 * tmp994 * -2.86826938508348129689693450927734375e+7 + tmp184 * tmp147 * tmp994 * -2.1875272203575193881988525390625e+7 + tmp191 * tmp154 * tmp994 * -8.71044591558253206312656402587890625e+6 + tmp198 * tmp161 * tmp994 * -8.256123433752777986228466033935546875e+6 + tmp205 * tmp168 * tmp994 * -6.41929563034610691829584538936614990234375e+4 + tmp212 * tmp175 * tmp994 * -1.089041725969471037387847900390625e+6 + tmp219 * tmp182 * tmp994 * 1.3808361243931539356708526611328125e+6 + tmp226 * tmp189 * tmp994 * 3.4075661280863615684211254119873046875e+5 + tmp233 * tmp196 * tmp994 * 1.326406940819893134175799787044525146484375e+4 + tmp240 * tmp203 * tmp994 * 3.2914888870737791876308619976043701171875e+4 + tmp247 * tmp210 * tmp994 * -1.75309104671274035354144871234893798828125e+4 + tmp217 * tmp994 * 1.739025196774369760532863438129425048828125e+3 * z + tmp224 * tmp994 * 8.98714279806095959202139056287705898284912109375e+1 + tmp30 * tmp994 * -3.3488540726757861420992412604391574859619140625e+2 + tmp44 * tmp1163 * 2.568602901142760310904122889041900634765625e+4 * y + tmp51 * tmp7 * tmp1163 * -6.39910687152194077498279511928558349609375e+4 + tmp58 * tmp14 * tmp1163 * 9.0083097099733888171613216400146484375e+4 + tmp65 * tmp21 * tmp1163 * -9.484217037814422510564327239990234375e+5 + tmp72 * tmp28 * tmp1163 * -3.600075980834834277629852294921875e+6 + tmp79 * tmp35 * tmp1163 * 1.3657137534186341799795627593994140625e+6 + tmp86 * tmp42 * tmp1163 * -5.672421984374326653778553009033203125e+6 + tmp93 * tmp49 * tmp1163 * 8.083055848013154231011867523193359375e+6 + tmp100 * tmp56 * tmp1163 * -2.12083704075715839862823486328125e+7 + tmp107 * tmp63 * tmp1163 * 3.64002259584418833255767822265625e+7 + tmp114 * tmp70 * tmp1163 * 2.584474493634389340877532958984375e+7 + tmp121 * tmp77 * tmp1163 * 7.82170241000406742095947265625e+7 + tmp128 * tmp84 * tmp1163 * 1.2600267222192929685115814208984375e+8 + tmp135 * tmp91 * tmp1163 * 6.67959145687679946422576904296875e+7 + tmp142 * tmp98 * tmp1163 * -8.61548507725602947175502777099609375e+6 + tmp149 * tmp105 * tmp1163 * 4.307888321512959897518157958984375e+7 + tmp156 * tmp112 * tmp1163 * 9.947022469961284101009368896484375e+7 + tmp163 * tmp119 * tmp1163 * -5.27523920739738941192626953125e+7 + tmp170 * tmp126 * tmp1163 * 5.9976018798557378351688385009765625e+7 + tmp177 * tmp133 * tmp1163 * 1.208818361157749406993389129638671875e+7 + tmp184 * tmp140 * tmp1163 * -3.9161991798560507595539093017578125e+6 + tmp191 * tmp147 * tmp1163 * -1.33758822220621886663138866424560546875e+6 + tmp198 * tmp154 * tmp1163 * 3.9807503596418728120625019073486328125e+6 + tmp205 * tmp161 * tmp1163 * 2.6472385677292346954345703125e+6 + tmp212 * tmp168 * tmp1163 * -3.7567882296312092803418636322021484375e+6 + tmp219 * tmp175 * tmp1163 * 8.8227231825006823055446147918701171875e+5 + tmp226 * tmp182 * tmp1163 * 6.4713598971529048867523670196533203125e+5 + tmp233 * tmp189 * tmp1163 * 7.9470694912795021082274615764617919921875e+4 + tmp240 * tmp196 * tmp1163 * -7.8606846475083220866508781909942626953125e+4 + tmp247 * tmp203 * tmp1163 * 2.4377184989274406689219176769256591796875e+4 + tmp210 * tmp1163 * -4.840576578522581257857382297515869140625e+3 * z + tmp217 * tmp1163 * -6.651144870158286721562035381793975830078125e+2 + tmp37 * tmp1163 * -2.16338941384806958012632094323635101318359375e+3 + tmp51 * tmp1327 * 3.4580778344412290607579052448272705078125e+4 * y + tmp58 * tmp7 * tmp1327 * -6.67006272528452682308852672576904296875e+4 + tmp65 * tmp14 * tmp1327 * -2.813216282899986836127936840057373046875e+5 + tmp72 * tmp21 * tmp1327 * 1.600366882563983090221881866455078125e+6 + tmp79 * tmp28 * tmp1327 * -4.651572890973095782101154327392578125e+6 + tmp86 * tmp35 * tmp1327 * -4.635367576471083797514438629150390625e+6 + tmp93 * tmp42 * tmp1327 * 1.853141126211662590503692626953125e+7 + tmp100 * tmp49 * tmp1327 * -8.26175553112882305867969989776611328125e+5 + tmp107 * tmp56 * tmp1327 * -5.6065551580552704632282257080078125e+7 + tmp114 * tmp63 * tmp1327 * 4.477918015893580019474029541015625e+7 + tmp121 * tmp70 * tmp1327 * 3.033503298015733063220977783203125e+7 + tmp128 * tmp77 * tmp1327 * -4.54599329982551634311676025390625e+7 + tmp135 * tmp84 * tmp1327 * 1.7184725321057498455047607421875e+8 + tmp142 * tmp91 * tmp1327 * -4.1228013490049801766872406005859375e+7 + tmp149 * tmp98 * tmp1327 * -1.48599943568904860876500606536865234375e+6 + tmp156 * tmp105 * tmp1327 * -1.640394065018566548824310302734375e+8 + tmp163 * tmp112 * tmp1327 * -1.50231541055843651294708251953125e+8 + tmp170 * tmp119 * tmp1327 * 5.51311634158718883991241455078125e+7 + tmp177 * tmp126 * tmp1327 * -4.60967007123934924602508544921875e+7 + tmp184 * tmp133 * tmp1327 * 1.720812180032856762409210205078125e+6 + tmp191 * tmp140 * tmp1327 * 2.3425050308803081512451171875e+7 + tmp198 * tmp147 * tmp1327 * -2.7573650323011361062526702880859375e+7 + tmp205 * tmp154 * tmp1327 * 1.93565412163910232484340667724609375e+7 + tmp212 * tmp161 * tmp1327 * -4.656490794350852258503437042236328125e+6 + tmp219 * tmp168 * tmp1327 * -1.13067003847309318371117115020751953125e+6 + tmp226 * tmp175 * tmp1327 * -1.7996660977921369485557079315185546875e+6 + tmp233 * tmp182 * tmp1327 * 7.72823332619267632253468036651611328125e+5 + tmp240 * tmp189 * tmp1327 * -1.18684576869108741448144428431987762451171875e+3 + tmp247 * tmp196 * tmp1327 * 4.209293110268269083462655544281005859375e+4 + tmp203 * tmp1327 * 2.53757331467094409163109958171844482421875e+4 * z + tmp210 * tmp1327 * -1.27419822955720928803202696144580841064453125e+3 + tmp44 * tmp1327 * 1.416090525239873750251717865467071533203125e+4 + tmp58 * tmp1486 * 6.2245545194429301773197948932647705078125e+4 * y + tmp65 * tmp7 * tmp1486 * -2.2082292848679612507112324237823486328125e+5 + tmp72 * tmp14 * tmp1486 * 3.124014111098635839880444109439849853515625e+4 + tmp79 * tmp21 * tmp1486 * -6.25758416791016049683094024658203125e+6 + tmp86 * tmp28 * tmp1486 * 1.7897089838210845482535660266876220703125e+5 + tmp93 * tmp35 * tmp1486 * 9.44042397158017195761203765869140625e+6 + tmp100 * tmp42 * tmp1486 * 1.508565758663363754749298095703125e+7 + tmp107 * tmp49 * tmp1486 * -5.3818888675383813679218292236328125e+7 + tmp114 * tmp56 * tmp1486 * -2.06950398579259105026721954345703125e+7 + tmp121 * tmp63 * tmp1486 * -1.5811462105617272853851318359375e+8 + tmp128 * tmp70 * tmp1486 * -2.537457552539723813533782958984375e+8 + tmp135 * tmp77 * tmp1486 * -1.3200169321094192564487457275390625e+7 + tmp142 * tmp84 * tmp1486 * -1.331086251372104585170745849609375e+8 + tmp149 * tmp91 * tmp1486 * -2.7587135367326819896697998046875e+8 + tmp156 * tmp98 * tmp1486 * -5.9600140417412407696247100830078125e+7 + tmp163 * tmp105 * tmp1486 * -4.993067045644967257976531982421875e+7 + tmp170 * tmp112 * tmp1486 * -4.263504846351540088653564453125e+7 + tmp177 * tmp119 * tmp1486 * 1.62158858565548837184906005859375e+8 + tmp184 * tmp126 * tmp1486 * 5.54861279418977908790111541748046875e+6 + tmp191 * tmp133 * tmp1486 * -9.22789402370282113552093505859375e+7 + tmp198 * tmp140 * tmp1486 * -4.997240008979074656963348388671875e+7 + tmp205 * tmp147 * tmp1486 * 1.80041542236631549894809722900390625e+7 + tmp212 * tmp154 * tmp1486 * 1.67892701110942661762237548828125e+7 + tmp219 * tmp161 * tmp1486 * 5.817362431911484338343143463134765625e+6 + tmp226 * tmp168 * tmp1486 * 1.24827960727365291677415370941162109375e+6 + tmp233 * tmp175 * tmp1486 * 4.118617833759034983813762664794921875e+5 + tmp240 * tmp182 * tmp1486 * -1.7956024484510053298436105251312255859375e+5 + tmp247 * tmp189 * tmp1486 * 2.190258133842541719786822795867919921875e+5 + tmp196 * tmp1486 * 3.4973117752789097721688449382781982421875e+4 * z + tmp203 * tmp1486 * -4.9817151344862040787120349705219268798828125e+3 + tmp51 * tmp1486 * -2.636649145558089003316126763820648193359375e+4 + tmp65 * tmp1640 * 5.57694030614999282988719642162322998046875e+4 * y + tmp72 * tmp7 * tmp1640 * 7.506203535231878049671649932861328125e+5 + tmp79 * tmp14 * tmp1640 * -6.65316567910718731582164764404296875e+5 + tmp86 * tmp21 * tmp1640 * -5.73123900796337611973285675048828125e+6 + tmp93 * tmp28 * tmp1640 * -1.60242270754070021212100982666015625e+7 + tmp100 * tmp35 * tmp1640 * -1.23835784345802031457424163818359375e+7 + tmp107 * tmp42 * tmp1640 * -3.103722012892372906208038330078125e+7 + tmp114 * tmp49 * tmp1640 * -8.755555659181118011474609375e+6 + tmp121 * tmp56 * tmp1640 * -5.2931458524988718330860137939453125e+7 + tmp128 * tmp63 * tmp1640 * 9.716058526160360872745513916015625e+7 + tmp135 * tmp70 * tmp1640 * 2.077633122112773954868316650390625e+8 + tmp142 * tmp77 * tmp1640 * 9.805324207639189064502716064453125e+7 + tmp149 * tmp84 * tmp1640 * -8.176873865114526450634002685546875e+7 + tmp156 * tmp91 * tmp1640 * 2.043620769532851874828338623046875e+8 + tmp163 * tmp98 * tmp1640 * -3.25460571152403056621551513671875e+8 + tmp170 * tmp105 * tmp1640 * 1.98919900744407832622528076171875e+8 + tmp177 * tmp112 * tmp1640 * 1.20987061431789398193359375e+7 + tmp184 * tmp119 * tmp1640 * 1.945936166237996518611907958984375e+7 + tmp191 * tmp126 * tmp1640 * -9.1809528345345020294189453125e+7 + tmp198 * tmp133 * tmp1640 * 8.7596486608074724674224853515625e+7 + tmp205 * tmp140 * tmp1640 * 5.667884081580150127410888671875e+7 + tmp212 * tmp147 * tmp1640 * 5.332579924994421191513538360595703125e+6 + tmp219 * tmp154 * tmp1640 * 1.375029893630347959697246551513671875e+7 + tmp226 * tmp161 * tmp1640 * 6.672319392773433588445186614990234375e+6 + tmp233 * tmp168 * tmp1640 * 2.479340224603985436260700225830078125e+6 + tmp240 * tmp175 * tmp1640 * 2.611227488589480708469636738300323486328125e+4 + tmp247 * tmp182 * tmp1640 * 1.22295237010598604683764278888702392578125e+5 + tmp189 * tmp1640 * 1.0393079596052661145222373306751251220703125e+4 * z + tmp196 * tmp1640 * -2.81537782232619383648852817714214324951171875e+3 + tmp58 * tmp1640 * 2.121729458680083553190343081951141357421875e+4 + tmp72 * tmp1789 * -1.5582822445429078652523458003997802734375e+5 * y + tmp79 * tmp7 * tmp1789 * -8.5014462077487216447480022907257080078125e+4 + tmp86 * tmp14 * tmp1789 * -2.564084819367307238280773162841796875e+6 + tmp93 * tmp21 * tmp1789 * -4.8959490236534662544727325439453125e+6 + tmp100 * tmp28 * tmp1789 * 9.09271001997027732431888580322265625e+6 + tmp107 * tmp35 * tmp1789 * -1.184030747874318063259124755859375e+7 + tmp114 * tmp42 * tmp1789 * 7.313609503410560078918933868408203125e+6 + tmp121 * tmp49 * tmp1789 * 4.3130171236253045499324798583984375e+7 + tmp128 * tmp56 * tmp1789 * 1.61642042481128990650177001953125e+8 + tmp135 * tmp63 * tmp1789 * 5.4773132097419001162052154541015625e+7 + tmp142 * tmp70 * tmp1789 * -1.1192463713859331607818603515625e+8 + tmp149 * tmp77 * tmp1789 * 1.835571215001440346240997314453125e+8 + tmp156 * tmp84 * tmp1789 * 1.8349478056333744525909423828125e+8 + tmp163 * tmp91 * tmp1789 * 3.1267025802735745906829833984375e+8 + tmp170 * tmp98 * tmp1789 * -1.8558052356652104854583740234375e+8 + tmp177 * tmp105 * tmp1789 * 2.1026138130288355052471160888671875e+7 + tmp184 * tmp112 * tmp1789 * -6.034970737308229506015777587890625e+7 + tmp191 * tmp119 * tmp1789 * 1.4345592741761243087239563465118408203125e+5 + tmp198 * tmp126 * tmp1789 * -5.2055970324661791324615478515625e+7 + tmp205 * tmp133 * tmp1789 * 3.4640618597813777625560760498046875e+7 + tmp212 * tmp140 * tmp1789 * 2.54905641133936941623687744140625e+7 + tmp219 * tmp147 * tmp1789 * 1.009492445635469257831573486328125e+7 + tmp226 * tmp154 * tmp1789 * 1.29918910747458450496196746826171875e+7 + tmp233 * tmp161 * tmp1789 * 1.4599689552738177590072154998779296875e+6 + tmp240 * tmp168 * tmp1789 * 3.803518739046299015171825885772705078125e+5 + tmp247 * tmp175 * tmp1789 * -4.9696515768544240927440114319324493408203125e+3 + tmp182 * tmp1789 * 1.879712141416647864389233291149139404296875e+4 * z + tmp189 * tmp1789 * 6.082043117109244121820665895938873291015625e+3 + tmp65 * tmp1789 * 8.2712639573318578186444938182830810546875e+4 + tmp79 * tmp1933 * 2.666949476320579997263848781585693359375e+5 * y + tmp86 * tmp7 * tmp1933 * 1.4226227042609662748873233795166015625e+6 + tmp93 * tmp14 * tmp1933 * 3.2924379865826624445617198944091796875e+6 + tmp100 * tmp21 * tmp1933 * -2.248118221933196298778057098388671875e+6 + tmp107 * tmp28 * tmp1933 * 3.455771416940818727016448974609375e+7 + tmp114 * tmp35 * tmp1933 * 8.233402167024926282465457916259765625e+6 + tmp121 * tmp42 * tmp1933 * -3.461033149931831657886505126953125e+7 + tmp128 * tmp49 * tmp1933 * 5.4086363262705214321613311767578125e+7 + tmp135 * tmp56 * tmp1933 * -2.547066871103002130985260009765625e+7 + tmp142 * tmp63 * tmp1933 * -1.683588524535671770572662353515625e+8 + tmp149 * tmp70 * tmp1933 * -1.95119007163369238376617431640625e+8 + tmp156 * tmp77 * tmp1933 * 3.8535509928844535350799560546875e+8 + tmp163 * tmp84 * tmp1933 * 2.56480834877651222050189971923828125e+7 + tmp170 * tmp91 * tmp1933 * 3.1291794804183743894100189208984375e+7 + tmp177 * tmp98 * tmp1933 * -9.80228578551062643527984619140625e+7 + tmp184 * tmp105 * tmp1933 * 2.123474623355538845062255859375e+8 + tmp191 * tmp112 * tmp1933 * -1.528956090408284403383731842041015625e+7 + tmp198 * tmp119 * tmp1933 * 8.5850724279107272624969482421875e+7 + tmp205 * tmp126 * tmp1933 * -2.84909180857491232454776763916015625e+7 + tmp212 * tmp133 * tmp1933 * -4.691959027145697176456451416015625e+7 + tmp219 * tmp140 * tmp1933 * 6.0141831040819920599460601806640625e+6 + tmp226 * tmp147 * tmp1933 * -5.2106887149818730540573596954345703125e+5 + tmp233 * tmp154 * tmp1933 * 2.7952842278834707103669643402099609375e+6 + tmp240 * tmp161 * tmp1933 * -2.88638275911270291544497013092041015625e+5 + tmp247 * tmp168 * tmp1933 * 4.457745123987680417485535144805908203125e+5 + tmp175 * tmp1933 * 5.9065315189619708689860999584197998046875e+4 * z + tmp182 * tmp1933 * -1.2760543873188800716889090836048126220703125e+4 + tmp72 * tmp1933 * -5.9350806509691683459095656871795654296875e+4 + tmp86 * tmp2072 * -4.37076535879798117093741893768310546875e+5 * y + tmp93 * tmp7 * tmp2072 * 6.21225273702624253928661346435546875e+5 + tmp100 * tmp14 * tmp2072 * 1.34474278229182795621454715728759765625e+6 + tmp107 * tmp21 * tmp2072 * 8.64706747676239907741546630859375e+6 + tmp114 * tmp28 * tmp2072 * 7.12180358605618588626384735107421875e+6 + tmp121 * tmp35 * tmp2072 * 3.827497848354625701904296875e+7 + tmp128 * tmp42 * tmp2072 * -1.742001978288175165653228759765625e+8 + tmp135 * tmp49 * tmp2072 * -4.54740322609897553920745849609375e+7 + tmp142 * tmp56 * tmp2072 * 1.970099033694564402103424072265625e+8 + tmp149 * tmp63 * tmp2072 * 1.63699950393453948199748992919921875e+7 + tmp156 * tmp70 * tmp2072 * 3.551953377226607799530029296875e+8 + tmp163 * tmp77 * tmp2072 * 5.8535899101732797920703887939453125e+7 + tmp170 * tmp84 * tmp2072 * -3.691813720367423258721828460693359375e+6 + tmp177 * tmp91 * tmp2072 * 3.475685091003601253032684326171875e+7 + tmp184 * tmp98 * tmp2072 * -6.9688461877361774444580078125e+7 + tmp191 * tmp105 * tmp2072 * 3.31334713522325456142425537109375e+7 + tmp198 * tmp112 * tmp2072 * 1.516847511844202578067779541015625e+8 + tmp205 * tmp119 * tmp2072 * 5.761894228159494698047637939453125e+6 + tmp212 * tmp126 * tmp2072 * 1.45572282166549004614353179931640625e+6 + tmp219 * tmp133 * tmp2072 * 9.61802703536680154502391815185546875e+6 + tmp226 * tmp140 * tmp2072 * -2.3219634096620096825063228607177734375e+6 + tmp233 * tmp147 * tmp2072 * 1.73072555791297950781881809234619140625e+6 + tmp240 * tmp154 * tmp2072 * -1.18718132343210768885910511016845703125e+6 + tmp247 * tmp161 * tmp2072 * 7.8965165672725089825689792633056640625e+5 + tmp168 * tmp2072 * -3.34464620756417396478354930877685546875e+5 * z + tmp175 * tmp2072 * -7.010083325011617489508353173732757568359375e+3 + tmp79 * tmp2072 * 1.4982720565705254557542502880096435546875e+5 + tmp93 * tmp2206 * 1.09561273699354263953864574432373046875e+6 * y + tmp100 * tmp7 * tmp2206 * 5.653165521158934570848941802978515625e+5 + tmp107 * tmp14 * tmp2206 * -5.08627886824323050677776336669921875e+6 + tmp114 * tmp21 * tmp2206 * 4.619561523487622849643230438232421875e+6 + tmp121 * tmp28 * tmp2206 * -3.2498529335082940757274627685546875e+7 + tmp128 * tmp35 * tmp2206 * 3.8627622660031211562454700469970703125e+6 + tmp135 * tmp42 * tmp2206 * -9.94951568371478617191314697265625e+7 + tmp142 * tmp49 * tmp2206 * -6.498758897924329340457916259765625e+7 + tmp149 * tmp56 * tmp2206 * -5.8249932622556559741497039794921875e+7 + tmp156 * tmp63 * tmp2206 * -2.4230648372675888240337371826171875e+6 + tmp163 * tmp70 * tmp2206 * 1.03731606469178497791290283203125e+8 + tmp170 * tmp77 * tmp2206 * -2.0427615738942849636077880859375e+8 + tmp177 * tmp84 * tmp2206 * -1.50748359051868617534637451171875e+8 + tmp184 * tmp91 * tmp2206 * 9.92026080833674967288970947265625e+7 + tmp191 * tmp98 * tmp2206 * 1.30632097315468527376651763916015625e+7 + tmp198 * tmp105 * tmp2206 * 3.01572003520688079297542572021484375e+7 + tmp205 * tmp112 * tmp2206 * -5.5203932730417378246784210205078125e+7 + tmp212 * tmp119 * tmp2206 * -2.6667973678420163691043853759765625e+7 + tmp219 * tmp126 * tmp2206 * -3.6285128474793575704097747802734375e+7 + tmp226 * tmp133 * tmp2206 * -8.4175394045858271420001983642578125e+6 + tmp233 * tmp140 * tmp2206 * 1.42712876962076015770435333251953125e+7 + tmp240 * tmp147 * tmp2206 * 2.395607660239310280303470790386199951171875e+4 + tmp247 * tmp154 * tmp2206 * 2.898329678472016821615397930145263671875e+5 + tmp161 * tmp2206 * -1.16374480384422335191629827022552490234375e+5 * z + tmp168 * tmp2206 * -3.93206288950740781729109585285186767578125e+4 + tmp86 * tmp2206 * 1.888966023686795379035174846649169921875e+5 + tmp100 * tmp2335 * -9.80892267126337974332273006439208984375e+4 * y + tmp107 * tmp7 * tmp2335 * -4.588705895706429728306829929351806640625e+5 + tmp114 * tmp14 * tmp2335 * -1.30630061991654336452484130859375e+7 + tmp121 * tmp21 * tmp2335 * 4.0679880029777748859487473964691162109375e+4 + tmp128 * tmp28 * tmp2335 * -3.63935602368953227996826171875e+7 + tmp135 * tmp35 * tmp2335 * 5.4386689414552085101604461669921875e+7 + tmp142 * tmp42 * tmp2335 * -2.81242624278098903596401214599609375e+7 + tmp149 * tmp49 * tmp2335 * -2.38920425068769566714763641357421875e+7 + tmp156 * tmp56 * tmp2335 * 3.7146625610067002475261688232421875e+7 + tmp163 * tmp63 * tmp2335 * 5.88534597320024013519287109375e+8 + tmp170 * tmp70 * tmp2335 * -1.260135513327358663082122802734375e+8 + tmp177 * tmp77 * tmp2335 * 2.87700037787031948566436767578125e+8 + tmp184 * tmp84 * tmp2335 * 3.014437467919366061687469482421875e+7 + tmp191 * tmp91 * tmp2335 * 2.7518071017046439647674560546875e+8 + tmp198 * tmp98 * tmp2335 * -8.687773782170189917087554931640625e+7 + tmp205 * tmp105 * tmp2335 * -2.4757954760902742855250835418701171875e+6 + tmp212 * tmp112 * tmp2335 * 2.2724469779710628092288970947265625e+7 + tmp219 * tmp119 * tmp2335 * -8.4175323631999827921390533447265625e+6 + tmp226 * tmp126 * tmp2335 * -2.02737718505742065608501434326171875e+7 + tmp233 * tmp133 * tmp2335 * -2.585732109079533256590366363525390625e+5 + tmp240 * tmp140 * tmp2335 * 6.2184272504521645605564117431640625e+6 + tmp247 * tmp147 * tmp2335 * -9.8543751923963683657348155975341796875e+5 + tmp154 * tmp2335 * 6.721201784509257413446903228759765625e+4 * z + tmp161 * tmp2335 * -4.47173183769296811078675091266632080078125e+4 + tmp93 * tmp2335 * 5.84211610529696699813939630985260009765625e+4 + tmp107 * tmp2459 * 8.45730708690018393099308013916015625e+5 * y + tmp114 * tmp7 * tmp2459 * -1.035580324632983305491507053375244140625e+6 + tmp121 * tmp14 * tmp2459 * -3.2716145148810571990907192230224609375e+6 + tmp128 * tmp21 * tmp2459 * 1.338619878564165718853473663330078125e+7 + tmp135 * tmp28 * tmp2459 * -4.01688388825332224369049072265625e+7 + tmp142 * tmp35 * tmp2459 * 6.1332127101156137883663177490234375e+7 + tmp149 * tmp42 * tmp2459 * 2.27466766361683197319507598876953125e+7 + tmp156 * tmp49 * tmp2459 * 2.79567729638895690441131591796875e+7 + tmp163 * tmp56 * tmp2459 * -1.44725133021199524402618408203125e+8 + tmp170 * tmp63 * tmp2459 * 2.95497035975821278989315032958984375e+7 + tmp177 * tmp70 * tmp2459 * -1.553165074406856000423431396484375e+8 + tmp184 * tmp77 * tmp2459 * 1.156018422933063507080078125e+8 + tmp191 * tmp84 * tmp2459 * -2.5698091917498958110809326171875e+8 + tmp198 * tmp91 * tmp2459 * -8.3217407145425856113433837890625e+7 + tmp205 * tmp98 * tmp2459 * 9.815282649565088748931884765625e+7 + tmp212 * tmp105 * tmp2459 * -2.88599067245211564004421234130859375e+7 + tmp219 * tmp112 * tmp2459 * -1.7150838800360523164272308349609375e+7 + tmp226 * tmp119 * tmp2459 * -4.7408591025969088077545166015625e+6 + tmp233 * tmp126 * tmp2459 * -1.91743101012959368526935577392578125e+7 + tmp240 * tmp133 * tmp2459 * 2.75301951399843394756317138671875e+6 + tmp247 * tmp140 * tmp2459 * 5.57324648742317338474094867706298828125e+5 + tmp147 * tmp2459 * -4.88813523280134540982544422149658203125e+5 * z + tmp154 * tmp2459 * 5.60432359195865647052414715290069580078125e+4 + tmp100 * tmp2459 * -6.420611867320121382363140583038330078125e+4 + tmp114 * tmp2578 * -1.16443655400782614015042781829833984375e+6 * y + tmp121 * tmp7 * tmp2578 * 9.75869323413391411304473876953125e+6 + tmp128 * tmp14 * tmp2578 * 7.8029716233189292252063751220703125e+6 + tmp135 * tmp21 * tmp2578 * -9.539327565418183803558349609375e+6 + tmp142 * tmp28 * tmp2578 * 4.500671201433275826275348663330078125e+6 + tmp149 * tmp35 * tmp2578 * -7.41270583640496730804443359375e+7 + tmp156 * tmp42 * tmp2578 * 3.20603720115968100726604461669921875e+7 + tmp163 * tmp49 * tmp2578 * 2.0797874982817940413951873779296875e+7 + tmp170 * tmp56 * tmp2578 * 1.920662750504035055637359619140625e+8 + tmp177 * tmp63 * tmp2578 * 2.60824116936004579067230224609375e+8 + tmp184 * tmp70 * tmp2578 * -2.07162406374793946743011474609375e+8 + tmp191 * tmp77 * tmp2578 * 1.7028196463528764247894287109375e+8 + tmp198 * tmp84 * tmp2578 * 4.8950080180530630052089691162109375e+7 + tmp205 * tmp91 * tmp2578 * -1.52098231474184580147266387939453125e+7 + tmp212 * tmp98 * tmp2578 * 5.6548313956749431788921356201171875e+7 + tmp219 * tmp105 * tmp2578 * 5.919939421895049512386322021484375e+7 + tmp226 * tmp112 * tmp2578 * 4.2763350111199297010898590087890625e+7 + tmp233 * tmp119 * tmp2578 * -2.46197850238000159151852130889892578125e+5 + tmp240 * tmp126 * tmp2578 * -5.546544308642183430492877960205078125e+6 + tmp247 * tmp133 * tmp2578 * 7.42925450487637077458202838897705078125e+5 + tmp140 * tmp2578 * -3.31937665064190514385700225830078125e+5 * z + tmp147 * tmp2578 * 2.766069662381142916274257004261016845703125e+4 + tmp107 * tmp2578 * 3.2983813843254186213016510009765625e+5 + tmp121 * tmp2692 * 3.0595618833062280900776386260986328125e+6 * y + tmp128 * tmp7 * tmp2692 * -2.144393227491213940083980560302734375e+6 + tmp135 * tmp14 * tmp2692 * -1.486583230402656085789203643798828125e+7 + tmp142 * tmp21 * tmp2692 * 1.5772441564001192455179989337921142578125e+5 + tmp149 * tmp28 * tmp2692 * -2.41103545138470493257045745849609375e+7 + tmp156 * tmp35 * tmp2692 * 5.899479885978206060826778411865234375e+6 + tmp163 * tmp42 * tmp2692 * -2.6430979763294376432895660400390625e+7 + tmp170 * tmp49 * tmp2692 * 5.133620760493158013559877872467041015625e+5 + tmp177 * tmp56 * tmp2692 * -9.37540943622738420963287353515625e+7 + tmp184 * tmp63 * tmp2692 * 1.85119906107368655502796173095703125e+7 + tmp191 * tmp70 * tmp2692 * -1.7615100093860842287540435791015625e+7 + tmp198 * tmp77 * tmp2692 * 3.6253904847083978354930877685546875e+7 + tmp205 * tmp84 * tmp2692 * 1.295719884057255089282989501953125e+8 + tmp212 * tmp91 * tmp2692 * -2.00803835253717899322509765625e+7 + tmp219 * tmp98 * tmp2692 * 5.3526329015579812228679656982421875e+7 + tmp226 * tmp105 * tmp2692 * -5.095322002082568593323230743408203125e+6 + tmp233 * tmp112 * tmp2692 * -5.359179434494399465620517730712890625e+6 + tmp240 * tmp119 * tmp2692 * 2.7676412322572679258882999420166015625e+6 + tmp247 * tmp126 * tmp2692 * 2.8785371270983857102692127227783203125e+6 + tmp133 * tmp2692 * -4.966306447809512610547244548797607421875e+5 * z + tmp140 * tmp2692 * -4.13705430211361599504016339778900146484375e+4 + tmp114 * tmp2692 * 1.653312923425719491206109523773193359375e+5 + tmp128 * tmp2801 * 7.96481968665814376436173915863037109375e+5 * y + tmp135 * tmp7 * tmp2801 * -1.28184011431963765062391757965087890625e+6 + tmp142 * tmp14 * tmp2801 * 3.739547609614540706388652324676513671875e+5 + tmp149 * tmp21 * tmp2801 * 2.212634053146792948246002197265625e+7 + tmp156 * tmp28 * tmp2801 * -5.60538157229077885858714580535888671875e+5 + tmp163 * tmp35 * tmp2801 * -2.5704425960932739078998565673828125e+7 + tmp170 * tmp42 * tmp2801 * -4.796751842378087341785430908203125e+7 + tmp177 * tmp49 * tmp2801 * 1.1452965290544028580188751220703125e+8 + tmp184 * tmp56 * tmp2801 * -8.557430825550214946269989013671875e+7 + tmp191 * tmp63 * tmp2801 * 7.166651077396799623966217041015625e+7 + tmp198 * tmp70 * tmp2801 * -3.168597039628877304494380950927734375e+6 + tmp205 * tmp77 * tmp2801 * 3.37717321003581769764423370361328125e+6 + tmp212 * tmp84 * tmp2801 * -2.91004992474741674959659576416015625e+7 + tmp219 * tmp91 * tmp2801 * -1.638337553455092199146747589111328125e+7 + tmp226 * tmp98 * tmp2801 * 1.24621004734718166291713714599609375e+7 + tmp233 * tmp105 * tmp2801 * 6.788069589686895720660686492919921875e+6 + tmp240 * tmp112 * tmp2801 * -6.34222029203966818749904632568359375e+6 + tmp247 * tmp119 * tmp2801 * 1.49785184677109098993241786956787109375e+6 + tmp126 * tmp2801 * -9.1405823712457739748060703277587890625e+5 * z + tmp133 * tmp2801 * 5.14689472374872420914471149444580078125e+4 + tmp121 * tmp2801 * 1.426149394206288852728903293609619140625e+5 + tmp135 * tmp2905 * 3.03609536294961930252611637115478515625e+5 * y + tmp142 * tmp7 * tmp2905 * -1.4714275790982604958117008209228515625e+6 + tmp149 * tmp14 * tmp2905 * -5.566759981338084675371646881103515625e+6 + tmp156 * tmp21 * tmp2905 * 2.3176940786889843642711639404296875e+7 + tmp163 * tmp28 * tmp2905 * 3.6069973592460402287542819976806640625e+6 + tmp170 * tmp35 * tmp2905 * 5.39247569399430532939732074737548828125e+5 + tmp177 * tmp42 * tmp2905 * -9.7824792022197246551513671875e+6 + tmp184 * tmp49 * tmp2905 * -1.945933786054898798465728759765625e+7 + tmp191 * tmp56 * tmp2905 * -1.077700385539690963923931121826171875e+7 + tmp198 * tmp63 * tmp2905 * 2.9433004530883781611919403076171875e+7 + tmp205 * tmp70 * tmp2905 * -6.730298381491740047931671142578125e+7 + tmp212 * tmp77 * tmp2905 * -3.928191222619201242923736572265625e+7 + tmp219 * tmp84 * tmp2905 * 2.6849644694305844604969024658203125e+7 + tmp226 * tmp91 * tmp2905 * 2.704867155703890323638916015625e+7 + tmp233 * tmp98 * tmp2905 * -6.78763992792464233934879302978515625e+6 + tmp240 * tmp105 * tmp2905 * 3.5959446982630081474781036376953125e+6 + tmp247 * tmp112 * tmp2905 * -4.497979745594228734262287616729736328125e+4 + tmp119 * tmp2905 * 1.21103635326783594791777431964874267578125e+5 * z + tmp126 * tmp2905 * 3.20004822359698982836562208831310272216796875e+3 + tmp128 * tmp2905 * 1.809142044416526914574205875396728515625e+5 + tmp142 * tmp3004 * 7.62679539351840387098491191864013671875e+5 * y + tmp149 * tmp7 * tmp3004 * -4.866038028806217014789581298828125e+6 + tmp156 * tmp14 * tmp3004 * -1.63776057305473624728620052337646484375e+6 + tmp163 * tmp21 * tmp3004 * 7.11217500781370140612125396728515625e+6 + tmp170 * tmp28 * tmp3004 * -3.9978607764128334820270538330078125e+7 + tmp177 * tmp35 * tmp3004 * -2.47929526534062363207340240478515625e+7 + tmp184 * tmp42 * tmp3004 * 5.7877387789233028888702392578125e+7 + tmp191 * tmp49 * tmp3004 * 2.415442343246303498744964599609375e+7 + tmp198 * tmp56 * tmp3004 * -2.06969723439102955162525177001953125e+7 + tmp205 * tmp63 * tmp3004 * 1.78525538171035833656787872314453125e+7 + tmp212 * tmp70 * tmp3004 * 1.4009467444685436785221099853515625e+7 + tmp219 * tmp77 * tmp3004 * -2.91895625420843400061130523681640625e+7 + tmp226 * tmp84 * tmp3004 * -9.44744408039938099682331085205078125e+6 + tmp233 * tmp91 * tmp3004 * 8.043186692793383263051509857177734375e+5 + tmp240 * tmp98 * tmp3004 * 1.1014191633621859364211559295654296875e+6 + tmp247 * tmp105 * tmp3004 * 2.806176347862780094146728515625e+5 + tmp112 * tmp3004 * -2.5533889297865671687759459018707275390625e+5 * z + tmp119 * tmp3004 * 2.7430891344744231901131570339202880859375e+4 + tmp135 * tmp3004 * 1.06134195710988031351007521152496337890625e+5 + tmp149 * tmp3098 * 1.1463186221695042331703007221221923828125e+5 * y + tmp156 * tmp7 * tmp3098 * 3.34098046720228740014135837554931640625e+5 + tmp163 * tmp14 * tmp3098 * 1.4934738638681494630873203277587890625e+6 + tmp170 * tmp21 * tmp3098 * -5.099484923313385806977748870849609375e+6 + tmp177 * tmp28 * tmp3098 * -2.89505389009524248540401458740234375e+7 + tmp184 * tmp35 * tmp3098 * -1.024711136925701797008514404296875e+7 + tmp191 * tmp42 * tmp3098 * -1.889690480137197673320770263671875e+7 + tmp198 * tmp49 * tmp3098 * -6.779162954163896851241588592529296875e+6 + tmp205 * tmp56 * tmp3098 * -3.632387960473184287548065185546875e+7 + tmp212 * tmp63 * tmp3098 * -2.55563506650698594748973846435546875e+7 + tmp219 * tmp70 * tmp3098 * 2.120485573269045352935791015625e+7 + tmp226 * tmp77 * tmp3098 * 1.265800266697686351835727691650390625e+7 + tmp233 * tmp84 * tmp3098 * 2.7973069577973075211048126220703125e+6 + tmp240 * tmp91 * tmp3098 * -4.2064045706932060420513153076171875e+6 + tmp247 * tmp98 * tmp3098 * -2.89720523540850146673619747161865234375e+5 + tmp105 * tmp3098 * 5.91017071080561145208775997161865234375e+5 * z + tmp112 * tmp3098 * -1.9248215497334153042174875736236572265625e+4 + tmp142 * tmp3098 * -1.098764736653561121784150600433349609375e+5 + tmp156 * tmp3187 * -5.137771720746974460780620574951171875e+5 * y + tmp163 * tmp7 * tmp3187 * -1.966078069582206197082996368408203125e+6 + tmp170 * tmp14 * tmp3187 * 6.24776580437773279845714569091796875e+6 + tmp177 * tmp21 * tmp3187 * 2.416337206486933864653110504150390625e+6 + tmp184 * tmp28 * tmp3187 * -2.3176539925927552394568920135498046875e+6 + tmp191 * tmp35 * tmp3187 * -1.403261177437100745737552642822265625e+7 + tmp198 * tmp42 * tmp3187 * 1.4312325385542665608227252960205078125e+6 + tmp205 * tmp49 * tmp3187 * 3.9408041259764735586941242218017578125e+6 + tmp212 * tmp56 * tmp3187 * 2.73211694501934386789798736572265625e+7 + tmp219 * tmp63 * tmp3187 * 5.9111524814186431467533111572265625e+6 + tmp226 * tmp70 * tmp3187 * -6.770602485044692642986774444580078125e+6 + tmp233 * tmp77 * tmp3187 * -6.29087843741706199944019317626953125e+6 + tmp240 * tmp84 * tmp3187 * 2.300029976749385707080364227294921875e+6 + tmp247 * tmp91 * tmp3187 * 5.716050566138536669313907623291015625e+5 + tmp98 * tmp3187 * -1.0669314752172600128687918186187744140625e+5 * z + tmp105 * tmp3187 * -1.223132019274805134045891463756561279296875e+4 + tmp149 * tmp3187 * 7.6460948540774334105663001537322998046875e+4 + tmp163 * tmp3271 * 4.2711688357697144965641200542449951171875e+4 * y + tmp170 * tmp7 * tmp3271 * 7.4079773240102943964302539825439453125e+5 + tmp177 * tmp14 * tmp3271 * -3.5754870884058983065187931060791015625e+6 + tmp184 * tmp21 * tmp3271 * 3.427145644821204245090484619140625e+6 + tmp191 * tmp28 * tmp3271 * -7.82998233624283969402313232421875e+6 + tmp198 * tmp35 * tmp3271 * -5.56519071248883567750453948974609375e+6 + tmp205 * tmp42 * tmp3271 * -2.451746062666709534823894500732421875e+6 + tmp212 * tmp49 * tmp3271 * 3.860770833344188518822193145751953125e+6 + tmp219 * tmp56 * tmp3271 * -3.3728278379827416501939296722412109375e+6 + tmp226 * tmp63 * tmp3271 * 2.2797554663294744677841663360595703125e+6 + tmp233 * tmp70 * tmp3271 * -3.8582958995827850885689258575439453125e+6 + tmp240 * tmp77 * tmp3271 * 5.56788241484340163879096508026123046875e+5 + tmp247 * tmp84 * tmp3271 * -4.376836840318930335342884063720703125e+5 + tmp91 * tmp3271 * 1.861945290110078640282154083251953125e+5 * z + tmp98 * tmp3271 * 3.5523566884319487144239246845245361328125e+4 + tmp156 * tmp3271 * -1.09171314424141004565171897411346435546875e+5 + tmp170 * tmp3350 * 5.44082330137627548538148403167724609375e+5 * y + tmp177 * tmp7 * tmp3350 * -1.26412620585219957865774631500244140625e+6 + tmp184 * tmp14 * tmp3350 * -8.68918582106443704105913639068603515625e+5 + tmp191 * tmp21 * tmp3350 * 1.463365755330618121661245822906494140625e+5 + tmp198 * tmp28 * tmp3350 * 3.2938731546351271681487560272216796875e+6 + tmp205 * tmp35 * tmp3350 * -1.8840158935666061006486415863037109375e+6 + tmp212 * tmp42 * tmp3350 * 5.350654957326852716505527496337890625e+6 + tmp219 * tmp49 * tmp3350 * 4.08686540659756958484649658203125e+6 + tmp226 * tmp56 * tmp3350 * 2.4612022012540991418063640594482421875e+6 + tmp233 * tmp63 * tmp3350 * -2.4538310317834909074008464813232421875e+5 + tmp240 * tmp70 * tmp3350 * 3.94916749895367189310491085052490234375e+5 + tmp247 * tmp77 * tmp3350 * 6.9935071883391030132770538330078125e+5 + tmp84 * tmp3350 * -2.182382886054189657443203032016754150390625e+4 * z + tmp91 * tmp3350 * 1.24793611090912527288310229778289794921875e+4 + tmp163 * tmp3350 * 9.6716059967903347569517791271209716796875e+4 + tmp177 * tmp3424 * -3.473462570327022694982588291168212890625e+4 * y + tmp184 * tmp7 * tmp3424 * -1.7710366337074176408350467681884765625e+5 + tmp191 * tmp14 * tmp3424 * 2.8114873208916489966213703155517578125e+5 + tmp198 * tmp21 * tmp3424 * 1.431031933379808324389159679412841796875e+5 + tmp205 * tmp28 * tmp3424 * 3.2918287437849775888025760650634765625e+6 + tmp212 * tmp35 * tmp3424 * -4.0018918043949562124907970428466796875e+6 + tmp219 * tmp42 * tmp3424 * 2.08536552602579002268612384796142578125e+6 + tmp226 * tmp49 * tmp3424 * 7.7187410104315378703176975250244140625e+5 + tmp233 * tmp56 * tmp3424 * 1.51516300701577565632760524749755859375e+6 + tmp240 * tmp63 * tmp3424 * -9.5935196350207013892941176891326904296875e+4 + tmp247 * tmp70 * tmp3424 * 1.8764899438338782056234776973724365234375e+5 + tmp77 * tmp3424 * -1.7650105654693322139792144298553466796875e+5 * z + tmp84 * tmp3424 * -1.12599907083165380754508078098297119140625e+4 + tmp170 * tmp3424 * 1.65243484159407744300551712512969970703125e+4 + tmp184 * tmp3493 * 3.170604361968534431071020662784576416015625e+4 * y + tmp191 * tmp7 * tmp3493 * -7.1019284896686685897293500602245330810546875e+3 + tmp198 * tmp14 * tmp3493 * -1.0667806064645524020306766033172607421875e+5 + tmp205 * tmp21 * tmp3493 * -5.3628663532777383807115256786346435546875e+4 + tmp212 * tmp28 * tmp3493 * 1.80971219243237585760653018951416015625e+6 + tmp219 * tmp35 * tmp3493 * -1.0597402581397411413490772247314453125e+6 + tmp226 * tmp42 * tmp3493 * -1.1588275993866080534644424915313720703125e+5 + tmp233 * tmp49 * tmp3493 * -4.25511186032313504256308078765869140625e+5 + tmp240 * tmp56 * tmp3493 * -2.938991744034239090979099273681640625e+5 + tmp247 * tmp63 * tmp3493 * 3.517423913705237209796905517578125e+5 + tmp70 * tmp3493 * -1.3036163511801595450378954410552978515625e+5 * z + tmp77 * tmp3493 * -1.15164797042504287674091756343841552734375e+4 + tmp177 * tmp3493 * -1.027326106002176529727876186370849609375e+5 + tmp191 * tmp3557 * 2.32968870028700403054244816303253173828125e+4 * y + tmp198 * tmp7 * tmp3557 * -1.7206523041620221920311450958251953125e+5 + tmp205 * tmp14 * tmp3557 * -1.20100356877218815498054027557373046875e+5 + tmp212 * tmp21 * tmp3557 * 8.7875761462733824737370014190673828125e+5 + tmp219 * tmp28 * tmp3557 * -2.122252146525706848478876054286956787109375e+4 + tmp226 * tmp35 * tmp3557 * -1.28475469950316357426345348358154296875e+5 + tmp233 * tmp42 * tmp3557 * -2.1290984832363171153701841831207275390625e+5 + tmp240 * tmp49 * tmp3557 * 1.2339747299729688165825791656970977783203125e+4 + tmp247 * tmp56 * tmp3557 * 6.92400885106657515279948711395263671875e+4 + tmp63 * tmp3557 * -2.536715383273988845758140087127685546875e+4 * z + tmp70 * tmp3557 * 1.1653339780162687020492739975452423095703125e+4 + tmp184 * tmp3557 * -1.616392726083058732911013066768646240234375e+3 + tmp198 * tmp3616 * 6.699978345966499546193517744541168212890625e+3 * y + tmp205 * tmp7 * tmp3616 * 5.187473514055114355869591236114501953125e+4 + tmp212 * tmp14 * tmp3616 * 1.65763816227832925505936145782470703125e+5 + tmp219 * tmp21 * tmp3616 * -3.25056090106885298155248165130615234375e+5 + tmp226 * tmp28 * tmp3616 * 1.8006313213867394370026886463165283203125e+5 + tmp233 * tmp35 * tmp3616 * 1.6149108969951418112032115459442138671875e+5 + tmp240 * tmp42 * tmp3616 * 9.2216657954497219179756939411163330078125e+4 + tmp247 * tmp49 * tmp3616 * -6.89965494541488005779683589935302734375e+4 + tmp56 * tmp3616 * -1.356782839007581424084492027759552001953125e+4 * z + tmp63 * tmp3616 * 3.8200083861179182349587790668010711669921875e+3 + tmp191 * tmp3616 * -2.522350773908335759188048541545867919921875e+4 + tmp205 * tmp3670 * -1.62814408780031881178729236125946044921875e+4 * y + tmp212 * tmp7 * tmp3670 * -6.41554116158736360375769436359405517578125e+4 + tmp219 * tmp14 * tmp3670 * 5.71622242260840721428394317626953125e+4 + tmp226 * tmp21 * tmp3670 * -9.0587334121958469040691852569580078125e+4 + tmp233 * tmp28 * tmp3670 * -7.3649611555596042308025062084197998046875e+4 + tmp240 * tmp35 * tmp3670 * 3.271239722421842088806442916393280029296875e+4 + tmp247 * tmp42 * tmp3670 * 3.37375063881458263495005667209625244140625e+4 + tmp49 * tmp3670 * 2.96347007322415311136865057051181793212890625e+3 * z + tmp56 * tmp3670 * -3.2546262208931884742924012243747711181640625e+3 + tmp198 * tmp3670 * 1.5338915752405284365522675216197967529296875e+4 + tmp212 * tmp3719 * 3.4136824779939588552224449813365936279296875e+3 * y + tmp219 * tmp7 * tmp3719 * -2.34577830300033165258355438709259033203125e+4 + tmp226 * tmp14 * tmp3719 * -5.9762243919275797452428378164768218994140625e+3 + tmp233 * tmp21 * tmp3719 * -5.814102600184004404582083225250244140625e+4 + tmp240 * tmp28 * tmp3719 * -3.0788500974681155639700591564178466796875e+4 + tmp247 * tmp35 * tmp3719 * 1.44903749359589928644709289073944091796875e+4 + tmp42 * tmp3719 * 8.998666125799138171714730560779571533203125e+3 * z + tmp49 * tmp3719 * -2.15837762855332357503357343375682830810546875e+3 + tmp205 * tmp3719 * 1.0489749206845866865478456020355224609375e+4 + tmp219 * tmp3763 * -7.9145091489850237849168479442596435546875e+3 * y + tmp226 * tmp7 * tmp3763 * -4.297619298829873514478094875812530517578125e+3 + tmp233 * tmp14 * tmp3763 * -3.6394460748627388966269791126251220703125e+3 + tmp240 * tmp21 * tmp3763 * -1.76700041940687924579833634197711944580078125e+2 + tmp247 * tmp28 * tmp3763 * -6.814583279386506546870805323123931884765625e+3 + tmp35 * tmp3763 * -3.72972720405301743085146881639957427978515625e+3 * z + tmp42 * tmp3763 * 5.423292687640742997245979495346546173095703125e+2 + tmp212 * tmp3763 * 5.6591764289461070802644826471805572509765625e+2 + tmp226 * tmp3802 * 1.75396447503114859500783495604991912841796875e+3 * y + tmp233 * tmp7 * tmp3802 * -5.6398178776681461386033333837985992431640625e+3 + tmp240 * tmp14 * tmp3802 * 4.5933270835433504544198513031005859375e+3 + tmp247 * tmp21 * tmp3802 * 2.34904035252849325843271799385547637939453125e+3 + tmp28 * tmp3802 * -1.9049080159450359133188612759113311767578125e+3 * z + tmp35 * tmp3802 * 7.929259720218824440962634980678558349609375e+2 + tmp219 * tmp3802 * 2.3259669282514806809558649547398090362548828125e+2 + tmp233 * tmp3836 * -2.8585752222517425025216653011739253997802734375e+2 * y + tmp240 * tmp7 * tmp3836 * -6.493934622707987500689341686666011810302734375e+2 + tmp247 * tmp14 * tmp3836 * -9.9977621505944307500612922012805938720703125e+2 + tmp21 * tmp3836 * 7.98660887368331486868555657565593719482421875e+1 * z + tmp28 * tmp3836 * 8.27886182528766454424840048886835575103759765625e+1 + tmp226 * tmp3836 * -4.777593477712571257143281400203704833984375e+2 + tmp240 * tmp3865 * 2.977550553870204339546035043895244598388671875e+2 * y + tmp247 * tmp7 * tmp3865 * 5.7826802635265408980558277107775211334228515625e+1 + tmp14 * tmp3865 * -3.19466599982074441754775762092322111129760742188e+1 * z + tmp21 * tmp3865 * -1.94448401828137278357644390780478715896606445312e+1 + tmp233 * tmp3865 * 3.52655094199027852042149788758251816034317016602e+0 + tmp247 * tmp3889 * -6.5845299554512251916094101034104824066162109375e+1 * y + tmp7 * tmp3889 * -2.42424916979261766414310841355472803115844726562e+1 * z + tmp14 * tmp3889 * -1.78320103293753859929893224034458398818969726562e+1 + tmp240 * tmp3889 * 4.077843129885712869509006850421428680419921875e+1 + tmp3908 * -1.18923647099609706145884047145955264568328857422e+1 * y * z + tmp7 * tmp3908 * 1.32242112002705436424321305821649730205535888672e+1 + tmp247 * tmp3908 * 1.49522929849522157041974423918873071670532226562e+1 + tmp3922 * 2.34174854619518546527956459613051265478134155273e+0 * y + tmp3922 * 9.04514018432027167015974100650055333971977233887e-1 * z + D.17848 * -1.68295515076845758617452020189375616610050201416e-1 + tmp263 * y * 2.44392153459739969179054241976700723171234130859e+0 + tmp3 * tmp7 * 4.93345174860685844464569527190178632736206054688e+1 + tmp9 * tmp14 * 8.24720252171355205916825070744380354881286621094e+0 + tmp16 * tmp21 * -1.9126235671034720553507213480770587921142578125e+2 + tmp23 * tmp28 * 4.316295746607764840518939308822154998779296875e+2 + tmp30 * tmp35 * 1.52199653640183896641246974468231201171875e+3 + tmp37 * tmp42 * -1.417924435742725108866579830646514892578125e+3 + tmp44 * tmp49 * 1.12212855482832528650760650634765625e+4 + tmp51 * tmp56 * -3.2432832132926720078103244304656982421875e+4 + tmp58 * tmp63 * 3.82009238930551000521518290042877197265625e+4 + tmp65 * tmp70 * -8.23258735196742345578968524932861328125e+4 + tmp72 * tmp77 * 1.14637278647861327044665813446044921875e+5 + tmp79 * tmp84 * 1.20300110761476520565338432788848876953125e+5 + tmp86 * tmp91 * -3.809750085251752170734107494354248046875e+4 + tmp93 * tmp98 * -5.87331939325498606194742023944854736328125e+4 + tmp100 * tmp105 * 1.5174465636768660624511539936065673828125e+5 + tmp107 * tmp112 * 5.7061912117433804087340831756591796875e+5 + tmp114 * tmp119 * 1.0945829384530000970698893070220947265625e+5 + tmp121 * tmp126 * 2.73575341893493314273655414581298828125e+5 + tmp128 * tmp133 * -2.663092194049791432917118072509765625e+5 + tmp135 * tmp140 * -1.7987658674338340642862021923065185546875e+5 + tmp142 * tmp147 * 9.581333776664166362024843692779541015625e+4 + tmp149 * tmp154 * -7.4135173588056395601597614586353302001953125e+3 + tmp156 * tmp161 * 1.3765767618855574983172118663787841796875e+5 + tmp163 * tmp168 * -1.29963515349221575888805091381072998046875e+5 + tmp170 * tmp175 * 1.024760051369240391068160533905029296875e+5 + tmp177 * tmp182 * 1.1774167185705955489538609981536865234375e+4 + tmp184 * tmp189 * -4.54049751975205363123677670955657958984375e+4 + tmp191 * tmp196 * 2.202691293171887446078471839427947998046875e+4 + tmp198 * tmp203 * -6.9031785646773232656414620578289031982421875e+3 + tmp205 * tmp210 * 2.587270389266899655922316014766693115234375e+3 + tmp212 * tmp217 * 1.584923463841544389651971869170665740966796875e+3 + tmp219 * tmp224 * 3.169149462572121365155908279120922088623046875e+2 + tmp226 * tmp231 * 5.80735738826707120097125880420207977294921875e+2 + tmp233 * tmp238 * 3.1680722189976205527273123152554035186767578125e+2 + tmp240 * tmp245 * 8.85777261495995782425438846985343843698501586914e-1 + tmp247 * tmp252 * -6.73539987844872634070725325727835297584533691406e+0 + tmp258 * -2.01161588801537272175323778355959802865982055664e+0 * z + D.17965 * -2.58248045144082116753025957223144359886646270752e-1 + D.17968 * 3.27889198207912357929671998135745525360107421875e+0; } Perhaps little more cureful code placement in SRA or local register pressure pass at the end of SSA path would do? Honza
Created attachment 11919 [details] bug2.c.099t.optimized
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, with the attached patch I can cure the regmove quadratic behaviour and the time report is not so unresonable now: gnu_dev_major gnu_dev_minor gnu_dev_makedev max min f fx fy fz add addl addr sub subl subr mul mull mulr divl ipow fi Analyzing compilation unitPerforming intraprocedural optimizations Assembling functions: max min add addl addr sub subl subr mul mull mulr divl ipow fz fy fx f fi {GC 126177k -> 85112k} {GC 327625k -> 39474k} Execution times (seconds) garbage collection : 0.83 ( 0%) usr 0.00 ( 0%) sys 0.82 ( 0%) wall 0 kB ( 0%) ggc callgraph construction: 0.16 ( 0%) usr 0.02 ( 1%) sys 0.16 ( 0%) wall 1147 kB ( 0%) ggc callgraph optimization: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 533 kB ( 0%) ggc ipa reference : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa type escape : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc trivially dead code : 0.45 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall 0 kB ( 0%) ggc life analysis : 21.38 ( 3%) usr 0.02 ( 1%) sys 21.39 ( 3%) wall 1120 kB ( 0%) ggc life info update : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.61 ( 0%) wall 0 kB ( 0%) ggc alias analysis : 0.87 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 4266 kB ( 1%) ggc register scan : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall 150 kB ( 0%) ggc rebuild jump labels : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.27 ( 0%) usr 0.06 ( 2%) sys 0.36 ( 0%) wall 471 kB ( 0%) ggc lexical analysis : 0.04 ( 0%) usr 0.05 ( 2%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc parser : 0.12 ( 0%) usr 0.03 ( 1%) sys 0.17 ( 0%) wall 3207 kB ( 1%) ggc inline heuristics : 15.14 ( 2%) usr 0.01 ( 0%) sys 15.26 ( 2%) wall 1486 kB ( 0%) ggc integration : 21.35 ( 3%) usr 0.12 ( 4%) sys 21.71 ( 3%) wall 33445 kB ( 8%) ggc tree gimplify : 0.18 ( 0%) usr 0.01 ( 0%) sys 0.19 ( 0%) wall 3341 kB ( 1%) ggc tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1338 kB ( 0%) ggc tree CFG cleanup : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 20 kB ( 0%) ggc tree VRP : 0.38 ( 0%) usr 0.01 ( 0%) sys 0.42 ( 0%) wall 11 kB ( 0%) ggc tree copy propagation : 0.23 ( 0%) usr 0.01 ( 0%) sys 0.28 ( 0%) wall 222 kB ( 0%) ggc tree store copy prop : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall 4 kB ( 0%) ggc tree find ref. vars : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 8137 kB ( 2%) ggc tree PTA : 1.29 ( 0%) usr 0.04 ( 1%) sys 1.36 ( 0%) wall 57 kB ( 0%) ggc tree alias analysis : 1.89 ( 0%) usr 0.20 ( 7%) sys 2.10 ( 0%) wall 0 kB ( 0%) ggc tree PHI insertion : 1.68 ( 0%) usr 0.01 ( 0%) sys 1.70 ( 0%) wall 18 kB ( 0%) ggc tree SSA rewrite : 0.62 ( 0%) usr 0.04 ( 1%) sys 0.65 ( 0%) wall 17084 kB ( 4%) ggc tree SSA other : 0.48 ( 0%) usr 0.08 ( 3%) sys 0.56 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental : 1.20 ( 0%) usr 0.00 ( 0%) sys 1.24 ( 0%) wall 0 kB ( 0%) ggc tree operand scan : 1.48 ( 0%) usr 0.34 (11%) sys 1.93 ( 0%) wall 15634 kB ( 4%) ggc dominator optimization: 1.05 ( 0%) usr 0.05 ( 2%) sys 1.05 ( 0%) wall 2698 kB ( 1%) ggc tree SRA : 1.05 ( 0%) usr 0.09 ( 3%) sys 1.15 ( 0%) wall 24835 kB ( 6%) ggc tree STORE-CCP : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 4 kB ( 0%) ggc tree CCP : 0.51 ( 0%) usr 0.02 ( 1%) sys 0.56 ( 0%) wall 154 kB ( 0%) ggc tree reassociation : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc tree PRE : 296.46 (45%) usr 0.49 (16%) sys 298.81 (45%) wall 19481 kB ( 5%) ggc tree FRE : 0.96 ( 0%) usr 0.05 ( 2%) sys 1.00 ( 0%) wall 7991 kB ( 2%) ggc tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree conservative DCE : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall 0 kB ( 0%) ggc tree aggressive DCE : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 8 kB ( 0%) ggc tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree SSA to normal : 27.19 ( 4%) usr 0.01 ( 0%) sys 27.33 ( 4%) wall 22 kB ( 0%) ggc tree rename SSA copies: 0.15 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc expand : 2.96 ( 0%) usr 0.09 ( 3%) sys 3.05 ( 0%) wall 24095 kB ( 6%) ggc jump : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc CSE : 1.87 ( 0%) usr 0.00 ( 0%) sys 1.88 ( 0%) wall 118 kB ( 0%) ggc global CSE : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc CPROP 1 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall 1620 kB ( 0%) ggc PRE : 21.36 ( 3%) usr 0.01 ( 0%) sys 21.41 ( 3%) wall 200 kB ( 0%) ggc CPROP 2 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall 390 kB ( 0%) ggc bypass jumps : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall 389 kB ( 0%) ggc CSE 2 : 1.05 ( 0%) usr 0.00 ( 0%) sys 1.07 ( 0%) wall 72 kB ( 0%) ggc branch prediction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1 kB ( 0%) ggc flow analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc combiner : 0.87 ( 0%) usr 0.01 ( 0%) sys 0.88 ( 0%) wall 1745 kB ( 0%) ggc if-conversion : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 3 kB ( 0%) ggc regmove : 21.69 ( 3%) usr 0.02 ( 1%) sys 21.78 ( 3%) wall 2 kB ( 0%) ggc local alloc : 7.60 ( 1%) usr 0.00 ( 0%) sys 7.62 ( 1%) wall 1480 kB ( 0%) ggc global alloc : 16.47 ( 2%) usr 0.35 (12%) sys 16.91 ( 3%) wall 16915 kB ( 4%) ggc reload CSE regs : 107.52 (16%) usr 0.15 ( 5%) sys 108.55 (16%) wall 4783 kB ( 1%) ggc flow 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 225 kB ( 0%) ggc peephole 2 : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 0 kB ( 0%) ggc rename registers : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall 0 kB ( 0%) ggc scheduling 2 : 75.09 (11%) usr 0.53 (18%) sys 76.86 (12%) wall 206227 kB (51%) ggc machine dep reorg : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 0 kB ( 0%) ggc reorder blocks : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall 15 kB ( 0%) ggc reg stack : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 37 kB ( 0%) ggc final : 0.66 ( 0%) usr 0.02 ( 1%) sys 0.74 ( 0%) wall 1156 kB ( 0%) ggc TOTAL : 659.57 2.99 668.06 407297 kB PRE is somewhat slow, but I will leave this to Danny. For scheduling the situation is quite clear - we have huge basic blocks and produce huge amount of dependencies. For reload, I am also not really surprised since the code produces is regalloc nightmare and reload manages to create very huge bitmaps that results in quadratic behaviour. Since Danny asked for allocpools: Alloc-pool Kind Pools Allocated Peak Leak ------------------------------------------------------------- Value sets 18 2230608 1929200 0 Bitmap sets 18 9504 8432 0 Value set nodes 18 2032208 1768488 0 Binary tree nodes 18 1291320 783992 0 value 48 3875872 1246744 0 et_occ pool 127 238144 48040 0 et_node pool 127 159680 36024 0 Reference tree nodes 18 1430880 1437864 0 Expression tree nodes 18 426240 428840 0 elt_list 48 3639816 397672 0 List tree nodes 18 511488 516880 0 elt_loc_list 48 14186784 975240 0 Comparison tree nodes 18 4520 4832 0 original_copy 26 48 88 0 Constraint pool 108 4335432 1501136 0 Unary tree nodes 18 96 968 0 Variable info pool 108 12261704 4550848 0 Constraint edges 108 2112 496 0 operand entry pool 36 512 248 0 cselib_val_list 48 11627616 974144 0 ------------------------------------------------------------- Total 994 58264584 Memory consumption is now dominated by scheduler's dependency info: ggc-common.c:193 (ggc_calloc) 6303224: 1.9% 5139976:12.3% 1863696: 8.8% 1073688:21.8% 530 gimplify.c:453 (create_tmp_var_raw) 7325032: 2.2% 0: 0.0% 889240: 4.2% 0: 0.0% 93344 genrtl.c:17 (gen_rtx_fmt_ee) 9819384: 2.9% 0: 0.0% 138900: 0.7% 0: 0.0% 829857 tree-dfa.c:186 (create_stmt_ann) 9970168: 2.9% 763932: 1.8% 3692: 0.0% 0: 0.0% 206496 tree-ssanames.c:147 (make_ssa_name) 9740544: 2.9% 0: 0.0% 2373936:11.2% 0: 0.0% 252385 bitmap.c:139 (bitmap_element_allocate) 18876340: 5.6% 0: 0.0% 0: 0.0% 0: 0.0% 674155 genrtl.c:32 (gen_rtx_fmt_ue) 193579104:57.2% 0: 0.0% 0: 0.0% 0: 0.0% 16131592 Total 338496482 41839722 21146495 4929007 22457179 I am now looking into -O3 compilation that creases at into-ssa by overly large stack. Honza
Created attachment 11920 [details] regmovefix
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, with the attached patch that saves roughly 10 minutes of tree-into-ssa pass, I can compile with -O3 -fno-tree-fre -fno-tree-pre. Only without checking-enabled since we do incredibly deep dominator walks running out of stack space that can be considered as bug too. TER still manages to enfore few thousdand temporaries with overlapping liveranges. THe out-of-ssa pass spends most of time in calculate_live_on_exit and calculate_live_on_entry that looks rather symmetric to problem cured by the attached patch, but I don't see directly how to avoid the quadratic behaviour there. Honza garbage collection : 1.22 ( 0%) usr 0.10 ( 1%) sys 8.40 ( 1%) wall 0 kB ( 0%) ggc callgraph construction: 0.14 ( 0%) usr 0.03 ( 0%) sys 0.18 ( 0%) wall 1147 kB ( 0%) ggc callgraph optimization: 0.07 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall 533 kB ( 0%) ggc ipa reference : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa type escape : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc cfg cleanup : 3.89 ( 1%) usr 0.01 ( 0%) sys 4.11 ( 0%) wall 1576 kB ( 1%) ggc trivially dead code : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall 0 kB ( 0%) ggc life analysis : 51.34 ( 9%) usr 2.65 (21%) sys 73.91 ( 5%) wall 2653 kB ( 1%) ggc life info update : 48.97 ( 9%) usr 0.14 ( 1%) sys 50.68 ( 4%) wall 641 kB ( 0%) ggc alias analysis : 0.69 ( 0%) usr 0.00 ( 0%) sys 1.05 ( 0%) wall 4139 kB ( 1%) ggc register scan : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall 0 kB ( 0%) ggc rebuild jump labels : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.37 ( 0%) usr 0.06 ( 0%) sys 0.34 ( 0%) wall 471 kB ( 0%) ggc lexical analysis : 0.01 ( 0%) usr 0.05 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc parser : 0.09 ( 0%) usr 0.02 ( 0%) sys 0.18 ( 0%) wall 3207 kB ( 1%) ggc inline heuristics : 14.79 ( 3%) usr 0.02 ( 0%) sys 14.86 ( 1%) wall 1118 kB ( 0%) ggc integration : 17.07 ( 3%) usr 0.22 ( 2%) sys 17.36 ( 1%) wall 79483 kB (27%) ggc tree gimplify : 0.15 ( 0%) usr 0.01 ( 0%) sys 0.17 ( 0%) wall 3341 kB ( 1%) ggc tree eh : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CFG construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1338 kB ( 0%) ggc tree CFG cleanup : 4.27 ( 1%) usr 0.00 ( 0%) sys 4.27 ( 0%) wall 20 kB ( 0%) ggc tree VRP : 1.26 ( 0%) usr 0.03 ( 0%) sys 1.33 ( 0%) wall 14 kB ( 0%) ggc tree copy propagation : 0.85 ( 0%) usr 0.05 ( 0%) sys 0.94 ( 0%) wall 313 kB ( 0%) ggc tree store copy prop : 0.27 ( 0%) usr 0.01 ( 0%) sys 0.28 ( 0%) wall 5 kB ( 0%) ggc tree find ref. vars : 0.16 ( 0%) usr 0.03 ( 0%) sys 0.18 ( 0%) wall 12044 kB ( 4%) ggc tree PTA : 1.55 ( 0%) usr 0.06 ( 0%) sys 1.63 ( 0%) wall 57 kB ( 0%) ggc tree alias analysis : 2.81 ( 0%) usr 0.29 ( 2%) sys 3.10 ( 0%) wall 0 kB ( 0%) ggc tree PHI insertion : 0.57 ( 0%) usr 0.92 ( 7%) sys 1.52 ( 0%) wall 3137 kB ( 1%) ggc tree SSA rewrite : 2.33 ( 0%) usr 0.06 ( 0%) sys 5.02 ( 0%) wall 21592 kB ( 7%) ggc tree SSA other : 0.41 ( 0%) usr 0.16 ( 1%) sys 0.65 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental : 4.18 ( 1%) usr 0.45 ( 4%) sys 4.72 ( 0%) wall 520 kB ( 0%) ggc tree operand scan : 1.79 ( 0%) usr 0.69 ( 5%) sys 39.97 ( 3%) wall 18374 kB ( 6%) ggc dominator optimization: 2.91 ( 1%) usr 0.05 ( 0%) sys 2.99 ( 0%) wall 11155 kB ( 4%) ggc tree SRA : 4.24 ( 1%) usr 0.15 ( 1%) sys 4.51 ( 0%) wall 25568 kB ( 9%) ggc tree STORE-CCP : 0.29 ( 0%) usr 0.01 ( 0%) sys 0.31 ( 0%) wall 18 kB ( 0%) ggc tree CCP : 0.87 ( 0%) usr 0.01 ( 0%) sys 2.39 ( 0%) wall 154 kB ( 0%) ggc tree split crit edges : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.14 ( 0%) wall 9284 kB ( 3%) ggc tree reassociation : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc tree code sinking : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall 0 kB ( 0%) ggc tree linearize phis : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc tree forward propagate: 0.10 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc tree conservative DCE : 1.13 ( 0%) usr 0.00 ( 0%) sys 1.11 ( 0%) wall 0 kB ( 0%) ggc tree aggressive DCE : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall 1 kB ( 0%) ggc PHI merge : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc complete unrolling : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree loop init : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc tree copy headers : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 0 kB ( 0%) ggc tree SSA uncprop : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc tree SSA to normal : 228.94 (40%) usr 0.64 ( 5%) sys 337.06 (25%) wall 10323 kB ( 4%) ggc tree rename SSA copies: 0.49 ( 0%) usr 0.03 ( 0%) sys 0.51 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 2.63 ( 0%) usr 0.09 ( 1%) sys 2.85 ( 0%) wall 0 kB ( 0%) ggc control dependences : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc expand : 6.10 ( 1%) usr 1.13 ( 9%) sys 192.49 (14%) wall 35008 kB (12%) ggc jump : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc CSE : 0.89 ( 0%) usr 0.01 ( 0%) sys 0.89 ( 0%) wall 53 kB ( 0%) ggc loop analysis : 0.29 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 930 kB ( 0%) ggc CPROP 1 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc CSE 2 : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.46 ( 0%) wall 29 kB ( 0%) ggc branch prediction : 0.55 ( 0%) usr 0.00 ( 0%) sys 0.56 ( 0%) wall 0 kB ( 0%) ggc flow analysis : 37.33 ( 6%) usr 0.10 ( 1%) sys 53.59 ( 4%) wall 0 kB ( 0%) ggc combiner : 1.02 ( 0%) usr 0.02 ( 0%) sys 1.37 ( 0%) wall 2685 kB ( 1%) ggc if-conversion : 5.21 ( 1%) usr 0.00 ( 0%) sys 5.36 ( 0%) wall 1614 kB ( 1%) ggc regmove : 0.72 ( 0%) usr 0.01 ( 0%) sys 0.83 ( 0%) wall 4 kB ( 0%) ggc mode switching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc local alloc : 1.06 ( 0%) usr 0.02 ( 0%) sys 1.46 ( 0%) wall 1045 kB ( 0%) ggc global alloc : 86.33 (15%) usr 4.12 (32%) sys 452.97 (34%) wall 8488 kB ( 3%) ggc reload CSE regs : 24.86 ( 4%) usr 0.07 ( 1%) sys 28.13 ( 2%) wall 3370 kB ( 1%) ggc load CSE after reload : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc flow 2 : 0.36 ( 0%) usr 0.01 ( 0%) sys 1.19 ( 0%) wall 5064 kB ( 2%) ggc if-conversion 2 : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 0 kB ( 0%) ggc peephole 2 : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 0 kB ( 0%) ggc rename registers : 0.38 ( 0%) usr 0.05 ( 0%) sys 0.50 ( 0%) wall 1 kB ( 0%) ggc scheduling 2 : 2.10 ( 0%) usr 0.07 ( 1%) sys 2.40 ( 0%) wall 4347 kB ( 1%) ggc machine dep reorg : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall 79 kB ( 0%) ggc reorder blocks : 0.63 ( 0%) usr 0.01 ( 0%) sys 1.06 ( 0%) wall 2738 kB ( 1%) ggc reg stack : 1.07 ( 0%) usr 0.02 ( 0%) sys 1.53 ( 0%) wall 11030 kB ( 4%) ggc final : 1.06 ( 0%) usr 0.04 ( 0%) sys 1.18 ( 0%) wall 2182 kB ( 1%) ggc symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 575.62 12.78 1351.48 291955 kB
Created attachment 11921 [details] intossaspeedup
Subject: Bug number PR28071 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01011.html
Subject: Bug 28071 Author: hubicka Date: Mon Jul 24 11:23:21 2006 New Revision: 115712 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115712 Log: PR rtl-optimization/28071 * ipa-inline.c (update_caller_keys): Remove edges that are no longer inline candidates. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-inline.c
Subject: Bug 28071 Author: hubicka Date: Mon Jul 24 11:27:53 2006 New Revision: 115713 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115713 Log: PR rtl-optimization/28071 * tree-cfg.c (tree_split_block): Do not allocate new stmt_list nodes. * tree-iterator.c (tsi_split_statement_list_before): Do not crash when splitting before first stmt. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-cfg.c trunk/gcc/tree-iterator.c
OK, some summary ;) Mainline (after the first three patches) at -O now peaks 450MB (just because of register allocator's conflict matrix, otherwise it is about 150MB). Still not quite icc's 12 seconds/200MB, but we are out of regression land for -O relative to 4.0.I tested 3.0 and it bombs on the testcase, 2.95 however compile it quite fluently on 200MB peak, it needs 6 minutes however. life analysis : 25.92 (16%) usr 0.01 ( 0%) sys 26.18 (15%) wall 2565 kB ( 1%) ggc inline heuristics : 15.15 ( 9%) usr 0.01 ( 0%) sys 15.27 ( 9%) wall 1486 kB ( 1%) ggc integration : 21.37 (13%) usr 0.12 ( 5%) sys 21.66 (13%) wall 33445 kB (19%) ggc tree SSA to normal : 27.73 (17%) usr 0.03 ( 1%) sys 27.93 (16%) wall 17 kB ( 0%) ggc local alloc : 7.33 ( 4%) usr 0.03 ( 1%) sys 7.41 ( 4%) wall 1855 kB ( 1%) ggc global alloc : 13.67 ( 8%) usr 0.73 (32%) sys 15.85 ( 9%) wall 14178 kB ( 8%) ggc reload CSE regs : 30.88 (19%) usr 0.04 ( 2%) sys 31.09 (18%) wall 2393 kB ( 1%) ggc TOTAL : 164.46 2.27 169.53 173593 kB It would be interesting to see how dataflow branch score here after re-merging from mainline. Hopefully integration and register allocation issues should be tracked there. The inliner is still quadratic in time because of quadratic split_block and cgraph_node. Both can be made linear quite easilly (split_block by always renumbering the smaller area of block and cgraph_node by producing hashtables for nodes with many edges), but I am not sure I want to do that for 4.2. Inline heuristics might be trickier to get in speed. I duno about reload. Oprofile might be handy ;) -O2 expose problem in PRE DannyB has fix for. Regmove and into-SSA can also be significantly sped up by patches I attached and will commit them once testing converge. -O3 turns the testcase into quite different one (gigantic basic block is turned into many basic blocks by inlining min/max functions). There few problems are still visible - FRE consume unbounded amount of memory and we fail to synthetize fmin/fmax operators where we ought to. If the FRE problem is fixed, I would say it should no longer be considered as 4.2 blocker. Honza
Subject: Bug number PR rtl-optimization/28071 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01083.html
Subject: Bug 28071 Author: hubicka Date: Wed Jul 26 22:51:56 2006 New Revision: 115765 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115765 Log: PR rtl-optimization/28071 * regmove.c (reg_is_remote_constant_p): Avoid quadratic behaviour. (reg_set_in_bb, max_reg_computed): New static variables. (regmove_optimize): Free the new array. (fixup_match_1): Update call of reg_is_remote_constant_p. Modified: trunk/gcc/ChangeLog trunk/gcc/regmove.c
Subject: Bug number PR rtl-optimization/28071 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01144.html
Subject: Bug number PR rtl-optimization/28071 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01145.html
Subject: Bug number PR rtl-optimization/28071 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01146.html
Subject: Bug number PR rtl-optimization/28071 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01147.html
Subject: Bug 28071 Author: hubicka Date: Thu Jul 27 16:02:27 2006 New Revision: 115776 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115776 Log: PR rtl-optimization/28071 * global.c (greg_obstack): New obstack. (allocate_bb_info): Use it. (free_bb_info): Likewise. (modify_reg_pav): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/global.c
Subject: Bug 28071 Author: hubicka Date: Thu Jul 27 16:03:22 2006 New Revision: 115777 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115777 Log: PR rtl-optimization/28071 * cselib.c (cselib_process_insn): Don't remove useless values too often for very large hashtables. Modified: trunk/gcc/ChangeLog trunk/gcc/cselib.c
Subject: Bug 28071 Author: hubicka Date: Thu Jul 27 17:10:07 2006 New Revision: 115779 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115779 Log: PR rtl-optimization/28071 * hashtab.c (htab_empty): Clear out n_deleted/n_elements; downsize the hashtable. Modified: trunk/libiberty/ChangeLog trunk/libiberty/hashtab.c
Subject: Bug number PR rtl-optimization/28071 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01185.html
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, I've added this testcase to our's memory regression tester (see gcc-regression mainling list), so hopefully the quadratic memory consumption issues will be tracked now. It would be nice to have runtime benchmark variant of the test we can track the runtime and compilation time. It seems to uncover quite interesting behaviours across the compiler. Honza
Subject: Bug 28071 Author: hubicka Date: Sat Jul 29 13:14:22 2006 New Revision: 115810 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115810 Log: PR rtl-optimization/28071 * cfgrtl.c (rtl_delete_block): Free regsets. * flow.c (allocate_bb_life_data): Re-use regsets if available. Modified: trunk/gcc/ChangeLog trunk/gcc/cfgrtl.c trunk/gcc/flow.c
Subject: Bug number PR rtl-optimization/28071 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01221.html
Jan, I'm assigning it to you since you have already spent a fair amount of time on it and made significant progress. Thanks for tackling the hard stuff.
Subject: Bug 28071 Author: rakdver Date: Wed Aug 16 21:25:39 2006 New Revision: 116190 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116190 Log: PR rtl-optimization/28071 * basic-block.h (bb_dom_dfs_in, bb_dom_dfs_out): Declare. * dominance.c (bb_dom_dfs_in, bb_dom_dfs_out): New functions. * tree-into-ssa.c (struct dom_dfsnum): New. (cmp_dfsnum, find_dfsnum_interval, prune_unused_phi_nodes): New functions. (insert_phi_nodes_for): Use prune_unused_phi_nodes instead of compute_global_livein. (prepare_block_for_update, prepare_use_sites_for): Mark the uses in phi nodes in the correct blocks. Modified: trunk/gcc/ChangeLog trunk/gcc/basic-block.h trunk/gcc/dominance.c trunk/gcc/tree-into-ssa.c
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, to summary current process, the memory consumption seems to be in control now: comparing PR rtl-optimization/28071 testcase compilation at -O0 level: Ovarall memory allocated via mmap and sbrk decreased from 146456k to 134136k, overall -9.18% Peak amount of GGC memory allocated before garbage collecting run decreased from 95412k to 81628k, overall -16.89% Amount of produced GGC garbage decreased from 163295k to 143524k, overall -13.77% Overall memory needed: 146456k -> 134136k Peak memory use before GGC: 95412k -> 81628k Peak memory use after GGC: 58507k Maximum of released memory in single GGC run: 45493k Garbage: 163295k -> 143524k Leak: 7142k Overhead: 29023k -> 25103k GGC runs: 87 comparing PR rtl-optimization/28071 testcase compilation at -O1 level: Overall memory needed: 430308k -> 424700k Peak memory use before GGC: 201177k Peak memory use after GGC: 196173k Maximum of released memory in single GGC run: 100203k -> 95156k Garbage: 279198k -> 271636k Leak: 47195k Overhead: 31459k -> 29952k GGC runs: 105 comparing PR rtl-optimization/28071 testcase compilation at -O2 level: Overall memory needed: 350424k -> 344820k Peak memory use before GGC: 208293k Peak memory use after GGC: 196536k Maximum of released memory in single GGC run: 101565k -> 96536k Garbage: 394891k -> 387353k Leak: 47778k Overhead: 49054k -> 47552k GGC runs: 111 comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level: Overall memory needed: 535696k -> 536260k Peak memory use before GGC: 314602k Peak memory use after GGC: 292946k Maximum of released memory in single GGC run: 163430k Garbage: 494953k -> 486928k Leak: 65110k Overhead: 60330k -> 58798k GGC runs: 100 I will post short summary of remaining bottleneks on each optimization level. Honza
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space At -O0 we get time sinks: life analysis : 0.75 (10%) usr 0.01 ( 3%) sys 0.78 ( 9%) wall 2714 kB ( 4%) ggc expand : 1.46 (15%) usr 0.04 (11%) sys 1.66 (15%) wall 37656 kB (58%) ggc local alloc : 1.40 (14%) usr 0.04 (11%) sys 1.45 (13%) wall 1293 kB ( 2%) ggc global alloc : 3.55 (36%) usr 0.05 (14%) sys 3.67 (34%) wall 7509 kB (12%) ggc final : 0.96 (10%) usr 0.04 (11%) sys 1.00 ( 9%) wall 1157 kB ( 2%) ggc TOTAL : 9.95 0.35 10.77 64543 kB Expand seems resonable given that almost everything is call that has long representation. Global alloc is copying important portion of insn stream because of: /* If we aren't replacing things permanently and we changed something, make another copy to ensure that all the RTL is new. Otherwise things can go wrong if find_reload swaps commutative operands and one is inside RTL that has been copied while the other is not. */ new_body = old_body; if (! replace) { new_body = copy_insn (old_body); if (REG_NOTES (insn)) REG_NOTES (insn) = copy_insn_1 (REG_NOTES (insn)); } and few other occurences of copy_insn in reload1.c. They seems to copy quite a lot of unnecesary RTL "just for sure". Also virtual register ellimination produce a lot of duplicated RTL, perhaps it can be cached? global alloc also spend 50% of time by clearing out reg_has_output_reload. I am testing patch that fix that. global alloc : 1.51 (19%) usr 0.07 (20%) sys 1.60 (18%) wall 7509 kB (12%) ggc Final is spending all it's time in shorten branches, that are not needed at all. Honza
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space The -O1 time sinks: life analysis : 25.44 (19%) usr 0.00 ( 0%) sys 25.49 (17%) wall 2565 kB ( 2%) ggc inline heuristics : 14.92 (11%) usr 0.00 ( 0%) sys 14.95 (10%) wall 1486 kB ( 1%) ggc integration : 20.73 (15%) usr 0.10 ( 4%) sys 22.72 (15%) wall 33445 kB (20%) ggc tree SSA to normal : 27.97 (20%) usr 0.04 ( 2%) sys 28.13 (19%) wall 17 kB ( 0%) ggc expand : 2.56 ( 2%) usr 0.04 ( 2%) sys 2.67 ( 2%) wall 24100 kB (14%) ggc local alloc : 7.21 ( 5%) usr 0.03 ( 1%) sys 7.18 ( 5%) wall 1855 kB ( 1%) ggc global alloc : 11.76 ( 9%) usr 0.99 (39%) sys 17.71 (12%) wall 11029 kB ( 6%) ggc reload CSE regs : 7.91 ( 6%) usr 0.02 ( 1%) sys 7.97 ( 5%) wall 2393 kB ( 1%) ggc TOTAL : 136.62 2.56 148.01 170448 kB tree SSA to normal spends most of time in find_value_in_list because TER is shuffling around single linked lists in the quadratic way. I got quickly lost in the logic there. Andrew, can you take a look, please? integration runs into qudratic behaviour of cgraph_edge. Implementing hashtable for large cgraphs is easy, I will do so. Also tree_split_block quadratic behaviour hits us here. reload CSE regs has hard time to track all the stack slot memory locations. It is working harder than needed because a lot of memories are believed to be aliasing even if theoretically almost everything SRA and has no address taken so it should have unique alias sets. Life analysis spends most of time in dead store removal code. Again lowering --param might help. I am also testing little patch to cut it to 13 seconds by speeding up reg_overlap_mentioned_p. It would be insteresting to see how dataflow branch score here. inline heuristics spends most time checking inline_function_growth limit, I will need to think about it a bit. Honza
I'll take a look. On the new out-of-ssa branch I've already converted the coalesce list from a linked list type linear algorithm to a hash table, as well as changed the live on entry and live on exit implementations to be more efficient. I didn't bother with TER because its due to be removed on the new branch... eventually :-) I'll take a peek and see how much work it is to change that. Andrew
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Thank you for consideration, Live on entry/exit code shows up high on -O3 compilation time too (something like 30% of time without PRE/FRE I believe). So if it is self contained change, perhaps pushing it to mainline as PR fix would make sense. Honza
Subject: Bug 28071 Author: hubicka Date: Mon Aug 21 00:00:14 2006 New Revision: 116277 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116277 Log: PR rtl-optimization/28071 * reload1.c (reg_has_output_reload): Turn into regset. (reload_as_needed, forget_old_reloads_1, forget_marked_reloads, choose_reload_regs, emit_reload_insns): Update to new reg_has_output_reload. Modified: trunk/gcc/ChangeLog trunk/gcc/reload1.c
Subject: Bug 28071 Author: hubicka Date: Mon Aug 21 01:42:39 2006 New Revision: 116284 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116284 Log: PR rtl-optimization/28071 * tree-optimize.c (tree_rest_of_compilation): Do not remove edges twice. * tree-inline.c (copy_bb): Use cgraph_set_call_stmt. * ipa-inline.c (cgraph_check_inline_limits): Add one_only argument. (cgraph_decide_inlining, cgraph_decide_inlining_of_small_function, cgraph_decide_inlining_incrementally): Update use of cgraph_check_inline_limits. * cgraph.c (edge_hash, edge_eq): New function. (cgraph_edge, cgraph_set_call_stmt, cgraph_create_edge, cgraph_edge_remove_caller, cgraph_node_remove_callees, cgraph_remove_node): Maintain call site hash. * cgraph.h (struct cgraph_node): Add call_site_hash. (cgraph_set_call_stmt): New function. Modified: trunk/gcc/ChangeLog trunk/gcc/cgraph.c trunk/gcc/cgraph.h trunk/gcc/ipa-inline.c trunk/gcc/tree-inline.c trunk/gcc/tree-optimize.c
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, update at -O1 few patches later (different machine with "only" 500MB ram, so some swappin occurs, but we almost fit now): life analysis : 23.50 (20%) usr 0.00 ( 0%) sys 23.51 (17%) wall 2565 kB ( 2%) ggc inline heuristics : 0.60 ( 1%) usr 0.00 ( 0%) sys 0.60 ( 0%) wall 1561 kB ( 1%) ggc integration : 5.75 ( 5%) usr 0.04 ( 2%) sys 5.79 ( 4%) wall 33701 kB (20%) ggc tree SSA rewrite : 0.51 ( 0%) usr 0.01 ( 1%) sys 0.53 ( 0%) wall 17087 kB (10%) ggc tree SRA : 0.98 ( 1%) usr 0.08 ( 4%) sys 1.10 ( 1%) wall 24835 kB (15%) ggc tree SSA to normal : 45.11 (39%) usr 0.02 ( 1%) sys 45.14 (33%) wall 17 kB ( 0%) ggc local alloc : 5.82 ( 5%) usr 0.01 ( 1%) sys 5.85 ( 4%) wall 1855 kB ( 1%) ggc global alloc : 9.83 ( 8%) usr 0.76 (39%) sys 23.49 (17%) wall 11029 kB ( 6%) ggc reload CSE regs : 7.30 ( 6%) usr 0.03 ( 2%) sys 10.16 ( 7%) wall 2393 kB ( 1%) ggc TOTAL : 116.65 1.96 136.52 170783 kB Life analysis is almost completely code tracking dead stores after reload (we have many stack slots). Tree-SSA to normal is the SRA problem discussed, integration is split_block, global alloc allocate very huge conflict matrix, reload CSE regs has similar problem tracking memories. No idea about local alloc. Honza
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, -O2 times: Execution times (seconds) life analysis : 18.08 ( 3%) usr 0.04 ( 1%) sys 19.42 ( 3%) wall 1120 kB ( 0%) ggc integration : 5.97 ( 1%) usr 0.07 ( 2%) sys 6.13 ( 1%) wall 33701 kB ( 8%) ggc tree PRE : 233.01 (43%) usr 0.46 (13%) sys 241.22 (37%) wall 19480 kB ( 5%) ggc tree SSA to normal : 51.26 ( 9%) usr 0.07 ( 2%) sys 52.62 ( 8%) wall 22 kB ( 0%) ggc expand : 2.62 ( 0%) usr 0.07 ( 2%) sys 9.45 ( 1%) wall 24095 kB ( 6%) ggc PRE : 20.39 ( 4%) usr 0.05 ( 1%) sys 21.70 ( 3%) wall 200 kB ( 0%) ggc regmove : 97.32 (18%) usr 0.17 ( 5%) sys 107.36 (16%) wall 2 kB ( 0%) ggc local alloc : 6.28 ( 1%) usr 0.00 ( 0%) sys 6.29 ( 1%) wall 1480 kB ( 0%) ggc global alloc : 13.12 ( 2%) usr 0.71 (21%) sys 62.79 (10%) wall 13764 kB ( 3%) ggc reload CSE regs : 16.16 ( 3%) usr 0.02 ( 1%) sys 19.21 ( 3%) wall 4783 kB ( 1%) ggc scheduling 2 : 60.85 (11%) usr 0.57 (17%) sys 67.94 (10%) wall 206199 kB (51%) ggc TOTAL : 547.14 3.41 651.49 404467 kB Danny has fix for PRE scheduled for 4.2. Regmove hits again the same predicate function sincle we now produce big basic blocks. This can be fixed rather easilly rather by limiting walk in that predicate or assiging INSN consetuctive indexes. Scheduling has problem moving around linked lists of dependencies and fixing it seems to need to go away from log links and thus it is 4.2 issue too. One detail that just came to mind... All memory consumed in scheduling are log links. Producing 206MB of them for 24MB function is rather dense. Can't we prune them out somewhat perhaps by accounting transitivity (at least in special cases)? The instructions are all really mostly independent, but we apparently lose track of the fact somewhere and producing almost complette tournament apparently. Honza
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, for completeness the -O3 -fno-tree-pre -fno-tree-fre results (tree-pre/fre needs something little over 2GB of ram to converge) Execution times (seconds) garbage collection : 1.11 ( 1%) usr 0.07 ( 2%) sys 8.57 ( 5%) wall 0 kB ( 0%) ggc life analysis : 5.47 ( 4%) usr 0.12 ( 3%) sys 5.63 ( 3%) wall 2701 kB ( 1%) ggc life info update : 2.05 ( 2%) usr 0.00 ( 0%) sys 2.10 ( 1%) wall 643 kB ( 0%) ggc integration : 8.36 ( 7%) usr 0.18 ( 5%) sys 8.61 ( 5%) wall 79611 kB (27%) ggc tree CFG cleanup : 3.69 ( 3%) usr 0.00 ( 0%) sys 3.77 ( 2%) wall 20 kB ( 0%) ggc tree alias analysis : 2.64 ( 2%) usr 0.25 ( 6%) sys 3.01 ( 2%) wall 0 kB ( 0%) ggc tree SSA rewrite : 2.17 ( 2%) usr 0.02 ( 1%) sys 2.22 ( 1%) wall 21589 kB ( 7%) ggc tree SSA incremental : 4.04 ( 3%) usr 0.01 ( 0%) sys 4.10 ( 2%) wall 1061 kB ( 0%) ggc tree operand scan : 1.54 ( 1%) usr 0.54 (14%) sys 1.95 ( 1%) wall 18382 kB ( 6%) ggc dominator optimization: 2.49 ( 2%) usr 0.06 ( 2%) sys 2.61 ( 1%) wall 11262 kB ( 4%) ggc tree SRA : 3.04 ( 2%) usr 0.08 ( 2%) sys 3.12 ( 2%) wall 25600 kB ( 9%) ggc tree SSA to normal : 38.17 (31%) usr 0.09 ( 2%) sys 38.56 (21%) wall 11214 kB ( 4%) ggc dominance computation : 2.40 ( 2%) usr 0.05 ( 1%) sys 2.52 ( 1%) wall 0 kB ( 0%) ggc expand : 4.22 ( 3%) usr 0.20 ( 5%) sys 11.38 ( 6%) wall 35690 kB (12%) ggc global alloc : 13.43 (11%) usr 1.28 (32%) sys 54.13 (29%) wall 5873 kB ( 2%) ggc flow 2 : 0.37 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall 5092 kB ( 2%) ggc TOTAL : 123.25 3.98 183.52 291674 kB Note that the testcase is very different at -O3, because min/max functions are inlined breaking gigantic basic blocks into number of small BBs, so many of bottlenecks visible at -O2 go away. I duno what happens in global alloc, tree SSA to normal is the live_on_entry/live_on_exit dicussed. We also have problems with very deep recursion levels as dominator tree is deep. I am thinking about implementing iterators for walking in dom order as the current fully blown domtree walker is bit uneasy in some cases. With FRE/PRE enabled also GGC runs out of stack frame size, because some of temporary values in annotations leaks and instruct GGC to recurse insanely. Honza
Created attachment 12135 [details] patch to resolve some of the SSA to Normal slowdowns. By re-implementing the live on entry/exit code, I get the following improvement at -O3: tree SSA to normal : 32.08 (35%) usr 0.08 ( 1%) sys 32.92 (28%) wall to tree SSA to normal : 16.19 (22%) usr 0.08 ( 1%) sys 16.33 (13%) wall the remaining SSA to normal time is the fault of TER at both -O3 and -O2. I'm not so sure this is stage 3 material, but I could be convinced. I'll attach the patch, but I'll post a full breakdown of what was implemented in a note to gcc-patches. It has been bootstrapped on i686-pc-linux-gnu with no new regressions.
Created attachment 12136 [details] Patch for the remaining SSA to Normal time issues I've attached a patch to address the slowdowns in TER. Again, not sure this is stage 3, but I'll send a note to gcc-patches with the full breakdown, but basically I replaced the expression linked lists with bitmaps. This patch has been bootstrapped on 1686-pc-linux-gnu with no new regressions. at -O2 timings go from: tree SSA to normal : 30.79 (19%) usr 0.06 ( 2%) sys 31.89 (19%) wall to tree SSA to normal : 1.33 ( 1%) usr 0.02 ( 1%) sys 1.86 ( 1%) wall and at -O3: tree SSA to normal : 32.08 (35%) usr 0.08 ( 1%) sys 32.92 (28%) wall to tree SSA to normal : 18.75 (24%) usr 0.06 ( 1%) sys 18.83 (23%) wall when combined with the previous live on entry/exit patch, I get the following times at -O2 : tree SSA to normal : 30.79 (19%) usr 0.06 ( 2%) sys 31.89 (19%) wall to tree SSA to normal : 1.16 ( 1%) usr 0.01 ( 0%) sys 1.17 ( 1%) wall and at -O3: tree SSA to normal : 32.08 (35%) usr 0.08 ( 1%) sys 32.92 (28%) wall to tree SSA to normal : 2.50 ( 4%) usr 0.08 ( 1%) sys 2.61 ( 2%) wall
links to the 2 notes on gcc-patches: live range changes: http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00895.html TER changes: http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00896.html
Subject: Bug 28071 Author: amacleod Date: Mon Aug 28 17:18:33 2006 New Revision: 116511 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116511 Log: revert 116257 which is the rewrite_liverange_info patch, so be replaced with the two patches I created for bug 28071. Modified: branches/out-of-ssa-the-sequel/gcc/ChangeLog branches/out-of-ssa-the-sequel/gcc/tree-outof-ssa.c branches/out-of-ssa-the-sequel/gcc/tree-ssa-live.c branches/out-of-ssa-the-sequel/gcc/tree-ssa-live.h
Huh. I didn't realize bugzilla scanned the entire checkin message looking for bug numbers.... This has been checked in on a branch, so you can ignore the preceeding note's commentary. it's just a note to myself.
Subject: Bug 28071 Author: hubicka Date: Tue Sep 12 10:11:04 2006 New Revision: 116886 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116886 Log: PR rtl-optimization/28071 * tree-vect-transform.c (vect_create_data_ref_ptr): Kill cast. (vect_transform_loop): Likewise. * tree-vectorizer.c (new_loop_vec_info): Likewise. (new_loop_vec_info): Likewise. (destroy_loop_vec_info): Likewise. * tree-dfa.c (create_var_ann): Use GCC_CNEW. (create_stmt_ann): Likewise. (create_tree_ann): Rename to ... (create_tree_common_ann): ... this one; allocate only the common part of annotations. * tree-vn.c (set_value_handle): Use get_tree_common_ann. (get_value_handle): Likewise. * tree-ssa-pre.c (phi_translate): Delay annotation allocation for get_tree_common_ann. * tree-vectorizer.h (set_stmt_info): Take stmt annotation. (vinfo_for_stmt): Use stmt annotations. * tree-flow.h (tree_ann_common_t): New type. (tree_common_ann, get_tree_common_ann, create_tree_common_ann): New. (tree_ann, get_tree_ann, create_tree_ann): New. * tree-flow-inline.h (get_function_ann): Do more type checking. (stmt_ann): Likewise. (tree_ann): Rename to ... (tree_common_ann): ... this one; return ony common_ann (get_tree_ann): Rename to ... (tree_common_ann): This one; return only common_ann. * tree-vect-patterns.c (vect_pattern_recog_1): Update call of set_stmt_info. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-dfa.c trunk/gcc/tree-flow-inline.h trunk/gcc/tree-flow.h trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-vect-patterns.c trunk/gcc/tree-vect-transform.c trunk/gcc/tree-vectorizer.c trunk/gcc/tree-vectorizer.h trunk/gcc/tree-vn.c
Is this still a regression?
It's at least still a regression on the 4.1 branch, which still does cc1: out of memory allocating 290995744 bytes after a total of 43593728 bytes at -O1. Otherwise we have 3.4.6: 106s 4.0.3: 108s 4.1.2: OOM 4.2.0: 86s and 4.2.0 uses a lot less memory than 4.0.3. So, let's remove the 4.2 regression marker.
Created attachment 12879 [details] Patch for scheduler dependency lists. Hi, This patch introduces new dependency lists to scheduler thus making LOG_LINKs not used in the schedulers. The patch is preliminary and I will post an updated version to gcc-patches in a few days. The structure of a change: As before, we have backward dependencies (INSN_DEPS - replacement for LOG_LINKS) and forward dependencies (INSN_DEPEND). These lists consist of dep_nodes. Each dep_node has a pointer to dep_data_node which contains dependency data (data field), dep_node of the backward dep_list (back field) and dep_node of the forward dep_list (forw field). Thus we can easily get forward dep_node by the backward one and vice versa. Each dep_node also contains a pointer to the next field of the previous node in the dep_list (to the place where pointer to it is stored) making removal from the list fast and easy. Changes are mostly just a pattern replacement of macros names. Patched compiler produces exactly the same output as original (except for one small thing: removal of DEPS_LIST from rtl.def somehow results in different numbering of the registers. The same occurs if add an additional rtx description to rtl.def. Don't know why this happens, but will be glad if someone explained.) Minimal changes to the backends were introduced. 1. ia64 scheduler hook adjust_cost was restored to its original version (as in gcc 4.1) 2. ia64 and rs6000 backends were fixed to walk through the new dependency lists, which they do for their own heuristics. (no other backend do that). 3. rs6000 scheduler hook is_costly_dependency () was changed so that there'll be no need to do a compatibility transformation (as being done for adjust_cost, btw) for a hook that is implemented on a single target. The patch was bootstrapped on x86_64 and ia64. Also I've build a cross to powerpc-740. Results (on x86_64): scheduler2 is now 4s instead of 12s. Memory consumption: 11.5M instead of 48M Thanks, Maxim
(In reply to comment #55) > Created an attachment (id=12879) [edit] > Patch for scheduler dependency lists. Looks like a pretty good cleanup IMHO. Here are some comments. o dep_def: representing a dependence edge including both producer and consumer is very handy, albeit somewhat redundant as we're usually traversing all cons connected to a pro or vice versa. (I.e., has its pros and cons, but mostly pros I agree - also done in ddg.h/ddg_edge.) Maybe comment why both 'kind' and 'ds' are needed, as one supersedes the other. o dep_node_def: this is a node in a (doubly-linked) chain, but it represents an *edge* in terms of the data-dependence graph. The prev_nextp field is a "/* Pointer to the next field of the previous node in the list. */" except for the first node on the list, whose prev_nextp points to itself, right? o dep_data_node_def: holding the two conjugate dependence edges together is very useful when switching directions. But perhaps most of the accesses go in one direction (e.g. iterating over cons of a pro), and having both conjugates structed together may reduce cache efficiency. So you may consider connecting each dep_node_def to its conjugate, not necessarily forcing them to be placed adjacent in memory. o To add to the checking routines, the following can be checked: every dep_node_def is pointed-to by either its data->back xor its data->forw, right? If so, this can be used to identify if a dep_node_def is forward or backward; that all nodes on a list are forward (and share the same pro) or backward (and share the same con); and to assert the following regarding L: +/* Add a dependency described by DEP to the list L. + L should be either INSN_DEPS1 or RESOLVED_DEPS1. */ o insn_cost (insn, dep): maybe it's better to break this into insn_cost (insn) of a producer regardless of consumers, and "dep_cost (dep)". o The comment explaining what 'resolve_dep' does can be inlined together with its code. +/* Detach dep_node N from the list. */ +static void +dep_node_detach (dep_node_t n) +{ + dep_node_t *prev_nextp = DEP_NODE_PREV_NEXTP (n); + dep_node_t next = DEP_NODE_NEXT (n); + + *prev_nextp = next; + + if (next != NULL) + DEP_NODE_PREV_NEXTP (next) = prev_nextp; maybe complete the detachment by adding: DEP_NODE_PREV_NEXTP (n) = NULL; DEP_NODE_NEXT (n) = NULL; +} +/* Attach NEXT to the next field pointed to by PREV_NEXTP. */ ^^^^^^^^^^^N to appear after node X whose &DEP_NODE_NEXT (X) is given by PREV_NEXT_P +static void +dep_node_attach (dep_node_t n, dep_node_t *prev_nextp) better place +dep_node_check_p (dep_node_t n) next to +dep_nodes_check_p (dep_node_t n) +/* Make a copy of FROM in TO with substitutin consumer with CON. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^substituting consumer with CON. Ayal.
Subject: Re: [4.1 regression] A file that can not be compiled in reasonable time/space Thanks! Very useful comments. I'm continuing to work on cleaning the patch (especially on writing the comments) and making code more transparent. Below are my comments on yours: zaks at il dot ibm dot com wrote: > ------- Comment #56 from zaks at il dot ibm dot com 2007-01-15 07:19 ------- > (In reply to comment #55) >> Created an attachment (id=12879) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12879&action=view) [edit] >> Patch for scheduler dependency lists. > > Looks like a pretty good cleanup IMHO. Here are some comments. > > o dep_def: representing a dependence edge including both producer and consumer > is very handy, albeit somewhat redundant as we're usually traversing all cons > connected to a pro or vice versa. This allows us to keep all things in one place - one of the things current deps don't provide. I.e., when changing some property of the dep we need to find a corresponding to that dep nodes in both backward and forward lists and apply the change to two places instead of one. (I.e., has its pros and cons, but mostly pros > I agree - also done in ddg.h/ddg_edge.) Maybe comment why both 'kind' and 'ds' > are needed, as one supersedes the other. There will be. Thanks. > > o dep_node_def: this is a node in a (doubly-linked) chain, but it represents an > *edge* in terms of the data-dependence graph. The prev_nextp field is a "/* Right! I struggled to figure out the correct name and didn't prevail. Thanks for the tip. It'll be dep_edge. > Pointer to the next field of the previous node in the list. */" except for the > first node on the list, whose prev_nextp points to itself, right? No. Prev_nextp field of the first node points to deps_list->first. This allows us not to distinguish first node from the others. I'll fix the comment. > > o dep_data_node_def: holding the two conjugate dependence edges together is > very useful when switching directions. But perhaps most of the accesses go in > one direction (e.g. iterating over cons of a pro), and having both conjugates > structed together may reduce cache efficiency. So you may consider connecting > each dep_node_def to its conjugate, not necessarily forcing them to be placed > adjacent in memory. Dep_def and both edges were placed in one structure so that they could be allocated and freed within a single alloc/free. As I understand you propose putting two pointers inside dep_edge_def: one to the dep_def and one to the opposite edge. Currently we have one pointer in dep_edge_def to the dep_data_node which have all that pointers. And probably I'm missing something, but I don't see how your way can improve cache efficiency. > > o To add to the checking routines, the following can be checked: every > dep_node_def is pointed-to by either its data->back xor its data->forw, right? > If so, this can be used to identify if a dep_node_def is forward or backward; > that all nodes on a list are forward (and share the same pro) or backward (and > share the same con); and to assert the following regarding L: > +/* Add a dependency described by DEP to the list L. > + L should be either INSN_DEPS1 or RESOLVED_DEPS1. */ Good idea. > > o insn_cost (insn, dep): maybe it's better to break this into insn_cost (insn) > of a producer regardless of consumers, and "dep_cost (dep)". Agree. > > o The comment explaining what 'resolve_dep' does can be inlined together with > its code. Agree. > > +/* Detach dep_node N from the list. */ > +static void > +dep_node_detach (dep_node_t n) > +{ > + dep_node_t *prev_nextp = DEP_NODE_PREV_NEXTP (n); > + dep_node_t next = DEP_NODE_NEXT (n); > + > + *prev_nextp = next; > + > + if (next != NULL) > + DEP_NODE_PREV_NEXTP (next) = prev_nextp; > maybe complete the detachment by adding: > DEP_NODE_PREV_NEXTP (n) = NULL; > DEP_NODE_NEXT (n) = NULL; Probably, you are right. > Ayal. Thanks, Maxim
(In reply to comment #57) > Subject: Re: [4.1 regression] A file that can not be > compiled in reasonable time/space > Thanks! Very useful comments. I'm continuing to work on cleaning the > patch (especially on writing the comments) Enjoy! One suggestion that may help explain the data-structure, is to provide a drawing of ddn with its dep and nodes connected. > > o dep_node_def: this is a node in a (doubly-linked) chain, but it represents an > > *edge* in terms of the data-dependence graph. The prev_nextp field is a "/* > Right! I struggled to figure out the correct name and didn't prevail. > Thanks for the tip. It'll be dep_edge. Ah, on second thought, perhaps the important property of this struct is the fact that it's a link on a forward or backward chain; so how about dep_link? > > Pointer to the next field of the previous node in the list. */" except for the > > first node on the list, whose prev_nextp points to itself, right? > No. Prev_nextp field of the first node points to deps_list->first. > This allows us not to distinguish first node from the others. I'll fix > the comment. Ah, right. > > > > o dep_data_node_def: holding the two conjugate dependence edges together is > > very useful when switching directions. But perhaps most of the accesses go in > > one direction (e.g. iterating over cons of a pro), and having both conjugates > > structed together may reduce cache efficiency. So you may consider connecting > > each dep_node_def to its conjugate, not necessarily forcing them to be placed > > adjacent in memory. > Dep_def and both edges were placed in one structure so that they could > be allocated and freed within a single alloc/free. As I understand you > propose putting two pointers inside dep_edge_def: one to the dep_def and > one to the opposite edge. Currently we have one pointer in dep_edge_def > to the dep_data_node which have all that pointers. And probably I'm > missing something, but I don't see how your way can improve cache > efficiency. You're right. There's probably not much to gain if anything paying an extra pointer to save the fields of the conjugate dep_node. Perhaps only place dep_def between back and forw (been too much into struct-reorg, I guess :). It does seem wasteful to hold two 'data' pointers for such nearby offsets ... ;) And another note: INSN_DEPS may be renamed INSN_BACK_DEPS to better distinguish it from INSN_DEPEND (which in turn might be called INSN_FORW_DEPS). And maybe INSN_RESOLVED_BACK_DEPS for consistency. Ayal.
Subject: Re: [4.1 regression] A file that can not be compiled in reasonable time/space Hi, just as heads up, the early inlining change made inliner to now fully inline to the function at -O2 (orignally we stopped because of inline unit growth doing just few of inlines). This enables more optimizations and reduces memory usage of all other passes except for scheduler, that increases. So we have roughly peak of 60MB GGC memory without scheduling, 360MB with scheduling, so this patch would be even more greatly appreciated ;) http://www.suse.de/~aj/SPEC/amd64/memory/pr28071-O2.rep Honza
Hi, small update on status. At -O3 -fno-tree-fre -fno-tree-pre we are now doing 1.1GB footprint, 800MB of this out of gimple. We still explode in FRE/PRE but majority of other problems was fixed: Execution times (seconds) garbage collection : 18.23 (12%) usr 0.04 ( 1%) sys 18.46 (10%) wall 0 kB ( 0%) ggc callgraph construction: 10.31 ( 7%) usr 0.04 ( 1%) sys 10.36 ( 5%) wall 2296 kB ( 0%) ggc life analysis : 4.08 ( 3%) usr 0.16 ( 3%) sys 4.26 ( 2%) wall 7350 kB ( 2%) ggc inline heuristics : 10.46 ( 7%) usr 0.12 ( 2%) sys 10.57 ( 6%) wall 2438 kB ( 1%) ggc integration : 16.48 (11%) usr 0.46 ( 9%) sys 17.00 ( 9%) wall 143049 kB (29%) ggc tree CFG cleanup : 4.69 ( 3%) usr 0.00 ( 0%) sys 4.69 ( 2%) wall 0 kB ( 0%) ggc tree SSA incremental : 2.32 ( 2%) usr 0.40 ( 8%) sys 2.76 ( 1%) wall 3276 kB ( 1%) ggc tree operand scan : 1.42 ( 1%) usr 0.22 ( 4%) sys 1.54 ( 1%) wall 27071 kB ( 6%) ggc dominator optimization: 2.25 ( 2%) usr 0.00 ( 0%) sys 2.24 ( 1%) wall 14657 kB ( 3%) ggc tree split crit edges : 0.39 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall 17558 kB ( 4%) ggc tree SSA to normal : 8.06 ( 5%) usr 0.40 ( 8%) sys 8.51 ( 4%) wall 22874 kB ( 5%) ggc expand : 3.83 ( 3%) usr 0.69 (14%) sys 38.08 (20%) wall 54312 kB (11%) ggc forward prop : 3.20 ( 2%) usr 0.82 (16%) sys 4.22 ( 2%) wall 2470 kB ( 1%) ggc if-conversion : 6.37 ( 4%) usr 0.00 ( 0%) sys 6.41 ( 3%) wall 9157 kB ( 2%) ggc global alloc : 12.12 ( 8%) usr 0.94 (19%) sys 15.48 ( 8%) wall 18801 kB ( 4%) ggc TOTAL : 147.90 5.02 191.03 486834 kB We get considerable usage in bitmaps (just those over 100MB of peak memory usage are listed): df-problems.c:2957 (df_chain_create_bb) 208MB df-problems.c:986 (df_rd_alloc) 207MB df-problems.c:987 (df_rd_alloc) 110MB tree-ssa-live.c:534 (new_tree_live_info) 110MB tree-ssa-live.c:538 (new_tree_live_info) 110MB At least 100MB, but probably more is consumed by the new linked lists used by scheduler. Hopefully this can be tracked by moving everyting to allocpools. I will send -O2 in separate post. Honza
Also forgot to mention, integration is slow because of the split_block quadraticness. For -O2: We need 531MB of ram, GGC memory is peaking at 100MB, large portion of the non-GGC memory are definitly the scheduler dependency lists. xecution times (seconds) garbage collection : 14.26 ( 5%) usr 0.03 ( 1%) sys 14.27 ( 5%) wall 0 kB ( 0%) ggc life analysis : 73.96 (24%) usr 1.55 (46%) sys 75.52 (24%) wall 7207 kB ( 2%) ggc alias analysis : 0.92 ( 0%) usr 0.00 ( 0%) sys 0.87 ( 0%) wall 8530 kB ( 3%) ggc inline heuristics : 11.64 ( 4%) usr 0.12 ( 4%) sys 11.77 ( 4%) wall 2695 kB ( 1%) ggc integration : 16.71 ( 5%) usr 0.19 ( 6%) sys 16.91 ( 5%) wall 69808 kB (21%) ggc tree gimplify : 0.49 ( 0%) usr 0.07 ( 2%) sys 0.58 ( 0%) wall 14977 kB ( 4%) ggc tree operand scan : 1.25 ( 0%) usr 0.11 ( 3%) sys 1.29 ( 0%) wall 20889 kB ( 6%) ggc tree SRA : 1.20 ( 0%) usr 0.07 ( 2%) sys 1.37 ( 0%) wall 40364 kB (12%) ggc tree FRE : 1.14 ( 0%) usr 0.07 ( 2%) sys 1.21 ( 0%) wall 9230 kB ( 3%) ggc expand : 3.29 ( 1%) usr 0.10 ( 3%) sys 3.39 ( 1%) wall 45828 kB (14%) ggc PRE : 21.54 ( 7%) usr 0.00 ( 0%) sys 21.54 ( 7%) wall 898 kB ( 0%) ggc regmove : 93.59 (30%) usr 0.05 ( 1%) sys 93.64 (30%) wall 156 kB ( 0%) ggc local alloc : 5.34 ( 2%) usr 0.00 ( 0%) sys 5.33 ( 2%) wall 2838 kB ( 1%) ggc global alloc : 4.25 ( 1%) usr 0.06 ( 2%) sys 4.30 ( 1%) wall 19946 kB ( 6%) ggc reload CSE regs : 4.09 ( 1%) usr 0.00 ( 0%) sys 4.11 ( 1%) wall 11354 kB ( 3%) ggc scheduling 2 : 16.97 ( 6%) usr 0.44 (13%) sys 17.53 ( 6%) wall 20069 kB ( 6%) ggc TOTAL : 308.36 3.39 312.58 334207 kB total: 531915 kB regmove has the quadratic loop issues I added param for earliler in the track, but the parameter is now apparently bit too large since rest of compiler is a lot faster. Scheduler/out-of-SSA slowness is gone. There are no overly large bitmaps, one large allocpool: df_scan_ref pool 18 74449440 67061984 0 Looks like we are in pretty good shape on this one, only IMO important problems being the slowness of life (hopefully fixed by DFA) and memory houngryness of scheduler. Honza
dataflow branch cannot complete this at -O3 -fno-tree-pre -fno-tree-fre
Subject: Bug 28071 Author: mkuvyrkov Date: Mon Apr 16 16:04:18 2007 New Revision: 123874 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123874 Log: PR middle-end/28071 * sched-int.h (struct deps): Split field 'pending_lists_length' into 'pending_read_list_length' and 'pending_write_list_length'. Update comment. * sched-deps.c (add_insn_mem_dependence): Change signature. Update to handle two length counters instead of one. Update all uses. (flush_pending_lists, sched_analyze_1, init_deps): Update to handle two length counters instead of one. * sched-rgn.c (propagate_deps): Update to handle two length counters instead of one. Modified: trunk/gcc/ChangeLog trunk/gcc/sched-deps.c trunk/gcc/sched-int.h trunk/gcc/sched-rgn.c
(In reply to comment #63) Scheduler memory hungryness should be fixed by the above commit.
I can confirm that at -O2, memory consumption dropped from 0.5GB to 0.28GB, that is indeed good improvement. To summarize http://www.suse.de/~gcctest/memory/results/200704171438/pr28071-O2.rep Compile time wise major offenders are: PRE : 259.18 (34%) usr 0.00 ( 0%) sys 259.18 (34%) wall 1421 kB ( 1%) ggc scheduling 2 : 366.76 (49%) usr 0.00 ( 0%) sys 366.82 (49%) wall 3062 kB ( 1%) ggc There is a lot of non-GGC memory. Major allocpool offender is: df_scan_ref pool 36 130400160 58647984 0 d bitmaps: tree-ssa-pre.c:549 (bitmap_set_new) 95283 14667640 8814400 8798320 9704128 reload1.c:518 (new_insn_chain) 90286 8425760 8425760 8425760 761 tree-ssa-pre.c:548 (bitmap_set_new) 95283 20190640 9860640 9826200 3268384 tree-ssa-structalias.c:879 (add_pred_grap 94816 7585280 7585280 7585280 189632 Thanks, Honza
Subject: Re: [4.1 regression] A file that can not be compiled in reasonable time/space Just to add some explanation to the numbers, df_scan_ref_pool is 50MB, the bitmaps quoted are 8MB each. Given nature of the testcase, I think we are doing satisfactory job at -O2. At -O3 there are still problems (the testcase -O2 has one huge BB, at -O3 we have many BBs). PRE explode completely and we need over 1.2GB for -O3 -fno-tree-pre -fno-tree-fre. What is also killing us at -O3 are the bitmaps. 385MB: df-problems.c:2951 (df_chain_create_bb) 40198 386574160 385195560 385195560 462958 200MB f-problems.c:984 (df_rd_alloc) 40198 385290320 208450840 0 0 110MB df-problems.c:985 (df_rd_alloc) 40198 201714640 110324160 0 0 tree-ssa-live.c:540 (new_tree_live_info) 31939 114031520 113098360 0 84523 tree-ssa-live.c:536 (new_tree_live_info) 31939 113096920 113092320 0 80895 Honza
Will not be fixed in 4.2.0; retargeting at 4.2.1.
Audit trail shows that this isn't a problem with 4.2. Target -> 4.1.3?
Change target milestone to 4.2.3, as 4.2.2 has been released.
> Audit trail shows that this isn't a problem with 4.2. Target -> 4.1.3? Yes, this has been fixed in the 4.2 series according to comment #54.
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>: https://gcc.gnu.org/g:095eb138f736d94dabf9a07a6671bd351be0e66a commit r14-2851-g095eb138f736d94dabf9a07a6671bd351be0e66a Author: Roger Sayle <roger@nextmovesoftware.com> Date: Fri Jul 28 09:39:46 2023 +0100 PR rtl-optimization/110587: Reduce useless moves in compile-time hog. This patch is one of a series of fixes for PR rtl-optimization/110587, a compile-time regression with -O0, that attempts to address the underlying cause. As noted previously, the pathological test case pr28071.c contains a large number of useless register-to-register moves that can produce quadratic behaviour (in LRA). These moves are generated during RTL expansion in emit_group_load_1, where the middle-end attempts to simplify the source before calling extract_bit_field. This is reasonable if the source is a complex expression (from before the tree-ssa optimizers), or a SUBREG, or a hard register, but it's not particularly useful to copy a pseudo register into a new pseudo register. This patch eliminates that redundancy. The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains 777K lines, with this patch it contains 717K lines, i.e. saving about 60K lines (admittedly of debugging text output, but it makes the point). 2023-07-28 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register.