This test is failing more often, so I thought I would bring back a PR. On hppa2.0w-hp-hpux11.11 (dual 750 MHz, 8 GB memory): WARNING: program timed out. FAIL: gcc.c-torture/compile/20001226-1.c -O2 (test for excess errors) WARNING: program timed out. FAIL: gcc.c-torture/compile/20001226-1.c -O3 -fomit-frame-pointer (test for ex cess errors) WARNING: program timed out. FAIL: gcc.c-torture/compile/20001226-1.c -O3 -g (test for excess errors) WARNING: program timed out. FAIL: gcc.c-torture/compile/20001226-1.c -Os (test for excess errors) On i686-pc-linux-gnu (2 GHz, 1GB memory): WARNING: program timed out. FAIL: gcc.c-torture/compile/20001226-1.c (test for excess errors)
Subject: Re: New: gcc.c-torture/compile/20001226-1.c times out > On i686-pc-linux-gnu (2 GHz, 1GB memory): Oops, that should have been 3.2 GHz. Dave
Subject: Bug 28614 Author: sje Date: Fri Dec 5 17:04:27 2008 New Revision: 142485 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=142485 Log: PR other/28614 * gcc.c-torture/compile/20001226-1.c: Add dg-timeout-factor. * g++.dg/torture/pr31863.C: Ditto. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.dg/torture/pr31863.C trunk/gcc/testsuite/gcc.c-torture/compile/20001226-1.c
Dave, I added a dg-timeout-factor to this test for HPPA so it shouldn't time out on PA boxes anymore. I hadn't noticed the x86 timeout part of the report when I first looked at it. Do you still have that problem? If so we can change the test to increase the timeout there too or, if you don't have that problem anymore we can just close this bug out since it should be fixed on PA now.
Subject: Re: gcc.c-torture/compile/20001226-1.c times out > ------- Comment #3 from sje at cup dot hp dot com 2008-12-05 17:16 ------- > Dave, I added a dg-timeout-factor to this test for HPPA so it shouldn't time > out on PA boxes anymore. I hadn't noticed the x86 timeout part of the report > when I first looked at it. Do you still have that problem? If so we can > change the test to increase the timeout there too or, if you don't have that > problem anymore we can just close this bug out since it should be fixed on PA > now. I haven't done a GCC build recently on this x86 system but I don't believe there is a timeout problem with this test on most x86 systems. Instead of just arbitrarily increasing the timeout values, I tend to think timeout values should be scaled based on the time for some base compilation. It also would be nice to keep execution time records for certain benchmark tests to that it might be possible to detect regressions in compilation speed. Dave
FRE seems to take most of the time on this testcase.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:72ae1e5635648bd3f6a5760ca46d531ad1f2c6b1 commit r13-5966-g72ae1e5635648bd3f6a5760ca46d531ad1f2c6b1 Author: Richard Biener <rguenther@suse.de> Date: Mon Feb 13 14:41:24 2023 +0100 tree-optimization/28614 - high FRE time for gcc.c-torture/compile/20001226-1.c I noticed that for gcc.c-torture/compile/20001226-1.c even -O1 has around 50% of the compile-time accounted to FRE. That's because we have blocks with a high incoming edge count and can_track_predicate_on_edge visits all of them even though it could stop after the second. The function is also called repeatedly for the same edge. The following fixes this and reduces the FRE time to 1% on the testcase. PR tree-optimization/28614 * tree-ssa-sccvn.cc (can_track_predicate_on_edge): Avoid walking all edges in most cases. (vn_nary_op_insert_pieces_predicated): Avoid repeated calls to can_track_predicate_on_edge unless checking is enabled. (process_bb): Instead call it once here for each edge we register possibly multiple predicates on.
For what it's worth, some changes after 2024-09-20's commit r15-3743-g2828ec526eaf5612178b62d48bfd8443c7ecd674 appear to have regressed this for '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'; with '--enable-checking=yes,extra,rtl') -- but for '-O1' only: PASS: gcc.c-torture/compile/20001226-1.c -O0 (test for excess errors) {+WARNING: gcc.c-torture/compile/20001226-1.c -O1 (test for excess errors) program timed out.+} [-PASS:-]{+FAIL:+} gcc.c-torture/compile/20001226-1.c -O1 (test for excess errors) PASS: gcc.c-torture/compile/20001226-1.c -O2 (test for excess errors) PASS: gcc.c-torture/compile/20001226-1.c -O3 -g (test for excess errors) PASS: gcc.c-torture/compile/20001226-1.c -Os (test for excess errors) We've got, for example: $ \time [...] -O0 [...] 10.69user 0.45system 0:11.14elapsed 99%CPU (0avgtext+0avgdata 361912maxresident)k 0inputs+18200outputs (0major+123361minor)pagefaults 0swaps $ \time [...] -O1 [...] [manually terminated after 19.75 h 100 % CPU usage] xgcc: fatal error: Terminated signal terminated program cc1 compilation terminated. Command exited with non-zero status 1 70873.77user 112.37system 19:43:29elapsed 99%CPU (0avgtext+0avgdata 1142652maxresident)k 0inputs+0outputs (0major+34722283minor)pagefaults 0swaps $ \time [...] -O2 [...] 41.28user 0.28system 0:41.57elapsed 99%CPU (0avgtext+0avgdata 210248maxresident)k 0inputs+2864outputs (0major+83144minor)pagefaults 0swaps I've not further analyzed where the time is being spent in 'cc1'. At 2024-09-20's commit r15-3743-g2828ec526eaf5612178b62d48bfd8443c7ecd674 we had, for example: $ \time [...] -O0 [...] 10.75user 0.47system 0:11.23elapsed 99%CPU (0avgtext+0avgdata 362048maxresident)k 0inputs+18200outputs (0major+123444minor)pagefaults 0swaps $ \time [...] -O1 [...] 10.27user 0.31system 0:10.58elapsed 99%CPU (0avgtext+0avgdata 194036maxresident)k 0inputs+3760outputs (0major+71795minor)pagefaults 0swaps $ \time [...] -O2 [...] 31.45user 0.29system 0:31.75elapsed 99%CPU (0avgtext+0avgdata 224988maxresident)k 0inputs+2912outputs (0major+77431minor)pagefaults 0swaps ..., so 10 s for '-O1'.
(In reply to myself from comment #7) > for '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'; > with '--enable-checking=yes,extra,rtl') -- but for '-O1' only: > > PASS: gcc.c-torture/compile/20001226-1.c -O0 (test for excess errors) > {+WARNING: gcc.c-torture/compile/20001226-1.c -O1 (test for excess > errors) program timed out.+} > [-PASS:-]{+FAIL:+} gcc.c-torture/compile/20001226-1.c -O1 (test for > excess errors) > PASS: gcc.c-torture/compile/20001226-1.c -O2 (test for excess errors) > PASS: gcc.c-torture/compile/20001226-1.c -O3 -g (test for excess > errors) > PASS: gcc.c-torture/compile/20001226-1.c -Os (test for excess errors) > $ \time [...] -O1 [...] > [manually terminated after 19.75 h 100 % CPU usage] > xgcc: fatal error: Terminated signal terminated program cc1 > compilation terminated. > Command exited with non-zero status 1 Per 'git bisect', this is due to PR114855 commit r15-3896-g942bbb2357656019caa3f8ebd2d23b09222f039a "tree-optimization/114855 - speed up dom_oracle::register_transitives". > I've not further analyzed where the time is being spent in 'cc1'. > At 2024-09-20's commit r15-3743-g2828ec526eaf5612178b62d48bfd8443c7ecd674 we > had, for example [...] 10 s for '-O1'.
On Fri, 4 Oct 2024, tschwinge at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28614 > > --- Comment #8 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- > (In reply to myself from comment #7) > > for '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'; > > with '--enable-checking=yes,extra,rtl') -- but for '-O1' only: > > > > PASS: gcc.c-torture/compile/20001226-1.c -O0 (test for excess errors) > > {+WARNING: gcc.c-torture/compile/20001226-1.c -O1 (test for excess > > errors) program timed out.+} > > [-PASS:-]{+FAIL:+} gcc.c-torture/compile/20001226-1.c -O1 (test for > > excess errors) > > PASS: gcc.c-torture/compile/20001226-1.c -O2 (test for excess errors) > > PASS: gcc.c-torture/compile/20001226-1.c -O3 -g (test for excess > > errors) > > PASS: gcc.c-torture/compile/20001226-1.c -Os (test for excess errors) > > > $ \time [...] -O1 [...] > > [manually terminated after 19.75 h 100 % CPU usage] > > xgcc: fatal error: Terminated signal terminated program cc1 > > compilation terminated. > > Command exited with non-zero status 1 > > Per 'git bisect', this is due to PR114855 commit > r15-3896-g942bbb2357656019caa3f8ebd2d23b09222f039a "tree-optimization/114855 - > speed up dom_oracle::register_transitives". > > > I've not further analyzed where the time is being spent in 'cc1'. So does it actually spend the time in dom_oracle? I would guess the change leaves around more of the redundant compare-and-jumps in the testcase so we hit a problem later?
AMD GCN result (cc1 -O1) sampled with 'perf', aborting after 10½min: Samples: 2M of event 'cpu_core/cycles/Pu', Event count (approx.): 2374461441070, Thread: cc1, DSO: cc1 Overhead Com Symbol 26,65% cc1 [.] assign_by_spills() 22,36% cc1 [.] bitmap_set_bit(bitmap_head*, int) 20,02% cc1 [.] update_lives(int, bool) 17,91% cc1 [.] bitmap_clear_bit(bitmap_head*, int) 5,73% cc1 [.] insert_in_live_range_start_chain(int) 1,83% cc1 [.] find_hard_regno_for_1(int, int*, int, bool, HARD_REG_SET) 0,84% cc1 [.] process_bb_lives(basic_block_def*, int&, bool) 0,75% cc1 [.] lra_spill() the assign_by_spills is dominated by if (hard_regno < 0 && reload_p) hard_regno = spill_for (regno, &all_spilled_pseudos, iter == 1); namely in spill_for, the for (r = lra_reg_info[spill_regno].live_ranges; r != NULL; r = r->next) { ... if (r2->regno >= lra_constraint_new_regno_start) sparseset_set_bit (live_range_reload_inheritance_pseudos, r2->regno); dominates.