The 64bit-out.go and go.test/test/cmplxdivide.go often time out on Solaris/SPARC: On unloaded machines, I find for cmplxdivide.go: Solaris 11, Sun Fire V890, 1.35 GHz UltraSPARC-IV: real 1:07.33 user 1:02.18 sys 0.64 Solaris 8, Sun Enterprise T5220, 1.2 GHz UltraSPARC-T2: real 2:09.40 user 2:07.73 sys 0.63 The latter is too close to the default 5 min timeout. It's similar for 64bit-out.go: real 1:13.68 user 1:07.82 sys 0.79 vs. real 2:17.81 user 2:16.11 sys 1.14 Rainer
Interestingly, the time for cmpldivide.go on SPARC appears to be primarily in the register allocator while compiling. This is true even though no -O option is used. Actually running the program after it has been compiled takes less than a second. Execution times (seconds) phase setup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 109 kB ( 0%) ggc phase parsing : 0.72 ( 1%) usr 0.04 ( 6%) sys 0.77 ( 1%) wall 8 kB ( 0%) ggc phase generate : 118.51 (99%) usr 0.67 (93%) sys 119.17 (99%) wall 54226 kB (100%) ggc callgraph construction : 0.09 ( 0%) usr 0.01 ( 1%) sys 0.09 ( 0%) wall 1806 kB ( 3%) ggc callgraph optimization : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 3 kB ( 0%) ggc cfg cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc trivially dead code : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.43 ( 0%) wall 0 kB ( 0%) ggc df scan insns : 0.39 ( 0%) usr 0.08 (11%) sys 0.47 ( 0%) wall 0 kB ( 0%) ggc df live regs : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 0.39 ( 0%) usr 0.02 ( 3%) sys 0.41 ( 0%) wall 1261 kB ( 2%) ggc register information : 52.00 (44%) usr 0.00 ( 0%) sys 52.00 (43%) wall 0 kB ( 0%) ggc alias analysis : 0.20 ( 0%) usr 0.01 ( 1%) sys 0.21 ( 0%) wall 1026 kB ( 2%) ggc rebuild jump labels : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 0 kB ( 0%) ggc parser (global) : 0.72 ( 1%) usr 0.04 ( 6%) sys 0.77 ( 1%) wall 8 kB ( 0%) ggc inline heuristics : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 4 kB ( 0%) ggc tree gimplify : 0.38 ( 0%) usr 0.02 ( 3%) sys 0.41 ( 0%) wall 5832 kB (11%) ggc tree eh : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 5 kB ( 0%) ggc tree CFG construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 10 kB ( 0%) ggc tree find ref. vars : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 548 kB ( 1%) ggc tree PHI insertion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1 kB ( 0%) ggc tree SSA rewrite : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 1131 kB ( 2%) ggc tree SSA other : 0.15 ( 0%) usr 0.05 ( 7%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc tree operand scan : 0.07 ( 0%) usr 0.02 ( 3%) sys 0.17 ( 0%) wall 673 kB ( 1%) ggc tree STMT verifier : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc out of ssa : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc expand vars : 0.08 ( 0%) usr 0.03 ( 4%) sys 0.10 ( 0%) wall 1535 kB ( 3%) ggc expand : 1.24 ( 1%) usr 0.04 ( 6%) sys 1.29 ( 1%) wall 12793 kB (24%) ggc post expand cleanups : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 5 kB ( 0%) ggc integrated RA : 50.16 (42%) usr 0.20 (28%) sys 50.35 (42%) wall 12377 kB (23%) ggc reload : 8.03 ( 7%) usr 0.17 (24%) sys 8.19 ( 7%) wall 13804 kB (25%) ggc thread pro- & epilogue : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 4 kB ( 0%) ggc final : 2.48 ( 2%) usr 0.02 ( 3%) sys 2.50 ( 2%) wall 9 kB ( 0%) ggc rest of compilation : 0.98 ( 1%) usr 0.00 ( 0%) sys 1.02 ( 1%) wall 31 kB ( 0%) ggc unaccounted todo : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 119.24 0.72 119.96 54344 kB real 2m2.183s user 2m0.976s sys 0m1.074s
SPARC register allocator slowness filed as PR 53125.
The 64bit-out.go case appears to be similar. It is also a generated file, and it also takes a long time to compile. The register allocator is not quite as dominant, only 43% of compilation time. In any case I will revisit 64bit-out when and if cmplxdivide is fixed.
This also happens intermittently on my s390x development machine (a zEC12) with the current 5.0 development trunk. (In reply to Ian Lance Taylor from comment #1) > Interestingly, the time for cmpldivide.go on SPARC appears to be primarily > in the register allocator while compiling. To be specific: LRA hard reg assignment : 217.88 (95%) usr 0.29 (74%) sys 218.24 (95%) wall 0 kB ( 0%) ggc > This is true even though no -O option is used. Actually, on s390x it does not happen -- Observation ----------- Compile time of the test is normally about 4 minutes, but I've seen ~3:50 as well as ~4:45. When the machine is slow for some reason (probably does not matter why), compile time may become more than 5 minutes and therefore the test times out. Explanation ----------- The test defines a long array of structures with three complex numbers in cmplxdivide1.go: var tests = []Test{ Test{complex(0, 0), complex(0, 0), complex(-nan, -nan)}, Test{complex(0, 0), complex(0, 1), complex(0, 0)}, ... } The constants like "nan" map to exported symbols of the math package (unlike C where this would probably be done with macros): "nan" appears in the code as "math.NaN@plt". With dynamic linkage the actual value is unknown at compile time, and the structure "tests" is initialised in the init function of the main package. Compiling with -O0, the executable is about 1.5 MB, and more than 90% of that is code in the init function. For each line in the table the assembler instuctions to initialise is consume about 420 bytes. As far as I was told, the register allocation code has some trouble with huge basic blocks of simple code like in this case, when the number of possibilities explodes. Note: With -O3, the code compiles in less than two seconds, probably because the code in the init function is reduced drastically before the expensive register allocation pass.
(In reply to Ian Lance Taylor from comment #3) > The 64bit-out.go case appears to be similar. It is also a generated file, > and it also takes a long time to compile. The register allocator is not > quite as dominant, only 43% of compilation time. In any case I will revisit > 64bit-out when and if cmplxdivide is fixed. Has cmplxdivide been fixed yet?
(In reply to Eric Gallager from comment #5) > (In reply to Ian Lance Taylor from comment #3) > > The 64bit-out.go case appears to be similar. It is also a generated file, > > and it also takes a long time to compile. The register allocator is not > > quite as dominant, only 43% of compilation time. In any case I will revisit > > 64bit-out when and if cmplxdivide is fixed. > > Has cmplxdivide been fixed yet? No reply; changing to SUSPENDED since this isn't really a case where closing as INVALID (due to lack of response) is applicable
Unfortunately, the issue persists on current trunk: 32-bit-default gccgo, sparc-sun-solaris2.11 (SPARC S7-2): real 3:07.71 user 3:05.92 sys 0.89 32-bit-default gccgo, sparc-sun-solaris2.11 (SPARC T8-1): real 1:18.05 user 1:16.50 sys 0.38 Given that the compile time is close to the default limit (5m) even on an unloaded machine, the test is almost guaranteed to FAIL under load. S7-2 -ftime-report output: Time variable usr sys wall GGC phase parsing : 0.59 ( 0%) 0.02 ( 2%) 0.61 ( 0%) 2575k ( 2%) phase opt and generate : 145.88 (100%) 0.79 ( 96%) 146.88 (100%) 113M ( 98%) phase last asm : 0.02 ( 0%) 0.01 ( 1%) 0.02 ( 0%) 230k ( 0%) garbage collection : 0.37 ( 0%) 0.04 ( 5%) 0.56 ( 0%) 0 ( 0%) dump files : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) callgraph construction : 0.06 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 1070k ( 1%) callgraph optimization : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) callgraph ipa passes : 1.93 ( 1%) 0.06 ( 7%) 1.99 ( 1%) 2573k ( 2%) ipa dead code removal : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) ipa inlining heuristics : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) cfg construction : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 8896 ( 0%) cfg cleanup : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 176 ( 0%) CFG verifier : 2.58 ( 2%) 0.00 ( 0%) 2.60 ( 2%) 0 ( 0%) trivially dead code : 0.22 ( 0%) 0.00 ( 0%) 0.22 ( 0%) 0 ( 0%) df scan insns : 0.11 ( 0%) 0.01 ( 1%) 0.11 ( 0%) 7392 ( 0%) df live regs : 0.15 ( 0%) 0.00 ( 0%) 0.14 ( 0%) 0 ( 0%) df reg dead/unused notes : 0.17 ( 0%) 0.00 ( 0%) 0.17 ( 0%) 2353k ( 2%) register information : 0.04 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 ( 0%) alias analysis : 0.17 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 2053k ( 2%) rebuild jump labels : 0.09 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 0 ( 0%) parser (global) : 0.59 ( 0%) 0.02 ( 2%) 0.61 ( 0%) 2574k ( 2%) inline parameters : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 4768 ( 0%) tree gimplify : 0.17 ( 0%) 0.00 ( 0%) 0.17 ( 0%) 7786k ( 7%) tree eh : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 8960 ( 0%) tree CFG construction : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 60k ( 0%) tree CFG cleanup : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) tree SSA other : 0.00 ( 0%) 0.01 ( 1%) 0.00 ( 0%) 0 ( 0%) tree SSA rewrite : 0.04 ( 0%) 0.03 ( 4%) 0.08 ( 0%) 512k ( 0%) tree SSA incremental : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 480k ( 0%) tree operand scan : 0.16 ( 0%) 0.05 ( 6%) 0.18 ( 0%) 2205k ( 2%) tree SSA verifier : 1.09 ( 1%) 0.00 ( 0%) 1.12 ( 1%) 0 ( 0%) tree STMT verifier : 1.57 ( 1%) 0.00 ( 0%) 1.59 ( 1%) 0 ( 0%) callgraph verifier : 0.07 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 ( 0%) dominance computation : 0.03 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 ( 0%) out of ssa : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 4472 ( 0%) expand vars : 0.03 ( 0%) 0.01 ( 1%) 0.05 ( 0%) 2955k ( 2%) expand : 0.49 ( 0%) 0.01 ( 1%) 0.51 ( 0%) 26M ( 23%) post expand cleanups : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 12k ( 0%) integrated RA : 48.45 ( 33%) 0.05 ( 6%) 48.48 ( 33%) 24M ( 21%) LRA non-specific : 4.00 ( 3%) 0.03 ( 4%) 3.99 ( 3%) 27M ( 24%) LRA virtuals elimination : 0.26 ( 0%) 0.00 ( 0%) 0.28 ( 0%) 45k ( 0%) LRA reload inheritance : 1.85 ( 1%) 0.01 ( 1%) 1.88 ( 1%) 6286k ( 5%) LRA create live ranges : 2.49 ( 2%) 0.00 ( 0%) 2.49 ( 2%) 2592k ( 2%) LRA hard reg assignment : 74.81 ( 51%) 0.51 ( 62%) 75.34 ( 51%) 0 ( 0%) reload : 0.03 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 792 ( 0%) thread pro- & epilogue : 0.41 ( 0%) 0.00 ( 0%) 0.41 ( 0%) 28k ( 0%) shorten branches : 0.22 ( 0%) 0.00 ( 0%) 0.22 ( 0%) 662k ( 1%) final : 0.88 ( 1%) 0.01 ( 1%) 0.91 ( 1%) 1751k ( 1%) symout : 0.02 ( 0%) 0.01 ( 1%) 0.02 ( 0%) 253k ( 0%) initialize rtl : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 5232 ( 0%) access analysis : 0.08 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 481k ( 0%) rest of compilation : 0.36 ( 0%) 0.02 ( 2%) 0.37 ( 0%) 4663k ( 4%) verify RTL sharing : 4.14 ( 3%) 0.00 ( 0%) 4.15 ( 3%) 0 ( 0%) TOTAL : 146.49 0.82 147.51 116M Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks.