Bug 52357 - 64bit-out.go and go.test/test/cmplxdivide.go time out on Solaris/SPARC
Summary: 64bit-out.go and go.test/test/cmplxdivide.go time out on Solaris/SPARC
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: go (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: ---
Assignee: Ian Lance Taylor
URL:
Keywords:
Depends on: 53125
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-23 17:07 UTC by Rainer Orth
Modified: 2024-04-04 12:20 UTC (History)
2 users (show)

See Also:
Host: sparc-sun-solaris2*
Target: sparc-sun-solaris2*
Build: sparc-sun-solaris2*
Known to work:
Known to fail:
Last reconfirmed: 2012-04-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Orth 2012-02-23 17:07:57 UTC
The 64bit-out.go and go.test/test/cmplxdivide.go often time out on Solaris/SPARC:

On unloaded machines, I find for cmplxdivide.go:

Solaris 11, Sun Fire V890, 1.35 GHz UltraSPARC-IV:

real        1:07.33
user        1:02.18
sys            0.64

Solaris 8, Sun Enterprise T5220, 1.2 GHz UltraSPARC-T2:

real     2:09.40
user     2:07.73
sys         0.63

The latter is too close to the default 5 min timeout.

It's similar for 64bit-out.go:

real        1:13.68
user        1:07.82
sys            0.79

vs.

real     2:17.81
user     2:16.11
sys         1.14

  Rainer
Comment 1 Ian Lance Taylor 2012-04-25 17:39:35 UTC
Interestingly, the time for cmpldivide.go on SPARC appears to be primarily in the register allocator while compiling.  This is true even though no -O option is used.  Actually running the program after it has been compiled takes less than a second.


Execution times (seconds)
 phase setup             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     109 kB ( 0%) ggc
 phase parsing           :   0.72 ( 1%) usr   0.04 ( 6%) sys   0.77 ( 1%) wall       8 kB ( 0%) ggc
 phase generate          : 118.51 (99%) usr   0.67 (93%) sys 119.17 (99%) wall   54226 kB (100%) ggc
 callgraph construction  :   0.09 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 0%) wall    1806 kB ( 3%) ggc
 callgraph optimization  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       3 kB ( 0%) ggc
 cfg cleanup             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 trivially dead code     :   0.42 ( 0%) usr   0.00 ( 0%) sys   0.43 ( 0%) wall       0 kB ( 0%) ggc
 df scan insns           :   0.39 ( 0%) usr   0.08 (11%) sys   0.47 ( 0%) wall       0 kB ( 0%) ggc
 df live regs            :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.39 ( 0%) usr   0.02 ( 3%) sys   0.41 ( 0%) wall    1261 kB ( 2%) ggc
 register information    :  52.00 (44%) usr   0.00 ( 0%) sys  52.00 (43%) wall       0 kB ( 0%) ggc
 alias analysis          :   0.20 ( 0%) usr   0.01 ( 1%) sys   0.21 ( 0%) wall    1026 kB ( 2%) ggc
 rebuild jump labels     :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall       0 kB ( 0%) ggc
 parser (global)         :   0.72 ( 1%) usr   0.04 ( 6%) sys   0.77 ( 1%) wall       8 kB ( 0%) ggc
 inline heuristics       :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       4 kB ( 0%) ggc
 tree gimplify           :   0.38 ( 0%) usr   0.02 ( 3%) sys   0.41 ( 0%) wall    5832 kB (11%) ggc
 tree eh                 :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       5 kB ( 0%) ggc
 tree CFG construction   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      10 kB ( 0%) ggc
 tree find ref. vars     :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     548 kB ( 1%) ggc
 tree PHI insertion      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       1 kB ( 0%) ggc
 tree SSA rewrite        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    1131 kB ( 2%) ggc
 tree SSA other          :   0.15 ( 0%) usr   0.05 ( 7%) sys   0.13 ( 0%) wall       0 kB ( 0%) ggc
 tree operand scan       :   0.07 ( 0%) usr   0.02 ( 3%) sys   0.17 ( 0%) wall     673 kB ( 1%) ggc
 tree STMT verifier      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 out of ssa              :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 expand vars             :   0.08 ( 0%) usr   0.03 ( 4%) sys   0.10 ( 0%) wall    1535 kB ( 3%) ggc
 expand                  :   1.24 ( 1%) usr   0.04 ( 6%) sys   1.29 ( 1%) wall   12793 kB (24%) ggc
 post expand cleanups    :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       5 kB ( 0%) ggc
 integrated RA           :  50.16 (42%) usr   0.20 (28%) sys  50.35 (42%) wall   12377 kB (23%) ggc
 reload                  :   8.03 ( 7%) usr   0.17 (24%) sys   8.19 ( 7%) wall   13804 kB (25%) ggc
 thread pro- & epilogue  :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall       4 kB ( 0%) ggc
 final                   :   2.48 ( 2%) usr   0.02 ( 3%) sys   2.50 ( 2%) wall       9 kB ( 0%) ggc
 rest of compilation     :   0.98 ( 1%) usr   0.00 ( 0%) sys   1.02 ( 1%) wall      31 kB ( 0%) ggc
 unaccounted todo        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 : 119.24             0.72           119.96              54344 kB

real    2m2.183s
user    2m0.976s
sys     0m1.074s
Comment 2 Ian Lance Taylor 2012-04-25 22:15:19 UTC
SPARC register allocator slowness filed as PR 53125.
Comment 3 Ian Lance Taylor 2012-04-25 22:48:54 UTC
The 64bit-out.go case appears to be similar.  It is also a generated file, and it also takes a long time to compile.  The register allocator is not quite as dominant, only 43% of compilation time.  In any case I will revisit 64bit-out when and if cmplxdivide is fixed.
Comment 4 Dominik Vogt 2015-02-27 08:48:15 UTC
This also happens intermittently on my s390x development machine (a zEC12) with the current 5.0 development trunk.

(In reply to Ian Lance Taylor from comment #1)
> Interestingly, the time for cmpldivide.go on SPARC appears to be primarily
> in the register allocator while compiling.

To be specific:

   LRA hard reg assignment : 217.88 (95%) usr   0.29 (74%) sys 218.24 (95%) wall       0 kB ( 0%) ggc


> This is true even though no -O option is used.

Actually, on s390x it does not happen

--

Observation
-----------

Compile time of the test is normally about 4 minutes, but I've seen ~3:50 as well as ~4:45.  When the machine is slow for some reason (probably does not matter why), compile time may become more than 5 minutes and therefore the test times out.

Explanation
-----------

The test defines a long array of structures with three complex numbers in cmplxdivide1.go:

  var tests = []Test{ 
    Test{complex(0, 0), complex(0, 0), complex(-nan, -nan)}, 
    Test{complex(0, 0), complex(0, 1), complex(0, 0)}, 
    ...
  }

The constants like "nan" map to exported symbols of the math package (unlike C where this would probably be done with macros): "nan" appears in the code as "math.NaN@plt".  With dynamic linkage the actual value is unknown at compile time, and the structure "tests" is initialised in the init function of the main package.  Compiling with -O0, the executable is about 1.5 MB, and more than 90% of that is code in the init function.  For each line in the table the assembler instuctions to initialise is consume about 420 bytes.

As far as I was told, the register allocation code has some trouble with huge basic blocks of simple code like in this case, when the number of possibilities explodes.

Note: With -O3, the code compiles in less than two seconds, probably because the code in the init function is reduced drastically before the expensive register allocation pass.
Comment 5 Eric Gallager 2018-10-01 18:04:21 UTC
(In reply to Ian Lance Taylor from comment #3)
> The 64bit-out.go case appears to be similar.  It is also a generated file,
> and it also takes a long time to compile.  The register allocator is not
> quite as dominant, only 43% of compilation time.  In any case I will revisit
> 64bit-out when and if cmplxdivide is fixed.

Has cmplxdivide been fixed yet?
Comment 6 Eric Gallager 2019-04-01 04:27:17 UTC
(In reply to Eric Gallager from comment #5)
> (In reply to Ian Lance Taylor from comment #3)
> > The 64bit-out.go case appears to be similar.  It is also a generated file,
> > and it also takes a long time to compile.  The register allocator is not
> > quite as dominant, only 43% of compilation time.  In any case I will revisit
> > 64bit-out when and if cmplxdivide is fixed.
> 
> Has cmplxdivide been fixed yet?

No reply; changing to SUSPENDED since this isn't really a case where closing as INVALID (due to lack of response) is applicable
Comment 7 Rainer Orth 2024-04-04 12:20:30 UTC
Unfortunately, the issue persists on current trunk:

32-bit-default gccgo, sparc-sun-solaris2.11 (SPARC S7-2):

real        3:07.71
user        3:05.92
sys            0.89

32-bit-default gccgo, sparc-sun-solaris2.11 (SPARC T8-1):

real        1:18.05
user        1:16.50
sys            0.38

Given that the compile time is close to the default limit (5m) even on an
unloaded machine, the test is almost guaranteed to FAIL under load.

S7-2 -ftime-report output:

Time variable                                   usr           sys          wall           GGC
 phase parsing                      :   0.59 (  0%)   0.02 (  2%)   0.61 (  0%)  2575k (  2%)
 phase opt and generate             : 145.88 (100%)   0.79 ( 96%) 146.88 (100%)   113M ( 98%)
 phase last asm                     :   0.02 (  0%)   0.01 (  1%)   0.02 (  0%)   230k (  0%)
 garbage collection                 :   0.37 (  0%)   0.04 (  5%)   0.56 (  0%)     0  (  0%)
 dump files                         :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 callgraph construction             :   0.06 (  0%)   0.00 (  0%)   0.04 (  0%)  1070k (  1%)
 callgraph optimization             :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 callgraph ipa passes               :   1.93 (  1%)   0.06 (  7%)   1.99 (  1%)  2573k (  2%)
 ipa dead code removal              :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 ipa inlining heuristics            :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 cfg construction                   :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)  8896  (  0%)
 cfg cleanup                        :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)   176  (  0%)
 CFG verifier                       :   2.58 (  2%)   0.00 (  0%)   2.60 (  2%)     0  (  0%)
 trivially dead code                :   0.22 (  0%)   0.00 (  0%)   0.22 (  0%)     0  (  0%)
 df scan insns                      :   0.11 (  0%)   0.01 (  1%)   0.11 (  0%)  7392  (  0%)
 df live regs                       :   0.15 (  0%)   0.00 (  0%)   0.14 (  0%)     0  (  0%)
 df reg dead/unused notes           :   0.17 (  0%)   0.00 (  0%)   0.17 (  0%)  2353k (  2%)
 register information               :   0.04 (  0%)   0.00 (  0%)   0.05 (  0%)     0  (  0%)
 alias analysis                     :   0.17 (  0%)   0.00 (  0%)   0.18 (  0%)  2053k (  2%)
 rebuild jump labels                :   0.09 (  0%)   0.00 (  0%)   0.09 (  0%)     0  (  0%)
 parser (global)                    :   0.59 (  0%)   0.02 (  2%)   0.61 (  0%)  2574k (  2%)
 inline parameters                  :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)  4768  (  0%)
 tree gimplify                      :   0.17 (  0%)   0.00 (  0%)   0.17 (  0%)  7786k (  7%)
 tree eh                            :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)  8960  (  0%)
 tree CFG construction              :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)    60k (  0%)
 tree CFG cleanup                   :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 tree SSA other                     :   0.00 (  0%)   0.01 (  1%)   0.00 (  0%)     0  (  0%)
 tree SSA rewrite                   :   0.04 (  0%)   0.03 (  4%)   0.08 (  0%)   512k (  0%)
 tree SSA incremental               :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)   480k (  0%)
 tree operand scan                  :   0.16 (  0%)   0.05 (  6%)   0.18 (  0%)  2205k (  2%)
 tree SSA verifier                  :   1.09 (  1%)   0.00 (  0%)   1.12 (  1%)     0  (  0%)
 tree STMT verifier                 :   1.57 (  1%)   0.00 (  0%)   1.59 (  1%)     0  (  0%)
 callgraph verifier                 :   0.07 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 dominance computation              :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 out of ssa                         :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)  4472  (  0%)
 expand vars                        :   0.03 (  0%)   0.01 (  1%)   0.05 (  0%)  2955k (  2%)
 expand                             :   0.49 (  0%)   0.01 (  1%)   0.51 (  0%)    26M ( 23%)
 post expand cleanups               :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)    12k (  0%)
 integrated RA                      :  48.45 ( 33%)   0.05 (  6%)  48.48 ( 33%)    24M ( 21%)
 LRA non-specific                   :   4.00 (  3%)   0.03 (  4%)   3.99 (  3%)    27M ( 24%)
 LRA virtuals elimination           :   0.26 (  0%)   0.00 (  0%)   0.28 (  0%)    45k (  0%)
 LRA reload inheritance             :   1.85 (  1%)   0.01 (  1%)   1.88 (  1%)  6286k (  5%)
 LRA create live ranges             :   2.49 (  2%)   0.00 (  0%)   2.49 (  2%)  2592k (  2%)
 LRA hard reg assignment            :  74.81 ( 51%)   0.51 ( 62%)  75.34 ( 51%)     0  (  0%)
 reload                             :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)   792  (  0%)
 thread pro- & epilogue             :   0.41 (  0%)   0.00 (  0%)   0.41 (  0%)    28k (  0%)
 shorten branches                   :   0.22 (  0%)   0.00 (  0%)   0.22 (  0%)   662k (  1%)
 final                              :   0.88 (  1%)   0.01 (  1%)   0.91 (  1%)  1751k (  1%)
 symout                             :   0.02 (  0%)   0.01 (  1%)   0.02 (  0%)   253k (  0%)
 initialize rtl                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  5232  (  0%)
 access analysis                    :   0.08 (  0%)   0.00 (  0%)   0.09 (  0%)   481k (  0%)
 rest of compilation                :   0.36 (  0%)   0.02 (  2%)   0.37 (  0%)  4663k (  4%)
 verify RTL sharing                 :   4.14 (  3%)   0.00 (  0%)   4.15 (  3%)     0  (  0%)
 TOTAL                              : 146.49          0.82        147.51          116M
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.