The following code takes much longer to compile at -O1 (and above) with -g using the current gcc trunk on x86_64-linux (in both 32-bit and 64-bit modes). It also affects 4.6, 4.7, and 4.8 (with checking disabled), but to a lesser extent (4 seconds vs. 12 seconds at -O1 with -g). This seems to be related to 58318, but 58318 manifests only when checking is enabled. This may also be related to 58478. $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/su/software/local/gcc-trunk/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-trunk/configure --enable-languages=c,c++,objc,obj-c++,fortran,lto --enable-checking=release --with-gmp=/home/su/software/local/gcc-trunk --with-mpfr=/home/su/software/local/gcc-trunk --with-mpc=/home/su/software/local/gcc-trunk --with-cloog=/home/su/software/local/gcc-trunk --prefix=/home/su/software/local/gcc-trunk Thread model: posix gcc version 4.9.0 20130917 (experimental) (GCC) $ $ time gcc -O0 -g small.c 0.02user 0.00system 0:00.13elapsed 27%CPU (0avgtext+0avgdata 37888maxresident)k 0inputs+56outputs (0major+6439minor)pagefaults 0swaps $ time gcc -O1 small.c 0.02user 0.02system 0:00.14elapsed 30%CPU (0avgtext+0avgdata 40144maxresident)k 0inputs+32outputs (0major+6560minor)pagefaults 0swaps $ time gcc -O1 -g small.c 7.69user 0.53system 0:12.20elapsed 67%CPU (0avgtext+0avgdata 2632224maxresident)k 0inputs+32outputs (0major+180567minor)pagefaults 0swaps $ ------------------------------------- int a, b, c, d, e, f; int main () { for (a = 0; a < 8; a++) for (b = 0; b < 8; b++) for (c = 0; c < 8; c++) for (d = 0; d < 8; d++) for (e = 0; e < 8; e++) { int t[3][2][9] = { {{f, f, f, f, f, f, f, f, f}, {f, f, f, f, f, f, f, f, f}}, {{f, f, f, f, f, f, f, f, f}, {f, f, f, f, f, f, f, f, f}}, {{f, f, f, f, f, f, f, f, f}, {f, f, f, f, f, f, f, f, f}}, }; } return 0; }
Confirmed (with checking enabled). It's VTA: with -fno-var-tracking the slowdown goes away.
With 4.8/4.7 it's indeed not that bad, thus tentatively marking as 4.9 regression...
Or rather -fvar-tracking-assignments. The slowness creeps in trivially dead code : 0.37 (13%) usr 0.00 ( 0%) sys 0.37 (10%) wall 0 kB ( 0%) ggc complete unrolling : 0.36 (13%) usr 0.53 (55%) sys 0.81 (22%) wall 211254 kB (38%) ggc expand : 0.29 (10%) usr 0.09 ( 9%) sys 0.38 (10%) wall 346562 kB (62%) ggc var-tracking itself doesn't enter the picture. main () { <bb 2>: # DEBUG D#1 => f # DEBUG t$0$0$0 => D#1 # DEBUG t$0$0$1 => D#1 # DEBUG t$0$0$2 => D#1 ... 3.3 million (!) similar lines follow ... e = 8; d = 8; c = 8; b = 8; a = 8; return 0; } seems to be support for aggregate piece debug values makes this blow up totally. Nobody is going to p t[][][] here (and I suspect gdb support isn't ready here either).
I don't see why do you think nobody would try to look at t[x][y][z] in the debugger. Anyway, I think we can do two things here. Obviously we can't give up on cunrolling it because that would be a clear -fcompare-debug failure. But: 1) in loop unrolling, analyze the debug stmts we are about to unroll, and if some of the debug stmts refer to a decl no other debug stmt in the loop refers to (though would need to take into account DECL_DEBUG_EXPR overlaps) and the expression in the debug stmt uses only SSA_NAMEs from before the loop, constants and/or debug exprs from the loop that satisfy that recursively, just emit the those debug stmts in the first iteration only and not repeat those in all the other unrolled iterations 1a) alternatively to that, write some debug stmt optimization pass, that would detect the case of useless debug stmts (saying something lives in what it is known to live at from earlier debug stmts already) 2) have some debug stmt limits (--param controlled), above which we start dropping debug stmts or resetting them just once or something similar. Because it is possible that there are multiple debug stmts per the same decl in the loop and 1) or 1a) can't do anything. Perhaps have the normal debug stmts in first iteration of the unrolled loop (or a few of them, depending on how many there are), then when we give up just reset all the debug stmts in some iteration and after that iteration and before last iteration don't emit debug stmts at all, then finally in the last iteration emit debug stmts again. Testcase for 2) is e.g. -O2 -g: int a, b, c, d, e; int main () { for (a = 0; a < 16; a++) for (b = 0; b < 16; b++) for (c = 0; c < 16; c++) for (d = 0; d < 16; d++) for (e = 0; e < 16; e++) { int f; #define F10 f = 0; f = 1; f = 2; f = 3; f = 4; f = 5; f = 6; f = 7; f = 8; f = 9; #define F100 F10 F10 F10 F10 F10 F10 F10 F10 F10 F10 F100 } return 0; } Another testcase, with no unrolling at all, that shows that it is easy to get thousands of debug stmts: int a, b, c, d, e, f; int main () { #define F1 {{f, f, f, f, f, f, f, f, f}, {f, f, f, f, f, f, f, f, f}}, #define F10 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 #define F100 F10 F10 F10 F10 F10 F10 F10 F10 F10 F10 #define F1000 F100 F100 F100 F100 F100 F100 F100 F100 F100 F100 int t[1000][2][9] = { F1000 }; return 0; } not sure where if at all to put any --param limits here though, after all, you get for this in the initializers as huge spaghetti code and only later on that is transformed into no actual code, just DEBUG stmt spaghetti. Though, supposedly this last example is already related to the PR59659 patch (which I've done for C++ only so far).
Alex, your thoughts on this?
BTW, I don't see any significant slowdown from r155360 when this regressed on compile time (for different reasons). It has been slow until r187052, with r187053 it returned roughly at previous speed, then r195015 started emitting the debug stmts and compile time went up again. But, if I compare 4.8 branch (--enable-checking=yes,rtl) from r208337 with current trunk (--enable-checking=yes,rtl), trunk is a few % faster even. So, IMHO this isn't a P1 regression as it hasn't really regressed recently (and while during development of 4.8 it has been for certain period fast again, it hasn't been "fixed" in any released compiler after 4.4.x).
GCC 4.9.0 has been released
GCC 4.9.1 has been released.
GCC 4.9.2 has been released.
GCC 4.9.3 has been released.
GCC 4.9 branch is being closed
GCC 5 branch is being closed
Ugh yeah, really bad. Reconfirmed. tor:~/bld/t/gcc$ time ./cc1 -O1 -g a.c -quiet real 4m59.125s user 4m52.388s sys 0m1.384s --enable-checking=yes,types,extra
Even with LVUs if I try the #c0 testcase with outer 3 loops commented out (so only 8x8 unrolling) and with a single dimension array rather than 3 dimensional one, i.e.: int d, e, f; int main () { for (d = 0; d < 8; d++) for (e = 0; e < 8; e++) { int t[9] = { f, f, f, f, f, f, f, f, f}; } return 0; } we still don't emit any location info for t, even when it is in *.final dump: (note 735 734 807 2 pr58479-4.c:12 NOTE_INSN_BEGIN_STMT) (note 807 735 808 2 (var_location t$0 (mem/c:SI (symbol_ref:DI ("f") [flags 0x2] <var_decl 0x7fffefd9e120 f>) [1 f+0 S4 A32])) NOTE_INSN_VAR_LOCATION) (note 808 807 809 2 (var_location t$1 (mem/c:SI (symbol_ref:DI ("f") [flags 0x2] <var_decl 0x7fffefd9e120 f>) [1 f+0 S4 A32])) NOTE_INSN_VAR_LOCATION) (note 809 808 810 2 (var_location t$2 (mem/c:SI (symbol_ref:DI ("f") [flags 0x2] <var_decl 0x7fffefd9e120 f>) [1 f+0 S4 A32])) NOTE_INSN_VAR_LOCATION) (note 810 809 811 2 (var_location t$3 (mem/c:SI (symbol_ref:DI ("f") [flags 0x2] <var_decl 0x7fffefd9e120 f>) [1 f+0 S4 A32])) NOTE_INSN_VAR_LOCATION) (note 811 810 812 2 (var_location t$4 (mem/c:SI (symbol_ref:DI ("f") [flags 0x2] <var_decl 0x7fffefd9e120 f>) [1 f+0 S4 A32])) NOTE_INSN_VAR_LOCATION) (note 812 811 813 2 (var_location t$5 (mem/c:SI (symbol_ref:DI ("f") [flags 0x2] <var_decl 0x7fffefd9e120 f>) [1 f+0 S4 A32])) NOTE_INSN_VAR_LOCATION) (note 813 812 814 2 (var_location t$6 (mem/c:SI (symbol_ref:DI ("f") [flags 0x2] <var_decl 0x7fffefd9e120 f>) [1 f+0 S4 A32])) NOTE_INSN_VAR_LOCATION) (note 814 813 736 2 (var_location t$7 (mem/c:SI (symbol_ref:DI ("f") [flags 0x2] <var_decl 0x7fffefd9e120 f>) [1 f+0 S4 A32])) NOTE_INSN_VAR_LOCATION) So unless that changes, not really sure if it is actually worth preserving the debug stmts at all. I guess we don't do that because there is no real instruction that is in the scope where the variable is defined and the NOTE_INSN_BEGIN_STMTs don't really count (but shouldn't they?). Alex, can you please have a look?
t and the lexical block enclosing it are regarded as unused by remove_unused_scope_block_p, so process_scope_var doesn't get a chance to output location information for t. It's there at all because of early debug info recording. remove_unused_locals has a comment just before the test for is_gimple_debug that suggests a good spot to keep the block alive, but it has a significant impact on compile memory use and debug info size, even though it's not clear that the unused variable would remain in the scope block decl list so as to get a loclist. I suppose it is not sensible to keep debug bind stmts for discarded variables past this point, but in several cases, including this, it's hard to figure out because the bind stmts refer to SRAed fragments of the actual variable.
GCC 6 branch is being closed
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
GCC 8.4.0 has been released, adjusting target milestone.
GCC 8 branch is being closed.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.
GCC 11 branch is being closed.