Summary: | [4.7 Regression] More var-tracking scalability problems | ||
---|---|---|---|
Product: | gcc | Reporter: | Sam Rushing <sam-gccbug> |
Component: | debug | Assignee: | Jakub Jelinek <jakub> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | aoliva, jakub |
Priority: | P3 | Keywords: | compile-time-hog |
Version: | 4.8.0 | ||
Target Milestone: | 4.8.1 | ||
Host: | Target: | ||
Build: | Known to work: | 4.4.5, 4.7.3, 4.8.0 | |
Known to fail: | 4.7.2 | Last reconfirmed: | 2013-03-03 00:00:00 |
Attachments: |
repro
gcc48-pr56510.patch |
Since you are building the trunk you should do timings only when configure with --enable-checking=release as by default on the trunk, some extra checking is enabled. I recompiled with --enable-checking=release, but I couldn't see any difference. I've attached to the process with gdb and it appears to be stuck forever in cselib.c, often in cselib_lookup(). Can you add a few options and report back the reports? You can add -ftime-report to the command line to let the compiler dump a report about how much time it spent in different parts of the compiler. With your observation about cselib, I suspect this is another issue with the scalability of var-tracking. Can you try with -g -fno-var-tracking? That did the trick... -fno-var-tracking fixes the issue. Indeed, var-tracking... variable tracking :21235.89 (99%) usr TOTAL :21409.16 May be a dup of bug 54402 or bug 55092 but I'll leave it to someone else to figure that out... Why are you marking this as 4.8 Regression, when the reporter says the same problem is there for 4.6 already? (In reply to comment #6) > Why are you marking this as 4.8 Regression, when the reporter says the same > problem is there for 4.6 already? Because I've confirmed it as a regression of GCC 4.8 from GCC 4.2, also as the reported said. I haven't confirmed it for GCC 4.6. I've just now confirmed that GCC 4.7 has the same problem. Just -fno-var-tracking-assignments is enough to have sensible compile time. Not unexpected either, of course... Reduced testcase: struct S { unsigned long s1; void **s2[0]; }; void **a, **b, **c, **d, **e, **f; static void ** baz (long x, long y) { void **s = f; *f = (void **) (y << 8 | (x & 0xff)); f += y + 1; return s; } void bar (void); void foo (void) { void **g = b[4]; a = b[2]; b = b[1]; g[2] = e; void **h = ((void ***************) a)[1][1][1][1][1][1][1][1][1][1][1][1][1][66]; void **i = ((struct S *) h)->s2[4]; d = baz (4, 3); d[1] = b; d[2] = a; d[3] = bar; b = d; g[1] = i[2]; a = g; ((void (*) (void)) (i[1])) (); } I'd say the problem is that during expansion we turn: _10 = MEM[(void * * * * * * * * * * * * * * *)a.1_4 + 8B]; _11 = MEM[(void * * * * * * * * * * * * * *)_10 + 8B]; _12 = MEM[(void * * * * * * * * * * * * *)_11 + 8B]; _13 = MEM[(void * * * * * * * * * * * *)_12 + 8B]; _14 = MEM[(void * * * * * * * * * * *)_13 + 8B]; _15 = MEM[(void * * * * * * * * * *)_14 + 8B]; _16 = MEM[(void * * * * * * * * *)_15 + 8B]; _17 = MEM[(void * * * * * * * *)_16 + 8B]; _18 = MEM[(void * * * * * * *)_17 + 8B]; _19 = MEM[(void * * * * * *)_18 + 8B]; _20 = MEM[(void * * * * *)_19 + 8B]; _21 = MEM[(void * * * *)_20 + 8B]; _22 = MEM[(void * * *)_21 + 8B]; h_23 = MEM[(void * *)_22 + 528B]; # DEBUG h => h_23 i_24 = MEM[(struct S *)h_23].s2[4]; into: (debug_insn 14 13 15 2 (var_location:DI h (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/ f:DI (plus:DI (reg/f:DI 61 [ a.1 ]) (const_int 8 [0x8])) [0 MEM[(vo id * * * * * * * * * * * * * * *)a.1_4 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * * * * * * *)_10 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * * * * * *)_11 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * * * * *)_12 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * * * *)_13 + 8 B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * * *)_14 + 8B]+0 S8 A6 4]) (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * *)_15 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * * * * *)_16 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * * * *)_17 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * * *)_18 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * * *)_19 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * * *)_20 + 8B]+0 S8 A64]) (const_int 8 [0x8])) [0 MEM[(void * * *)_21 + 8B]+0 S8 A64]) (const_int 528 [0x210])) [0 MEM[(void * *)_22 + 528B]+0 S8 A64])) pr56510-2.i:21 -1 (nil)) which is simply too large for any reasonable cselib handling, it would better be split using debug temporaries. OT, are you sure the testcase doesn't violate aliasing just about everywhere? (In reply to comment #8) > OT, are you sure the testcase doesn't violate aliasing just about > everywhere? At least -Wstrict-aliasing=2 doesn't think so, but it's certainly a test case that shows the worst one can do with C pointers :-) There is one pointer-to-int cast warning: self/compile.c:1766:8: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] That reduced testcase actually compiles in a few seconds on a fast box, so let's make it larger, then it will take a few years: struct S { unsigned long s1; void **s2[0]; }; void **a, **b, **c, **d, **e, **f; static void ** baz (long x, long y) { void **s = f; *f = (void **) (y << 8 | (x & 0xff)); f += y + 1; return s; } void bar (void); void foo (void) { void **g = b[4]; a = b[2]; b = b[1]; g[2] = e; void **h = ((void **************************) a)[1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][66]; void **i = ((struct S *) h)->s2[4]; d = baz (4, 3); d[1] = b; d[2] = a; d[3] = bar; b = d; g[1] = i[2]; a = g; ((void (*) (void)) (i[1])) (); } Alex, I think creating debug temporaries from within expand_debug_expr sounds too complicated and furthermore at expand_debug_expr time we don't know yet whether we'll actually return non-NULL for the whole expression or throw everything away. So, what would you think about just keeping the code as is and just after expand_debug_expr is called, we look at the VARIABLE_LOCATION second operand and if the RTL nesting depth is deep enough (say 3-4 levels of nesting?) in something that we'd be ok to split into debug temporaries (I'd say RTX_*COMPARE/UNARY/*ARITH/TERNARY plus first operand of MEM), create debug temporary for the subexpression and replace the operand with the debug temporary. Otherwise, I'm afraid with TER we can end up with arbitrarily deep DEBUG_INSN operands. Why TER into debug-insns at all? Because TERed stmts won't be expanded, so there is nothing to refer to. Furthermore, in many cases expand_debug_expr relies on seeing the inner operand. Created attachment 29585 [details] gcc48-pr56510.patch Untested fix. Author: jakub Date: Tue Mar 5 22:31:50 2013 New Revision: 196479 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196479 Log: PR debug/56510 * cfgexpand.c (expand_debug_parm_decl): Call copy_rtx on incoming. (avoid_complex_debug_insns): New function. (expand_debug_locations): Call it. * gcc.dg/pr56510.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr56510.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/testsuite/ChangeLog Fixed for trunk so far. GCC 4.8.0 is being released, adjusting target milestone. Author: jakub Date: Wed Apr 3 08:19:56 2013 New Revision: 197391 URL: http://gcc.gnu.org/viewcvs?rev=197391&root=gcc&view=rev Log: Backported from mainline 2013-03-05 Jakub Jelinek <jakub@redhat.com> PR debug/56510 * cfgexpand.c (expand_debug_parm_decl): Call copy_rtx on incoming. (avoid_complex_debug_insns): New function. (expand_debug_locations): Call it. * gcc.dg/pr56510.c: New test. Added: branches/gcc-4_7-branch/gcc/testsuite/gcc.dg/pr56510.c Modified: branches/gcc-4_7-branch/gcc/ChangeLog branches/gcc-4_7-branch/gcc/cfgexpand.c branches/gcc-4_7-branch/gcc/testsuite/ChangeLog Fixed for 4.7.3+ too. |
Created attachment 29568 [details] repro The file is output from another compiler, in CPS form (i.e., all funs are tail calls). If I leave off '-g' it takes 49 seconds to compile (2.4GHz core i7) If I add '-g' it takes 'forever' (> 90 mins). Reproduced on OS X (10.8.2) x86_64, Linux Ubuntu (w/gcc4.6.3), FreeBSD 7.3. I believe this bug affects gcc going back to 4.6 and earlier. I have a copy of 4.2 sitting around that does not seem to have the problem, though. gcc was configured: "./configure --enable-languages=c,c++" built with "make -j 16" command: /usr/local/bin/gcc -std=c99 -O3 -g -I./include self/compile.c -o self/compile