User account creation filtered due to spam.

Bug 56510

Summary: [4.7 Regression] More var-tracking scalability problems
Product: gcc Reporter: Sam Rushing <sam-gccbug>
Component: debugAssignee: Jakub Jelinek <jakub>
Status: RESOLVED FIXED    
Severity: normal CC: aoliva, jakub
Priority: P3 Keywords: compile-time-hog
Version: 4.8.0   
Target Milestone: 4.8.1   
Host: Target:
Build: Known to work: 4.4.5, 4.7.3, 4.8.0
Known to fail: 4.7.2 Last reconfirmed: 2013-03-03 00:00:00
Attachments: repro
gcc48-pr56510.patch

Description Sam Rushing 2013-03-02 23:56:58 UTC
Created attachment 29568 [details]
repro

The file is output from another compiler, in CPS form (i.e., all funs are tail calls).
If I leave off '-g' it takes 49 seconds to compile (2.4GHz core i7)
If I add '-g' it takes 'forever' (> 90 mins).
Reproduced on OS X (10.8.2) x86_64, Linux Ubuntu (w/gcc4.6.3), FreeBSD 7.3.
I believe this bug affects gcc going back to 4.6 and earlier.  I have a copy of 4.2 sitting around that does not seem to have the problem, though.

gcc was configured: "./configure --enable-languages=c,c++"
built with "make -j 16"

command: /usr/local/bin/gcc -std=c99 -O3 -g -I./include self/compile.c -o self/compile
Comment 1 Andrew Pinski 2013-03-03 00:10:49 UTC
Since you are building the trunk you should do timings only when configure with --enable-checking=release as by default on the trunk, some extra checking is enabled.
Comment 2 Sam Rushing 2013-03-03 00:48:51 UTC
I recompiled with --enable-checking=release, but I couldn't see any difference.

I've attached to the process with gdb and it appears to be stuck forever in cselib.c, often in cselib_lookup().
Comment 3 Steven Bosscher 2013-03-03 00:57:04 UTC
Can you add a few options and report back the reports?

You can add -ftime-report to the command line to let the compiler dump
a report about how much time it spent in different parts of the compiler.

With your observation about cselib, I suspect this is another issue with
the scalability of var-tracking. Can you try with -g -fno-var-tracking?
Comment 4 Sam Rushing 2013-03-03 01:19:30 UTC
That did the trick... -fno-var-tracking fixes the issue.
Comment 5 Steven Bosscher 2013-03-03 11:49:13 UTC
Indeed, var-tracking...

variable tracking     :21235.89 (99%) usr
TOTAL                 :21409.16

May be a dup of bug 54402 or bug 55092 but I'll leave it to someone
else to figure that out...
Comment 6 Jakub Jelinek 2013-03-03 12:08:19 UTC
Why are you marking this as 4.8 Regression, when the reporter says the same problem is there for 4.6 already?
Comment 7 Steven Bosscher 2013-03-03 12:16:18 UTC
(In reply to comment #6)
> Why are you marking this as 4.8 Regression, when the reporter says the same
> problem is there for 4.6 already?

Because I've confirmed it as a regression of GCC 4.8 from GCC 4.2, also
as the reported said. I haven't confirmed it for GCC 4.6.  I've just now
confirmed that GCC 4.7 has the same problem.

Just -fno-var-tracking-assignments is enough to have sensible compile
time. Not unexpected either, of course...
Comment 8 Jakub Jelinek 2013-03-03 12:40:09 UTC
Reduced testcase:
struct S { unsigned long s1; void **s2[0]; };
void **a, **b, **c, **d, **e, **f;

static void **
baz (long x, long y)
{
  void **s = f;
  *f = (void **) (y << 8 | (x & 0xff));
  f += y + 1;
  return s;
}

void bar (void);
void
foo (void)
{
  void **g = b[4];
  a = b[2];
  b = b[1];
  g[2] = e;
  void **h = ((void ***************) a)[1][1][1][1][1][1][1][1][1][1][1][1][1][66];
  void **i = ((struct S *) h)->s2[4];
  d = baz (4, 3);
  d[1] = b;
  d[2] = a;
  d[3] = bar;
  b = d;
  g[1] = i[2];
  a = g;
  ((void (*) (void)) (i[1])) ();
}

I'd say the problem is that during expansion we turn:
  _10 = MEM[(void * * * * * * * * * * * * * * *)a.1_4 + 8B];
  _11 = MEM[(void * * * * * * * * * * * * * *)_10 + 8B];
  _12 = MEM[(void * * * * * * * * * * * * *)_11 + 8B];
  _13 = MEM[(void * * * * * * * * * * * *)_12 + 8B];
  _14 = MEM[(void * * * * * * * * * * *)_13 + 8B];
  _15 = MEM[(void * * * * * * * * * *)_14 + 8B];
  _16 = MEM[(void * * * * * * * * *)_15 + 8B];
  _17 = MEM[(void * * * * * * * *)_16 + 8B];
  _18 = MEM[(void * * * * * * *)_17 + 8B];
  _19 = MEM[(void * * * * * *)_18 + 8B];
  _20 = MEM[(void * * * * *)_19 + 8B];
  _21 = MEM[(void * * * *)_20 + 8B];
  _22 = MEM[(void * * *)_21 + 8B];
  h_23 = MEM[(void * *)_22 + 528B];
  # DEBUG h => h_23
  i_24 = MEM[(struct S *)h_23].s2[4];
into:
(debug_insn 14 13 15 2 (var_location:DI h (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI
(plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/f:DI (plus:DI (mem/
f:DI (plus:DI (reg/f:DI 61 [ a.1 ])
                                                                                                                    (const_int 8 [0x8])) [0 MEM[(vo
id * * * * * * * * * * * * * * *)a.1_4 + 8B]+0 S8 A64])
                                                                                                            (const_int 8 [0x8])) [0 MEM[(void * * *
 * * * * * * * * * * *)_10 + 8B]+0 S8 A64])
                                                                                                    (const_int 8 [0x8])) [0 MEM[(void * * * * * * *
 * * * * * *)_11 + 8B]+0 S8 A64])
                                                                                            (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * * * *
 *)_12 + 8B]+0 S8 A64])
                                                                                    (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * * * *)_13 + 8
B]+0 S8 A64])
                                                                            (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * * *)_14 + 8B]+0 S8 A6
4])
                                                                    (const_int 8 [0x8])) [0 MEM[(void * * * * * * * * *)_15 + 8B]+0 S8 A64])
                                                            (const_int 8 [0x8])) [0 MEM[(void * * * * * * * *)_16 + 8B]+0 S8 A64])
                                                    (const_int 8 [0x8])) [0 MEM[(void * * * * * * *)_17 + 8B]+0 S8 A64])
                                            (const_int 8 [0x8])) [0 MEM[(void * * * * * *)_18 + 8B]+0 S8 A64])
                                    (const_int 8 [0x8])) [0 MEM[(void * * * * *)_19 + 8B]+0 S8 A64])
                            (const_int 8 [0x8])) [0 MEM[(void * * * *)_20 + 8B]+0 S8 A64])
                    (const_int 8 [0x8])) [0 MEM[(void * * *)_21 + 8B]+0 S8 A64])
            (const_int 528 [0x210])) [0 MEM[(void * *)_22 + 528B]+0 S8 A64])) pr56510-2.i:21 -1
     (nil))
which is simply too large for any reasonable cselib handling, it would better be split using debug temporaries.

OT, are you sure the testcase doesn't violate aliasing just about everywhere?
Comment 9 Steven Bosscher 2013-03-03 12:50:32 UTC
(In reply to comment #8)
> OT, are you sure the testcase doesn't violate aliasing just about
> everywhere?

At least -Wstrict-aliasing=2 doesn't think so, but it's certainly a
test case that shows the worst one can do with C pointers :-)

There is one pointer-to-int cast warning:
self/compile.c:1766:8: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
Comment 10 Jakub Jelinek 2013-03-04 09:21:38 UTC
That reduced testcase actually compiles in a few seconds on a fast box, so let's make it larger, then it will take a few years:
struct S { unsigned long s1; void **s2[0]; };
void **a, **b, **c, **d, **e, **f;

static void **
baz (long x, long y)
{
  void **s = f;
  *f = (void **) (y << 8 | (x & 0xff));
  f += y + 1;
  return s;
}

void bar (void);
void
foo (void)
{
  void **g = b[4];
  a = b[2];
  b = b[1];
  g[2] = e;
  void **h = ((void **************************) a)[1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][66];
  void **i = ((struct S *) h)->s2[4];
  d = baz (4, 3);
  d[1] = b;
  d[2] = a;
  d[3] = bar;
  b = d;
  g[1] = i[2];
  a = g;
  ((void (*) (void)) (i[1])) ();
}

Alex, I think creating debug temporaries from within expand_debug_expr sounds too complicated and furthermore at expand_debug_expr time we don't know yet whether we'll actually return non-NULL for the whole expression or throw everything away.  So, what would you think about just keeping the code as is and just after expand_debug_expr is called, we look at the VARIABLE_LOCATION second operand and if the RTL nesting depth is deep enough (say 3-4 levels of nesting?) in something that we'd be ok to split into debug temporaries (I'd say RTX_*COMPARE/UNARY/*ARITH/TERNARY plus first operand of MEM), create debug temporary for the subexpression and replace the operand with the debug temporary.  Otherwise, I'm afraid with TER we can end up with arbitrarily deep DEBUG_INSN operands.
Comment 11 Richard Biener 2013-03-04 10:52:35 UTC
Why TER into debug-insns at all?
Comment 12 Jakub Jelinek 2013-03-04 10:56:14 UTC
Because TERed stmts won't be expanded, so there is nothing to refer to.
Furthermore, in many cases expand_debug_expr relies on seeing the inner operand.
Comment 13 Jakub Jelinek 2013-03-05 13:50:28 UTC
Created attachment 29585 [details]
gcc48-pr56510.patch

Untested fix.
Comment 14 Jakub Jelinek 2013-03-05 22:31:56 UTC
Author: jakub
Date: Tue Mar  5 22:31:50 2013
New Revision: 196479

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196479
Log:
	PR debug/56510
	* cfgexpand.c (expand_debug_parm_decl): Call copy_rtx on incoming.
	(avoid_complex_debug_insns): New function.
	(expand_debug_locations): Call it.

	* gcc.dg/pr56510.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr56510.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/testsuite/ChangeLog
Comment 15 Jakub Jelinek 2013-03-05 22:35:30 UTC
Fixed for trunk so far.
Comment 16 Jakub Jelinek 2013-03-22 14:42:05 UTC
GCC 4.8.0 is being released, adjusting target milestone.
Comment 17 Jakub Jelinek 2013-04-03 08:25:49 UTC
Author: jakub
Date: Wed Apr  3 08:19:56 2013
New Revision: 197391

URL: http://gcc.gnu.org/viewcvs?rev=197391&root=gcc&view=rev
Log:
	Backported from mainline
	2013-03-05  Jakub Jelinek  <jakub@redhat.com>

	PR debug/56510
	* cfgexpand.c (expand_debug_parm_decl): Call copy_rtx on incoming.
	(avoid_complex_debug_insns): New function.
	(expand_debug_locations): Call it.

	* gcc.dg/pr56510.c: New test.

Added:
    branches/gcc-4_7-branch/gcc/testsuite/gcc.dg/pr56510.c
Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/cfgexpand.c
    branches/gcc-4_7-branch/gcc/testsuite/ChangeLog
Comment 18 Jakub Jelinek 2013-04-03 08:27:47 UTC
Fixed for 4.7.3+ too.