User account creation filtered due to spam.

Bug 58479 - [5/6/7/8 Regression] slow var-tracking on x86_64-linux at -O1 (and above) with -g, but checking disabled
Summary: [5/6/7/8 Regression] slow var-tracking on x86_64-linux at -O1 (and above) wit...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.9.0
: P2 normal
Target Milestone: 5.5
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2013-09-19 19:47 UTC by Zhendong Su
Modified: 2016-08-03 11:32 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-09-19 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Zhendong Su 2013-09-19 19:47:13 UTC
The following code takes much longer to compile at -O1 (and above) with -g using the current gcc trunk on x86_64-linux (in both 32-bit and 64-bit modes). 

It also affects 4.6, 4.7, and 4.8 (with checking disabled), but to a lesser extent (4 seconds vs. 12 seconds at -O1 with -g). 

This seems to be related to 58318, but 58318 manifests only when checking is enabled. 

This may also be related to 58478. 


$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/su/software/local/gcc-trunk/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-trunk/configure --enable-languages=c,c++,objc,obj-c++,fortran,lto --enable-checking=release --with-gmp=/home/su/software/local/gcc-trunk --with-mpfr=/home/su/software/local/gcc-trunk --with-mpc=/home/su/software/local/gcc-trunk --with-cloog=/home/su/software/local/gcc-trunk --prefix=/home/su/software/local/gcc-trunk
Thread model: posix
gcc version 4.9.0 20130917 (experimental) (GCC) 
$
$ time gcc -O0 -g small.c
0.02user 0.00system 0:00.13elapsed 27%CPU (0avgtext+0avgdata 37888maxresident)k
0inputs+56outputs (0major+6439minor)pagefaults 0swaps
$ time gcc -O1 small.c
0.02user 0.02system 0:00.14elapsed 30%CPU (0avgtext+0avgdata 40144maxresident)k
0inputs+32outputs (0major+6560minor)pagefaults 0swaps
$ time gcc -O1 -g small.c
7.69user 0.53system 0:12.20elapsed 67%CPU (0avgtext+0avgdata 2632224maxresident)k
0inputs+32outputs (0major+180567minor)pagefaults 0swaps
$


-------------------------------------


int a, b, c, d, e, f; 

int main ()
{
  for (a = 0; a < 8; a++)
    for (b = 0; b < 8; b++)
      for (c = 0; c < 8; c++)
	for (d = 0; d < 8; d++)
	  for (e = 0; e < 8; e++)
	    {
	      int t[3][2][9] = {
		{{f, f, f, f, f, f, f, f, f},
		 {f, f, f, f, f, f, f, f, f}},
		{{f, f, f, f, f, f, f, f, f},
		 {f, f, f, f, f, f, f, f, f}},
		{{f, f, f, f, f, f, f, f, f},
		 {f, f, f, f, f, f, f, f, f}},
	      };
	    }

  return 0;
}
Comment 1 Marek Polacek 2013-09-19 20:03:52 UTC
Confirmed (with checking enabled).  It's VTA: with -fno-var-tracking the slowdown goes away.
Comment 2 Marek Polacek 2013-09-19 20:06:29 UTC
With 4.8/4.7 it's indeed not that bad, thus tentatively marking as 4.9 regression...
Comment 3 Richard Biener 2014-01-20 09:56:34 UTC
Or rather -fvar-tracking-assignments.  The slowness creeps in

 trivially dead code     :   0.37 (13%) usr   0.00 ( 0%) sys   0.37 (10%) wall       0 kB ( 0%) ggc
 complete unrolling      :   0.36 (13%) usr   0.53 (55%) sys   0.81 (22%) wall  211254 kB (38%) ggc
 expand                  :   0.29 (10%) usr   0.09 ( 9%) sys   0.38 (10%) wall  346562 kB (62%) ggc

var-tracking itself doesn't enter the picture.

main ()
{
  <bb 2>:
  # DEBUG D#1 => f
  # DEBUG t$0$0$0 => D#1
  # DEBUG t$0$0$1 => D#1
  # DEBUG t$0$0$2 => D#1
... 3.3 million (!) similar lines follow ...
  e = 8;
  d = 8;
  c = 8;
  b = 8;
  a = 8;
  return 0;

}

seems to be support for aggregate piece debug values makes this blow up
totally.  Nobody is going to p t[][][] here (and I suspect gdb support
isn't ready here either).
Comment 4 Jakub Jelinek 2014-01-20 10:48:16 UTC
I don't see why do you think nobody would try to look at t[x][y][z] in the debugger.
Anyway, I think we can do two things here.  Obviously we can't give up on cunrolling it because that would be a clear -fcompare-debug failure.  But:
1) in loop unrolling, analyze the debug stmts we are about to unroll, and if
some of the debug stmts refer to a decl no other debug stmt in the loop refers to (though would need to take into account DECL_DEBUG_EXPR overlaps) and the expression in the debug stmt uses only SSA_NAMEs from before the loop, constants and/or debug exprs from the loop that satisfy that recursively, just emit the those debug stmts in the first iteration only and not repeat those in all the other unrolled iterations
1a) alternatively to that, write some debug stmt optimization pass, that would detect the case of useless debug stmts (saying something lives in what it is known to live at from earlier debug stmts already)

2) have some debug stmt limits (--param controlled), above which we start dropping debug stmts or resetting them just once or something similar.  Because
it is possible that there are multiple debug stmts per the same decl in the loop and 1) or 1a) can't do anything.  Perhaps have the normal debug stmts in first iteration of the unrolled loop (or a few of them, depending on how many there are), then when we give up just reset all the debug stmts in some iteration and after that iteration and before last iteration don't emit debug stmts at all, then finally in the last iteration emit debug stmts again.

Testcase for 2) is e.g. -O2 -g:
int a, b, c, d, e;
int
main ()
{
  for (a = 0; a < 16; a++)
  for (b = 0; b < 16; b++)
  for (c = 0; c < 16; c++)
  for (d = 0; d < 16; d++)
  for (e = 0; e < 16; e++)
    {
      int f;
#define F10 f = 0; f = 1; f = 2; f = 3; f = 4; f = 5; f = 6; f = 7; f = 8; f = 9;
#define F100 F10 F10 F10 F10 F10 F10 F10 F10 F10 F10
      F100
    }
  return 0;
}

Another testcase, with no unrolling at all, that shows that it is easy to get thousands of debug stmts:
int a, b, c, d, e, f;

int
main ()
{
#define F1 {{f, f, f, f, f, f, f, f, f}, {f, f, f, f, f, f, f, f, f}},
#define F10 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1
#define F100 F10 F10 F10 F10 F10 F10 F10 F10 F10 F10
#define F1000 F100 F100 F100 F100 F100 F100 F100 F100 F100 F100
  int t[1000][2][9] = {
    F1000
  };
  return 0;
}

not sure where if at all to put any --param limits here though, after all, you get for this in the initializers as huge spaghetti code and only later on that is transformed into no actual code, just DEBUG stmt spaghetti.  Though, supposedly this last example is already related to the PR59659 patch (which I've done for C++ only so far).
Comment 5 Jakub Jelinek 2014-03-03 14:01:25 UTC
Alex, your thoughts on this?
Comment 6 Jakub Jelinek 2014-03-12 15:28:43 UTC
BTW, I don't see any significant slowdown from r155360 when this regressed on compile time (for different reasons).
It has been slow until r187052, with r187053 it returned roughly at previous speed, then r195015 started emitting the debug stmts and compile time went up again.

But, if I compare 4.8 branch (--enable-checking=yes,rtl) from r208337 with current trunk (--enable-checking=yes,rtl), trunk is a few % faster even.

So, IMHO this isn't a P1 regression as it hasn't really regressed recently (and while during development of 4.8 it has been for certain period fast again, it hasn't been "fixed" in any released compiler after 4.4.x).
Comment 7 Jakub Jelinek 2014-04-22 11:36:59 UTC
GCC 4.9.0 has been released
Comment 8 Jakub Jelinek 2014-07-16 13:30:38 UTC
GCC 4.9.1 has been released.
Comment 9 Jakub Jelinek 2014-10-30 10:40:35 UTC
GCC 4.9.2 has been released.
Comment 10 Jakub Jelinek 2015-06-26 19:57:57 UTC
GCC 4.9.3 has been released.
Comment 11 Richard Biener 2016-08-03 11:02:40 UTC
GCC 4.9 branch is being closed