GCC Bugzilla has been upgraded from version 4.4.9 to 5.0rc3. If you see any problem, please report it to bug 64968.
Bug 10944 - alloc_page in ggc-page.c is slow
Summary: alloc_page in ggc-page.c is slow
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: other (show other bugs)
Version: 3.4.0
: P2 normal
Target Milestone: 4.0.0
Assignee: Zack Weinberg
URL:
Keywords: compile-time-hog, patch
Depends on:
Blocks: 8361
  Show dependency treegraph
 
Reported: 2003-05-22 19:23 IST by Andrew Pinski
Modified: 2005-02-09 02:18 IST (History)
3 users (show)

See Also:
Host: powerpc-apple-darwin6.6
Target: powerpc-apple-darwin6.6
Build: powerpc-apple-darwin6.6
Known to work:
Known to fail:
Last reconfirmed: 2003-11-30 08:38:20


Attachments
Proof-of-concept patch (1.48 KB, patch)
2003-10-12 20:30 IST, Steven Bosscher
Details | Diff
Proof-of-concept patch (2) (1.65 KB, patch)
2003-10-13 14:30 IST, Steven Bosscher
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2003-05-22 19:23:32 IST
alloc_page is slow because it transverses a linked list that is sometimes huge.
It takes about 5% out of a 10s sample (around 20s into compiling) when compiling code from bug 
8361 <http://gcc.gnu.org/PR8361>. 

One way of fixing this is to have an array for that list that can grow, this will most of the problems 
of transversing an linked list.
Comment 1 Zack Weinberg 2003-05-23 21:52:36 IST
The code in alloc_page is remarkably stupid, but the linked list traversal
should only occur when trying to allocate an object larger than a single
page, which is believed to happen almost never.  Can you please find out
whether it's really spending time in that list, or if some other area is
the cause of the problem?

I'll take responsibility for this bug; I've been meaning to get us per-order
freelists for a long time.
Comment 2 Andrew Pinski 2003-05-23 22:29:39 IST
Subject: Re: [Bug other/10944] alloc_page in ggc-page.c is slow

Yes it is spending 90% of the time in alloc_page in that list traversal.
I used Shikari (part of the CHUD tools) on Mac OS X for profiling the 
complier
10000 samples ever 1ms (10s in total).

I can find out the backtrace if you want it.

Thanks,
Andrew Pinski


On Friday, May 23, 2003, at 17:52 US/Eastern, zack@gcc.gnu.org wrote:
>
> The code in alloc_page is remarkably stupid, but the linked list 
> traversal
> should only occur when trying to allocate an object larger than a 
> single
> page, which is believed to happen almost never.  Can you please find 
> out
> whether it's really spending time in that list, or if some other area 
> is
> the cause of the problem?
>
> I'll take responsibility for this bug; I've been meaning to get us 
> per-order
> freelists for a long time.

Comment 3 Andrew Pinski 2003-05-23 22:40:18 IST
The backtraces are spread all over so that it is not just one place.
Comment 4 Andrew Pinski 2003-05-23 23:19:05 IST
Here is a sample output of shikari based on 75168 samples each one 1ms apart 
samples		function		program
	1680		alloc_page	cc1plus

based on asm:
Samples	asm				file:line
2	  	lwz  r0,4(r3)		ggc-page.c:709
1375	  	cmpw cr7,r0,r28	ggc-page.c:709
	 	beq cr7,+224		ggc-page.c:709
6	  	mr r2,r3			ggc-page.c:708
6	  	lwz r3,0(r3)		ggc-page.c:708
266	  	cmpwi cr6,r3,0		ggc-page.c:708
	  	bne cr6,-24		ggc-page.c:708

samples	line		code
		707	  /* Check the list of free pages for one we can use.  */
278		708	  for (pp = &G.free_pages, p = *pp; p; pp = &p->next, p = *pp)
1377		709	    if (p->bytes == entry_size)
		710	      break;
Comment 5 Steven Bosscher 2003-10-12 20:30:44 IST
Created attachment 4924 [details]
Proof-of-concept patch
Comment 6 Steven Bosscher 2003-10-12 20:34:54 IST
Discussed this with Zack, and the patch "is essentially what [he] had in mind."

Zack suggested there are still a few things that need to be done, however:
1) two-level free page caching:
   - after one collection, put all freepages on the per-order lists;
   - after the next collection, put all thepages that are still on
     the per-order lists on the global list;
   - and after a third collection release the pages still on the global
     list tothe OS.
   This probably takes some fiddling with the page_entry struct.  Also
   we'll have to see how this works in relation with page groups

2) scavenge pages from other per-order lists rather than go back to the
   OS for more.
   (Personally I think this won't be necessary if (1) is implemented properly)


Comment 7 Steven Bosscher 2003-10-13 14:30:57 IST
Created attachment 4928 [details]
Proof-of-concept patch (2)
Comment 8 Steven Bosscher 2003-10-13 14:37:11 IST
The new patch implements suggestion 1 from last nights list.

Andrew, you identified the loop over free_pages as a time consumer, but now that
it is gone, I get virtually no measurable time improvements (10ths of seconds on
>1min total compilation time) for the test case for PR8361.  Which (not
surprisingly) once again shows that GC is slow because with each call
ggc_collect we need to do more and more marking...

Are you absolutely, positively sure that this loop is a performance hog? 
Perhaps you can try the patch, it bootstrappes (c,objc,c++) and showed no
regressions on i686.  So far I haven't found convincing evidence that it
improves performance significantly except for the most pathological (artificial)
cases.

Sigh, back to waiting for a new GC strategy...

Comment 9 Andrew Pinski 2003-11-30 08:38:19 IST
I am going to push this (only because I have not have time to test the patch) and there is 
already a new GC strategy (for 3.4) but does not work with everything.
Comment 10 Steven Bosscher 2003-12-20 14:10:47 IST
I have experimented with my patch on three different targets (i686,x86_64, and 
hpux) and have never seen any improvements with it even though it kills the 
supposed hot spot in the code.  I'm marking this as WAITING and I'll close it 
soonish, unless someone can test this on the reporter's platform (ppc-darwin) 
and show that it does help on that target. 
 
Comment 11 Andrew Pinski 2003-12-23 01:10:12 IST
Steven can you wait like three weeks before closing because I go on vaction tommorrow for about 
two weeks and I cannot test the patch untill after I get back?
Comment 12 Dara Hazeghi 2004-01-17 22:59:07 IST
Andrew, did you get a chance to test the patch for this?
Comment 13 Andrew Pinski 2004-02-06 01:44:38 IST
I do not have time to test this, and it now this patch does not matter as Honza's work with the 
compiler's memory and compile time stuff and Steven did not get any benifit from it.