This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Something is broken in repack


On Wed, 12 Dec 2007, Nicolas Pitre wrote:

> Add memory fragmentation to that and you have a clogged system.
> 
> Solution: 
> 
> 	pack.deltacachesize=1
> 	pack.windowmemory=16M
> 
> Limiting the window memory to 16MB will automatically shrink the window 
> size when big objects are encountered, therefore keeping much fewer of 
> those objects at the same time in memory, which in turn means they will 
> be processed much more quickly.  And somehow that must help with memory 
> fragmentation as well.

OK scrap that.

When I returned to the computer this morning, the repack was 
completed... with a 1.3GB pack instead.

So... The gcc repo apparently really needs a large window to efficiently 
compress those large objects.

But when those large objects are already well deltified and you repack 
again with a large window, somehow the memory allocator is way more 
involved, probably even 
more so when there are several threads in parallel amplifying the issue, 
and things probably get to a point of no return with regard to memory 
fragmentation after a while.

So... my conclusion is that the glibc allocator has fragmentation issues 
with this work load, given the notable difference with the Google 
allocator, which itself might not be completely immune to fragmentation 
issues of its own.  And because the gcc repo requires a large window of 
big objects to get good compression, then you're better not using 4 
threads to repack it with -a -f.  The fact that the size of the source 
pack has such an influence is probably only because the increased usage 
of the delta base object cache is playing a role in the global memory 
allocation pattern, allowing for the bad fragmentation issue to occur.

If you could run one last test with the mallinfo patch I posted, without 
the pack.windowmemory setting, and adding the reported values along with 
those from top, then we could formally conclude to memory fragmentation 
issues.

So I don't think Git itself is actually bad.  The gcc repo most 
certainly constitute a nasty use case for memory allocators, but I don't 
think there is much we can do about it besides possibly implementing our 
own memory allocator with active defragmentation where possible (read 
memcpy) at some point to give glibc's allocator some chance to breathe a 
bit more.

In the mean time you might have to use only one thread and lots of 
memory to repack the gcc repo, or find the perfect memory allocator to 
be used with Git.  After all, packing the whole gcc history to around 
230MB is quite a stunt but it requires sufficient resources to 
achieve it. Fortunately, like Linus said, such a wholesale repack is not 
something that most users have to do anyway.


Nicolas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]