This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Something is broken in repack
On Wed, 12 Dec 2007, Nicolas Pitre wrote:
> Add memory fragmentation to that and you have a clogged system.
>
> Solution:
>
> pack.deltacachesize=1
> pack.windowmemory=16M
>
> Limiting the window memory to 16MB will automatically shrink the window
> size when big objects are encountered, therefore keeping much fewer of
> those objects at the same time in memory, which in turn means they will
> be processed much more quickly. And somehow that must help with memory
> fragmentation as well.
OK scrap that.
When I returned to the computer this morning, the repack was
completed... with a 1.3GB pack instead.
So... The gcc repo apparently really needs a large window to efficiently
compress those large objects.
But when those large objects are already well deltified and you repack
again with a large window, somehow the memory allocator is way more
involved, probably even
more so when there are several threads in parallel amplifying the issue,
and things probably get to a point of no return with regard to memory
fragmentation after a while.
So... my conclusion is that the glibc allocator has fragmentation issues
with this work load, given the notable difference with the Google
allocator, which itself might not be completely immune to fragmentation
issues of its own. And because the gcc repo requires a large window of
big objects to get good compression, then you're better not using 4
threads to repack it with -a -f. The fact that the size of the source
pack has such an influence is probably only because the increased usage
of the delta base object cache is playing a role in the global memory
allocation pattern, allowing for the bad fragmentation issue to occur.
If you could run one last test with the mallinfo patch I posted, without
the pack.windowmemory setting, and adding the reported values along with
those from top, then we could formally conclude to memory fragmentation
issues.
So I don't think Git itself is actually bad. The gcc repo most
certainly constitute a nasty use case for memory allocators, but I don't
think there is much we can do about it besides possibly implementing our
own memory allocator with active defragmentation where possible (read
memcpy) at some point to give glibc's allocator some chance to breathe a
bit more.
In the mean time you might have to use only one thread and lots of
memory to repack the gcc repo, or find the perfect memory allocator to
be used with Git. After all, packing the whole gcc history to around
230MB is quite a stunt but it requires sufficient resources to
achieve it. Fortunately, like Linus said, such a wholesale repack is not
something that most users have to do anyway.
Nicolas