problem when mapping malloc to GC_malloc.

Thu Jun 18 19:01:00 GMT 2009

----- Original Message ----- 
From: "abhishek desai" <abhi00@gmail.com>
To: "Hans Boehm" <Hans.Boehm@hp.com>
Cc: <java@gcc.gnu.org>
Sent: Thursday, June 18, 2009 11:18 AM
Subject: Re: problem when mapping malloc to GC_malloc.

On Thu, Jun 18, 2009 at 10:09 PM, Hans Boehm<Hans.Boehm@hp.com> wrote:
>
>
> On Thu, 18 Jun 2009, abhishek desai wrote:
>
>> Hi,
>>
>> My JNI code includes redefinitions to the malloc, free and realloc
>> functions (shown below). These functions call GC_malloc, GC_free and
>> GC_realloc respectively. This is done so that any calls to the malloc
>> get allocated through the garbage collector. However this is failing
>> with segfault. Any clues why this does not work ?
>> I am using this code along with the libgcj library linked dynamically
>> with my application.
>>
>> void *malloc(size_t size)
>> {
>> return GC_malloc(size);
>> }
>>
>> void *realloc(void *ptr, size_t size)
>> {
>> return GC_realloc(ptr, size);
>> }
>>
>> void free(void *ptr)
>> {
>> GC_free(ptr);
>> }
>>
>>
>> regards
>> abhishek
>>
> The collector itself supports a REDIRECT_MALLOC option that might get you
> closer. In general, this is very hard.
>
> There are other functions (calloc, memalign, etc.) that you would also 
> have
> to replace, so that their clients don't end up using the original malloc
> with GC_free. This is the easy part.
>

I will say it is not so easy...

the primary issue here is DLLs / Shared Objects, which may well continue 
using the original functions even though one has managed to replace them in 
the statically-linked portion of the app...

as such, a complete replacement would likely require eliminating any use of 
shared objects as well...

to make matters worse, C runtimes are typically themselves DLLs or SOs, so 
effectively, one would have to likely replace the entire C runtime with a 
customized, statically-linked version...

granted, it may not always be that bad (after all, some OS's only use static 
linking, and others may well prefer the versions of symbols from the main 
binary rather than depended-on libs, ...), but IMO it is neither safe nor 
portable...

> The hard part is that if you replace malloc, low level parts of the system
> will also end up using GC_malloc, and sometimes squirrel away pointers to
> the results in places the GC doesn't really know about. Recent versions of
> the GC (7.1+) contains some hacks to try to handle this on Linux. But the
> multithreaded versions still are sometimes not 100% robust. Gcj's version
> is unlikely to work in this mode, except possibly in single-threaded mode.
>

agreed...

granted most code does not do such things, there is no real guerantee that 
none of the code does so.
for example, the well known XOR-pointer hacks, or representing pointers in a 
serialized form or in misaligned memory, ...

> A real fix here would probably require some new hooks in glibc and the
> startup and libpthread code.
>

agreed...
then it only necessarily works on Linux, but this is still better than 
nothing I guess...

> Hans
>

replying to "abhishek desai":

>> if you replace malloc, low level parts of the system
>> will also end up using GC_malloc
> This is what I am trying to achieve but this is turning out to be a tough 
> task.
>

the question is: does it actually make sense to do so?...

just because you may want to do something does not mean it is a good idea.

> Some more background here. I am using the gcc version 3.4.6 and the
> libgcj associated with it. I know its ancient but the project demands
> it. Cant help it. I am compiling for a mipsel architecture using
> uclibc. There are a lot of calls to malloc through the libgcj library
> or possibly in the other libraries used by libgcj. The calls to malloc
> happen even before the main function begins.
>

this is to be expected.
now where is the REAL problem?...

> uclibc implementation of memory management (malloc, realloc) relies on
> mmap and sbrk to get memory which is also used by the gc for the same
> purpose. So is not possible to use the gc instead of the uclibc memory
> manager ? Which areas could be causing the problem ? Even if low level
> parts of the system make calls to GC_malloc how does it affect the
> over all behavior ?
>

theoretically, a GC "could" be used as the main malloc, however, doing so is 
not always a good idea:
technically, it violates conformance with the C standard (malloc is defined 
to behave in a certain way, and GC_malloc does not necessarily behave in 
this way, and this variation is allowed to cause existing code to break);
it creates a big ugly issue in trying to actually do so (apart from OS 
developers, this task is not entirely practical);
one ends up additionally weighing down the GC with lots of additional work, 
thus reducing its overall performance;
...

so, my personal suggestion is to just let malloc be...

one can then, not worry about trying to replace malloc, and instead simply 
use the GC in addition to malloc (being careful not to mix them up, 
however...).

one may find that each has unique advantages:
GC is good for letting objects be garbage collected;
malloc is good for quickly grabbing big chunks of memory and then freeing 
them when done (for example, for temporary buffers and similar).

as I see it, there is no real problem in letting both sit around abd each do 
their respective tasks...

> --Abhishek