Bug 67618

Summary: malloc+memset optimization breaks code
Product: gcc Reporter: Daniel Gutson <daniel.gutson>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED INVALID    
Severity: normal    
Priority: P3    
Version: 5.2.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed:

Description Daniel Gutson 2015-09-17 19:32:08 UTC
compiling the following code with -O2

#include <stdlib.h>
#include <string.h>

char* function(size_t x)
{
    void* ret = malloc(x);
    if (x > 12)
        memset(ret, 0, x);

    return (char*)ret;
}

int main(void)
{
    return 0;
}

causes gcc to wrongly replace function's content by a call to calloc.
This comes from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57742

Additionally, in the case for RTEMS for example, if the function itself (caller) is calloc, causes a recursive call. (Maybe this could be addressed in two separate issues).
Comment 1 Marc Glisse 2015-09-17 19:43:27 UTC
Why do you call it wrong? It is always legal to replace malloc with calloc, and if we do it, the memset becomes useless. It may not be optimal from a performance POV, but I don't see where it breaks anything.

As for the infinite loop, that's a know problem not specific to this function and there are already several PRs/emails about it. When you compile the C library itself, you are supposed to pass some options to gcc so it knows not to do that.
Comment 2 Daniel Gutson 2015-09-17 19:47:58 UTC
(In reply to Marc Glisse from comment #1)
> Why do you call it wrong? It is always legal to replace malloc with calloc,

Have you seen the 'if' condition? The optimization ignores it completely.

> and if we do it, the memset becomes useless. It may not be optimal from a
> performance POV, but I don't see where it breaks anything.
> 
> As for the infinite loop, that's a know problem not specific to this
> function and there are already several PRs/emails about it. When you compile
> the C library itself, you are supposed to pass some options to gcc so it
> knows not to do that.
Comment 3 Andreas Schwab 2015-09-17 19:49:26 UTC
Trying to read the (uninitialized) contents of the allocated memory for x <= 12 would be undefined behaviour, so calling calloc instead does not change the defined behaviour.
Comment 4 Daniel Gutson 2015-09-17 19:57:32 UTC
(In reply to Andreas Schwab from comment #3)
> Trying to read the (uninitialized) contents of the allocated memory for x <=
> 12 would be undefined behaviour, so calling calloc instead does not change
> the defined behaviour.

Why do you assert that the program will read the memory?
The optimization ignores the 'if' statement (just comment that line and see), so if malloc() returns NULL, memset is called unconditionally and will try to write to address NULL. The "x > 12" was just an example that this is arbitrary. Replace it with something else. The 'if' shall not be ignored.
Comment 5 Andrew Pinski 2015-09-17 20:08:45 UTC
(In reply to Daniel Gutson from comment #4)
> (In reply to Andreas Schwab from comment #3)
> > Trying to read the (uninitialized) contents of the allocated memory for x <=
> > 12 would be undefined behaviour, so calling calloc instead does not change
> > the defined behaviour.
> 
> Why do you assert that the program will read the memory?

It does not.  It just optimizes the code.

> The optimization ignores the 'if' statement (just comment that line and
> see), so if malloc() returns NULL, memset is called unconditionally and will
> try to write to address NULL. The "x > 12" was just an example that this is
> arbitrary. Replace it with something else. The 'if' shall not be ignored.

yes that just undefined behavior when invoking memset with a NULL value, so replacing it is still fine.

Also calloc is sometimes faster than a malloc/memset due to knowing if the kernel will zero out the memory, etc.
Comment 6 Daniel Gutson 2015-09-17 20:11:11 UTC
(In reply to Andrew Pinski from comment #5)
> (In reply to Daniel Gutson from comment #4)
> > (In reply to Andreas Schwab from comment #3)
> > > Trying to read the (uninitialized) contents of the allocated memory for x <=
> > > 12 would be undefined behaviour, so calling calloc instead does not change
> > > the defined behaviour.
> > 
> > Why do you assert that the program will read the memory?
> 
> It does not.  It just optimizes the code.

I meant: how do you know what the program will do next? Maybe it will write memory. See below please.

> 
> > The optimization ignores the 'if' statement (just comment that line and
> > see), so if malloc() returns NULL, memset is called unconditionally and will
> > try to write to address NULL. The "x > 12" was just an example that this is
> > arbitrary. Replace it with something else. The 'if' shall not be ignored.
> 
> yes that just undefined behavior when invoking memset with a NULL value, so
> replacing it is still fine.

That's why the 'if (ptr != NULL)' should not be ignored, which currently is. The 'if' prevents the UB.

> 
> Also calloc is sometimes faster than a malloc/memset due to knowing if the
> kernel will zero out the memory, etc.

This is not under discussion.
Comment 7 Marc Glisse 2015-09-17 20:13:54 UTC
(In reply to Daniel Gutson from comment #4)
> The optimization ignores the 'if' statement (just comment that line and
> see), so if malloc() returns NULL, memset is called unconditionally and will
> try to write to address NULL.

Ah, you are saying that calling memset on a null pointer is not undefined behavior, that it generates an observable trap, and that the transformation should be protected by -fdelete-null-pointer-checks?
Comment 8 Marc Glisse 2015-09-17 20:23:14 UTC
(bugzilla bug that reset the component...)

(In reply to Daniel Gutson from comment #6)
> That's why the 'if (ptr != NULL)' should not be ignored, which currently is.
> The 'if' prevents the UB.

Uh, if you consider it UB, I don't understand the problem. At runtime, either malloc succeeded and the transformation is fine, or x<=12 and the transformation is fine, or the call to memset is undefined behavior so anything is fine (including the transformation). Unless you explicitly want to catch the trap, I don't understand what you are saying. Could you detail step by step where a well-defined behavior in the original program is turned into a different behavior in the optimization?
Comment 9 Daniel Gutson 2015-09-17 20:27:59 UTC
(In reply to Marc Glisse from comment #8)
> (bugzilla bug that reset the component...)
> 
> (In reply to Daniel Gutson from comment #6)
> > That's why the 'if (ptr != NULL)' should not be ignored, which currently is.
> > The 'if' prevents the UB.
> 
> Uh, if you consider it UB, I don't understand the problem. At runtime,
> either malloc succeeded and the transformation is fine, or x<=12 and the
> transformation is fine, or the call to memset is undefined behavior so
> anything is fine (including the transformation). Unless you explicitly want
> to catch the trap, I don't understand what you are saying. Could you detail
> step by step where a well-defined behavior in the original program is turned
> into a different behavior in the optimization?

See this example: ('function' is same as above)

void caller(void)
{
    void* ptr = function(1);
    *(char*)ptr = 1;
}

In this case, calloc was called instead of (only) malloc because the 'if' was ignored, resulting in a suboptimized code (since calloc is usually slower than  malloc alone).
The resulting steps are:
    calloc(1)
    *ptr = 1;

whereas I just wanted
    malloc(1)
    *ptr = 1;

IMHO, the optimization should take the 'if' into account and only apply if it is the usual 'if (ptr != NULL)' pattern.
(Additionally, it should check that the context caller function is not 'calloc' itself).
Comment 10 Andrew Pinski 2015-09-17 20:51:59 UTC
(In reply to Daniel Gutson from comment #9)
> (In reply to Marc Glisse from comment #8)
> > (bugzilla bug that reset the component...)
> > 
> > (In reply to Daniel Gutson from comment #6)
> > > That's why the 'if (ptr != NULL)' should not be ignored, which currently is.
> > > The 'if' prevents the UB.
> > 
> > Uh, if you consider it UB, I don't understand the problem. At runtime,
> > either malloc succeeded and the transformation is fine, or x<=12 and the
> > transformation is fine, or the call to memset is undefined behavior so
> > anything is fine (including the transformation). Unless you explicitly want
> > to catch the trap, I don't understand what you are saying. Could you detail
> > step by step where a well-defined behavior in the original program is turned
> > into a different behavior in the optimization?
> 
> See this example: ('function' is same as above)
> 
> void caller(void)
> {
>     void* ptr = function(1);
>     *(char*)ptr = 1;
> }

Maybe file another bug which does the opposite transformation for the cases where memcpy happens after the calloc.  There is not enough information to know if the value is going to be <=15 most of the time or not so we just guess.

Anyways there is no breaking of code.  If you don't want this transformation inside a function which is called calloc, then you need to use -fno-builtin-malloc to disable finding of the malloc call.
Comment 11 Daniel Gutson 2015-09-17 22:35:04 UTC
(In reply to Andrew Pinski from comment #10)
> (In reply to Daniel Gutson from comment #9)
> > (In reply to Marc Glisse from comment #8)
> > > (bugzilla bug that reset the component...)
> > > 
> > > (In reply to Daniel Gutson from comment #6)
> > > > That's why the 'if (ptr != NULL)' should not be ignored, which currently is.
> > > > The 'if' prevents the UB.
> > > 
> > > Uh, if you consider it UB, I don't understand the problem. At runtime,
> > > either malloc succeeded and the transformation is fine, or x<=12 and the
> > > transformation is fine, or the call to memset is undefined behavior so
> > > anything is fine (including the transformation). Unless you explicitly want
> > > to catch the trap, I don't understand what you are saying. Could you detail
> > > step by step where a well-defined behavior in the original program is turned
> > > into a different behavior in the optimization?
> > 
> > See this example: ('function' is same as above)
> > 
> > void caller(void)
> > {
> >     void* ptr = function(1);
> >     *(char*)ptr = 1;
> > }
> 
> Maybe file another bug which does the opposite transformation for the cases
> where memcpy happens after the calloc.  There is not enough information to
> know if the value is going to be <=15 most of the time or not so we just
> guess.

Can't we use this one?

> 
> Anyways there is no breaking of code.

OK, my bad.

>  If you don't want this transformation
> inside a function which is called calloc, then you need to use
> -fno-builtin-malloc to disable finding of the malloc call.

Shouldn't be -fno-builtin-calloc the flag to prevent this optimization? I don't want to disable malloc's builtin flavor everywhere else.
Comment 12 Marc Glisse 2015-09-18 05:44:47 UTC
(In reply to Daniel Gutson from comment #11)
> > > void caller(void)
> > > {
> > >     void* ptr = function(1);
> > >     *(char*)ptr = 1;
> > > }
> > 
> > Maybe file another bug which does the opposite transformation for the cases
> > where memcpy happens after the calloc.  There is not enough information to
> > know if the value is going to be <=15 most of the time or not so we just
> > guess.
> 
> Can't we use this one?

In cases where we have malloc, memset(0,,...) and then another write that overwrites the whole content, DSE (or some other pass) will already remove the memset(0) long before the transformation into calloc, which then will not happen. If you try to compile your example above, "caller" calls malloc, not calloc.
Comment 13 roc@ocallahan.org 2016-06-03 06:29:10 UTC
Here's one case where this optimization really is invalid: when you're compiling an implementation of calloc() :-(.
Comment 14 roc@ocallahan.org 2016-06-03 06:31:08 UTC
Argh, I see this was already mentioned, sorry for the noise.