Bug 19831 - Missing DSE/malloc/free optimization
Summary: Missing DSE/malloc/free optimization
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 enhancement
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords: missed-optimization
: 53082 (view as bug list)
Depends on:
Blocks: 88118
  Show dependency treegraph
 
Reported: 2005-02-08 21:35 UTC by Andrew Pinski
Modified: 2024-04-08 21:49 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-06-26 15:12:33


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2005-02-08 21:35:14 UTC
The following function really should be compiled to an empty function (DSE should first remove the 
store and then free of a malloc with no change inbetween and we should remove both calls).
void *malloc(__SIZE_TYPE__);
void free(void*);
int f(void)
{
  char *i = malloc(1);
  *i = 1;
  free (i);
}

This is something which is useful when we want to much higer level optimizations.
Comment 1 Daniel Berlin 2005-02-09 03:42:10 UTC
Subject: Re:  New: Missing DSE/malloc/free
	optimization

> void *malloc(__SIZE_TYPE__);
> void free(void*);
> int f(void)
> {
>   char *i = malloc(1);
>   *i = 1;
>   free (i);
> }
> 
> This is something which is useful when we want to much higer level optimizations.

This would require an attribute free, which is on the tree-profiling
branch

Otherwise, you wouldn't know that free is really "free"

Comment 2 Richard Biener 2009-06-26 15:12:33 UTC
I have a patch for the DSE part.

The malloc part shouldn't be too difficult.

The free part is more interesting.  Basically the pointer use in free (p)
should mark the malloc necessary but really nothing is there to mark the
free necessary if it is (the free is necessary iff the malloc is necessary but
nothing provides this link to provide easy integration with the DCE worklist
implementation).

Maybe a two-stage implementation would work where we replace an unnecessary
malloc () with a NULL assignment (so the free becomes a no-op and is
eventually folded away after CCP).  A little bit hackish - and dependent
on free (NULL) (if it happens to persist) to not fault.
Comment 3 Richard Biener 2009-06-26 15:23:28 UTC
Hm, it's not _that_ simple.  A fab hack is easier I guess.
Comment 4 Richard Biener 2009-06-26 15:39:52 UTC
And we want to optimize

void *malloc(__SIZE_TYPE__);
void free(void*);
void abort(void);
int f(void)
{
  char *i = malloc(1);
  if (i == (void *)0)
    abort ();
  *i = 1;
  free (i);
}

the same (removing the check and the abort() call).

Which asks for doing the task all at once.  Somewhere.
Comment 5 Richard Biener 2009-07-01 12:27:47 UTC
Subject: Bug 19831

Author: rguenth
Date: Wed Jul  1 12:27:33 2009
New Revision: 149140

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=149140
Log:
2009-07-01  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/19831
	* tree-ssa-dce.c (propagate_necessity): Calls to functions
	that only act as barriers do not make any previous stores
	necessary.
	* tree-ssa-structalias.c (handle_lhs_call): Delay making
	HEAP variables global, do not add a constraint from nonlocal.
	(find_func_aliases): Handle escapes through return statements.
	(compute_points_to_sets): Make escaped HEAP variables global.

	* gcc.dg/tree-ssa/20041122-1.c: Enable TBAA, scan FRE dump,
	make allocated memory escape.  Un-XFAIL.
	* gcc.dg/vect/pr21591.c: Make allocated memory escape.
	* gcc.dg/vect/pr31699.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dce-7.c: New testcase.

	libmudflap/
	* testsuite/libmudflap.c/fail11-frag.c: Make allocated memory
	escape.
	* testsuite/libmudflap.c/fail12-frag.c: Likewise.
	* testsuite/libmudflap.c/fail16-frag.c: Likewise.
	* testsuite/libmudflap.c/fail31-frag.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-7.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/20041122-1.c
    trunk/gcc/testsuite/gcc.dg/vect/pr21591.c
    trunk/gcc/testsuite/gcc.dg/vect/pr31699.c
    trunk/gcc/tree-ssa-dce.c
    trunk/gcc/tree-ssa-structalias.c
    trunk/libmudflap/ChangeLog
    trunk/libmudflap/testsuite/libmudflap.c/fail11-frag.c
    trunk/libmudflap/testsuite/libmudflap.c/fail12-frag.c
    trunk/libmudflap/testsuite/libmudflap.c/fail16-frag.c
    trunk/libmudflap/testsuite/libmudflap.c/fail31-frag.c

Comment 6 Richard Biener 2009-07-01 12:28:57 UTC
I am not working on the other piece.
Comment 7 Richard Biener 2011-09-07 14:11:29 UTC
I'll give it another try somewhen.  Don't mark the free as necessary, nor
its operand.  If a malloc ends up not being marked as necessary
and all uses of the lhs are free calls remove the pairs, otherwise do nothing
in eliminate_unnecessary_stmts.
Comment 8 Richard Biener 2011-09-07 15:04:17 UTC
Hm, and for invalid code like

int main()
{
  int *p = __builtin_malloc (sizeof (int) * 4);
  *p++ = 4;
  *p++ = 8;
  __builtin_free (p);
  return 0;
}

we shouldn't ICE either ... but here the idea wouldn't work (malloc is
not marked necessary but free doesn't match up with it either).
Comment 9 Richard Biener 2011-09-08 09:21:47 UTC
Author: rguenth
Date: Thu Sep  8 09:21:39 2011
New Revision: 178683

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178683
Log:
2011-09-08  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/19831
	* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Do not mark
	allocation functions as necessary.

	* gcc.dg/tree-ssa/ssa-dce-8.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-8.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-dce.c
Comment 10 Richard Biener 2011-09-08 13:00:32 UTC
Author: rguenth
Date: Thu Sep  8 13:00:23 2011
New Revision: 178687

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178687
Log:
2011-09-08  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/19831
	* tree-ssa-dce.c (mark_all_reaching_defs_necessary_1): Also
	skip builtins with vdefs that do not really store something.
	(propagate_necessity): For calls to free that we can associate
	with an allocation function do not mark the freed pointer
	definition necessary.
	(eliminate_unnecessary_stmts): Remove a call to free if
	the associated call to an allocation function is not necessary.

	* gcc.dg/tree-ssa/pr19831-1.c: New testcase.
	* gcc.dg/tree-ssa/pr19831-2.c: Likewise.
	* gcc.dg/tree-ssa/pr19831-3.c: Likewise.
	* gcc.dg/errno-1.c: Adjust.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr19831-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr19831-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr19831-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/errno-1.c
    trunk/gcc/tree-ssa-dce.c
Comment 11 Richard Biener 2011-09-08 13:05:26 UTC
More cases handled now.  What we still miss is conditional free, see
gcc.dg/tree-ssa/pr19831-3.c for XFAILed testcases.
Comment 12 xiaoyuanbo 2012-02-22 12:41:56 UTC
but you need some
Comment 13 Andrew Pinski 2012-04-23 06:51:34 UTC
*** Bug 53082 has been marked as a duplicate of this bug. ***
Comment 14 Marc Glisse 2013-06-01 13:57:17 UTC
Discussing PR 53082 here since it was closed as a dup. It asks for 2 things:

1) Replacing a=malloc(n);b=malloc(m); ... free(a);free(b); with a single malloc-free pair: a=malloc(n+m);b=a+n; ... free(a);.

2) Replacing some malloc calls with alloca.

I am wondering about the circumstances where those things are safe to do. The second one can only be safe if the size is a constant or VRP tells us it is at most a small constant. And then for both, nothing too weird must happen in between, but I am not sure what bad things are possible exactly. The "easy" case would be with the 2 malloc calls consecutive (what can't we put in between?), the 2 free calls consecutive, everything in a single basic block (this sounds way too restrictive), the argument of free an SSA_NAME defined with the malloc call, nothing that may throw (or any other way we can avoid coming back to execute free) in "...". Then I have a hard time finding bad cases (except that on error, malloc returns 0, and we put 0+n in b, maybe it requires a -fmalloc-cannot-fail). Are there other places in the compiler with a similar set of checks?

And then we would also need to extend everything to new/delete :-(
(unless maybe we install an LTO version of libstdc++ and, with -fwhole-program (taken as a promise not to LD_PRELOAD another new/delete), gcc accepts to inline new/delete and sees malloc and free, but that seems a long shot...)
Comment 15 Marc Glisse 2013-10-10 20:49:39 UTC
See the discussion that starts at:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130923/188747.html

(it continues in the following weeks) for optimization 2 of comment #14 in clang. The key idea is that if, in the function where malloc is called, we can check that on all paths (including EH) to the exit of the function free is called on the return value of malloc, then we can replace malloc and all the corresponding free with a stack allocation (whether we want to, depending on the size, is a different issue). Of course there are some details like: if we test whether malloc returned 0, that's one branch where we don't need free, etc.

Also, the optimization can make a crash happen much earlier:
p=malloc(n);fun(p);free(p);
where fun calls free, would normally fail in the second free (double-free) and will instead fail in fun (free of a pointer to the stack). I think that's fine, but people may be surprised if there are a lot of side-effects (I/O for instance) between the 2 free.

The hardest part here is deciding whether we always reach a free (this is how we prove that there is no hidden call to realloc/free). There could be an infinite loop in between. Any function we call might itself have an infinite loop or call exit. And worst of all, there might be a call to longjmp. That seems to make it impossible to perform the optimization when there is a function call, in all but the most contrived cases (a call to setjmp between malloc and the function call, or in C++ if the EH landing pad for the call contains a non-inlined call to a non-trivial destructor).

So to make this possible, the first thing to do seems to forbid longjmp: -fno-longjmp.

But even then it is still hard to know if regular functions always return (infinite loops, exit). There are *LOOPING_CONST_OR_PURE* flags in gcc (and a hardcoded list of builtins in call_may_noreturn_p), we could generalize them to a may_not_return / always_returns flag (and associated attribute), that gcc could recursively infer on many functions (a similar no_longjmp flag could also be added to avoid the need for -fno-longjmp). I think this is the most promising direction.

Assuming that we never do the things that would break the optimization (call realloc or free in a function that won't return) seems too complicated to explain properly, so I wouldn't even propose a -fI-promise-it-is-ok.
Comment 16 Marc Glisse 2013-10-16 15:12:48 UTC
Some more transformations for the list:

p = malloc (n);
memcpy (p, q, m);
free (q);

==>

p = realloc (q, n);

it isn't equivalent, in particular it could be slower if m is much smaller than what q points to, but I think it should generally be safe and profitable. Doing it without the memcpy is more questionable.

---

And in case we could actually see where q was coming from above:

q = malloc (n);
...
p = realloc (q, n); // --> just p = q;

Maybe I shouldn't list more without implementing at least one first...

---

A detail I had forgotten for the heap to stack (malloc+free -> alloca) transformation is that the stack deallocation (which we clearly want, especially if the allocation is in a loop) can't be inserted exactly where free was. Otherwise, for:

p = malloc (m);
q = malloc (n);
free (p);
free (q); // deallocation not in reverse order!

bad things could happen. Besides, there could already be VLAs in the function, and even without doing stack deallocation on free there are already strange interactions if we call alloca from inside the scope of a VLA.
Comment 17 Marc Glisse 2013-10-25 21:32:50 UTC
void f (double * __restrict a) {
  int * __restrict p = (int*) __builtin_malloc (sizeof (int));
  *p = 42;
  __builtin_free (p);
  ++*a; // Breaks the optimization!
}

Normally, gcc now manages to remove unused malloc+write+free sequences. However, it seems that completely unrelated operations, even after free, can prevent this optimization (it is the detection that *p=42 is dead that fails). That limits the effect in real-world code...

(funny that if I remove the line with free(p), gcc does manage to optimize)
Comment 18 rguenther@suse.de 2013-10-29 12:14:08 UTC
On Fri, 25 Oct 2013, glisse at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19831
> 
> --- Comment #17 from Marc Glisse <glisse at gcc dot gnu.org> ---
> void f (double * __restrict a) {
>   int * __restrict p = (int*) __builtin_malloc (sizeof (int));
>   *p = 42;
>   __builtin_free (p);
>   ++*a; // Breaks the optimization!
> }
> 
> Normally, gcc now manages to remove unused malloc+write+free sequences.
> However, it seems that completely unrelated operations, even after free, can
> prevent this optimization (it is the detection that *p=42 is dead that fails).
> That limits the effect in real-world code...
> 
> (funny that if I remove the line with free(p), gcc does manage to optimize)

That's just DCE doing more than it is designed to do in some very
rare corner cases.

Look why DSE doesn't do its job instead.
Comment 19 Marc Glisse 2013-10-29 13:19:09 UTC
Author: glisse
Date: Tue Oct 29 13:19:08 2013
New Revision: 204160

URL: http://gcc.gnu.org/viewcvs?rev=204160&root=gcc&view=rev
Log:
2013-10-29  Marc Glisse  <marc.glisse@inria.fr>

	PR tree-optimization/19831
gcc/
	* tree-ssa-alias.c (stmt_kills_ref_p_1): Handle BUILT_IN_FREE.

gcc/testsuite/
	* gcc.dg/tree-ssa/alias-25.c: New file.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/alias-25.c   (with props)
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-alias.c

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/alias-25.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/alias-25.c
            ('svn:keywords' added)
Comment 20 Richard Biener 2021-03-25 14:07:45 UTC
The original cases are all fixed but what remains is us failing to elide

void f ()
{
  void *p = __builtin_malloc (1);
  if (!p)
    __builtin_abort ();
  __builtin_free (p);
}

if that's even desirable.  Note that we mark the abort() call as necessary
since it has side-effects which is contrary to this expectation.  If you
think of, say,

extern int alloc_fails;
extern int alloc_success;
void f ()
{
  void *p = __builtin_malloc (1);
  if (!p)
    alloc_fails++;
  else
    alloc_success++;
  __builtin_free (p);
}

then the question is really whether we may assume that malloc does not
return NULL.  We can then avoid marking 'p' necessary on NULL checks
on malloc returned pointers and upon DCEing the malloc/free pair
replace the def of 'p' with a non-zero constant.

Related would be to play similar things with realloc - replace

 void *p = realloc (q, n);
 free (p);

with

 free (q);

and

  void *p = realloc (q, n);
  if (p == q)
    not_copied++;
 free (p);

with

  void *p = q;
  if (p == q)
    not_copied++;
  free(q);

so here we'd not only have to deal with != NULL but other pointer equality
checks.
Comment 21 koss.dallas 2021-06-21 17:50:16 UTC Comment hidden (spam)