57723 – Missed optimization: recursion around empty function

Bug 57723 - Missed optimization: recursion around empty function

Summary: Missed optimization: recursion around empty function

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.9.0

Importance:	P3 minor
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2013-06-26 10:43 UTC by petschy
Modified:	2021-08-16 23:47 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
test case source (207 bytes, text/plain) 2013-06-26 10:44 UTC, petschy	Details
gcc amd64 disassembly (2.36 KB, text/plain) 2013-06-26 10:45 UTC, petschy	Details
clang amd64 disassembly (412 bytes, text/plain) 2013-06-26 10:45 UTC, petschy	Details
fixed test case (correct recursive traversal) (214 bytes, text/plain) 2013-06-26 11:13 UTC, petschy	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description petschy 2013-06-26 10:43:58 UTC

Background: freeing nodes of a tree allocated with custom allocators. One of the allocators can't free individual pointers, so free() is NOP in that case (the whole pool will be freed at once when the allocator is destroyed). With this allocator, the whole recursive traversal can be eliminated in theory.

Examining the disasm of the generated code revealed that gcc unfolds the recursion many levels, just to do the unneeded node traversal; the actual call to the empty free() fn is eliminated.

In the test case, loop() does a simple linear traversal of the linked nodes. The pointers are not volatile, and are only read, so there should not be any side effects. Why can't the compiler optimize away the whole loop?

Clang does a somewhat better job, the recursion is optimized away, and one function is completely reduced to NOP (free_all2()), but the others still have the node traversal loop.

Tried with gcc 4.6, 4.7.3, 4.9.0 with the same results.

g++-4.9.0 -v:
Using built-in specs.
COLLECT_GCC=g++-4.9.0
COLLECT_LTO_WRAPPER=/home/usr-local/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --enable-languages=c,c++ --program-suffix=-4.9.0
Thread model: posix
gcc version 4.9.0 20130626 (experimental) (GCC) 
commit 944f42fc29289812416f34d7b0c497ee79065396

command line:
g++-4.9.0 -std=c++11 -O3 -Wall 20130626-free_node.cpp

Regards, Peter

Comment 1 petschy 2013-06-26 10:44:45 UTC

Created attachment 30365 [details]
test case source

Comment 2 petschy 2013-06-26 10:45:22 UTC

Created attachment 30366 [details]
gcc amd64 disassembly

Comment 3 petschy 2013-06-26 10:45:47 UTC

Created attachment 30367 [details]
clang amd64 disassembly

Comment 4 petschy 2013-06-26 11:12:25 UTC

Ooops, the test case won't perform the freeing completely, since the recursive call is not inside the 'down' traversal loop, so only the first node on the given level would be recursively freed, but this doesn't affect the missed optimization.

Comment 5 petschy 2013-06-26 11:13:10 UTC

Created attachment 30368 [details]
fixed test case (correct recursive traversal)

Comment 6 Michael Matz 2013-06-26 13:44:34 UTC

The loop in loop() isn't removed because it's potentially infinite, and GCC
doesn't remove infinite loops by default.  Add -funsafe-loop-optimizations
to do that (loop() will then become an empty function).

The recursion isn't removed because all calls to non-const non-pure functions
are deemed necessary.  dead code removal could be made to handle this with
some trickery.  We'd need to not mark recursive calls as inherently necessary
at first, but only later if we mark anything in the function except the return
statement necessary.

Comment 7 petschy 2013-06-27 08:52:02 UTC

Is it a plausible assumption that if a function is not marked as 'noreturn' and the loop doesn't change the program's state then the loop could be optimized away?

Cases:
- the loop terminates, but the state is not changed, NOP
- the loop does not terminate (in this case a cycle of the Node's), but the function should return (no noreturn attr), so this is probably a bug in the prg

I can't think of any cases right now for the second point where that would be the desired behaviour of the program, instead of a bug. Please comment on this.

Comment 8 Michael Matz 2013-06-27 12:38:33 UTC

(In reply to petschy from comment #7)
> Is it a plausible assumption that if a function is not marked as 'noreturn'
> and the loop doesn't change the program's state then the loop could be
> optimized away?

It's not unplausible, but still wrong, see below.  GCC optimizes empty
loops only away if the functions containing them are marked pure or const (the
argument being that an infinite loop is in itself a side-effect observable from
outside).  This could be extended to also do that for functions without the
noreturn attribute.

But it would be a relatively large change in semantics.  That is because
noreturn attributes are optional; functions that don't return aren't
required to be so marked, so in practice a missing noreturn attribute
doesn't mean anything.  So, when GCC starts to remove infinite loops in functions missing it (in practice most functions) this might change behaviour
(which you already can have with the unsafe-loop-optimization flag).

The problem lies with multi-threaded programs.  A function containing
an infinite loop could be stopped from a different thread.  While I'll
happily admit that this is a quite unlikely behaviour, you can construct
a valid program that will break with infinite-loop removal.

So it seems that if any other attribute should trigger this in addition
to const/pure (which have other effects of course), it would have to
be a new one, with positive meaning, ala "this-function-does-return".

I'm not aware of anyone sufficiently motivated to work on this.  Perhaps
you are, if so, look for finite_loop_p() in tree-ssa-loop-niter.c for
the place to lookup the new attribute, and builtin-attrs.def for the place
to define a new attribute.

> Cases:
> - the loop terminates, but the state is not changed, NOP
> - the loop does not terminate (in this case a cycle of the Node's), but the
> function should return (no noreturn attr), so this is probably a bug in the
> prg
> 
> I can't think of any cases right now for the second point where that would
> be the desired behaviour of the program, instead of a bug. Please comment on
> this.

See the multi-threading example.

Comment 9 Mikael Pettersson 2013-06-27 13:33:50 UTC

(In reply to Michael Matz from comment #8)
> (the
> argument being that an infinite loop is in itself a side-effect observable
> from
> outside).

Exactly.

> A function containing
> an infinite loop could be stopped from a different thread.

We have production code which does that, except the stopping (and redirection to another context) is done from a separate monitor process via ptrace().

GCC "optimizing" away infinite loops is just plain wrong, at least for ordinary systems programming languages.

Comment 10 petschy 2013-06-27 20:30:52 UTC

Thanks for the explanation. The multithreaded argument is sound, but then, on second thought, even in single threaded programs the state might be altered by a signal handler, or another process if the memory is shared, so the optimization might break the program. The bottom line is that the compiler doesn't have enough information, so it must be conservative, hence the loop stays in.

Adding a new fn attribute probably wouldn't be enough, since in general there might be more than one potentially infinite loop inside the fn, with different semantics, so optimizing all of them away still could be improper. Hence the attribute should apply to a given loop only ('finite'), but implementing it is probably too much trouble for this rare case, and the compiler still needs to eliminate the recursion, too, which might be more complex, eg multiple functions calling each other in the general case.

For my specific case, I can solve the problem by providing a trait for the allocator which says 'free() is NOP, so don't bother', then the top level function can decide what to do, traverse & free or do nothing.

Mikael: could you please provide some info on the ptrace() wizardy you mentioned, if it's not confidental? I got curious.

Based on the discussion so far, do you think that clang is overly smart in this case, producing potentially broken code? free_all2() was compiled into a single ret, and the other two functions lack the recursion, only have the node traversal of the current level, which seems to be an error to me, because if there is an infinite loop on one of the levels, the program's behaviour will be different when compiled with optimizations. If I set n_->down to null before the recursive call, it generated the expected code, which is interesting, since then the loop is more likely on the 'finite side'.

Thanks, Peter