Background: freeing nodes of a tree allocated with custom allocators. One of the allocators can't free individual pointers, so free() is NOP in that case (the whole pool will be freed at once when the allocator is destroyed). With this allocator, the whole recursive traversal can be eliminated in theory. Examining the disasm of the generated code revealed that gcc unfolds the recursion many levels, just to do the unneeded node traversal; the actual call to the empty free() fn is eliminated. In the test case, loop() does a simple linear traversal of the linked nodes. The pointers are not volatile, and are only read, so there should not be any side effects. Why can't the compiler optimize away the whole loop? Clang does a somewhat better job, the recursion is optimized away, and one function is completely reduced to NOP (free_all2()), but the others still have the node traversal loop. Tried with gcc 4.6, 4.7.3, 4.9.0 with the same results. g++-4.9.0 -v: Using built-in specs. COLLECT_GCC=g++-4.9.0 COLLECT_LTO_WRAPPER=/home/usr-local/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ./configure --enable-languages=c,c++ --program-suffix=-4.9.0 Thread model: posix gcc version 4.9.0 20130626 (experimental) (GCC) commit 944f42fc29289812416f34d7b0c497ee79065396 command line: g++-4.9.0 -std=c++11 -O3 -Wall 20130626-free_node.cpp Regards, Peter
Created attachment 30365 [details] test case source
Created attachment 30366 [details] gcc amd64 disassembly
Created attachment 30367 [details] clang amd64 disassembly
Ooops, the test case won't perform the freeing completely, since the recursive call is not inside the 'down' traversal loop, so only the first node on the given level would be recursively freed, but this doesn't affect the missed optimization.
Created attachment 30368 [details] fixed test case (correct recursive traversal)
The loop in loop() isn't removed because it's potentially infinite, and GCC doesn't remove infinite loops by default. Add -funsafe-loop-optimizations to do that (loop() will then become an empty function). The recursion isn't removed because all calls to non-const non-pure functions are deemed necessary. dead code removal could be made to handle this with some trickery. We'd need to not mark recursive calls as inherently necessary at first, but only later if we mark anything in the function except the return statement necessary.
Is it a plausible assumption that if a function is not marked as 'noreturn' and the loop doesn't change the program's state then the loop could be optimized away? Cases: - the loop terminates, but the state is not changed, NOP - the loop does not terminate (in this case a cycle of the Node's), but the function should return (no noreturn attr), so this is probably a bug in the prg I can't think of any cases right now for the second point where that would be the desired behaviour of the program, instead of a bug. Please comment on this.
(In reply to petschy from comment #7) > Is it a plausible assumption that if a function is not marked as 'noreturn' > and the loop doesn't change the program's state then the loop could be > optimized away? It's not unplausible, but still wrong, see below. GCC optimizes empty loops only away if the functions containing them are marked pure or const (the argument being that an infinite loop is in itself a side-effect observable from outside). This could be extended to also do that for functions without the noreturn attribute. But it would be a relatively large change in semantics. That is because noreturn attributes are optional; functions that don't return aren't required to be so marked, so in practice a missing noreturn attribute doesn't mean anything. So, when GCC starts to remove infinite loops in functions missing it (in practice most functions) this might change behaviour (which you already can have with the unsafe-loop-optimization flag). The problem lies with multi-threaded programs. A function containing an infinite loop could be stopped from a different thread. While I'll happily admit that this is a quite unlikely behaviour, you can construct a valid program that will break with infinite-loop removal. So it seems that if any other attribute should trigger this in addition to const/pure (which have other effects of course), it would have to be a new one, with positive meaning, ala "this-function-does-return". I'm not aware of anyone sufficiently motivated to work on this. Perhaps you are, if so, look for finite_loop_p() in tree-ssa-loop-niter.c for the place to lookup the new attribute, and builtin-attrs.def for the place to define a new attribute. > Cases: > - the loop terminates, but the state is not changed, NOP > - the loop does not terminate (in this case a cycle of the Node's), but the > function should return (no noreturn attr), so this is probably a bug in the > prg > > I can't think of any cases right now for the second point where that would > be the desired behaviour of the program, instead of a bug. Please comment on > this. See the multi-threading example.
(In reply to Michael Matz from comment #8) > (the > argument being that an infinite loop is in itself a side-effect observable > from > outside). Exactly. > A function containing > an infinite loop could be stopped from a different thread. We have production code which does that, except the stopping (and redirection to another context) is done from a separate monitor process via ptrace(). GCC "optimizing" away infinite loops is just plain wrong, at least for ordinary systems programming languages.
Thanks for the explanation. The multithreaded argument is sound, but then, on second thought, even in single threaded programs the state might be altered by a signal handler, or another process if the memory is shared, so the optimization might break the program. The bottom line is that the compiler doesn't have enough information, so it must be conservative, hence the loop stays in. Adding a new fn attribute probably wouldn't be enough, since in general there might be more than one potentially infinite loop inside the fn, with different semantics, so optimizing all of them away still could be improper. Hence the attribute should apply to a given loop only ('finite'), but implementing it is probably too much trouble for this rare case, and the compiler still needs to eliminate the recursion, too, which might be more complex, eg multiple functions calling each other in the general case. For my specific case, I can solve the problem by providing a trait for the allocator which says 'free() is NOP, so don't bother', then the top level function can decide what to do, traverse & free or do nothing. Mikael: could you please provide some info on the ptrace() wizardy you mentioned, if it's not confidental? I got curious. Based on the discussion so far, do you think that clang is overly smart in this case, producing potentially broken code? free_all2() was compiled into a single ret, and the other two functions lack the recursion, only have the node traversal of the current level, which seems to be an error to me, because if there is an infinite loop on one of the levels, the program's behaviour will be different when compiled with optimizations. If I set n_->down to null before the recursive call, it generated the expected code, which is interesting, since then the loop is more likely on the 'finite side'. Thanks, Peter