[Bug c++/98401] coroutines: Temporaries passed to co_await sometimes cause an extraneous call to destructor at incorrect address

Tue Dec 28 04:04:07 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98401

--- Comment #8 from Milo Brandt <brandt.milo2 at gmail dot com> ---
I was trying to fix this and, from my work, I have a precise diagnosis of the
bug and a hack that *mostly* fixes things, but leaves a more subtle bug
unfixed. Debugging this is getting over my head, but the descriptions below
should suffice to describe fairly precisely how the bug operates and what
behavior needs to be changed to patch it - perhaps someone more capable with
this codebase will find this helpful. Apologies for the length of this comment,
but the issue is very particular.

The bugs reported here arise from the following behavior of the code that
morphs functions to coroutines: when a temporary is created by aggregate
initialization and has its lifetime extended across a suspension point, the
members of the temporary are constructed out of place, then effectively
memcpy'd into the slots for the members of the constructed aggregate. After the
suspend point, both the temporary itself and the members that were constructed
out of place are destroyed.

This is incorrect behavior. The expected behavior is that the temporary gets
its lifetime extended, but that its members are initialized in place. Only the
temporary itself should be cleaned up. The current behavior can lead to serious
problems in generated code - for instance, if the aggregate initialized struct
had a unique_ptr as a member, the current behavior will delete that pointer
twice, almost certainly causing correct code to crash at runtime when compiled
with gcc.

As a particular example, if we had "struct Pair { X x; Y y; };" and wished to
write a line of code such as "Pair{}, co_await /*awaitable*/;", the existing
code allocates a slot in the coroutine frame for values of type Pair, X, and Y.
It initializes the slots for X and Y via constructor calls, then copies the
memory from these slots to the members of the Pair slot, then call destructors
for each of the three promoted temporaries.

This behavior is implemented in gcc/cp/coroutines.cc and principally involves
the functions maybe_promote_temps and flatten_await_stmt, which are responsible
for finding temporaries in expressions involving co_await and promoting them to
values in the coroutine's frame. At present, to flatten a sub-expression such
as Pair{} appearing in a co_await statement, it will detect three expressions
of interest: one representing the Pair itself, then two as sub-nodes of the
CONSTRUCT node for that Pair, corresponding to the two members. These are
identified by find_interesting_subtree, which calls tmp_target_expr_p, which
selects TARGET_EXPRs that are artificial (from the compiler) and which are not
associated to a named variable. All three TARGET_EXPRs meet these criteria, and
hence get promoted, even though only the outermost one actually *should* be
promoted.

This can largely be fixed by adding the following lines to tmp_target_expr_p: 
+  if (TREE_CODE (TREE_OPERAND (t, 1)) == AGGR_INIT_EXPR)
+    return false;
Adding these lines doesn't harm any existing tests and resolves the problem
reported here - they ensure that a temporary that is aggregate initialized, but
has its lifetime extended over a suspension point will have its members
constructed in place and not have their destructors called spuriously. This is
kind of a hack, but it does fix the problem reported in this thread and doesn't
seem to create any new problems.

However, this patch fails to fix a more subtle bug that is intimately tied to
the one it fixes: if there is a suspension point *during* aggregate
initialization, the initializers for the aggregate will be executed in the
wrong order even though the standard guarantees their execution order will be
from first to last. For instance, code such as "Pair{.x{}, .y=(co_await
std::suspend_always{}, 1)};" acts incorrectly by first calling the awaiter's
await_ready and await_suspend functions, then, upon resumption, constructing
the .x member via X::X(), *then* calling the awaiter's await_resume,
constructing .y, and calling Pair::~Pair(). I believe that the only correct
behavior for this line is to construct .x via X::X(), then call await_ready and
await_suspend, then, upon resumption, calling await_resume immediately, then
constructing the .y member and calling Pair::~Pair() to clean up (the only
difference being that X::X() must be called before await_ready). If the
coroutine is destroyed instead, we should destroy the .x member via X::~X()
(or, similarly, we must call X::~X() if the call to Y::Y(int) throws an
exception after resumption). An example that prints out the ordering of these
functions is here: https://godbolt.org/z/dz1818naf (though note that none of
gcc, clang nor MSVC correctly handle this code).

I believe that the correct way to patch this problem is as follows:
1. The function flatten_await_stmt must look specifically for CONSTRUCTOR nodes
in the tree and, if any initializer other than the first contains a co_await
statement, this CONSTRUCTOR node must somehow be rewritten into a series of
constructions of each individual element (e.g. replacing the above construction
by something similar to "new (&p->x) X{}, new (&p->y) Y{co_await
std::suspend_always{}, 1}, ..." where p is a Pair* pointing to the allocation
of the Pair value in the coroutine's frame - except with extra complexity to
handle exceptions during construction).
2. Whenever a transformation as in (1) occurs, the destroyer must be amended to
delete the members of a partially constructed aggregate type. The destroyer
should not destroy the aggregate itself until it has finished construction.
3. Any AGGR_INIT_EXPRs appearing as the initializer for a TARGET_EXPR in a
CONSTRUCTOR (which I believe is the only place they *can* appear in a coroutine
body? - if they can appear elsewhere, those situations should also be examined
for similar bugs) should not be promoted except through the special procedure
from (1).
I believe this strategy would fix the bad behavior identified here, but it
looks quite difficult to actually implement - at least without a strong
knowledge of gcc internals that I simply do not have.

I've attached two testcases that I would propose as minimal examples of the
incorrect behavior; the first deals merely with the problem of extending the
lifetime of aggregates. The second deals with aggregate construction that is
interrupted by a suspension point. The two line patch proposed above would fix
the first case, but not the second. The outline for a more proper patch would
fix both at once (and make the two line patch redundant).