Bug 101786 - P1143R2 constinit implementation is incomplete (joining with thread_local)
Summary: P1143R2 constinit implementation is incomplete (joining with thread_local)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 10.2.0
: P3 minor
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2021-08-05 03:47 UTC by Ilya Kurdyukov
Modified: 2021-08-11 20:02 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-08-05 00:00:00


Attachments
gcc12-pr101786.patch (834 bytes, patch)
2021-08-05 14:19 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ilya Kurdyukov 2021-08-05 03:47:41 UTC
The paper says:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1143r2.html

> constinit can also be useful to compilers for non-initializing declarations
> of thread_local variables:
> 
> extern thread_local constinit x;
> int f() { return x; }
> 
> Without constinit, runtime code must be executed to perform a check of a
> guard variable and conditionally initialize x each time it is used. (Other
> techniques exist, but this approach is common.) If the variable is known to
> have constant initialization, this can be avoided.

Let's fix the missing type for x and try:

extern thread_local constinit int x;
int f() { return x; }

In case of compilation, GCC does not remove the TLS wrapper function as it should according to this paper:

_ZTW1x:
        push    rbp
        mov     rbp, rsp
        mov     eax, OFFSET FLAT:_ZTH1x
        test    rax, rax
        je      .L2
        call    _ZTH1x
.L2:
        mov     rdx, QWORD PTR fs:0
        mov     rax, QWORD PTR x@gottpoff[rip]
        add     rax, rdx
        pop     rbp
        ret
_Z1fv:
        push    rbp
        mov     rbp, rsp
        call    _ZTW1x
        mov     eax, DWORD PTR [rax]
        pop     rbp
        ret

The code it should produce should look like this: 

_Z1fv:
        push    rbp
        mov     rbp, rsp
        mov     rax, QWORD PTR x@gottpoff[rip]
        mov     eax, DWORD PTR fs:[rax]
        pop     rbp
        ret

What I can get now is only by replacing "thread_local constinit" with "__thread".

Clang implements this feature.
Comment 1 Andrew Pinski 2021-08-05 03:52:23 UTC
_ZTH1x must have been declared as weak.
Comment 2 Andrew Pinski 2021-08-05 03:54:20 UTC
Confirmed.  We most likely could get rid of the check in this case.  It is a missed optimization only really.
Comment 3 Jakub Jelinek 2021-08-05 14:19:20 UTC
Created attachment 51266 [details]
gcc12-pr101786.patch

Untested fix.
Comment 4 Jakub Jelinek 2021-08-05 14:24:28 UTC
IMHO clang++ implements it incorrectly, if you compile the testcase I've added in the patch with clang, then it will properly use _ZTH for the 4th variable (as it has non-trivial dtor, that dtor needs to be registered and that counts as dynamic initialization for the purposes of TLS wrappers), but for the case of incomplete type it doesn't, even when it can't know whether the definition of the var will have trivial or non-trivial dtor.
Comment 5 GCC Commits 2021-08-11 20:01:34 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:ee8f9ff00d79998274c967ad0c23692be9dd3ada

commit r12-2861-gee8f9ff00d79998274c967ad0c23692be9dd3ada
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Aug 11 22:00:29 2021 +0200

    c++: Optimize constinit thread_local vars [PR101786]
    
    The paper that introduced constinit mentioned in rationale that constinit
    can be used on externs as well and that it can be used to avoid the
    thread_local initialization wrappers, because the standard requires that
    if constinit is present on any declaration, it is also present on the
    initialization declaration, even if it is in some other TU etc.
    
    There is a small problem though, we use the tls wrappers not just if
    the thread_local variable needs dynamic initialization, but also when
    it has static initialization, but non-trivial destructor, as the
    "dynamic initialization" in that case needs to register the destructor.
    
    So, the following patch optimizes constinit thread_local vars only
    if we can prove they will not have non-trivial destructors.  That includes
    the case where we have incomplete type where we don't know and need to
    conservatively assume the type will have non-trivial destructor at the
    initializing declaration side.
    
    2021-08-11  Jakub Jelinek  <jakub@redhat.com>
    
            PR c++/101786
            * decl2.c (var_defined_without_dynamic_init): Return true for
            DECL_DECLARED_CONSTINIT_P with complete type and trivial destructor.
    
            * g++.dg/cpp2a/constinit16.C: New test.
Comment 6 Jakub Jelinek 2021-08-11 20:02:17 UTC
Fixed for 12+.