Bug 114088 - Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw
Summary: Please provide __builtin_c16slen and __builtin_c32slen to complement __builti...
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-24 15:59 UTC by Thiago Macieira
Modified: 2024-04-08 21:06 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thiago Macieira 2024-02-24 15:59:30 UTC
Actually, GCC doesn't have __builtin_wcslen, but Clang does. Providing these extra two builtins would allow implementing __builtin_wcslen too. The names are not part of the C standard, but follow the current naming construction rules for it, similar to how "mbrtowc" and "wcslen" parallel.

My specific need is actually to implement char16_t string containers in C++. I'm particularly interested in QString/QStringView, but this applies to std::basic_string{_view} too.

For example:

std::string_view f1() { return "Hello"; }
std::wstring_view fw() { return L"Hello"; }
std::u16string_view f16() { return u"Hello"; }
std::u32string_view f32() { return U"Hello"; }

With GCC and libstdc++, the first function produces optimal code:
        movl    $5, %eax
        leaq    .LC0(%rip), %rdx
        ret

For wchar_t case, GCC emits an out-of-line call to wcslen:
        pushq   %rbx
        leaq    .LC2(%rip), %rbx
        movq    %rbx, %rdi
        call    wcslen@PLT
        movq    %rbx, %rdx
        popq    %rbx
        ret

The next two, because of the absence of a C library function, emit a loop:
        xorl    %eax, %eax
        leaq    .LC1(%rip), %rcx
.L4:
        incq    %rax
        cmpw    $0, (%rcx,%rax,2)
        jne     .L4
        movq    %rcx, %rdx
        ret

Clang, meanwhile, emits optimal code for all four and so did the pre-Clang Intel compiler. See https://gcc.godbolt.org/z/qvj7qnYbz. MSVC emits optimal for the char and wchar_t versions, but loops for the other two.

Clang gives up when the string gets longer, though. See https://gcc.godbolt.org/z/54j3zr6e6. That indicates that it gave up on guessing the loop run and would do better if the intrinsic were present.
Comment 1 Jonathan Wakely 2024-02-24 18:28:52 UTC
GCC built-ins like __builtin_strlen just wrap a libc function. __builtin_wcslen would generally just be a call to wcslen, which doesn't give you much. I assume what you want is to recognize wcslen and replace it with inline assembly code.

Similarly, if libc doesn't provide c16slen then a __builtin_c16slen isn't going to do much.

I think what you want is better code for finding char16_t(0) or char32_t(0), not a new built-in.
Comment 2 Xi Ruoyao 2024-02-25 05:15:04 UTC
(In reply to Jonathan Wakely from comment #1)
> GCC built-ins like __builtin_strlen just wrap a libc function. __builtin_wcslen would generally just be a call to wcslen, which doesn't give you much.

But __builtin_strlen *does* get optimized when the input is a string literal.  Not sure about wcslen though.
Comment 3 Thiago Macieira 2024-02-25 06:22:10 UTC
> But __builtin_strlen *does* get optimized when the input is a string literal.  Not sure about wcslen though.

It appears not to, in the test above. std::char_trait<wchar_t>::length() calls wcslen() whereas the char specialisation uses __builtin_strlen() explicitly. But if the intrinsics are enabled, the two would be the same, wouldn't they?

Anyway, in the absence of a library function to call, inserting the loop is fine; it's what is there already.

Though it would be nice to be able to provide such a function. I wrote it for Qt (it's called qustrlen). I would try with __builtin_constant_p first to see if the string is a literal.
Comment 4 Jonathan Wakely 2024-02-25 13:27:42 UTC
(In reply to Xi Ruoyao from comment #2)
> But __builtin_strlen *does* get optimized when the input is a string
> literal.

But so does strlen, because GCC knows about it. That's my point.