Actually, GCC doesn't have __builtin_wcslen, but Clang does. Providing these extra two builtins would allow implementing __builtin_wcslen too. The names are not part of the C standard, but follow the current naming construction rules for it, similar to how "mbrtowc" and "wcslen" parallel. My specific need is actually to implement char16_t string containers in C++. I'm particularly interested in QString/QStringView, but this applies to std::basic_string{_view} too. For example: std::string_view f1() { return "Hello"; } std::wstring_view fw() { return L"Hello"; } std::u16string_view f16() { return u"Hello"; } std::u32string_view f32() { return U"Hello"; } With GCC and libstdc++, the first function produces optimal code: movl $5, %eax leaq .LC0(%rip), %rdx ret For wchar_t case, GCC emits an out-of-line call to wcslen: pushq %rbx leaq .LC2(%rip), %rbx movq %rbx, %rdi call wcslen@PLT movq %rbx, %rdx popq %rbx ret The next two, because of the absence of a C library function, emit a loop: xorl %eax, %eax leaq .LC1(%rip), %rcx .L4: incq %rax cmpw $0, (%rcx,%rax,2) jne .L4 movq %rcx, %rdx ret Clang, meanwhile, emits optimal code for all four and so did the pre-Clang Intel compiler. See https://gcc.godbolt.org/z/qvj7qnYbz. MSVC emits optimal for the char and wchar_t versions, but loops for the other two. Clang gives up when the string gets longer, though. See https://gcc.godbolt.org/z/54j3zr6e6. That indicates that it gave up on guessing the loop run and would do better if the intrinsic were present.
GCC built-ins like __builtin_strlen just wrap a libc function. __builtin_wcslen would generally just be a call to wcslen, which doesn't give you much. I assume what you want is to recognize wcslen and replace it with inline assembly code. Similarly, if libc doesn't provide c16slen then a __builtin_c16slen isn't going to do much. I think what you want is better code for finding char16_t(0) or char32_t(0), not a new built-in.
(In reply to Jonathan Wakely from comment #1) > GCC built-ins like __builtin_strlen just wrap a libc function. __builtin_wcslen would generally just be a call to wcslen, which doesn't give you much. But __builtin_strlen *does* get optimized when the input is a string literal. Not sure about wcslen though.
> But __builtin_strlen *does* get optimized when the input is a string literal. Not sure about wcslen though. It appears not to, in the test above. std::char_trait<wchar_t>::length() calls wcslen() whereas the char specialisation uses __builtin_strlen() explicitly. But if the intrinsics are enabled, the two would be the same, wouldn't they? Anyway, in the absence of a library function to call, inserting the loop is fine; it's what is there already. Though it would be nice to be able to provide such a function. I wrote it for Qt (it's called qustrlen). I would try with __builtin_constant_p first to see if the string is a literal.
(In reply to Xi Ruoyao from comment #2) > But __builtin_strlen *does* get optimized when the input is a string > literal. But so does strlen, because GCC knows about it. That's my point.