Given the following code: #include <string_view> size_t findFirstE_slow(std::string_view sv) { return sv.find_first_of("eE"); } size_t findFirstE_fast(std::string_view sv) { auto it{sv.begin()}; for (; it != sv.end() && *it != 'e' && *it != 'E'; ++it) ; return it == sv.end() ? std::string_view::npos : size_t(it - sv.begin()); } x86-64 gcc (trunk) -std=c++20 -O3 generates the following assembly: .LC0: .string "eE" findFirstE_slow(std::basic_string_view<char, std::char_traits<char> >): push r12 push rbp push rbx test rdi, rdi je .L4 mov rbx, rdi mov rbp, rsi xor r12d, r12d jmp .L3 .L8: add r12, 1 cmp rbx, r12 je .L4 .L3: movsx esi, BYTE PTR [rbp+0+r12] mov edx, 2 mov edi, OFFSET FLAT:.LC0 call memchr test rax, rax je .L8 mov rax, r12 pop rbx pop rbp pop r12 ret .L4: mov r12, -1 pop rbx pop rbp mov rax, r12 pop r12 ret findFirstE_fast(std::basic_string_view<char, std::char_traits<char> >): add rdi, rsi cmp rdi, rsi je .L13 mov rax, rsi jmp .L12 .L15: add rax, 1 cmp rdi, rax je .L13 .L12: movzx edx, BYTE PTR [rax] and edx, -33 cmp dl, 69 jne .L15 sub rax, rsi ret .L13: mov rax, -1 ret Even though both findFirstE_slow() and findFirstE_fast() are logically equivalent, findFirstE_slow() calls memchr() for every character in the input string. findFirstE_fast() does what we would reasonably expect, comparing every character in the input string with 'e' and 'E'. In practice, when the set of characters to be found is small, findFirstE_slow() runs noticeably slower than findFirstE_fast(). This seems to be a missed optimization in char_traits<char>::find(), which affects several find-related methods in string_view. Relevant StackOverflow question with quick-bench output, Compiler Explorer output, and more discussion: https://stackoverflow.com/questions/70433152/missed-optimization-with-string-viewfind-first-of
Really the issue is: _8 = __builtin_memchr ("eE", _7, 2); if (_8 != 0B) Should just be expanded to _7 == 'e' || _7 == 'E' . That is: int f(char a) { return __builtin_memchr ("e", a, 1) != 0; } int f1(char a) { return a == 'e'; } int g(char a) { return __builtin_memchr ("eE", a, 2) != 0; } int g1(char a) { return a == 'e' || a == 'E'; } f and f1 should produce the same assembly and g and g1 should produce the same (similar) assembly.
Created attachment 53154 [details] A patch Like this?
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:c6cf555a88f8ffb8ec85c2b94fe656ab217260ea commit r13-1699-gc6cf555a88f8ffb8ec85c2b94fe656ab217260ea Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jun 17 07:33:06 2022 -0700 Simplify memchr with small constant strings When memchr is applied on a constant string of no more than the bytes of a word, simplify memchr by checking each byte in the constant string. int f (int a) { return __builtin_memchr ("AE", a, 2) != 0; } is simplified to int f (int a) { return ((char) a == 'A' || (char) a == 'E') != 0; } gcc/ PR tree-optimization/103798 * tree-ssa-forwprop.cc: Include "tree-ssa-strlen.h". (simplify_builtin_call): Inline memchr with constant strings of no more than the bytes of a word. * tree-ssa-strlen.cc (use_in_zero_equality): Make it global. * tree-ssa-strlen.h (use_in_zero_equality): New. gcc/testsuite/ PR tree-optimization/103798 * c-c++-common/pr103798-1.c: New test. * c-c++-common/pr103798-2.c: Likewise. * c-c++-common/pr103798-3.c: Likewise. * c-c++-common/pr103798-4.c: Likewise. * c-c++-common/pr103798-5.c: Likewise. * c-c++-common/pr103798-6.c: Likewise. * c-c++-common/pr103798-7.c: Likewise. * c-c++-common/pr103798-8.c: Likewise. * c-c++-common/pr103798-9.c: Likewise. * c-c++-common/pr103798-10.c: Likewise.
I just started looking through the automated builds logs and noticed that three configurations (`.../configure --enable-werror-always --enable-languages=all --disable-gcov --disable-shared --disable-threads --target=rl78-elf --without-headers`, and with --target=avr-elf as well as with --target=gcc-pru-elf) break during `make V=1 all-gcc`: [...] /usr/lib/gcc-snapshot/bin/g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -W cast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc/gcc/../libbacktrace -o tree-ssa-forwprop.o -MT tree-ssa-forwprop.o -MMD -MP -MF ./.deps/tree-ssa-forwprop.TPo ../../gcc/gcc/tree-ssa-forwprop.cc ../../gcc/gcc/tree-ssa-forwprop.cc: In function 'bool simplify_builtin_call(gimple_stmt_iterator*, tree)': ../../gcc/gcc/tree-ssa-forwprop.cc:1258:42: error: array subscript 1 is outside array bounds of 'tree_node* [1]' [-Werror=array-bounds] 1258 | op[i - 1] = fold_convert_loc (loc, boolean_type_node, | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ 1259 | fold_build2_loc (loc, | ~~~~~~~~~~~~~~~~~~~~~ 1260 | BIT_IOR_EXPR, | ~~~~~~~~~~~~~ 1261 | boolean_type_node, | ~~~~~~~~~~~~~~~~~~ 1262 | op[i - 1], | ~~~~~~~~~~ 1263 | op[i])); | ~~~~~~~ In file included from ../../gcc/gcc/system.h:707, from ../../gcc/gcc/tree-ssa-forwprop.cc:21: ../../gcc/gcc/../include/libiberty.h:733:36: note: at offset 8 into object of size [0, 8] allocated by '__builtin_alloca' 733 | # define alloca(x) __builtin_alloca(x) | ~~~~~~~~~~~~~~~~^~~ ../../gcc/gcc/../include/libiberty.h:365:40: note: in expansion of macro 'alloca' 365 | #define XALLOCAVEC(T, N) ((T *) alloca (sizeof (T) * (N))) | ^~~~~~ ../../gcc/gcc/tree-ssa-forwprop.cc:1250:22: note: in expansion of macro 'XALLOCAVEC' 1250 | tree *op = XALLOCAVEC (tree, isize); | ^~~~~~~~~~ cc1plus: all warnings being treated as errors make[1]: *** [Makefile:1146: tree-ssa-forwprop.o] Error 1 This also happens for --target=avr-elf and --target=gcc-pru-elf . Thank, Jan-Benedict