Consider the following code: #include <regex> int main() { std::regex regex("x{2,}"); } If I compile and run it at Ubuntu 20.04 with clang++-10 -fsanitize=undefined -O2 -g a.cpp && valgrind ./a.out I get the following error: ==2367== Memcheck, a memory error detector ==2367== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==2367== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==2367== Command: ./a.out ==2367== ==2367== Conditional jump or move depends on uninitialised value(s) ==2367== at 0x45AC3C: std::__detail::_StateSeq<std::__cxx11::regex_traits<char> >::_M_clone() (regex_automaton.tcc:208) ==2367== by 0x4341EA: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_quantifier() (regex_compiler.tcc:253) ==2367== by 0x432F67: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term() (regex_compiler.tcc:143) ==2367== by 0x432B9A: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() (regex_compiler.tcc:123) ==2367== by 0x427E00: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() (regex_compiler.tcc:99) ==2367== by 0x42747E: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) (regex_compiler.tcc:84) ==2367== by 0x427149: __compile_nfa<std::__cxx11::regex_traits<char>, const char *> (regex_compiler.h:183) ==2367== by 0x427149: std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex<char const*>(char const*, char const*, std::locale, std::regex_constants::syntax_option_type) (regex.h:763) ==2367== by 0x427025: basic_regex<const char *> (regex.h:507) ==2367== by 0x427025: basic_regex (regex.h:440) ==2367== by 0x427025: main (a.cpp:3) ==2367== ==2367== Conditional jump or move depends on uninitialised value(s) ==2367== at 0x45AC3C: std::__detail::_StateSeq<std::__cxx11::regex_traits<char> >::_M_clone() (regex_automaton.tcc:208) ==2367== by 0x434218: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_quantifier() (regex_compiler.tcc:257) ==2367== by 0x432F67: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term() (regex_compiler.tcc:143) ==2367== by 0x432B9A: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() (regex_compiler.tcc:123) ==2367== by 0x427E00: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() (regex_compiler.tcc:99) ==2367== by 0x42747E: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) (regex_compiler.tcc:84) ==2367== by 0x427149: __compile_nfa<std::__cxx11::regex_traits<char>, const char *> (regex_compiler.h:183) ==2367== by 0x427149: std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex<char const*>(char const*, char const*, std::locale, std::regex_constants::syntax_option_type) (regex.h:763) ==2367== by 0x427025: basic_regex<const char *> (regex.h:507) ==2367== by 0x427025: basic_regex (regex.h:440) ==2367== by 0x427025: main (a.cpp:3) ==2367== ==2367== ==2367== HEAP SUMMARY: ==2367== in use at exit: 0 bytes in 0 blocks ==2367== total heap usage: 20 allocs, 20 frees, 76,776 bytes allocated ==2367== ==2367== All heap blocks were freed -- no leaks are possible ==2367== ==2367== Use --track-origins=yes to see where uninitialised values come from ==2367== For lists of detected and suppressed errors, rerun with: -s ==2367== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0) Any of the following actions remove the error: replacing clang++ with g++, disabling -fsanitize=undefined, disabling -O2, switching to -stdlib=libc++. Versions are: clang version 10.0.0-4ubuntu1 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin valgrind-3.15.0 libstdc++-10-dev/focal-updates,focal-security,now 10.2.0-5ubuntu1~20.04 amd64 [installed,automatic] A friend of mine suggested that it's probably caused by use-after-move of `__dup` in regex_automaton.tcc:206 (commit e45c41988bfd655b1df7cff8fcf111dc6fb732e3 at GitHub mirror) and vaguely suggested that maybe clang++ starts to implement some kind of destructive moves: auto __id = _M_nfa._M_insert_state(std::move(__dup)); __m[__u] = __id; if (__dup._M_has_alt())
(In reply to Egor Suvorov from comment #0) > suggested that maybe clang++ starts to implement some kind of destructive > moves: That would not be permitted by the C++ standard. I can't reproduce this with clang 10.0.1 and valgrind 3.16.1
I don't think the use-after-move is the problem, because only the members of the _State_base class are accessed after the move, and they are unaffected by the move. The move constructor does this: _State(_State&& __rhs) : _State_base(__rhs) { if (__rhs._M_opcode() == _S_opcode_match) new (this->_M_matcher_storage._M_addr()) _MatcherT(std::move(__rhs._M_get_matcher())); } So the _State_base::_M_opcode, _State_base::_M_alt and _State_base::_M_next members of __rhs are not modified, and can be safely accessed after the move.
I can't reproduce this error with any version of clang or gcc. No error is shown by ubsan, asan or valgrind. Please reopen if you can still reproduce this, and provide more details about versions of the tools.
Note that the issue requires both UBSanitizer and Valgrind enabled simultaneously. Running with just one of them does not result in an error. The versions are ones installed in Ubuntu 20.04.3 Focal Fossa (amd64) by running `sudo apt install clang-10 valgrind` (you can get VM image from https://www.osboxes.org/ubuntu-server/) https://launchpad.net/ubuntu/focal/amd64/clang-10/1:10.0.0-4ubuntu1 https://launchpad.net/ubuntu/focal/amd64/valgrind/1:3.15.0-1ubuntu9.1 Unfortunately, I was unable to get more verbose output from either clang or Valgrind. So it may be Ubuntu-specific issue. Actually, the problem is not reproducible with clang-12, so it may be already be fixed.
Yes, I used both ubsan and valgrind, and couldn't reproduce it with any version of GCC or clang I tried, including with the headers from GCC 10.2 I might try to reproduce it on Ubuntu, but if it's specific to Ubuntu then it should be reported to them, not here. Upstream GCC doesn't reproduce the problem.
I can reproduce it in a container using: FROM ubuntu:20.04 RUN apt-get update RUN echo 'tzdata tzdata/Areas select Europe' | debconf-set-selections RUN echo 'tzdata tzdata/Zones/Europe select London' | debconf-set-selections RUN DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata RUN apt-get -y install build-essential clang-10 valgrind COPY test.cc /tmp/ RUN clang++-10 -fsanitize=undefined -O1 -g /tmp/test.cc -o /tmp/a.out && valgrind /tmp/a.out But it's not caused by the move (it still happens if I copy the state object instead of moving it). Initializing the _State_base::_M_match_storage buffer doesn't help. Nothing I changed in the code helped. I think it's a ubsan bug in clang 10 and clang 11.