Bug 98677 - std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move
Summary: std::regex constructor triggers valgrind under clang++ with undefined sanitiz...
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 10.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: std::regex
  Show dependency treegraph
 
Reported: 2021-01-14 12:04 UTC by Egor Suvorov
Modified: 2021-10-12 20:01 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Egor Suvorov 2021-01-14 12:04:06 UTC
Consider the following code:

#include <regex>
int main() {
    std::regex regex("x{2,}");
}

If I compile and run it at Ubuntu 20.04 with

clang++-10 -fsanitize=undefined -O2 -g a.cpp && valgrind ./a.out 

I get the following error:

==2367== Memcheck, a memory error detector
==2367== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2367== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==2367== Command: ./a.out
==2367== 
==2367== Conditional jump or move depends on uninitialised value(s)
==2367==    at 0x45AC3C: std::__detail::_StateSeq<std::__cxx11::regex_traits<char> >::_M_clone() (regex_automaton.tcc:208)
==2367==    by 0x4341EA: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_quantifier() (regex_compiler.tcc:253)
==2367==    by 0x432F67: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term() (regex_compiler.tcc:143)
==2367==    by 0x432B9A: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() (regex_compiler.tcc:123)
==2367==    by 0x427E00: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() (regex_compiler.tcc:99)
==2367==    by 0x42747E: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) (regex_compiler.tcc:84)
==2367==    by 0x427149: __compile_nfa<std::__cxx11::regex_traits<char>, const char *> (regex_compiler.h:183)
==2367==    by 0x427149: std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex<char const*>(char const*, char const*, std::locale, std::regex_constants::syntax_option_type) (regex.h:763)
==2367==    by 0x427025: basic_regex<const char *> (regex.h:507)
==2367==    by 0x427025: basic_regex (regex.h:440)
==2367==    by 0x427025: main (a.cpp:3)
==2367== 
==2367== Conditional jump or move depends on uninitialised value(s)
==2367==    at 0x45AC3C: std::__detail::_StateSeq<std::__cxx11::regex_traits<char> >::_M_clone() (regex_automaton.tcc:208)
==2367==    by 0x434218: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_quantifier() (regex_compiler.tcc:257)
==2367==    by 0x432F67: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term() (regex_compiler.tcc:143)
==2367==    by 0x432B9A: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() (regex_compiler.tcc:123)
==2367==    by 0x427E00: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() (regex_compiler.tcc:99)
==2367==    by 0x42747E: std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) (regex_compiler.tcc:84)
==2367==    by 0x427149: __compile_nfa<std::__cxx11::regex_traits<char>, const char *> (regex_compiler.h:183)
==2367==    by 0x427149: std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex<char const*>(char const*, char const*, std::locale, std::regex_constants::syntax_option_type) (regex.h:763)
==2367==    by 0x427025: basic_regex<const char *> (regex.h:507)
==2367==    by 0x427025: basic_regex (regex.h:440)
==2367==    by 0x427025: main (a.cpp:3)
==2367== 
==2367== 
==2367== HEAP SUMMARY:
==2367==     in use at exit: 0 bytes in 0 blocks
==2367==   total heap usage: 20 allocs, 20 frees, 76,776 bytes allocated
==2367== 
==2367== All heap blocks were freed -- no leaks are possible
==2367== 
==2367== Use --track-origins=yes to see where uninitialised values come from
==2367== For lists of detected and suppressed errors, rerun with: -s
==2367== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)

Any of the following actions remove the error: replacing clang++ with g++, disabling -fsanitize=undefined, disabling -O2, switching to -stdlib=libc++.

Versions are:

clang version 10.0.0-4ubuntu1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

valgrind-3.15.0

libstdc++-10-dev/focal-updates,focal-security,now 10.2.0-5ubuntu1~20.04 amd64 [installed,automatic]

A friend of mine suggested that it's probably caused by use-after-move of `__dup` in regex_automaton.tcc:206 (commit e45c41988bfd655b1df7cff8fcf111dc6fb732e3 at GitHub mirror) and vaguely suggested that maybe clang++ starts to implement some kind of destructive moves:

          auto __id = _M_nfa._M_insert_state(std::move(__dup));
          __m[__u] = __id;
          if (__dup._M_has_alt())
Comment 1 Jonathan Wakely 2021-01-14 14:17:41 UTC
(In reply to Egor Suvorov from comment #0)
> suggested that maybe clang++ starts to implement some kind of destructive
> moves:

That would not be permitted by the C++ standard.

I can't reproduce this with clang 10.0.1 and valgrind 3.16.1
Comment 2 Jonathan Wakely 2021-01-14 14:21:29 UTC
I don't think the use-after-move is the problem, because only the members of the _State_base class are accessed after the move, and they are unaffected by the move.

The move constructor does this:

      _State(_State&& __rhs) : _State_base(__rhs)
      {
	if (__rhs._M_opcode() == _S_opcode_match)
	  new (this->_M_matcher_storage._M_addr())
	    _MatcherT(std::move(__rhs._M_get_matcher()));
      }

So the _State_base::_M_opcode, _State_base::_M_alt and _State_base::_M_next members of __rhs are not modified, and can be safely accessed after the move.
Comment 3 Jonathan Wakely 2021-10-09 00:46:46 UTC
I can't reproduce this error with any version of clang or gcc. No error is shown by ubsan, asan or valgrind.

Please reopen if you can still reproduce this, and provide more details about versions of the tools.
Comment 4 Egor Suvorov 2021-10-09 12:46:30 UTC
Note that the issue requires both UBSanitizer and Valgrind enabled simultaneously. Running with just one of them does not result in an error.

The versions are ones installed in Ubuntu 20.04.3 Focal Fossa (amd64) by running `sudo apt install clang-10 valgrind` (you can get VM image from https://www.osboxes.org/ubuntu-server/)

https://launchpad.net/ubuntu/focal/amd64/clang-10/1:10.0.0-4ubuntu1
https://launchpad.net/ubuntu/focal/amd64/valgrind/1:3.15.0-1ubuntu9.1

Unfortunately, I was unable to get more verbose output from either clang or Valgrind. So it may be Ubuntu-specific issue.

Actually, the problem is not reproducible with clang-12, so it may be already be fixed.
Comment 5 Jonathan Wakely 2021-10-09 14:46:38 UTC
Yes, I used both ubsan and valgrind, and couldn't reproduce it with any version of GCC or clang I tried, including with the headers from GCC 10.2

I might try to reproduce it on Ubuntu, but if it's specific to Ubuntu then it should be reported to them, not here. Upstream GCC doesn't reproduce the problem.
Comment 6 Jonathan Wakely 2021-10-12 20:01:55 UTC
I can reproduce it in a container using:

FROM ubuntu:20.04
RUN apt-get update
RUN echo 'tzdata tzdata/Areas select Europe' | debconf-set-selections
RUN echo 'tzdata tzdata/Zones/Europe select London' | debconf-set-selections
RUN DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata
RUN apt-get -y install build-essential clang-10 valgrind
COPY test.cc /tmp/
RUN clang++-10 -fsanitize=undefined -O1 -g /tmp/test.cc -o /tmp/a.out && valgrind /tmp/a.out

But it's not caused by the move (it still happens if I copy the state object instead of moving it). Initializing the _State_base::_M_match_storage buffer doesn't help. Nothing I changed in the code helped.

I think it's a ubsan bug in clang 10 and clang 11.