Hi, Tested on GCC 4.8.1 ---------- #include <regex> using namespace std; int main (int argc, char *argv[]) { regex r(argv[1]); return 0; } ---------- # g++ c11RE.cpp -o c11RE -std=c++11 -Wall # ./c11RE '.*' # ./c11RE '(|' Segmentation fault (core dumped) # ./c11RE '((x|' *** Error in `./c11RE': malloc(): memory corruption: 0x00007fffa0cb8670 *** Expected (regex_error): # ./c11RE '(x}' terminate called after throwing an instance of 'std::regex_error' what(): regex_error Aborted (core dumped) ------------ (gdb) r '(|' The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/cx/c11RE '(|' Program received signal SIGSEGV, Segmentation fault. 0x00000000004022cc in std::__detail::_StateSeq::_StateSeq(std::__detail::_StateSeq const&) () (gdb) bt #0 0x00000000004022cc in std::__detail::_StateSeq::_StateSeq(std::__detail::_StateSeq const&) () #1 0x0000000000404a05 in std::__detail::_Compiler<char const*, std::regex_traits<char> >::_M_disjunction() () #2 0x0000000000407901 in std::__detail::_Compiler<char const*, std::regex_traits<char> >::_M_atom() () #3 0x00000000004069cb in std::__detail::_Compiler<char const*, std::regex_traits<char> >::_M_term() () #4 0x000000000040567e in std::__detail::_Compiler<char const*, std::regex_traits<char> >::_M_alternative() () #5 0x00000000004049c8 in std::__detail::_Compiler<char const*, std::regex_traits<char> >::_M_disjunction() () #6 0x0000000000403ef2 in std::__detail::_Compiler<char const*, std::regex_traits<char> >::_Compiler(char const* const&, char const* const&, std::regex_traits<char>&, unsigned int) () #7 0x0000000000403297 in std::shared_ptr<std::__detail::_Automaton> std::__detail::__compile<char const*, std::regex_traits<char> >(char const* const&, char const* const&, std::regex_traits<char>&, unsigned int) () #8 0x0000000000402abb in std::basic_regex<char, std::regex_traits<char> >::basic_regex(char const*, unsigned int) () #9 0x0000000000401767 in main () (gdb) x/i $rip => 0x4022cc <_ZNSt8__detail9_StateSeqC2ERKS0_+16>: mov (%rax),%rdx (gdb) x/x $rax 0xffffffffffffffe8: Cannot access memory at address 0xffffffffffffffe8 (gdb) x/x $rdx 0xffffffffffffffe8: Cannot access memory at address 0xffffffffffffffe8 ------------ BR, Maksymilian http://cxsecurity.com/
*sigh* <regex> is not implemented prior to GCC 4.9.0, I thought the whole world was aware of that by now.
Sorry for mistake. Could you check this again ? cx@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -v Using built-in specs. COLLECT_GCC=/home/cx/gcc49/bin/g++ COLLECT_LTO_WRAPPER=/home/cx/gcc49/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /home/cx/gcc49/source/gcc-4.9.0/configure --disable-multilib --prefix=/home/cx/gcc49 Thread model: posix gcc version 4.9.0 (GCC) cx@cx:~/REstd11/kozak5$ cat c11re.c #include <iostream> #include <string> #include <regex> using namespace std; int main (int argc, char *argv[]) { if (std::regex_match ("GNUj", std::regex(argv[1]) )) std::cout << "ELO\n"; return 0; } cx@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -o c11re c11re.c -std=c++11 cx@cx:~/REstd11/kozak5$ ./c11re '((x|' terminate called after throwing an instance of 'std::regex_error' what(): regex_error Przerwane (core dumped) cx@cx:~/REstd11/kozak5$ ./c11re '((.*)()?*{100})' Naruszenie ochrony pamięci (core dumped) cx@cx:~/REstd11/kozak5$ (gdb) r '((.*)()?*{100})' Starting program: /home/cx/REstd11/kozak5/./c11re '((.*)()?*{100})' Program received signal SIGSEGV, Segmentation fault. 0x0000000000402f15 in std::_Bit_reference::operator bool() const () (gdb) x/i $rip => 0x402f15 <_ZNKSt14_Bit_referencecvbEv+15>: mov (%rax),%rdx (gdb) i r rax 0x200000000063a128 2305843009220223272 rbx 0xffffffffffffffff -1 rcx 0x200000000063a128 2305843009220223272 rdx 0x8000000000000000 -9223372036854775808 rsi 0x200000000063a128 2305843009220223272 rdi 0x7fffffffd350 140737488343888 rbp 0x7fffffffd310 0x7fffffffd310 rsp 0x7fffffffd310 0x7fffffffd310 r8 0x2 2 r9 0x20 32 r10 0x3 3 r11 0x7ffff75b5798 140737343346584 r12 0x402880 4204672 r13 0x7fffffffe260 140737488347744 r14 0x0 0 r15 0x0 0 => 0x402f15 <_ZNKSt14_Bit_referencecvbEv+15>: rip 0x402f15 0x402f15 <std::_Bit_reference::operator bool() const+15> ... #0 0x0000000000402f15 in std::_Bit_reference::operator bool() const () #1 0x000000000040a1bc in void std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, false>::_M_dfs<true>(long) () #2 0x000000000040a275 in void std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, false>::_M_dfs<true>(long) () #3 0x000000000040a493 in void std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, false>::_M_dfs<true>(long) () #4 0x000000000040a28f in void std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, false>::_M_dfs<true>(long) () #5 0x000000000040a3a5 in void std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, false>::_M_dfs<true>(long) () #6 0x000000000040a3a5 in void std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits---Type <return> to continue, or q <return> to quit--- <char>, false>::_M_dfs<true>(long) () #7 0x000000000040a3a5 in void std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, false>::_M_dfs<true>(long) () #8 0x0000000000407ee0 in bool std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, false>::_M_main<true>() () #9 0x0000000000406172 in std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, false>::_M_match() () #10 0x0000000000404cf5 in bool std::__detail::__regex_algo_impl<char const*, std::allocator<std::sub_match<char const*> >, char, std::regex_traits<char>, (std::__detail::_RegexExecutorPolicy)0, true>(char const*, char const*, std::match_results<char const*, std::allocator<std::sub_match<char const*> > >&, std::basic_regex<char, std::regex_traits<char> > const&, std::regex_constants::match_flag_type) () #11 0x000000000040449e in bool std::regex_match<char const*, std::allocator<std::sub_match<char const*> >, char, std::regex_traits<c---Type <return> to continue, or q <return> to quit--- har> >(char const*, char const*, std::match_results<char const*, std::allocator<std::sub_match<char const*> > >&, std::basic_regex<char, std::regex_traits<char> > const&, std::regex_constants::match_flag_type) () #12 0x000000000040405c in bool std::regex_match<char const*, char, std::regex_traits<char> >(char const*, char const*, std::basic_regex<char, std::regex_traits<char> > const&, std::regex_constants::match_flag_type) () #13 0x0000000000403d4c in bool std::regex_match<char, std::regex_traits<char> >(char const*, std::basic_regex<char, std::regex_traits<char> > const&, std::regex_constants::match_flag_type) () #14 0x0000000000402a5f in main ()
(In reply to Maksymilian A from comment #2) > cx@cx:~/REstd11/kozak5$ ./c11re '((x|' > terminate called after throwing an instance of 'std::regex_error' > what(): regex_error > Przerwane (core dumped) I think this is by design. > cx@cx:~/REstd11/kozak5$ ./c11re '((.*)()?*{100})' > Naruszenie ochrony pamięci (core dumped) That's a bug. (It would be helpful if you didn't put C11 in the subject, this has nothing to do with C)
That segfault is already fixed on trunk, although possibly just latent
Thanks for feedback. I'm going verify this on trunk
@Jonathan: true but check this case cx@cx:~/REtrunk/kozak5$ ~/gccTRUNK/bin/g++ -v Using built-in specs. COLLECT_GCC=/home/cx/gccTRUNK/bin/g++ COLLECT_LTO_WRAPPER=/home/cx/gccTRUNK/libexec/gcc/x86_64-unknown-linux-gnu/4.10.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../trunk/configure --prefix=/home/cx/gccTRUNK/ --disable-multilib Thread model: posix gcc version 4.10.0 20140625 (experimental) (GCC) cx@cx:~/REtrunk/kozak5$ ~/gccTRUNK/bin/g++ c11re.c -o c11re -std=c++11 cx@cx:~/REtrunk/kozak5$ ./c11re '(.*{100}{100}{100})' Naruszenie ochrony pamięci (core dumped) Program received signal SIGSEGV, Segmentation fault. 0x000000000041014e in std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, true>::_State_info<std::integral_constant<bool, true>, std::vector<std::sub_match<char const*>, std::allocator<std::sub_match<char const*> > > >::_M_visited(long) const () BR, Maksymilian http://cxsecurity.com/
"(.*{100}{100}{100})" seems to be a stack overflow. It's because regex executor uses recursion. It could be fixed (not segfault but memory exhaustion) by using a std::stack and simulate recursion; IMH, however, directly throwing regex_error::error_space is the right thing here to do.
(In reply to Tim Shen from comment #7) > "(.*{100}{100}{100})" seems to be a stack overflow. It's because regex > executor uses recursion. It could be fixed (not segfault but memory > exhaustion) by using a std::stack and simulate recursion; IMH, however, > directly throwing regex_error::error_space is the right thing here to do. Yeap it's stack overflow. Why regex_error::error_space? Not better regex_error::error_stack?
(In reply to Maksymilian Arciemowicz from comment #8) > (In reply to Tim Shen from comment #7) > > "(.*{100}{100}{100})" seems to be a stack overflow. It's because regex > > executor uses recursion. It could be fixed (not segfault but memory > > exhaustion) by using a std::stack and simulate recursion; IMH, however, > > directly throwing regex_error::error_space is the right thing here to do. > > Yeap it's stack overflow. Why regex_error::error_space? Not better > regex_error::error_stack? Sorry for not clarify that: I prefer throwing error_space when constructing (complaining about too many states) instead of throwing error_stack when matching. To solve the latter problem, as I said, we can use a std::stack or something to avoid a stack overflow.
There is also one other alternative like this http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/regex/regcomp.c.diff?r1=1.29&r2=1.30&f=h
Author: timshen Date: Tue Jul 1 03:05:45 2014 New Revision: 212185 URL: https://gcc.gnu.org/viewcvs?rev=212185&root=gcc&view=rev Log: PR libstdc++/61061 PR libstdc++/61582 * include/bits/regex_automaton.h (_NFA<>::_M_insert_state): Add a NFA state limit. If it's exceeded, regex_constants::error_space will be throwed. * include/bits/regex_automaton.tcc (_StateSeq<>::_M_clone): Use map (which is sparse) instead of vector. This reduce n times clones' cost from O(n^2) to O(n). * include/std/regex: Add map dependency. * testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc: New testcase. Added: trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/regex_automaton.h trunk/libstdc++-v3/include/bits/regex_automaton.tcc trunk/libstdc++-v3/include/std/regex
Ups. Check this (.*{100}{300}) gcc version 4.10.0 20140701 (experimental) (GCC) -------- Starting program: /home/cx/REtrunk/kozak5/t3 '(.*{100}{300})' Program received signal SIGSEGV, Segmentation fault. 0x000000000040c22a in std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, true>::_M_dfs(std::__detail::_Executor<char const*, std::allocator<std::sub_match<char const*> >, std::regex_traits<char>, true>::_Match_mode, long) () --------
@Tim: do you need help?
(In reply to Maksymilian Arciemowicz from comment #13) > @Tim: do you need help? This is what I'm going to do: https://gcc.gnu.org/ml/libstdc++/2014-07/msg00008.html Please send to libstdc++ ml if you have any ideas.
*** Bug 66456 has been marked as a duplicate of this bug. ***
*** Bug 67212 has been marked as a duplicate of this bug. ***
*** Bug 68688 has been marked as a duplicate of this bug. ***
*** Bug 70411 has been marked as a duplicate of this bug. ***
*** Bug 70459 has been marked as a duplicate of this bug. ***
Any status update on this. GCC7 is looming... Thanks.
(In reply to Pádraig Brady from comment #20) > Any status update on this. GCC7 is looming... > Thanks. Unfortunately I haven't get a chance to work on this. I plan to put up a one-line tweak on the internal state limit to make the library throwing an exception, instead of crash. That's probably a strict improvement.
FWIW, there is an excellent overview of regular expression engine pitfalls and methods here: https://swtch.com/~rsc/regexp/regexp1.html https://swtch.com/~rsc/regexp/regexp2.html https://swtch.com/~rsc/regexp/regexp3.html Those are about 10 years old, but not outdated. The TL;DR is "use NFAs and DFAs, not back-tracking".
(In reply to M Welinder from comment #22) > FWIW, there is an excellent overview of regular expression engine pitfalls > and methods here: > > https://swtch.com/~rsc/regexp/regexp1.html > https://swtch.com/~rsc/regexp/regexp2.html > https://swtch.com/~rsc/regexp/regexp3.html Yes, there have been links to the first one in libstdc++ headers since 2013.
(In reply to Maksymilian Arciemowicz from comment #12) > Ups. Check this (.*{100}{300}) This one still results in a stack overflow on trunk, with an 8MB stack. That is: std::regex_match("a", std::regex("(.*{100}{300})")); I have a proof-of-concept patch replacing the recursion in _Executor. The example above runs successfully with a 16k stack limit.