std::regex crashes when matching long lines.
Here is an example:
std::string s (100'000, '*');
std::regex r ("^(.*?)$");
std::regex_search (s, m, r);
std::cout << s .substr (0, 10) << std::endl;
std::cout << m .str (1) .substr (0, 10) << std::endl;
It turns out that std::regex_search operator .* is implemented recursively which result in this example in a stack overflow.
*** Bug 86163 has been marked as a duplicate of this bug. ***
*** Bug 86165 has been marked as a duplicate of this bug. ***
BTW, this is unrelated to using grouping in the regex, searching for something as simple as "A.*B" also crashes for input longer than ~27KiB on Linux amd64 with g++ 8.2.0. This makes std::regex simply unusable.
(In reply to Vadim Zeitlin from comment #3)
> This makes std::regex simply unusable.
Yes, because there are no uses with inputs below 27KiB.
I obviously meant that it makes it unusable in my use case when I can't guarantee that the input is bounded by this (smallish) size.
I think I am hitting this issue somewhat earlier on an ARM system with a more limited stack size.
Was able to reproduce it on Desktop x86_64 Linux with e.g.:
$ ulimit -s 256 # 256kb stack; which is what have by default on the ARM system
$ g++ test.cpp -o regex_test
Segmentation fault (core dumped)
It seems that the issue is the backtracking required by the NFA, as it enters in a deep recursion when calling _M_dfs in _M_main_dispatch (regex_executor.tcc).
Maybe moving the DFS stack from the recursion stack to the heap and use an iterative DFS could fix this, but converting the NFA to DFA may be a better choice, as it removes the backtracking requirement when iterating with the string.
I started working on a patch to replace the recursion with iteration, but didn't get it working yet.