Bug 86164 - std::regex crashes when matching long lines
Summary: std::regex crashes when matching long lines
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 7.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 86163 86165 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-06-15 10:48 UTC by Holger Seelig
Modified: 2019-03-29 20:24 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-01-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Holger Seelig 2018-06-15 10:48:15 UTC
std::regex crashes when matching long lines.

Here is an example:


#include <regex>
#include <iostream>
 
int main()
{
    std::string s (100'000, '*');
    std::smatch m;
    std::regex r ("^(.*?)$");

    std::regex_search (s, m, r);

    std::cout << s .substr (0, 10) << std::endl;
    std::cout << m  .str (1) .substr (0, 10) << std::endl;
}


It turns out that std::regex_search operator .* is implemented recursively which result in this example in a stack overflow.
Comment 1 ktkachov 2018-06-15 11:12:42 UTC
*** Bug 86163 has been marked as a duplicate of this bug. ***
Comment 2 ktkachov 2018-06-15 11:13:53 UTC
*** Bug 86165 has been marked as a duplicate of this bug. ***
Comment 3 Vadim Zeitlin 2018-11-13 23:46:30 UTC
BTW, this is unrelated to using grouping in the regex, searching for something as simple as "A.*B" also crashes for input longer than ~27KiB on Linux amd64 with g++ 8.2.0. This makes std::regex simply unusable.
Comment 4 Jonathan Wakely 2018-11-14 00:08:18 UTC
(In reply to Vadim Zeitlin from comment #3)
> This makes std::regex simply unusable.

Yes, because there are no uses with inputs below 27KiB.
Comment 5 Vadim Zeitlin 2018-11-14 00:12:49 UTC
I obviously meant that it makes it unusable in my use case when I can't guarantee that the input is bounded by this (smallish) size.
Comment 6 Nils Gladitz 2018-12-05 07:33:02 UTC
I think I am hitting this issue somewhat earlier on an ARM system with a more limited stack size.

Was able to reproduce it on Desktop x86_64 Linux with e.g.:

#include <regex>
  
int main()
{
    std::regex_match(
        std::string(2000, 'a'),
        std::regex(".*")
    );
}

$ ulimit -s 256 # 256kb stack; which is what have by default on the ARM system
$ g++ test.cpp -o regex_test
$ ./regex_test
Segmentation fault (core dumped)
Comment 7 Giuliano Belinassi 2019-03-29 12:48:29 UTC
It seems that the issue is the backtracking required by the NFA, as it enters in a deep recursion when calling _M_dfs in _M_main_dispatch (regex_executor.tcc).

Maybe moving the DFS stack from the recursion stack to the heap and use an iterative DFS could fix this, but converting the NFA to DFA may be a better choice, as it removes the backtracking requirement when iterating with the string.
Comment 8 Jonathan Wakely 2019-03-29 20:24:25 UTC
I started working on a patch to replace the recursion with iteration, but didn't get it working yet.