cx@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -v Using built-in specs. COLLECT_GCC=/home/cx/gcc49/bin/g++ COLLECT_LTO_WRAPPER=/home/cx/gcc49/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /home/cx/gcc49/source/gcc-4.9.0/configure --disable-multilib --prefix=/home/cx/gcc49 Thread model: posix gcc version 4.9.0 (GCC) cx@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -o regr regr.cpp -std=c++11cx@cx:~/REstd11/kozak5$ cat ./regr.cpp #include <regex> using namespace std; int main (int argc, char *argv[]) { string input; regex r(argv[1]); return 0; } Memory Resource Exhaustion cx@cx:~/REstd11/kozak5$ ./regr '((.*)$1{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100})' expected: error_complexity CPU Resource Exhaustion cx@cx:~/REstd11/kozak5$ ./regr '((.*)((.*){10}(.*{2444444444,1})' # ps -aux cx 13836 99.9 5.8 666828 442248 pts/3 R+ cze23 577:46 ./regr ((.*)((.*){10}(.*{2444444444,1}) expected: error_space BR, Maksymilian Arciemowicz
Tim, how hard would it be to hardcode limits somewhere for these cases?
(In reply to Jonathan Wakely from comment #1) > Tim, how hard would it be to hardcode limits somewhere for these cases? It's easy. 6 lines. I'll propose a patch soon.
Author: timshen Date: Tue Jul 1 03:05:45 2014 New Revision: 212185 URL: https://gcc.gnu.org/viewcvs?rev=212185&root=gcc&view=rev Log: PR libstdc++/61061 PR libstdc++/61582 * include/bits/regex_automaton.h (_NFA<>::_M_insert_state): Add a NFA state limit. If it's exceeded, regex_constants::error_space will be throwed. * include/bits/regex_automaton.tcc (_StateSeq<>::_M_clone): Use map (which is sparse) instead of vector. This reduce n times clones' cost from O(n^2) to O(n). * include/std/regex: Add map dependency. * testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc: New testcase. Added: trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/regex_automaton.h trunk/libstdc++-v3/include/bits/regex_automaton.tcc trunk/libstdc++-v3/include/std/regex This is copied by hand, because PR 61601 is mistyped to 61601. Now it's fixed in changelog.
cpu exhaustion not eliminated PoC: (.*{100}{200}findme)
> cpu exhaustion not eliminated > > PoC: (.*{100}{200}findme) I tried to run: #include <regex> using namespace std; int main (int argc, char *argv[]) { string input; regex r(argv[1]); return 0; } Against '(.*{100}{200}findme)' and there's no cpu exhaustion. Do you have any other testcases?
> Do you have any other testcases? for trunk? maybe you have to use ::regex_match
(In reply to Maksymilian Arciemowicz from comment #6) > > Do you have any other testcases? > > for trunk? maybe you have to use ::regex_match std::regex_match("findme", std::regex("(.*{100}{200}findme)")); there's no memory problem, it just takes exponentially long time to run (which is expected when using backtracking). To avoid it, you can use Thompson NFA: #define _GLIBCXX_REGEX_USE_THOMPSON_NFA #include <regex> int main (int argc, char *argv[]) { std::regex_match("findme", std::regex("(.*{100}{200}findme)", std::regex_constants::extended)); return 0; } Notice that for now Thompson NFA doesn't support ECMAScript.
> there's no memory problem, it just takes exponentially long time to run > (which is expected when using backtracking). call it cpu resource exhaustion (CWE-400) > > To avoid it, you can use Thompson NFA: > > #define _GLIBCXX_REGEX_USE_THOMPSON_NFA > #include <regex> > > int main (int argc, char *argv[]) > { > std::regex_match("findme", std::regex("(.*{100}{200}findme)", > std::regex_constants::extended)); > > return 0; > > } > > Notice that for now Thompson NFA doesn't support ECMAScript. yeap. try (.*{300}{100}) for _GLIBCXX_REGEX_USE_THOMPSON_NFA. occurs stack exhaustion like in #61582
(In reply to Tim Shen from comment #7) > (In reply to Maksymilian Arciemowicz from comment #6) > > > Do you have any other testcases? > > > > for trunk? maybe you have to use ::regex_match > > std::regex_match("findme", std::regex("(.*{100}{200}findme)")); > > there's no memory problem, it just takes exponentially long time to run > (which is expected when using backtracking). > > To avoid it, you can use Thompson NFA: > > #define _GLIBCXX_REGEX_USE_THOMPSON_NFA > #include <regex> > > int main (int argc, char *argv[]) > { > std::regex_match("findme", std::regex("(.*{100}{200}findme)", > std::regex_constants::extended)); > > return 0; > > } > > Notice that for now Thompson NFA doesn't support ECMAScript. Are you still working on this?
(In reply to Eric Gallager from comment #9) > (In reply to Tim Shen from comment #7) > > (In reply to Maksymilian Arciemowicz from comment #6) > > > > Do you have any other testcases? > > > > > > for trunk? maybe you have to use ::regex_match > > > > std::regex_match("findme", std::regex("(.*{100}{200}findme)")); > > > > there's no memory problem, it just takes exponentially long time to run > > (which is expected when using backtracking). > > > > To avoid it, you can use Thompson NFA: > > > > #define _GLIBCXX_REGEX_USE_THOMPSON_NFA > > #include <regex> > > > > int main (int argc, char *argv[]) > > { > > std::regex_match("findme", std::regex("(.*{100}{200}findme)", > > std::regex_constants::extended)); > > > > return 0; > > > > } > > > > Notice that for now Thompson NFA doesn't support ECMAScript. > > Are you still working on this? No, I'm not actively working on this.
(In reply to Tim Shen from comment #10) > (In reply to Eric Gallager from comment #9) > > (In reply to Tim Shen from comment #7) > > > (In reply to Maksymilian Arciemowicz from comment #6) > > > > > Do you have any other testcases? > > > > > > > > for trunk? maybe you have to use ::regex_match > > > > > > std::regex_match("findme", std::regex("(.*{100}{200}findme)")); > > > > > > there's no memory problem, it just takes exponentially long time to run > > > (which is expected when using backtracking). > > > > > > To avoid it, you can use Thompson NFA: > > > > > > #define _GLIBCXX_REGEX_USE_THOMPSON_NFA > > > #include <regex> > > > > > > int main (int argc, char *argv[]) > > > { > > > std::regex_match("findme", std::regex("(.*{100}{200}findme)", > > > std::regex_constants::extended)); > > > > > > return 0; > > > > > > } > > > > > > Notice that for now Thompson NFA doesn't support ECMAScript. > > > > Are you still working on this? > > No, I'm not actively working on this. moving from assignee to cc list, then (and redoing a few other lost ccs while I'm at it)