Bug 61601 - C++11 regex resource exhaustion
Summary: C++11 regex resource exhaustion
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: std::regex
  Show dependency treegraph
 
Reported: 2014-06-24 19:44 UTC by Maksymilian Arciemowicz
Modified: 2024-02-22 14:31 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-06-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maksymilian Arciemowicz 2014-06-24 19:44:36 UTC
cx@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/home/cx/gcc49/bin/g++
COLLECT_LTO_WRAPPER=/home/cx/gcc49/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /home/cx/gcc49/source/gcc-4.9.0/configure --disable-multilib --prefix=/home/cx/gcc49
Thread model: posix
gcc version 4.9.0 (GCC) 
cx@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -o regr regr.cpp -std=c++11cx@cx:~/REstd11/kozak5$ cat ./regr.cpp
#include <regex>

using namespace std;

int main (int argc, char *argv[])
{	
      string input;
      regex r(argv[1]);
      
      return 0;

}


Memory Resource Exhaustion
cx@cx:~/REstd11/kozak5$ ./regr '((.*)$1{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100})'

expected: error_complexity

CPU Resource Exhaustion
cx@cx:~/REstd11/kozak5$ ./regr '((.*)((.*){10}(.*{2444444444,1})'

# ps -aux
cx       13836 99.9  5.8 666828 442248 pts/3   R+   cze23 577:46 ./regr ((.*)((.*){10}(.*{2444444444,1})

expected: error_space

BR,
Maksymilian Arciemowicz
Comment 1 Jonathan Wakely 2014-06-25 18:16:40 UTC
Tim, how hard would it be to hardcode limits somewhere for these cases?
Comment 2 Tim Shen 2014-06-26 06:36:52 UTC
(In reply to Jonathan Wakely from comment #1)
> Tim, how hard would it be to hardcode limits somewhere for these cases?

It's easy. 6 lines. I'll propose a patch soon.
Comment 3 Tim Shen 2014-07-01 03:19:49 UTC
Author: timshen
Date: Tue Jul  1 03:05:45 2014
New Revision: 212185

URL: https://gcc.gnu.org/viewcvs?rev=212185&root=gcc&view=rev
Log:
	PR libstdc++/61061
	PR libstdc++/61582
	* include/bits/regex_automaton.h (_NFA<>::_M_insert_state): Add
	a NFA state limit. If it's exceeded, regex_constants::error_space
	will be throwed.
	* include/bits/regex_automaton.tcc (_StateSeq<>::_M_clone): Use
	map (which is sparse) instead of vector. This reduce n times clones'
	cost from O(n^2) to O(n).
	* include/std/regex: Add map dependency.
	* testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc: New
	testcase.


Added:
    trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/regex_automaton.h
    trunk/libstdc++-v3/include/bits/regex_automaton.tcc
    trunk/libstdc++-v3/include/std/regex


This is copied by hand, because PR 61601 is mistyped to 61601. Now it's fixed in changelog.
Comment 4 Maksymilian Arciemowicz 2014-07-01 19:28:17 UTC
cpu exhaustion not eliminated

PoC: (.*{100}{200}findme)
Comment 5 Tim Shen 2015-02-01 08:31:33 UTC
> cpu exhaustion not eliminated
>
> PoC: (.*{100}{200}findme)

I tried to run:

#include <regex>

using namespace std;

int main (int argc, char *argv[])
{
  string input;
  regex r(argv[1]);

  return 0;

}

Against '(.*{100}{200}findme)' and there's no cpu exhaustion.

Do you have any other testcases?
Comment 6 Maksymilian Arciemowicz 2015-02-01 20:35:42 UTC
> Do you have any other testcases?

for trunk? maybe you have to use ::regex_match
Comment 7 Tim Shen 2015-02-02 00:37:20 UTC
(In reply to Maksymilian Arciemowicz from comment #6)
> > Do you have any other testcases?
> 
> for trunk? maybe you have to use ::regex_match

std::regex_match("findme", std::regex("(.*{100}{200}findme)"));

there's no memory problem, it just takes exponentially long time to run (which is expected when using backtracking).

To avoid it, you can use Thompson NFA:

#define _GLIBCXX_REGEX_USE_THOMPSON_NFA
#include <regex>

int main (int argc, char *argv[])
{
  std::regex_match("findme", std::regex("(.*{100}{200}findme)", std::regex_constants::extended));

  return 0;

}

Notice that for now Thompson NFA doesn't support ECMAScript.
Comment 8 Maksymilian Arciemowicz 2015-02-02 18:00:57 UTC
> there's no memory problem, it just takes exponentially long time to run
> (which is expected when using backtracking).

call it cpu resource exhaustion (CWE-400)

> 
> To avoid it, you can use Thompson NFA:
> 
> #define _GLIBCXX_REGEX_USE_THOMPSON_NFA
> #include <regex>
> 
> int main (int argc, char *argv[])
> {
>   std::regex_match("findme", std::regex("(.*{100}{200}findme)",
> std::regex_constants::extended));
> 
>   return 0;
> 
> }
> 
> Notice that for now Thompson NFA doesn't support ECMAScript.

yeap.

try (.*{300}{100}) for _GLIBCXX_REGEX_USE_THOMPSON_NFA. occurs stack exhaustion like in #61582
Comment 9 Eric Gallager 2019-10-03 04:15:30 UTC
(In reply to Tim Shen from comment #7)
> (In reply to Maksymilian Arciemowicz from comment #6)
> > > Do you have any other testcases?
> > 
> > for trunk? maybe you have to use ::regex_match
> 
> std::regex_match("findme", std::regex("(.*{100}{200}findme)"));
> 
> there's no memory problem, it just takes exponentially long time to run
> (which is expected when using backtracking).
> 
> To avoid it, you can use Thompson NFA:
> 
> #define _GLIBCXX_REGEX_USE_THOMPSON_NFA
> #include <regex>
> 
> int main (int argc, char *argv[])
> {
>   std::regex_match("findme", std::regex("(.*{100}{200}findme)",
> std::regex_constants::extended));
> 
>   return 0;
> 
> }
> 
> Notice that for now Thompson NFA doesn't support ECMAScript.

Are you still working on this?
Comment 10 Tim Shen 2019-10-25 21:47:35 UTC
(In reply to Eric Gallager from comment #9)
> (In reply to Tim Shen from comment #7)
> > (In reply to Maksymilian Arciemowicz from comment #6)
> > > > Do you have any other testcases?
> > > 
> > > for trunk? maybe you have to use ::regex_match
> > 
> > std::regex_match("findme", std::regex("(.*{100}{200}findme)"));
> > 
> > there's no memory problem, it just takes exponentially long time to run
> > (which is expected when using backtracking).
> > 
> > To avoid it, you can use Thompson NFA:
> > 
> > #define _GLIBCXX_REGEX_USE_THOMPSON_NFA
> > #include <regex>
> > 
> > int main (int argc, char *argv[])
> > {
> >   std::regex_match("findme", std::regex("(.*{100}{200}findme)",
> > std::regex_constants::extended));
> > 
> >   return 0;
> > 
> > }
> > 
> > Notice that for now Thompson NFA doesn't support ECMAScript.
> 
> Are you still working on this?

No, I'm not actively working on this.
Comment 11 Eric Gallager 2021-05-04 16:35:57 UTC
(In reply to Tim Shen from comment #10)
> (In reply to Eric Gallager from comment #9)
> > (In reply to Tim Shen from comment #7)
> > > (In reply to Maksymilian Arciemowicz from comment #6)
> > > > > Do you have any other testcases?
> > > > 
> > > > for trunk? maybe you have to use ::regex_match
> > > 
> > > std::regex_match("findme", std::regex("(.*{100}{200}findme)"));
> > > 
> > > there's no memory problem, it just takes exponentially long time to run
> > > (which is expected when using backtracking).
> > > 
> > > To avoid it, you can use Thompson NFA:
> > > 
> > > #define _GLIBCXX_REGEX_USE_THOMPSON_NFA
> > > #include <regex>
> > > 
> > > int main (int argc, char *argv[])
> > > {
> > >   std::regex_match("findme", std::regex("(.*{100}{200}findme)",
> > > std::regex_constants::extended));
> > > 
> > >   return 0;
> > > 
> > > }
> > > 
> > > Notice that for now Thompson NFA doesn't support ECMAScript.
> > 
> > Are you still working on this?
> 
> No, I'm not actively working on this.

moving from assignee to cc list, then (and redoing a few other lost ccs while I'm at it)