User account creation filtered due to spam.

Bug 61601 - C++11 regex resource exhaustion
Summary: C++11 regex resource exhaustion
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: ---
Assignee: Tim Shen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-24 19:44 UTC by Maksymilian Arciemowicz
Modified: 2015-03-09 06:53 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-06-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maksymilian Arciemowicz 2014-06-24 19:44:36 UTC
cx@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/home/cx/gcc49/bin/g++
COLLECT_LTO_WRAPPER=/home/cx/gcc49/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /home/cx/gcc49/source/gcc-4.9.0/configure --disable-multilib --prefix=/home/cx/gcc49
Thread model: posix
gcc version 4.9.0 (GCC) 
cx@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -o regr regr.cpp -std=c++11cx@cx:~/REstd11/kozak5$ cat ./regr.cpp
#include <regex>

using namespace std;

int main (int argc, char *argv[])
{	
      string input;
      regex r(argv[1]);
      
      return 0;

}


Memory Resource Exhaustion
cx@cx:~/REstd11/kozak5$ ./regr '((.*)$1{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100})'

expected: error_complexity

CPU Resource Exhaustion
cx@cx:~/REstd11/kozak5$ ./regr '((.*)((.*){10}(.*{2444444444,1})'

# ps -aux
cx       13836 99.9  5.8 666828 442248 pts/3   R+   cze23 577:46 ./regr ((.*)((.*){10}(.*{2444444444,1})

expected: error_space

BR,
Maksymilian Arciemowicz
Comment 1 Jonathan Wakely 2014-06-25 18:16:40 UTC
Tim, how hard would it be to hardcode limits somewhere for these cases?
Comment 2 Tim Shen 2014-06-26 06:36:52 UTC
(In reply to Jonathan Wakely from comment #1)
> Tim, how hard would it be to hardcode limits somewhere for these cases?

It's easy. 6 lines. I'll propose a patch soon.
Comment 3 Tim Shen 2014-07-01 03:19:49 UTC
Author: timshen
Date: Tue Jul  1 03:05:45 2014
New Revision: 212185

URL: https://gcc.gnu.org/viewcvs?rev=212185&root=gcc&view=rev
Log:
	PR libstdc++/61061
	PR libstdc++/61582
	* include/bits/regex_automaton.h (_NFA<>::_M_insert_state): Add
	a NFA state limit. If it's exceeded, regex_constants::error_space
	will be throwed.
	* include/bits/regex_automaton.tcc (_StateSeq<>::_M_clone): Use
	map (which is sparse) instead of vector. This reduce n times clones'
	cost from O(n^2) to O(n).
	* include/std/regex: Add map dependency.
	* testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc: New
	testcase.


Added:
    trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/regex_automaton.h
    trunk/libstdc++-v3/include/bits/regex_automaton.tcc
    trunk/libstdc++-v3/include/std/regex


This is copied by hand, because PR 61601 is mistyped to 61601. Now it's fixed in changelog.
Comment 4 Maksymilian Arciemowicz 2014-07-01 19:28:17 UTC
cpu exhaustion not eliminated

PoC: (.*{100}{200}findme)
Comment 5 Tim Shen 2015-02-01 08:31:33 UTC
> cpu exhaustion not eliminated
>
> PoC: (.*{100}{200}findme)

I tried to run:

#include <regex>

using namespace std;

int main (int argc, char *argv[])
{
  string input;
  regex r(argv[1]);

  return 0;

}

Against '(.*{100}{200}findme)' and there's no cpu exhaustion.

Do you have any other testcases?
Comment 6 Maksymilian Arciemowicz 2015-02-01 20:35:42 UTC
> Do you have any other testcases?

for trunk? maybe you have to use ::regex_match
Comment 7 Tim Shen 2015-02-02 00:37:20 UTC
(In reply to Maksymilian Arciemowicz from comment #6)
> > Do you have any other testcases?
> 
> for trunk? maybe you have to use ::regex_match

std::regex_match("findme", std::regex("(.*{100}{200}findme)"));

there's no memory problem, it just takes exponentially long time to run (which is expected when using backtracking).

To avoid it, you can use Thompson NFA:

#define _GLIBCXX_REGEX_USE_THOMPSON_NFA
#include <regex>

int main (int argc, char *argv[])
{
  std::regex_match("findme", std::regex("(.*{100}{200}findme)", std::regex_constants::extended));

  return 0;

}

Notice that for now Thompson NFA doesn't support ECMAScript.
Comment 8 Maksymilian Arciemowicz 2015-02-02 18:00:57 UTC
> there's no memory problem, it just takes exponentially long time to run
> (which is expected when using backtracking).

call it cpu resource exhaustion (CWE-400)

> 
> To avoid it, you can use Thompson NFA:
> 
> #define _GLIBCXX_REGEX_USE_THOMPSON_NFA
> #include <regex>
> 
> int main (int argc, char *argv[])
> {
>   std::regex_match("findme", std::regex("(.*{100}{200}findme)",
> std::regex_constants::extended));
> 
>   return 0;
> 
> }
> 
> Notice that for now Thompson NFA doesn't support ECMAScript.

yeap.

try (.*{300}{100}) for _GLIBCXX_REGEX_USE_THOMPSON_NFA. occurs stack exhaustion like in #61582