This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Patch] Whole regex refactoring and current status

As metioned in this[1] email, I here propose a refactored version of
regex, and show the status:

1) Automaton module(regex_automaton.h, regex_automaton.tcc), now
consists of an NFA implementation; In the future, the NFA should be
able to be optimized to a DFA.

2) Regex compiler module(regex_compiler.h, regex_compiler.tcc), which
take a string and then emit a NFA. Now it provides a regex token
scanner and parser(NFA generator). It's not fully implemented, and
will be main task in the near future. It should includes various regex
flavors(ECMAScript, basic, extended, awk, grep and egrep), but they
are not supported. Since the ECMAScript style is the default one, I'll
start my work here.

3) Executor(or matcher) module, which is constructed with a NFA and a
string, then match and group it. We have a DFS and BFS(Thompson
matcher) approach now; the BFS approach can effectively prevent the
matching from being exponential, but cannot handle back-references.
The current strategy is checking whether there're back-referneces in
the pattern; If there are, a DFS matcher will be generated, else the
BFS one. Matching options are not supported now. It shouldn't be a
great deal since the executors are there.

This patch eliminate _SpecializedCursor and _SpecializedResults.
Instead, I use _BiIter and match_results, the actual data structure,
directly. It should enabled the easy and obvious approch to implement

'make bootstrap && make -k check' tested under x86_64.



Tim Shen

Attachment: changelog
Description: Binary data

Attachment: regex.patch
Description: Binary data

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]