This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Patch] Reimplment regex matcher using DFS


This is the most exciting patch from me so far! XD

Here I temporarily shadow the Thompson NFA matcher[1](original
_Grep_matcher), and use the Depth-First Search(DFS, or backtracking)
approach instead.

Yes, DFS is *exponentially slow* :( However we need it, because when
encountering the feature "back-reference", like "\1" in "([A-Z])\1*",
DFS is the best choice[1].

Thompson NFA matcher, which essentially is a memoized Breadth-first
search, probably can not hold large scale back-references : there may
be a huge space consumption for saving different matching statuses. It
will come back, but now let's keep this in mind : "Premature
optimization is the root of all evil".

At last, two bug reports(libstdc++/53622 and libstdc++/57173) said
that there're regex grouping problems. It's relatively simple fix it
in the DFS approach, and I added them to the testsuite. Shall I write
PR in the ChangeLog? What does PR stand for?

By the way, probably I shouldn't add too much comment in the code(what
if it's changed for reasons?). At the final section of my GSoC
participation, I'll spend all time adding documents and comments.

Waiting for reviewing. Thank you all!


[1] http://swtch.com/~rsc/regexp/regexp1.html

-- 
Tim Shen

Attachment: dfs-matcher.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]