w.r.t. [Bug libstdc++/85472] Regex match bug

Hans Åberg haberg-1@telia.com
Fri May 4 07:17:00 GMT 2018

> On 4 May 2018, at 00:27, Tim Shen <timshen@google.com> wrote:
>>> The NFA/DFA-based algorithm wouldn't work, as they are guaranteed
>>> polynomial. On the other hand, regex with back-references is proved to
>>> be NP-complete.
>> That is for static algorithms. Here, partial computation is fed back and
> modifying the NFA, branching into multiple ones.
> I'm not yet convinced that there is efficient way to implement the NFA
> modification approach, not efficient enough to fix PR 85472 without
> regressing the normal cases.

Future will tell. Flex start conditions may be similar, though static: When seeing (a+), it chooses the longest match, so one does not get all, then the action must be written, and though it can choose the start condition dynamically, one cannot reuse the matched string dynamically in a new regular expression.

> Even in the DFS case, if I want to walk back the path to generate captures
> once hit a "final state" or "back-reference state", it still looks
> expensive to me. The reverse matching takes O(s.size()) amount of time, and
> I only want to do it once.

I do not backtrack. The idea is that with a final state after (a+), all matches "a", "aa", are found as they appear at string length 1, 2, ... Then the regular expression "a"^k NFA and add its first states to the current state position set.

More information about the Libstdc++ mailing list