This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libstdc++/85472] Regex match bug


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472

--- Comment #12 from Hans Åberg <haberg-1 at telia dot com> ---
(In reply to Tim Shen from comment #11)
> > The problem is that #4 has an earlier capture, making it impossible to see
> > that it is left undefined later.
> 
> I wouldn't say it's impossible. libc++ implements it correctly at a cost.
>
> Specifically, see
> https://github.com/llvm-mirror/libcxx/blob/
> e54a22f809bc741760e39ac78444f57f06117652/include/regex#L5566. 

I meant the way I do it now.

> > So it would be better to capture it as an
> > empty string here:
> > 
> > In the example I gave, if the transitions are marked (ignoring #2)
> >   1: {a ↦ {1#3, 2#3, 3#3$4}};
> >   2: {b ↦ {2#4, 3#4}};
> >   3: {c ↦ {3#5, 2#5, 1#5, ∅#2}.
> > Then in the "ac" part, 'a' in state 1 goes to 3 and gets a #3 match 'a' plus
> > an empty which I have written $4. That is the idea, though.
> 
> I'm feeling that this is going to a quadratic space complexity here. Does
> "(((...((a_0)|(a_1)|(a_2)|...(a_n))...)))" if the regex expands to n
> surrounding parenthesis, and n characters in the middle? Would this regex
> match taking up to n^2 time, given a n-sized input string?

It is all linear, in one traverse, just pick down the action form the
transitions instead of the states. Both are computed when checking the forward
transition for a character: I just record them.

> If you think it's time and space efficiently doable, maybe you can develop a
> prototype first.

Yes. I will have to think about it, and report back here.

> > I have though not implemented the counted repetitions A{m, n}, which I think
> > usually are implemented as loops, so I do not know how that fits into the
> > picture.
> 
> It might also be helpful to think about back-references like "(a+)\1".

What's this?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]