RFC: basic regex implementation

Stephen M. Webb stephenw.webb@bregmasoft.ca
Thu Jun 10 12:53:00 GMT 2010


On 09/06/10 17:59, Jonathan Wakely wrote:
> On 9 June 2010 21:10, Stephen M. Webb wrote:
> > This is a first pass for a C++0x regex implementation.  Very basic
> > functionality only (BREs only, no backrefs, no character classes), not
> > production code yet, still not caught up to n3090.  Looking for feedback
> > before I put more effort in.
>
> I like the debugging stuff guarded by SMW_NDEBUG, it would be a shame
> to lose that.  Maybe it could be polished so it could be kept in the
> final version, at least when _GLIBCXX_DEBUG is defined.

Yes, I think a big chunk of this implementation predates _GLIBCXX_DEBUG so I 
wasn't tracking it.  I will add this to the to-do list.

> AFAICT there is no fix needed for threading issues w.r.t
> __unmatched_sub, initialisation of local statics is reentrant with GCC
> (as required by C++0x). That's why I used a local static not a global
> static.

Ok.  I am not completely up to speed on whole swathes of C++0x, and certainly 
this was an issue with C++03.

> The std:: qualifiers could be removed, the code is already in namespace
> std.

Did we not go through an exercise a few years ago to explictly qualify names 
with std:: to avoid a whole class of problems?

> A comment on the C++0x regex spec, rather than your implementation:
> It's my understanding that users are not supposed to instantiate
> sub_match objects, or at least shouldn't need to, as doing so results
> in an uninitialised "matched" member. I intend to file an NB comment
> suggesting a deleted default constructor.  There could be a private
> constructor which is used internally by the library.  If you have any
> comments on that point I'd be glad to hear them.

The problem with making sub_match constructors private is that the 
implementation of the match engine(s) and token_iterator engine(s) could get 
pretty hairy, or else sub_match would have to have so many friends it should 
have its own page on Facebook.  A sub_match is effectively a POD, and if a 
user creates a POD without initializing it he or she can keep both halves.

I think a safer design might have been to make sub_match.matched a member 
function and provide non-trivial constructors.  Then again, keeping it 
PODlike simplifies implementing the rest iof the regex library.

It would be interesting to hear what the Committee has to say on the matter.

-- 
Stephen M. Webb
stephen.webb@bregmasoft.ca



More information about the Libstdc++ mailing list