[patch] regex_traits implementation

Tim Shen timshen91@gmail.com
Tue Jul 2 19:10:00 GMT 2013

On Wed, Jul 3, 2013 at 1:11 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> Defining struct _RegexMask in the typedef declaration makes it public,
> I think I would prefer if the struct definition was a private member
> of regex_traits and only the char_class_type typedef was public.
> Either that or just rename the type to char_class_type and do away
> with the typedef.

I've make _RegexMask private; However, _RegexMask must
appear(privatly) before typedef _RegexMask char_class_type.
According to http://gcc.gnu.org/codingconventions.html, "Semantic
constraints may require a different declaration order, but seek to
minimize the potential confusion."

> I'm not sure it's necessarily required for bitmask types, but I would
> expect to be able to default-construct a char_class_type, is that left
> out intentionally?

It's not intentional and I added it now. I don't use the default
constructer in my code, because I just want to pay for what I get.
However, adding this make it more int-like.

> I think the _RegexMask constructor should be explicit.

I intentionally make _RegexMask act like an unsigned number. Since
lookup_collatename is required to return a value compares equal to 0,
I don't know if adding operater==(ctype_base __n) a good idea. It's a
little bit elegant to make this implicit conver

> Its parameters should be named __base and __extended, to conform to the libstdc++
> coding guidelines.


> It looks like _RegexMask could be a literal type, with a constexpr
> constructor and constexpr operators.  I think we might as well make
> use of those features if we can.


> The operator functions take const parameters by value, is that a
> conscious decision to prefer that to either non-const parameters or
> pass by reference?

It's intended.

First, since sizeof(_RegexMask) is relatively small, a call-by-value
call may save more time than pointer-based reference;

Second, making it `const` enables more potential compiler optimizations.

Am I naive somewhere about these?

> The definition of operator== considers all bits of _M_extened to be
> part of the object's value representation, so:
>     char_class_type{{},4} != char_class_type{{},0}
> even though those two values have the same flags set and behave
> identically when passed to regex_traits::isctype. It would be better
> to define operator== to use _M_extended&3 so only the bits we care
> about contribute to the value.

Yes you are absolutely right!

> In theory we shouldn't need a special case for the "blank" character
> class, I got that added to std::ctype_base via
> http://cplusplus.github.io/LWG/lwg-defects.html#2019, but we don't
> support it yet in libstdc++ (I forgot about it.)

We don't have std::isblank yet; I add a comment for this.

> Your isctype implementation tests __f._M_extended ==
> _RegexMask::_S_word, shouldn't that use & not ==, since __f could have
> both the _S_word and _S_blank bits set?   Similarly for the _S_blank
> case.

Sorry for carelessness, fixed.

> The _S_word case tests against ctype_base::alpha, but it should be
> alnum not alpha.  I'd suggest getting rid of that test entirely
> though:  The "w" class is represented by _RegexMask{0, _S_word},
> whereas in the uncommitted patch I posted to
> http://gcc.gnu.org/ml/libstdc++/2010-10/msg00049.html I represented it
> by the equivalent of _RegexMask{ctype_base::alnum,
> _S_char_class_under}, which means you don't need to check the alnum
> case because the earlier __fctyp.is(__f._M_base, __c) test will
> already have handled it. That means instead of a _S_word bit you only
> need _S_under instead.

Great idea about "inherit from alnum"! Fixed.

Tim Shen

More information about the Libstdc++ mailing list