This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch] regex_traits implementation
- From: Tim Shen <timshen91 at gmail dot com>
- To: Jonathan Wakely <jwakely dot gcc at gmail dot com>
- Cc: "libstdc++" <libstdc++ at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Stephen Webb <stephen dot webb at canonical dot com>
- Date: Wed, 3 Jul 2013 03:10:41 +0800
- Subject: Re: [patch] regex_traits implementation
- References: <CAPrifDmvwyXmOS4Wp1mo+ORnEk0o5=_dXbLimv9soahjcSuwAw at mail dot gmail dot com> <51CE1755 dot 1040401 at oracle dot com> <51CE1ED5 dot 6020104 at bregmasoft dot ca> <294c5ce0-0569-463f-b716-95abf2d2f848 at email dot android dot com> <CAGBC11ktNNzXhMLPuoxwyQ0o2QqEtq0S8Qi8YN3FMSsh5r_rZA at mail dot gmail dot com> <CAGBC11mJEnp70OdDdKr9BduhxXCrs9RUoN0TOXKhg7VSKnvQMQ at mail dot gmail dot com> <CAH6eHdShjBcZ+yF8O3+SOh222G5BQ5huMyE7-i0OhCTmbBhoeQ at mail dot gmail dot com>
On Wed, Jul 3, 2013 at 1:11 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> Defining struct _RegexMask in the typedef declaration makes it public,
> I think I would prefer if the struct definition was a private member
> of regex_traits and only the char_class_type typedef was public.
> Either that or just rename the type to char_class_type and do away
> with the typedef.
I've make _RegexMask private; However, _RegexMask must
appear(privatly) before typedef _RegexMask char_class_type.
According to http://gcc.gnu.org/codingconventions.html, "Semantic
constraints may require a different declaration order, but seek to
minimize the potential confusion."
> I'm not sure it's necessarily required for bitmask types, but I would
> expect to be able to default-construct a char_class_type, is that left
> out intentionally?
It's not intentional and I added it now. I don't use the default
constructer in my code, because I just want to pay for what I get.
However, adding this make it more int-like.
> I think the _RegexMask constructor should be explicit.
I intentionally make _RegexMask act like an unsigned number. Since
lookup_collatename is required to return a value compares equal to 0,
I don't know if adding operater==(ctype_base __n) a good idea. It's a
little bit elegant to make this implicit conver
> Its parameters should be named __base and __extended, to conform to the libstdc++
> coding guidelines.
Fixed.
> It looks like _RegexMask could be a literal type, with a constexpr
> constructor and constexpr operators. I think we might as well make
> use of those features if we can.
Fixed.
> The operator functions take const parameters by value, is that a
> conscious decision to prefer that to either non-const parameters or
> pass by reference?
It's intended.
First, since sizeof(_RegexMask) is relatively small, a call-by-value
call may save more time than pointer-based reference;
Second, making it `const` enables more potential compiler optimizations.
Am I naive somewhere about these?
> The definition of operator== considers all bits of _M_extened to be
> part of the object's value representation, so:
>
> char_class_type{{},4} != char_class_type{{},0}
>
> even though those two values have the same flags set and behave
> identically when passed to regex_traits::isctype. It would be better
> to define operator== to use _M_extended&3 so only the bits we care
> about contribute to the value.
Yes you are absolutely right!
> In theory we shouldn't need a special case for the "blank" character
> class, I got that added to std::ctype_base via
> http://cplusplus.github.io/LWG/lwg-defects.html#2019, but we don't
> support it yet in libstdc++ (I forgot about it.)
We don't have std::isblank yet; I add a comment for this.
> Your isctype implementation tests __f._M_extended ==
> _RegexMask::_S_word, shouldn't that use & not ==, since __f could have
> both the _S_word and _S_blank bits set? Similarly for the _S_blank
> case.
Sorry for carelessness, fixed.
> The _S_word case tests against ctype_base::alpha, but it should be
> alnum not alpha. I'd suggest getting rid of that test entirely
> though: The "w" class is represented by _RegexMask{0, _S_word},
> whereas in the uncommitted patch I posted to
> http://gcc.gnu.org/ml/libstdc++/2010-10/msg00049.html I represented it
> by the equivalent of _RegexMask{ctype_base::alnum,
> _S_char_class_under}, which means you don't need to check the alnum
> case because the earlier __fctyp.is(__f._M_base, __c) test will
> already have handled it. That means instead of a _S_word bit you only
> need _S_under instead.
Great idea about "inherit from alnum"! Fixed.
--
Tim Shen