[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris SPARC when parsing a simple character class

Thu May 17 22:36:00 GMT 2018

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2018-05-17
     Ever confirmed|0                           |1

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Wanying Luo from comment #0)
> gcc version 4.9.2 (GCC) 

The earliest currently supported release is GCC 6.4, but this doesn't appear to
have been fixed already.

> In libstdc++-v3/include/bits/locale_classes.tcc, do_transform() is defined
> as follows:
> 
>     do_transform(const _CharT* __lo, const _CharT* __hi) const
>     {
> ...
>               size_t __res = _M_transform(__c, __p, __len);
> ...
>               __ret.append(__c, __res);
> 
> 
> When _M_transform() calls strxfrm() and gets -1 when converting 0x80 under
> the UTF-8 locale on Solaris SPARC, it simply assigns -1 to __res of type
> size_t which creates a very large number. This causes __ret.append(__c,
> __res) to crash.

Well the value returned is already a size_t, so it's already a very large
number (not -1), and we do check for larger values than expected, but we don't
check for errors.

> I think it would be nice if the code checks errno and
> issues a better error message than the one above.

Yes, we need to check errno for errors from strxfrm.