51772 – --enable-clocale=generic makes unsafe assumptions about ctype_base::mask

Bug 51772 - --enable-clocale=generic makes unsafe assumptions about ctype_base::mask

Summary: --enable-clocale=generic makes unsafe assumptions about ctype_base::mask

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	libstdc++ (show other bugs)
Version:	4.7.0

Importance:	P3 minor
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	51018
	Show dependency tree / graph

Reported:	2012-01-06 00:08 UTC by Jonathan Wakely
Modified:	2022-08-21 13:57 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2012-01-06 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jonathan Wakely 2012-01-06 00:08:19 UTC

config/locale/generic/ctype_members.cc initializes _M_bit like so:

    for (size_t __i = 0; __i <= 15; ++__i)
      {
        _M_bit[__i] = static_cast<mask>(1 << __i);
        _M_wmask[__i] = _M_convert_to_wmask(_M_bit[__i]);
      }

This assumes that ctype_base::mask is at least 16 bits.

Each element of _M_bit has a single bit set, and is compared to the defined mask constants in _M_convert_to_wmask:

  ctype<wchar_t>::__wmask_type
  ctype<wchar_t>::_M_convert_to_wmask(const mask __m) const throw()
  {
    __wmask_type __ret;
    switch (__m)
      {
      // ...
      case xdigit:
        __ret = wctype("xdigit");
        break;
      case alnum:
        __ret = wctype("alnum");
        break;
      case graph:
        __ret = wctype("graph");
        break;
      default:
        __ret = __wmask_type();
      }
    return __ret;
  };

If any of the mask constants has more than one bit set it will never be matched.

e.g. on NetBSD xdigit can never be matched for ctype<wchar_t> because config/os/bsd/netbsd/ctype_base.h defines the following

    typedef unsigned char       mask;

    static const mask xdigit    = _N | _X;

As a result, this valid program aborts:

#include <locale>
#include <assert.h>

class gnu_ctype: public std::ctype<wchar_t> { };

int main()
{
  gnu_ctype gctype;

  assert(gctype.is(std::ctype_base::xdigit, L'a'));
}

Comment 1 Paolo Carlini 2012-01-06 13:20:33 UTC

Maybe I'm misunderstanding the tone of your PR, and certainly Benjamin knows better than me because he invented this stuff, but I don't think the blame should be on the generic configuration per se, in other terms it isn't supposed to be fully generic and covering all possible situations. I think the general idea was providing something covering a good range of cases and then making easy addind new ones covering the special needs of the various targets. I think that, for now at least, should. E done for netbsd too.

Comment 2 Jonathan Wakely 2012-01-06 14:00:52 UTC

Sorry for the tone, it might be that this is only really broken on netbsd, and maybe using the ieee model there would fix the test failure - I need to investigate further

Comment 3 Jonathan Wakely 2012-01-06 18:10:04 UTC

I see the actual problem now, it's not as bad as I initially thought (it was late at night when I started debugging this!)

The code in generic/ctype_members.cc assumes that each ctype_base::mask constant will either be a single bit, or be the bitwise-or of other ctype_base::mask constants, which works because e.g. if alnum is upper|lower|digit, then do_is(alnum, L'c') will match on lower.

But netbsd defines ctype_base::xdigit as _N|_X where _N is ctype_base::digit but _X corresponds to [A-Fa-f] and is not used in any other ctype_base::mask constant, so the wide characters [A-Fa-f] cannot be matched by xdigit.

newlib similarly uses _X|_N for xdigit, but has --enable-clocale=newlib so doesn't use the generic code.

bionic uses _X|_N but I don't think it supports wchar_t anyway

I believe vxworks and qnx would fail to match is(ctype::space, L' ') and is(ctype::print, L' ') for similar reasons, if they support wchar_t

Comment 4 Jonathan Wakely 2014-10-17 09:09:09 UTC

In config/locale/newlib/ctype_members.cc:

      default:
	// Different from the generic version, xdigit and print in
	// newlib are defined as bitwise-OR result of bitmasks:
	//   xdigit = _X | _N;
	//   print  = _P | _U | _L | _N | _B;
	// in which _X and _B don't correspond to any ctype mask.
	// In order to get the wmask correctly converted when __m is
	// equal to _X or _B, the two cases are specifically handled
	// here.
	if (__m & xdigit)
	  __ret = wctype("xdigit");
	else if (__m & print)
	  __ret = wctype("print");
	else
	  __ret = __wmask_type();
      }

This doesn't only apply to the masks defined in config/os/newlib/ctype_base.h, but also netbsd and openbsd and bionic.

Something similar is also needed for qnx, as config/os/qnx/qnx6.1/ctype_base.h uses several bitmasks that do not correspond to any ctype, e.g. _XA, _SP, _XS, _XB