This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Enable input using variable-width encodings


Hi,

This patch allows basic_filebuf::underflow to handle stateless
variable-width encodings (such as UTF-8) correctly.

codecvt_base::partial is handled by keeping a permanent buffer
of unconverted bytes (_M_ext_buf). Bytes that are not converted
during one call to underflow are converted during the next one.

The external buffer is delimited by three pointers; _M_ext_buf,
_M_ext_next and _M_ext_end. The range [_M_ext_buf, _M_ext_next)
corresponds to [ebase(), egptr()) and the range
[_M_ext_next, _M_ext_end) contains bytes that have not been
converted. The position in the external buffer that matches
eptr() is thus

  _M_ext_buf +
  codecvt::length(state, _M_ext_buf, _M_ext_next,
                  eptr() - ebase())

This should make it easy to implement seekoff.

Tested on i686-pc-linux-gnu.

Regards,
Petur

2003-09-04  Petur Runolfsson  <peturr02@ru.is>

	PR libstdc++/9028
	* include/bits/fstream.tcc
	(basic_filebuf::_M_destroy_internal_buffer): Destroy _M_ext_buf.
	(basic_filebuf::basic_filebuf): Initialize _M_ext_buf,
	_M_ext_buf_size, _M_ext_next and _M_ext_end.
	(basic_filebuf::underflow): Handle variable-width stateless
	encodings (codecvt::encoding() == 0), including UTF-8.
	* include/std/std_fstream.h (basic_filebuf):
	Declare _M_ext_buf, _M_ext_buf_size, _M_ext_next, _M_ext_end.
	* testsuite/27_io/basic_filebuf/underflow/wchar_t/1.cc: New test.
	* testsuite/27_io/basic_filebuf/underflow/wchar_t/2.cc: New test.
	* testsuite/27_io/basic_filebuf/underflow/wchar_t/3.cc: New test.
	* testsuite/27_io/basic_filebuf/underflow/wchar_t/4.cc: New test.
	* testsuite/27_io/basic_filebuf/underflow/wchar_t/5.cc: New test.
	* testsuite/27_io/objects/wchar_t/12.cc: New test.
	* testsuite/27_io/objects/wchar_t/13.cc: New test.

Attachment: utf8.diff
Description: utf8.diff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]