This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libstdc++/18678] std::time_put<wchar_t> is broken with UTF-8 locales


------- Additional Comments From rleigh at debian dot org  2004-11-26 12:13 -------
Yes, I'm using 3.4.3 (and glibc-2.3.2.ds1-18).  With respect to the comparisons,
I've now added wcsftime() to the test, and it /does/ match std::time_put<wchar_t>:

$ LANG=ru_RU.UTF-8 LC_ALL=ru_RU.UTF-8 ./date3
asctime:                Fri Nov 26 11:32:47 2004
strftime:               Ð?Ñ?н 26 Ð?оÑ? 2004 11:32:47
wcsftime:               B= 26 >O 2004 11:32:47
std::time_put<char>:    Ð?Ñ?н 26 Ð?оÑ? 2004 11:32:47
std::time_put<wchar_t>: B= 26 >O 2004 11:32:47
Viewed as hexadecimal (aligned for comparison):
"Narrow" UTF-8:
    Ð?     Ñ?     н        2  6     Ð?     о     Ñ?
==> d0 9f d1 82 d0 bd 20 32 36 20 d0 9d d0 be d1 8f
       2  0  0  4     1  1  :  3  2  :  4  7  \n
==> 20 32 30 30 34 20 31 31 3a 35 31 3a 30 34 0a

"Wide" unknown:
      B  =              2  6       >  O
==> 1f 42 3d          20 32 36 20 1d 3e 4f
       2  0  0  4     1  1  :  3  2  :  4  7  \n
==> 20 32 30 30 34 20 31 31 3a 35 31 3a 30 34 0a

However... I am using a UTF-8-capable terminal (GNOME-terminal, and the Linux
console with LatArCyrHeb font in UTF-8 mode).  In both of these cases, the
output appears as above; the "narrow" output to cout is displayed correctly (I
verified this is valid UTF-8), but the "wide" output to wcout is not valid UTF-8.

I expected valid UTF-8 in both cases, since this is what the locale codeset
specifies.  I'm not sure what encoding wchar_t would be using, but I assumed I
would get readable output (maybe I am wrong about that?).  It looks like the
"wide" output is a different encoding, but for some reason has not affected the
7-bit ASCII range (I would have expected something like padding with \0 if it
was outputting UCS-4).


Regards,
Roger


#include <iostream>
#include <locale>
#include <ctime>
#include <cwchar>

int main()
{
  // Set up locale stuff...
  std::locale::global(std::locale(""));
  std::cout.imbue(std::locale());
  std::wcout.imbue(std::locale());

  // Get current time
  time_t simpletime = time(0);

  // Break down time.
  std::tm brokentime;
  localtime_r(&simpletime, &brokentime);

  // Normalise.
  mktime(&brokentime);

  std::cout << "asctime:                " << asctime(&brokentime);

  // Print with strftime(3)
  char buffer[40];
  std::strftime(&buffer[0], 40, "%c", &brokentime);

  std::cout << "strftime:               " << &buffer[0] << '\n';

  wchar_t wbuffer[40];
  std::wcsftime(&wbuffer[0], 40, L"%c", &brokentime);
  std::wcout << "wcsftime:               " << &wbuffer[0] << '\n';

  // Try again, but use proper locale facets...
  const std::time_put<char>& tp =
    std::use_facet<std::time_put<char> >(std::cout.getloc());

  std::string pattern("std::time_put<char>:    %c\n");
  tp.put(std::cout, std::cout, std::cout.fill(),
	 &brokentime, &*pattern.begin(), &*pattern.end());

  // And again, but using wchar_t...
  const std::time_put<wchar_t>& wtp =
    std::use_facet<std::time_put<wchar_t> >(std::wcout.getloc());

  std::wstring wpattern(L"std::time_put<wchar_t>: %c\n");
  wtp.put(std::wcout, std::wcout, std::wcout.fill(),
	  &brokentime, &*wpattern.begin(), &*wpattern.end());


  return 0;
}



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18678


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]