This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Define std::to_chars and std::from_chars for C++17 (P0067R5, partial)


On 07/04/17 13:29 +0100, Jonathan Wakely wrote:
+	      constexpr char __abc[] = "abcdefghijklmnopqrstuvwxyz";
+	      unsigned char __lc = std::tolower(__c);
+	      constexpr bool __consecutive = __consecutive_chars(__abc, 26);
+	      if _GLIBCXX17_CONSTEXPR (__consecutive)
+		{
+		  // Characters 'a'..'z' are consecutive
+		  if (std::isalpha(__c) && (__lc - 'a') < __b)
+		    __c = __lc - 'a' + 10;
<snip>

Is that alright? I have my eyes set on the following line in particular:

+	      unsigned char __lc = std::tolower(__c);

Correct me if I'm wrong but that's locale sensitive. As it turns out in
a Turkish locale the lowercase for <I> is <ı> "U+0131 LATIN SMALL LETTER
DOTLESS I".

I installed a Turkish locale on an old system of mine (Linux, UTF-8) and
sketched out the lines to quickly test the logic. As best as I can tell
the following happens when the global locale is set to Turkish:

 - __lc is 'I' unchanged because <ı> U+0131 would encode to two UTF-8
   codepoints aka can't fit in a char
 - the test succeeds
 - the computed value that ends up being written to __c is decimal -14

Keep in mind I didn't run the actual code from the patch, so I may have
got something wrong. Still, the Turkish i/İ and ı/I pairs are just one
example in one locale. Are std::tolower and std::isalpha the right tools
for this job? (These are the only two locale sensitive parts I
noticed in the code.)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]