This is GCC Bugzilla
This is GCC Bugzilla Version 2.20+
View Bug Activity | Format For Printing | Clone This Bug
Description of numpunct<>::do_grouping() (22.2.3.1.2, p3): Returns: A basic_string<char> vec used as a vector of integer values, in which each element vec[i] represents the number of digits in the group at position i, starting with position 0 as the rightmost group. If vec.size() <= i, the number is the same as group (i-1); if (i<0 || vec[i]<=0 || vec[i]==CHAR_MAX), the size of the digit group is unlimited. But num_put and num_get implementations don't interpret condition (vec[i]==CHAR_MAX) as condition of group with unlimited size, but interpret value CHAR_MAX as regular size of digits group. According to the code, condition (static_cast<signed char>(vec[i]) > 0) is used as an indicator of regular group size. This means range 1..127 for vec[i] values. But on architectures with char ~ signed char(e.g. x86), CHAR_MAX is 127, so the range shouldn't include value 127. In tests, string(1, CHAR_MAX) is used as grouping string. Like string() and string(1, -1), this should disable grouping at all. But on architectures with char ~ signed char tests failed. On architectures with char ~ unsigned char or when CHAR_MAX replaced with -1, all became ok.
Created an attachment (id=17289) [edit] test for num_get<>::get() method When grouping is disabled, thousands separator should not be read by get() methods.
Created an attachment (id=17290) [edit] test for money_put<>::put method When grouping is disabled, money_put<>::put() method for digits should write these digits as is, without thousands separator.
Ok, thanks.
(In reply to comment #0) I'm not sure I understand your rationale or I agree that this is a bug. IIUC, string(1, CHAR_MAX) indicates that groups may be of arbitrary length, which includes "123,456" This behavior is the same regardless of whether char is a signed or unsigned type. As a data point WRT existing practice: all implementations I've tried (Apache stdcxx on HP-UX/IPF, HP aCC 6.16, Sun C++ 5.9 with both libCstd and STLport, and IBM XLC++ 9.1) behave the same as libstdc++: they extract 123456 from the stream and set eofbit.
Hi. Maybe the testcases should be amended, but surely when char is signed, just checking that vec[i] > 0 as an indication of regular group size instead of vec[i] > 0 && vec[i] != CHAR_MAX cannot be right... Please, Andrey, double check the testcases. Thanks in advance!
Actually, libstdc++ stores 123456, which is indeed fine, and sets failbit | eofbit, failbit exactly because of the issue discussed here.
(In reply to comment #4) > > I'm not sure I understand your rationale or I agree that this is a bug. IIUC, > string(1, CHAR_MAX) indicates that groups may be of arbitrary length, which > includes "123,456" This behavior is the same regardless of whether char is > a signed or unsigned type. "Arbitrary length" is not quite correct here - "123,456" violates grouping, given with string(1, CHAR_MAX). Standard use term "unlimited length", which means(as I understand) that all other digits should incorporate in only one group - only "123456" is correct. (In reply to comment #6) >Actually, libstdc++ stores 123456, which is indeed fine, and sets failbit | >eofbit, failbit exactly because of the issue discussed here. The thing is that, according to the standard, CHAR_MAX should be treats similar as -1. But implementation treats string(1, -1) as no grouping at all, and stops read, when has encountered symbol ','. So only "123" is accumulated. This behaviour seems correct for me (though standard treats only string() as no grouping at all, 22.2.2.1.2, p8). So with string(1, CHAR_MAX) only "123" should be accumulated, not "123,456". In other words, test is passed when CHAR_MAX is replaced with -1. Inside grouping string, CHAR_MAX means same as -1(according to the standard). So test should be passed with original text. The same is about the second test.
(In reply to comment #7) Standard use term "unlimited length", which > means(as I understand) that all other digits should incorporate in only one > group - only "123456" is correct. I don't read that anywhere in the Standard. > The thing is that, according to the standard, CHAR_MAX should be treats similar > as -1. But implementation treats string(1, -1) as no grouping at all, and stops > read, when has encountered symbol ','. So only "123" is accumulated. > This behaviour seems correct for me (though standard treats only string() as no > grouping at all, 22.2.2.1.2, p8). > So with string(1, CHAR_MAX) only "123" should be accumulated, not "123,456". No, no, it is not correct, it is a bug in the current implementation. See where bool discard is defined, in Stage 2. (1, -1) (or (1, CHAR_MAX)) are not special, are normal groupings, that must be enforced.
In other terms, as usual about those matters, Martin is right ;) Only he has got a wrong data point about libstdc++, he believes we are setting eofbit. Thanks anyway for pointing our attention to this CHAR_MAX issue, I'm going to fix it together with the bool discard thing (and its consequences).
(In reply to comment #8) > (In reply to comment #7) > Standard use term "unlimited length", which > > means(as I understand) that all other digits should incorporate in only one > > group - only "123456" is correct. > > I don't read that anywhere in the Standard. Could you clarify this a bit? Do you mean that when reading "123,456" with string(1, -1), failbit shouldn't be set?
(In reply to comment #10) > Do you mean that when reading "123,456" with string(1, -1), failbit shouldn't > be set? Right, as Martin said.
Let's consider the following situation (seems lifelike to me). Suppose one needs a representation of numbers in which only the last 3 digits are separated from all other digits (grouped), like "1234,567" or "1234567,890". Other separators shouldn't appear. Grouping string "\003" doesn't fit for this purpose as it separates all 3-digits groups: "1,234,567". Before this PR, I thought that "\003\177" is sufficient for this purpose. But, as I understand now, the representations like "12,34,567" are acceptable in this case, which is not what I would like to have. Could you suggest which grouping string should be used to do such number representation? Or is this unachievable? I investigated locale support in POSIX, such number representation is achieved there with "\003\177". It seems strange for me that similar mechanism is not work with C++ locales.
(In reply to comment #12) > Could you suggest which grouping string should be used to do such number > representation? Or is this unachievable? Actually \003\176 seems perfectly practical to me. In particular considering the actual size of real numbers. > I investigated locale support in POSIX, such number representation is achieved > there with "\003\177". It seems strange for me that similar mechanism is not > work with C++ locales. POSIX seems a good point, probably you should have mentioned it much earlier. Let's hear Martin again, then. Certainly, however, I'm concerned about having a behavior different from all the other implementations mentioned by Martin.
If I understand correctly, in order to implement the POSIX behavior for C++, assuming we want it, the Standard should be clarified to explain that values <= 0 or CHAR_MAX are different in that do no admit repetitions, thus saying explicitly that such group is effectively the last, arbitrarily long, one. I must say, that is what we would do when formatting anyway.
(In reply to comment #14) > If I understand correctly, in order to implement the POSIX behavior for C++, > assuming we want it, the Standard should be clarified to explain that values <= > 0 or CHAR_MAX are different in that do no admit repetitions, thus saying > explicitly that such group is effectively the last, arbitrarily long, one. Yes, I meen exactly this. Also, current implementation follows this strategy - accroding to the tests, and according to it's source code. So it is strange for me, that the standard for num_get<>::do_get() say, that only string() is indicator of no grouping at all. While string(1,-1) and string(1,CHAR_MAX) also force number representation to have only one group.
(In reply to comment #15) > (In reply to comment #14) > > If I understand correctly, in order to implement the POSIX behavior for C++, > > assuming we want it, the Standard should be clarified to explain that values <= > > 0 or CHAR_MAX are different in that do no admit repetitions, thus saying > > explicitly that such group is effectively the last, arbitrarily long, one. > > Yes, I meen exactly this. > Also, current implementation follows this strategy - according to the tests, > and according to it's source code. Yes, I know that ;) And I also know that implementing the exact behaviour you want would be trivial and would bring consistency between signed and unsigned char platforms. Probably we want to implement it anyway in v3, but that doesn't mean by itself that it's obviously the only meaningful interpretation of the standard if so many other implementations differ, as pointed out by Martin. Thus let's be clear about that with Martin and probably at the same time also take actions to open a DR for C++0x.
Created an attachment (id=17300) [edit] Unified money_get and num_get test case and results. Attached is a unified test case with test results on popular platforms. The assertions reflect the behavior described in comment #4 which may actually turn out to be incorrect in the end.
I was too hasty -- the attached test case is buggy: it's missing a seek to the beginning of the stream after the first extraction, making the results for the num_get part meaningless. (In reply to comment #7) > "Arbitrary length" is not quite correct here - "123,456" violates grouping, > given with string(1, CHAR_MAX). Standard use term "unlimited length", which > means(as I understand) that all other digits should incorporate in only one > group - only "123456" is correct. That seems like a reasonable interpretation but others appear to be possible as well. Looks like this needs to be clarified. (In reply to comment #12) > Let's consider the following situation (seems lifelike to me). Suppose one > needs a representation of numbers in which only the last 3 digits are separated > from all other digits (grouped), like "1234,567" or "1234567,890". Other > separators shouldn't appear. It seems that "\003\000" should do that, and unless I'm mistaken, does with libstc++ (but not other implementations). (In reply to comment #13) > POSIX seems a good point, probably you should have mentioned it much earlier. > Let's hear Martin again, then. Certainly, however, I'm concerned about having a > behavior different from all the other implementations mentioned by Martin. I agree. It would be good to reconcile any accidental differences between C++ and POSIX.
Thanks Martin. Then, as requested by submitter, for now I'm just going to tweak a bit the behaviour of libstdc++ to bring consistency between unsigned and signed char platforms, in other terms, value <= 0 and value == CHAR_MAX will lead to the same behavior everywhere.
Created an attachment (id=17301) [edit] Corrected unified demo. Attached a corrected unified demo with assertions removed and with output on popular implementations for reference.
Subject: Bug 39168 Author: paolo Date: Sun Feb 15 16:47:57 2009 New Revision: 144190 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144190 Log: 2009-02-15 Paolo Carlini <paolo.carlini@oracle.com> PR libstdc++/39168 * src/locale_facets.cc (__verify_grouping(const char*, size_t, const string&)): Also check that the value != CHAR_MAX. * include/bits/locale_facets.tcc (__numpunct_cache<>:: _M_cache(const locale&), __add_grouping(_CharT*, _CharT, const char*, size_t, const _CharT*, const _CharT*)): Likewise. * include/bits/locale_facets_nonio.tcc (__moneypunct_cache<>:: _M_cache(const locale&)): Likewise. * testsuite/22_locale/money_put/put/wchar_t/39168.cc: New. * testsuite/22_locale/money_put/put/char/39168.cc: Likewise. * testsuite/22_locale/money_get/get/wchar_t/39168.cc: Likewise. * testsuite/22_locale/money_get/get/char/39168.cc: Likewise. * testsuite/22_locale/num_get/get/wchar_t/39168.cc: Likewise. * testsuite/22_locale/num_get/get/char/39168.cc: Likewise. Added: trunk/libstdc++-v3/testsuite/22_locale/money_get/get/char/39168.cc trunk/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/39168.cc trunk/libstdc++-v3/testsuite/22_locale/money_put/put/char/39168.cc trunk/libstdc++-v3/testsuite/22_locale/money_put/put/wchar_t/39168.cc trunk/libstdc++-v3/testsuite/22_locale/num_get/get/char/39168.cc trunk/libstdc++-v3/testsuite/22_locale/num_get/get/wchar_t/39168.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/locale_facets.tcc trunk/libstdc++-v3/include/bits/locale_facets_nonio.tcc trunk/libstdc++-v3/src/locale_facets.cc
Fixed for 4.4.0.