First Last Prev Next    No search results available      Search page      Enter new bug
Bug#: 39168
Product:  
Component:  
Status: RESOLVED
Resolution: FIXED
Assigned To: Paolo Carlini <paolo.carlini@oracle.com>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Andrey Tsyvarev <tsyvarev@ispras.ru>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
test1.cpp test for num_get<>::get() method text/plain 2009-02-12 15:30 288 bytes Edit
test2.cpp test for money_put<>::put method text/plain 2009-02-12 15:33 296 bytes Edit
z.cpp Unified money_get and num_get test case and results. text/plain 2009-02-14 21:21 677 bytes Edit
z.cpp Corrected unified demo. text/plain 2009-02-14 21:58 802 bytes Edit
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 39168 depends on: Show dependency tree
Show dependency graph
Bug 39168 blocks:

Additional Comments:






View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2009-02-12 15:58 Opened: 2009-02-12 15:26
Description of numpunct<>::do_grouping() (22.2.3.1.2, p3): 

Returns: A basic_string<char> vec used as a vector of integer values, in which
each element vec[i] represents the number of digits in the group at position i,
starting with position 0 as the rightmost group. If vec.size() <= i, the number
is the same as group (i-1); if (i<0 || vec[i]<=0 || vec[i]==CHAR_MAX), the size
of the digit group is unlimited. 

But num_put and num_get implementations don't interpret condition
(vec[i]==CHAR_MAX) as condition of group with unlimited size, but interpret
value CHAR_MAX as regular size of digits group.

According to the code, condition (static_cast<signed char>(vec[i]) > 0) is used
as an indicator of regular group size. This means range 1..127 for vec[i]
values.

But on architectures with char ~ signed char(e.g. x86), CHAR_MAX is 127, so the
range shouldn't include value 127.

In tests, string(1, CHAR_MAX) is used as grouping string. Like string() and
string(1, -1), this should disable grouping at all.
But on architectures with char ~ signed char tests failed. On architectures
with char ~ unsigned char or when CHAR_MAX replaced with -1, all became ok.

------- Comment #1 From Andrey Tsyvarev 2009-02-12 15:30 -------
Created an attachment (id=17289) [edit]
test for num_get<>::get() method

When grouping is disabled, thousands separator should not be read by get()
methods.

------- Comment #2 From Andrey Tsyvarev 2009-02-12 15:33 -------
Created an attachment (id=17290) [edit]
test for money_put<>::put method

When grouping is disabled, money_put<>::put() method for digits should write
these digits as is, without thousands separator.

------- Comment #3 From Paolo Carlini 2009-02-12 15:58 -------
Ok, thanks.

------- Comment #4 From Martin Sebor 2009-02-12 16:49 -------
(In reply to comment #0)

I'm not sure I understand your rationale or I agree that this is a bug. IIUC,
string(1, CHAR_MAX) indicates that groups may be of arbitrary length, which
includes "123,456" This behavior is the same regardless of whether char is
a signed or unsigned type.

As a data point WRT existing practice: all implementations I've tried (Apache
stdcxx on HP-UX/IPF, HP aCC 6.16, Sun C++ 5.9 with both libCstd and STLport,
and IBM XLC++ 9.1) behave the same as libstdc++: they extract 123456 from the
stream and set eofbit.

------- Comment #5 From Paolo Carlini 2009-02-12 17:05 -------
Hi. Maybe the testcases should be amended, but surely when char is signed, just
checking that vec[i] > 0 as an indication of regular group size instead of
vec[i] > 0 && vec[i] != CHAR_MAX cannot be right...

Please, Andrey, double check the testcases. Thanks in advance!

------- Comment #6 From Paolo Carlini 2009-02-12 17:53 -------
Actually, libstdc++ stores 123456, which is indeed fine, and sets failbit |
eofbit, failbit exactly because of the issue discussed here.

------- Comment #7 From Andrey Tsyvarev 2009-02-13 11:21 -------
(In reply to comment #4)
> 
> I'm not sure I understand your rationale or I agree that this is a bug. IIUC,
> string(1, CHAR_MAX) indicates that groups may be of arbitrary length, which
> includes "123,456" This behavior is the same regardless of whether char is
> a signed or unsigned type.

"Arbitrary length" is not quite correct here - "123,456" violates grouping,
given with string(1, CHAR_MAX). Standard use term "unlimited length", which
means(as I understand) that all other digits should incorporate in only one
group - only "123456" is correct.

(In reply to comment #6)
>Actually, libstdc++ stores 123456, which is indeed fine, and sets failbit |
>eofbit, failbit exactly because of the issue discussed here.

The thing is that, according to the standard, CHAR_MAX should be treats similar
as -1. But implementation treats string(1, -1) as no grouping at all, and stops
read, when has encountered symbol ','. So only "123" is accumulated.
This behaviour seems correct for me (though standard treats only string() as no
grouping at all, 22.2.2.1.2, p8).
So with string(1, CHAR_MAX) only "123" should be accumulated, not "123,456".

In other words, test is passed when CHAR_MAX is replaced with -1. Inside
grouping string, CHAR_MAX means same as -1(according to the standard). So test
should be passed with original text.

The same is about the second test.

------- Comment #8 From Paolo Carlini 2009-02-13 11:31 -------
(In reply to comment #7)
                                  Standard use term "unlimited length", which
> means(as I understand) that all other digits should incorporate in only one
> group - only "123456" is correct.

I don't read that anywhere in the Standard.

> The thing is that, according to the standard, CHAR_MAX should be treats similar
> as -1. But implementation treats string(1, -1) as no grouping at all, and stops
> read, when has encountered symbol ','. So only "123" is accumulated.
> This behaviour seems correct for me (though standard treats only string() as no
> grouping at all, 22.2.2.1.2, p8).
> So with string(1, CHAR_MAX) only "123" should be accumulated, not "123,456".

No, no, it is not correct, it is a bug in the current implementation. See where
bool discard is defined, in Stage 2. (1, -1) (or (1, CHAR_MAX)) are not
special, are normal groupings, that must be enforced.

------- Comment #9 From Paolo Carlini 2009-02-13 11:39 -------
In other terms, as usual about those matters, Martin is right ;) Only he has
got a wrong data point about libstdc++, he believes we are setting eofbit.

Thanks anyway for pointing our attention to this CHAR_MAX issue, I'm going to
fix it together with the bool discard thing (and its consequences).

------- Comment #10 From Andrey Tsyvarev 2009-02-13 13:41 -------
(In reply to comment #8)
> (In reply to comment #7)
>                                   Standard use term "unlimited length", which
> > means(as I understand) that all other digits should incorporate in only one
> > group - only "123456" is correct.
> 
> I don't read that anywhere in the Standard.

Could you clarify this a bit?
Do you mean that when reading "123,456" with string(1, -1), failbit shouldn't
be set?

------- Comment #11 From Paolo Carlini 2009-02-13 13:43 -------
(In reply to comment #10)
> Do you mean that when reading "123,456" with string(1, -1), failbit shouldn't
> be set?

Right, as Martin said. 

------- Comment #12 From Andrey Tsyvarev 2009-02-13 15:04 -------
Let's consider the following situation (seems lifelike to me). Suppose one
needs a representation of numbers in which only the last 3 digits are separated
from all other digits (grouped), like "1234,567" or "1234567,890". Other
separators shouldn't appear.

Grouping string "\003" doesn't fit for this purpose as it separates all
3-digits groups: "1,234,567".
Before this PR, I thought that "\003\177" is sufficient for this purpose. But,
as I understand now, the representations like "12,34,567" are acceptable in
this case, which is not what I would like to have.

Could you suggest which grouping string should be used to do such number
representation? Or is this unachievable?

I investigated locale support in POSIX, such number representation is achieved
there with "\003\177". It seems strange for me that similar mechanism is not
work with C++ locales.

------- Comment #13 From Paolo Carlini 2009-02-13 15:11 -------
(In reply to comment #12)
> Could you suggest which grouping string should be used to do such number
> representation? Or is this unachievable?

Actually \003\176 seems perfectly practical to me. In particular considering
the actual size of real numbers.

> I investigated locale support in POSIX, such number representation is achieved
> there with "\003\177". It seems strange for me that similar mechanism is not
> work with C++ locales.

POSIX seems a good point, probably you should have mentioned it much earlier.
Let's hear Martin again, then. Certainly, however, I'm concerned about having a
behavior different from all the other implementations mentioned by Martin.

------- Comment #14 From Paolo Carlini 2009-02-13 15:21 -------
If I understand correctly, in order to implement the POSIX behavior for C++,
assuming we want it, the Standard should be clarified to explain that values <=
0 or CHAR_MAX are different in that do no admit repetitions, thus saying
explicitly that such group is effectively the last, arbitrarily long, one.

I must say, that is what we would do when formatting anyway.

------- Comment #15 From Andrey Tsyvarev 2009-02-13 15:38 -------
(In reply to comment #14)
> If I understand correctly, in order to implement the POSIX behavior for C++,
> assuming we want it, the Standard should be clarified to explain that values <=
> 0 or CHAR_MAX are different in that do no admit repetitions, thus saying
> explicitly that such group is effectively the last, arbitrarily long, one.

Yes, I meen exactly this.
Also, current implementation follows this strategy - accroding to the tests,
and according to it's source code.

So it is strange for me, that the standard for num_get<>::do_get() say, that
only string() is indicator of no grouping at all.
While string(1,-1) and string(1,CHAR_MAX) also force number representation to
have only one group.

------- Comment #16 From Paolo Carlini 2009-02-13 15:49 -------
(In reply to comment #15)
> (In reply to comment #14)
> > If I understand correctly, in order to implement the POSIX behavior for C++,
> > assuming we want it, the Standard should be clarified to explain that values <=
> > 0 or CHAR_MAX are different in that do no admit repetitions, thus saying
> > explicitly that such group is effectively the last, arbitrarily long, one.
> 
> Yes, I meen exactly this.
> Also, current implementation follows this strategy - according to the tests,
> and according to it's source code.

Yes, I know that ;) And I also know that implementing the exact behaviour you
want would be trivial and would bring consistency between signed and unsigned
char platforms. Probably we want to implement it anyway in v3, but that doesn't
mean by itself that it's obviously the only meaningful interpretation of the
standard if so many other implementations differ, as pointed out by Martin. 

Thus let's be clear about that with Martin and probably at the same time also
take actions to open a DR for C++0x.

------- Comment #17 From Martin Sebor 2009-02-14 21:21 -------
Created an attachment (id=17300) [edit]
Unified money_get and num_get test case and results.

Attached is a unified test case with test results on popular platforms. The
assertions reflect the behavior described in comment #4 which may actually
turn out to be incorrect in the end.

------- Comment #18 From Martin Sebor 2009-02-14 21:41 -------
I was too hasty -- the attached test case is buggy: it's missing a seek to
the beginning of the stream after the first extraction, making the results
for the num_get part meaningless.

(In reply to comment #7)
> "Arbitrary length" is not quite correct here - "123,456" violates grouping,
> given with string(1, CHAR_MAX). Standard use term "unlimited length", which
> means(as I understand) that all other digits should incorporate in only one
> group - only "123456" is correct.

That seems like a reasonable interpretation but others appear to be possible
as well. Looks like this needs to be clarified.

(In reply to comment #12)
> Let's consider the following situation (seems lifelike to me). Suppose one
> needs a representation of numbers in which only the last 3 digits are separated
> from all other digits (grouped), like "1234,567" or "1234567,890". Other
> separators shouldn't appear.

It seems that "\003\000" should do that, and unless I'm mistaken, does with
libstc++ (but not other implementations).

(In reply to comment #13)
> POSIX seems a good point, probably you should have mentioned it much earlier.
> Let's hear Martin again, then. Certainly, however, I'm concerned about having a
> behavior different from all the other implementations mentioned by Martin.

I agree. It would be good to reconcile any accidental differences between
C++ and POSIX.

------- Comment #19 From Paolo Carlini 2009-02-14 21:54 -------
Thanks Martin. Then, as requested by submitter, for now I'm just going to tweak
a bit the behaviour of libstdc++ to bring consistency between unsigned and
signed char platforms, in other terms, value <= 0 and value == CHAR_MAX will
lead to the same behavior everywhere.

------- Comment #20 From Martin Sebor 2009-02-14 21:58 -------
Created an attachment (id=17301) [edit]
Corrected unified demo.

Attached a corrected unified demo with assertions removed and with output on
popular implementations for reference.

------- Comment #21 From paolo@gcc.gnu.org 2009-02-15 16:48 -------
Subject: Bug 39168

Author: paolo
Date: Sun Feb 15 16:47:57 2009
New Revision: 144190

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144190
Log:
2009-02-15  Paolo Carlini  <paolo.carlini@oracle.com>

        PR libstdc++/39168
        * src/locale_facets.cc (__verify_grouping(const char*, size_t,
        const string&)): Also check that the value != CHAR_MAX.
        * include/bits/locale_facets.tcc (__numpunct_cache<>::
        _M_cache(const locale&), __add_grouping(_CharT*, _CharT,
        const char*, size_t, const _CharT*, const _CharT*)): Likewise.
        * include/bits/locale_facets_nonio.tcc (__moneypunct_cache<>::
        _M_cache(const locale&)): Likewise.
        * testsuite/22_locale/money_put/put/wchar_t/39168.cc: New.
        * testsuite/22_locale/money_put/put/char/39168.cc: Likewise.
        * testsuite/22_locale/money_get/get/wchar_t/39168.cc: Likewise.
        * testsuite/22_locale/money_get/get/char/39168.cc: Likewise.
        * testsuite/22_locale/num_get/get/wchar_t/39168.cc: Likewise.
        * testsuite/22_locale/num_get/get/char/39168.cc: Likewise.


Added:
    trunk/libstdc++-v3/testsuite/22_locale/money_get/get/char/39168.cc
    trunk/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/39168.cc
    trunk/libstdc++-v3/testsuite/22_locale/money_put/put/char/39168.cc
    trunk/libstdc++-v3/testsuite/22_locale/money_put/put/wchar_t/39168.cc
    trunk/libstdc++-v3/testsuite/22_locale/num_get/get/char/39168.cc
    trunk/libstdc++-v3/testsuite/22_locale/num_get/get/wchar_t/39168.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/locale_facets.tcc
    trunk/libstdc++-v3/include/bits/locale_facets_nonio.tcc
    trunk/libstdc++-v3/src/locale_facets.cc

------- Comment #22 From Paolo Carlini 2009-02-15 16:49 -------
Fixed for 4.4.0.

First Last Prev Next    No search results available      Search page      Enter new bug