Bug 37958

Summary: num_get<>::do_get(bool) sets eofbit flag incorrectly when boolalpha == true
Product: gcc Reporter: Andrey Tsyvarev <tsyvarev>
Component: libstdc++Assignee: Paolo Carlini <paolo.carlini>
Status: RESOLVED FIXED    
Severity: normal CC: ciaran.mccreesh, gcc-bugs
Priority: P3    
Version: 4.4.0   
Target Milestone: 4.4.0   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2008-10-30 16:51:34
Attachments: Test, modelling example from standard exactly
Test, whether there is an unnecessary comparision (in == end)
Analogue of basic_istringstream with (in == end) catching

Description Andrey Tsyvarev 2008-10-30 14:59:11 UTC
There are some examples in the description of member function of the num_get<>
facet

iter_type do_get(iter_type in, iter_type end, ios_base& str, ios_base::iostate& err, bool& val) const

for the case when (str.flags() & ios_base::boolalpha) != 0, that clarify the
algorithm that the function implements (22.2.2.1.2, p16):

For targets true: "1" and false: "0", the input sequence "1" yields val == true and err == str.goodbit. For empty targets (""), any input sequence yields err ==
str.failbit.

But in both cases implementation also sets flag str.eofbit in err.

It seems that implementation always set str.eofbit when after call of the function (in == end).

But standard states(22.2.2.1.2, p16) that this flag should be set only when:
"if, when seeking another character to match, it is found that (in == end)" (on success)
or "if the reason for the failure was that (in == end)" (on fail)

This conditions are not the same as simply (in == end).

Short test reproduce this difference for targets "true" and "false" and input sequence "true" (similar to the first example):

test.cpp:

#include <iostream>
#include <sstream>
using namespace std;

int main()
{
    istringstream is("true");
    bool result;
    is >> boolalpha >> result;
    
    if(is.rdstate() & ios_base::eofbit)
        cout << "eofbit was set." << endl;
    else
        cout << "eofbit wasn't set." << endl;
    return 0;
}

andrew@Ubuntu:/mnt/hgfs/shared/temp/test$ g++ test.cpp && ./a.out
eofbit was set.
andrew@Ubuntu:/mnt/hgfs/shared/temp/test$ g++ --version
g++ (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The detailed bug description can be found at: 

http://linuxtesting.org/results/report?num=S0735
Comment 1 Paolo Carlini 2008-10-30 16:07:00 UTC
(In reply to comment #0)
> It seems that implementation always set str.eofbit when after call of the
> function (in == end).
> 
> But standard states(22.2.2.1.2, p16) that this flag should be set only when:
> "if, when seeking another character to match, it is found that (in == end)" (on
> success)
> or "if the reason for the failure was that (in == end)" (on fail)
> 
> This conditions are not the same as simply (in == end).

Frankly, I don't get it. In my reading of the standard, either when val is set or val is not set, when in == end at the end then we have eofbit. Maybe you should simply attach a testcase where the behavior is incorrect, the provided one is fine (and consistent with the general behavior for numeric parsing).
Comment 2 Paolo Carlini 2008-10-30 16:15:12 UTC
Maybe now I see, what's behind your report: you believe that, when val is set, eofbit should be set only when seeking another character to match. But that would essentially boil down to *never* set it, in that case, because val is set only when a target sequence is uniquely matched, thus we don't look beyond the last matched character.

If anything, I think this issue qualifies for a DR, I don't see compelling reasons to change the current behavior of v3, which makes sense and its consistent with the other numeric parsings.
Comment 3 Paolo Carlini 2008-10-30 16:51:34 UTC
Ok, now I see, weird.
Comment 4 Andrey Tsyvarev 2008-10-31 10:34:52 UTC
Created attachment 16595 [details]
Test, modelling example from standard exactly

("For targets true: "1" and false: "0", the input sequence "1" yields val == true and err == str.goodbit")

Output the same:

andrew@Ubuntu:/mnt/hgfs/shared/temp/test$ g++ test.cpp && ./a.out
eofbit was set.
Comment 5 Paolo Carlini 2008-10-31 10:36:31 UTC
Ok, thanks. I'm on it, patch forthcoming.
Comment 6 Andrey Tsyvarev 2008-10-31 10:48:26 UTC
(In reply to comment #2)
> Maybe now I see, what's behind your report: you believe that, when val is set,
> eofbit should be set only when seeking another character to match. But that
> would essentially boil down to *never* set it, in that case, because val is set
> only when a target sequence is uniquely matched, thus we don't look beyond the
> last matched character.

I suppose there is a situations when eofbit is set:

Assume, sequence is "fals". Because "fals" is prefix of "false", do_get() should try to obtained character after 's', but it found, that (in==end). So it should set err to (failbit | eofbit).

Here is an example of a situation when this difference between standard and current implementation may be significant.

Consider the stream, in which operation (in == end) will block process until a character is available. So, comparision (in == end) always returns false, until stream is forced to close.

For example, assume the stream is connected to terminal. So if the stream is depleted but the program performs the comparison (in == end), current process will be blocked until the user inputs another character.

For the prompt "Are you sure to perform this operation?" it is sufficient for the user to input {'y', 'e', 's'} and program will interpret this as affirmative answer.

But for current implementation user should input {'y', 'e', 's', <enter>}, because after reading "yes" do_get() performs the comparison (in == end) to verify whether eofbit should be set in err. But this call will block program until the user inputs another character, which does not affect the interpretation of the user's answer, though.
Comment 7 paolo@gcc.gnu.org 2008-10-31 16:49:09 UTC
Subject: Bug 37958

Author: paolo
Date: Fri Oct 31 16:47:48 2008
New Revision: 141498

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141498
Log:
2008-10-31  Paolo Carlini  <paolo.carlini@oracle.com>

	PR libstdc++/37958
	* include/bits/locale_facets.tcc (num_get<>::do_get(iter_type,
	iter_type, ios_base&, ios_base::iostate&, bool&): Fix.
	* testsuite/22_locale/num_get/get/char/37958.cc: New.
	* testsuite/22_locale/num_get/get/wchar_t/37958.cc: Likewise.

Added:
    trunk/libstdc++-v3/testsuite/22_locale/num_get/get/char/37958.cc
    trunk/libstdc++-v3/testsuite/22_locale/num_get/get/wchar_t/37958.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/locale_facets.tcc

Comment 8 Paolo Carlini 2008-10-31 16:50:08 UTC
Fixed for 4.4.0.
Comment 9 Andrey Tsyvarev 2008-11-01 12:21:02 UTC
Patch seems doesn't resolve problem entirely.
The thing is that, besides of constraints on setting eofbit flag, the standard claims, that comparision (in == end) should be perfomed only when it is needed for identify a match(22.2.2.1.2, p15): 

"Successive characters in the range [in,end) (see 23.1.1) are obtained and matched against corresponding positions in the target sequences ONLY as necessary to identify a unique match. The input iterator in is compared
to end ONLY when necessary to obtain a character."

If also take into account constraints on setting eofbit flag:

"if, when seeking another character to match, it is found that (in == end)" and
"if the reason for the failure was that (in == end).",

one may conclude, that

eofbit flag should be set IF and ONLY IF comparision (in == end) was performed and it returned true.


Declaration

bool __testeof = __beg == __end;

in the patched code means, that implementation always compares (in == end) at start, even when input and target sequences are empty (""). In that case, err will be set to failbit, which conforms to the example in the standard:

"For empty targets (""), any input sequence yields err == str.failbit"

But input sequence should not be read at all in this case, so comparision (in == end) should not be performed!

There are also other cases, when implementation performs unnecessary comparision (in == end).
Comment 10 Paolo Carlini 2008-11-01 12:36:04 UTC
(In reply to comment #9)
> Declaration
> 
> bool __testeof = __beg == __end;
> 
> in the patched code means, that implementation always compares (in == end) at
> start, even when input and target sequences are empty (""). In that case, err
> will be set to failbit, which conforms to the example in the standard:
> 
> "For empty targets (""), any input sequence yields err == str.failbit"
> 
> But input sequence should not be read at all in this case, so comparision (in
> == end) should not be performed!

Ok, I'll special case this, but please let's stop the nitpicking here.
Comment 11 Andrey Tsyvarev 2008-11-01 15:52:31 UTC
It is not nitpicking. Case with empty sequences is only demonstration, that code behaviour is not confirm to the standard.

Really, it seems that in every succesfull matching it will be unnecessary comparision (in == end). For example, for target sequences "false" and "true" and input sequence "false" eofbit flag won't be set in err, but when __beg points to 'e', (++__beg == __end) in the last if-operator inside for-cycle will return true.


Comment 12 Andrey Tsyvarev 2008-11-01 16:01:59 UTC
Created attachment 16607 [details]
Test, whether there is an unnecessary comparision (in == end)

By hooking underflow() method of stringbuf, one can verify, whether do_get() implementation performs unnesessary call of (in == end).

First test in main() verify situation, when all sequences are empty.
Second and third - falsename is "false", truename is "true", input sequence is "false" or "true" correspondingly.
Comment 13 paolo@gcc.gnu.org 2008-11-01 16:19:08 UTC
Subject: Bug 37958

Author: paolo
Date: Sat Nov  1 16:17:42 2008
New Revision: 141517

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141517
Log:
2008-11-01  Paolo Carlini  <paolo.carlini@oracle.com>

	PR libstdc++/37958 (cont)
	* include/bits/locale_facets.tcc (num_get<>::do_get(iter_type,
	iter_type, ios_base&, ios_base::iostate&, bool&): Fix again.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/locale_facets.tcc

Comment 14 Paolo Carlini 2008-11-01 16:22:31 UTC
Ok, ok. In the future, please don't lump issues together (i.e., do not reopen PRs if the first commit fixes the original issue, file a separate one) and always provide specific testcases.
Comment 15 Andrey Tsyvarev 2008-11-01 18:43:07 UTC
I belived that the easier describe situation is the better.
Because setting flag is simpler to observe, than the fact of comparision (in == end), PR was reported about flag but comparision.

But the second patch is also not correct...

If run test in the second attachment, in the second testcase(falsename - "false", "truename" - true, input - "false") eofbit flag won't be set(it is right). But in the third testcase(input is "true") implementation set eofbit flag.
Problem is that __lim cannot be used as stopper for the cycle, because target sequences have different lenght.

If move last if in for-cycle

	      if ((!__testt && __n >= __lc->_M_falsename_size)
		  || (!__testf && __n >= __lc->_M_truename_size))
		break;

to the beginning of the cycle, all seems to be correct... except case of identical target sequences.

Later I'll try to write more presize test, and may try to write patch.
Comment 16 Paolo Carlini 2008-11-01 18:48:14 UTC
(In reply to comment #15)
> But the second patch is also not correct...

I don't see why.
 
> If run test in the second attachment, in the second testcase(falsename -
> "false", "truename" - true, input - "false") eofbit flag won't be set(it is
> right). But in the third testcase(input is "true") implementation set eofbit
> flag.

Your testcase passes, three PASS in the output.

> Problem is that __lim cannot be used as stopper for the cycle, because target
> sequences have different lenght.

Indeed, the other tests at the end of the for trigger in that case (different lengths).

> Later I'll try to write more presize test, and may try to write patch.

Yes, we need a testcase.
Comment 17 Paolo Carlini 2008-11-01 19:05:21 UTC
Ok, I see that for true in input eofbit should not be set. I'll fix this, but your testcase is wrong.
Comment 18 paolo@gcc.gnu.org 2008-11-01 22:10:53 UTC
Subject: Bug 37958

Author: paolo
Date: Sat Nov  1 22:09:43 2008
New Revision: 141523

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141523
Log:
2008-11-01  Paolo Carlini  <paolo.carlini@oracle.com>

	PR libstdc++/37958 (cont again)
	* include/bits/locale_facets.tcc (num_get<>::do_get(iter_type,
	iter_type, ios_base&, ios_base::iostate&, bool&): Fix again.
	* testsuite/22_locale/num_get/get/char/37958.cc: Extend.
	* testsuite/22_locale/num_get/get/wchar_t/37958.cc: Likewise.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/locale_facets.tcc
    trunk/libstdc++-v3/testsuite/22_locale/num_get/get/char/37958.cc
    trunk/libstdc++-v3/testsuite/22_locale/num_get/get/wchar_t/37958.cc

Comment 19 Andrey Tsyvarev 2008-11-06 07:16:39 UTC
Created attachment 16627 [details]
Analogue of basic_istringstream with (in == end) catching

Using basic_istringstream1 class instead basic_istringstream one can count symbols, which implementation is really read. These symbols are those, for which (in == end) or/and (*in) was performed (++in operation does not affect on this counter).

Test testsuite/22_locale/num_get/get/char/37958.cc may be extend as:

- istringstream iss0, iss1, iss2, iss3;
+ basic_istringstream1<char> iss0, iss1, iss2, iss3;
...
  iss0.str("true");
  iss0.setf(ios_base::boolalpha);
  err = ios_base::goodbit;
  end = ng0.get(iss0.rdbuf(), 0, iss0, err, b0);
  VERIFY( err == ios_base::goodbit );
  VERIFY( b0 == true );
+ VERIFY( iss0.rdbuf()->chars_read() == 4);//after character 'e' (in == end) should not be performed

  iss0.str("false");
  iss0.clear();
  err = ios_base::goodbit;
  end = ng0.get(iss0.rdbuf(), 0, iss0, err, b0);
  VERIFY( err == ios_base::goodbit );
  VERIFY( b0 == false );
+ VERIFY( iss0.rdbuf()->chars_read() == 5);
...
  iss2.str("1");
  iss2.setf(ios_base::boolalpha);
  err = ios_base::goodbit;
  end = ng2.get(iss2.rdbuf(), 0, iss2, err, b2);
  VERIFY( err == ios_base::goodbit );
  VERIFY( b2 == true );
+ VERIFY( iss2.rdbuf()->chars_read() == 1);
...
  iss3.str("");
  iss3.clear();
  b3 = true;
  err = ios_base::goodbit;
  end = ng3.get(iss3.rdbuf(), 0, iss3, err, b3);
  VERIFY( err == ios_base::failbit );
  VERIFY( b3 == false );
+ VERIFY( iss3.rdbuf()->chars_read() == 0);//stream should not be readed at all
Comment 20 Paolo Carlini 2008-11-06 11:52:50 UTC
(In reply to comment #19)
> Created an attachment (id=16627) [edit]

Thanks, as always the project is looking for capable contributors, but you feel like contributing something non-trivial you have first to file a Copyright Assignment:

  http://gcc.gnu.org/contribute.html

(if/when you request the forms make sure to CC an appropriate maintainer, like me or Benjamin Kosnik for the library), and then send the patches in the appropriate from to the appropriate mailing list. Thanks again.