Bug 45280 - Stream parsing of digit-then-e (with no exponent) now fails
Summary: Stream parsing of digit-then-e (with no exponent) now fails
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.4.3
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-13 20:09 UTC by Lucian Smith
Modified: 2010-08-13 21:41 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lucian Smith 2010-08-13 20:09:29 UTC
In past versions of the C++ standard library, if I piped a string like "59e" into a double, it would set the double to '59' and set position of the get pointer to after the e.  This meant I had to check if the last char read was an 'e' and if so, back up, but that was OK.

Something changed (and I wish I could tell you when, but I don't know--probably in my upgrade from ubuntu 9.04 to 10.04) so now when I do the same thing, the stream parser just gets confused, sets the 'fail' bit, and sets the double to 0.  Here's a test program:


#include <sstream>
#include <iostream>
using namespace std;

int main(int argc, char** argv)
{
  stringstream withx, withe;
  withx << "59x";
  withe << "59e";
  double num;
  char cc;
  withx >> num;
  cc = withx.get();
  cout << "From the string '59x', the streamed number is " << num << ", and the next character is '" << cc << "'" << endl;
  withe >> num;
  cc = withe.get();
  cout << "From the string '59e', the streamed number is " << num << ", and the next character is '" << cc << "'" << endl;
  cout << "tellg: " << withe.tellg() << endl;
  cout << "tellp: " << withe.tellp() << endl;
  cout << "eof? " << withe.eof() << endl;
  cout << "fail?" << withe.fail() << endl;
  cout << "bad?" << withe.bad() << endl;
}

which outputs:


From the string '59x', the streamed number is 59, and the next character is 'x'
From the string '59e', the streamed number is 0, and the next character is '&#65533;'
tellg: -1
tellp: -1
eof? 0
fail?1
bad?0

Now, ideally, parsing '59e' should behave exactly the same as parsing '59x'--it would notice that there's no exponent, decide the 'e' wasn't part of the number, back up, export '59', and the next result of 'get' would return 'e'.  But barring that, going back to the old behavior of not failing and returning 59, even if the get position is post-e, would be great.

I apologize for not testing this in more recent builds--I did check the changelog and the list of bug fixes, and didn't find this in there, though I certainly could have missed it.
Comment 1 Paolo Carlini 2010-08-13 20:15:23 UTC
Yes, this is intended. We even have testcases about that.
Comment 2 Lucian Smith 2010-08-13 20:22:55 UTC
Is the reasoning explained somewhere?
Comment 3 Paolo Carlini 2010-08-13 20:23:16 UTC
By the way, if you read 22.2.3.1 in C++98, it's clear that 'e' is *not* just any other character: after 'e', a sign is optional but at least a digit is compulsory.
Comment 4 Lucian Smith 2010-08-13 20:34:38 UTC
Yes, exactly!  Which is why the 'e' should not be parsed at all unless there is an optional sign and a compulsory digit following it.  The 'e' in general is not compulsory.  '59' is a valid double.

The context is that I am parsing (among other things) chemical reactions.  It is perfectly valid to have something like:

2EtOH -> EtOHX
Comment 5 Lucian Smith 2010-08-13 20:56:35 UTC
Followup:  This still fails even if you're trying to pipe it into an integer and not a double.  Integers, as per 22.2.3.1 in C++98, do not have an optional 'e' after them.  (Though of course you could *cast* a floating point value to an integer.)  Given that, I hope you won't mind if I reopen the bug.
Comment 6 Paolo Carlini 2010-08-13 21:00:26 UTC
You are of course wrong. Parsing something like "59e" as an integer type of course succeeds and gives "59". Really, we have *tons* of testcases about that in the testsuite. We know what we are doing ;)
Comment 7 Lucian Smith 2010-08-13 21:14:26 UTC
You're right!  Sorry; I apparently jumped to a conclusion while testing (but I did test!)

I still disagree that an 'e' with no digit following can be reasonably construed as part of an improperly-formatted float, and think it should instead be considered not part of the float at all, since properly-formatted floats have both the e and the digit.  I think you are erring on the side of assuming an error when a valid interpretation exists.
Comment 8 Paolo Carlini 2010-08-13 21:20:53 UTC
I'm not erring. We changes this behavior on purpose, after having also checked that *2* other, completely independent, implementations agree (ie, Dinkumware and Roguewave).
Comment 9 Lucian Smith 2010-08-13 21:40:15 UTC
Fair enough!  I still disagree, but I guess my task now is to get Dinkumware and Roguewave to change their implementations, and come back.  I don't suppose you'd be swayed by Microsoft?  I didn't think so ;-)
Comment 10 Lucian Smith 2010-08-13 21:41:20 UTC
Whoops, duh, dinkumware is ms.  Never mind, it was a dumb joke anyway.