Two possible lines on how to fix libstdc++/2071 test case

Loren James Rittle rittle@latour.rsch.comm.mot.com
Wed May 9 23:59:00 GMT 2001


By my reading, this program is portable according to the ISO C
standard, even when stdin is interactive or a pipe:

#include <stdio.h>

int main ()
{
  char buf;
  char buf2;
  char buf3;

  fread (&buf, 1, 1, stdin);
  ungetc (buf, stdin);
  getc (stdin);
  fread (&buf2, 1, 1, stdin);
  ungetc ('6', stdin);
  fread (&buf3, 1, 1, stdin);

  printf ("%c %c %c\n", buf, buf2, buf3);
}

I think I can make the argument that this program is portable since
the standard clearly explains not only when ungetc() may fail but also
when it must succeed and exactly what happens to the state of the
stream when ungetc() is used.  There is a lot more doubt in the
wording around fseek() and our personal experience shows that it may
fail at the whims of the C library implementation.

It is also portable against existing practice.  These are the results
on all tested platforms (same host list as used in analysis report):

; a.out 
1234<CR>
1 2 6

; echo 1234|a.out 
1 2 6

I also consulted the various system man pages.  Solaris guarantees at
least 4 ungetc() in a row (presumably to always support pushing back a
wide character); BSD and OSF1 are both only limited to available
memory resources.  I assume that glibc has no limit.

It is straightforward to replace an fseek(), which is only attempting
to undo the last fread(), with a series of ungetc().  I have no idea
which is more inefficient on most platforms: fseek() [which may
translate to multiple OS calls with my C library!] or (possibly
multiple) ungetc() [which is implemented entirely in user-space].
[Aside, multiple ungetc() is known to be lightening fast on some
platforms, especially when the character matches the one fetched in
its place and/or only one pushback spot is used.]  But I think "making
right" supersedes "making fast" at this point.  That said, I would
agree that this workaround/change should only be made for platforms
that absolutely require it.  glibc, OSF1 and other hosts which do not
mind an fseek() on any type of stream should not have to suffer.
However, if fseek() always translates to one or more OS calls on those
platforms, then someone might care about this issue beyond just making
it work.

Unfortunately, the abstraction layer used in libstdc++-v3 doesn't
support an entry point which maps to ungetc().  However, this issue
can be worked around with an implementation-visible entry point
(similar to the completely non-standard __basic_file<_CharT>::sys_open() ).

OK, that is the line of attack I have in a basic patch form.  I am now
working out how to ensure that only those hosts that need it get it
and only when they really need it.  Given that an interactive test
must be run to spot the problem on a host, I don't know how autoconfig
could be used here.  Thus, I am leaning towards forcing ports that
need the fix to define a macro in our implementation name space within
their os.h file.  Regarding the question of when the fix is needed is
more tricky.  Since ISO C only guarantees that one pushback spot is
available, the argument is made that underflow() should only employ
this fix when the "buffer size" is one.

Here is another observation (i.e. second line of attack - which while
more aggressive and perhaps not entirely legal sounds nifty at first
tought and might be able to solve many other perceived IO performance
problems when C++ IO sits on top of stdio): unless explicitly released
from the contract, C++ IO is suppose to be synchronized with C stdio
on a per-character basis; however, if we assume that guarantee only
holds true when only one or the other IO style is used between any two
sequence points, I think we could do much better with some amount of
internal library re-architecture.  However, this is well outside the
range of a simple bug fix...

Patch to follow, but comments accepted on the technique being applied.

Regards,
Loren



More information about the Libstdc++ mailing list