This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Incorrect implementation of codecvt<char, char, mbstate_t> in codecvt.cc


Oops - the first reply went away before I had added my comments...


Kristian Spangsege wrote:


On 1/12/08, Bo Persson <bop at gmb dot dk> wrote:
Kristian Spangsege wrote:

    The implementation of codecvt<char, char, mbstate_t> is not in
    agreement with DR19 (TC). It may be due to a misinterpretation
    of the text there, and if so, the problem could also be
    present in the implementation of codecvt<wchar_t, char,
    mbstate_t> - I haven't checked that.



    '__from_next' is set equal to '__from'. According to DR19 it
    must instead be set equal to '__from_end' (the next argument
    after '__from'.)



From DR19:

        If returns noconv, internT and externT are the same type
        and the converted sequence is identical to the input
        sequence [from,from_next). to_next is set equal to to, the
        value of state is unchanged, and there are no changes to
        the values in [to, to_limit).



    The way I interpret this is as follows: If the return value is
    'noconv' then it indicates that the initial section
    [from,from_next) of the input can be used directly as output
    since the converted sequence would by identical had it been
computed.


Yes, but as there has been no conversion at all,

It is true that no conversion has been done, but it is reasonable to say that your 'noconv' reply concerns the entire input, and as such the entire input has been accounted for, from the point of view of the facet user. In a sense the 'noconv' reply lets you process input without producing output. So...

what should the "end
of consumed input sequence" be?

It should be 'from_end', since you wish to express that it is the entire input that requires no conversion. This line of thought will also allow you to restrict you 'noconv' reply to a prefix of the presented input. DR19 directly suggests such usage by saying:
... and the
converted sequence is identical to the input sequence
[from,from_next).

But the default char-char conversion is degenerate in that never converts anything. This is signalled by the always_noconv() function returning true.


You shouldn't be surprised that do_in() and do_out() returns noconv. They always do!


Back to you comments:


Setting __from_next = __from indicates
that no characters were used.

Yes, indeed, but that is in conflict with my interpretation of DR19. Setting '__from_next = __from' is a non-statement saying that the _zero_ initial characters of input needs no conversion.

But the return value is 'noconv', meaning that there is no conversion. Otherwise the return value would have been 'partial' (or worse - a failure).



I should mention that I'm not trying to persuadu you that the DR19 directions are technically better than the decisions made in the GCC implementation. However I think it is important that GCC adhere to the DR TMs. Also, the comment in the code above indicates that in this case a consious decision was made to adhere to DR19.

Had DR19 added the following line, the GCC implementation would have
been both good (in my oppinion) and conformant:
If 'do_out' returns 'noconv' and '__from_next == __from' upon
return it means that the 'no conversion' result concerns the
entire input string.
Alas, this is not want DR19 says :-)

I don't think there is a conflict. Honestly.




I think your code will have to special case 'noconv' anyway, since it will have to output the from-sequence (all of it). Other cases will have to output the to-sequence (possibly in a loop, handling partial conversions).

I don't know exactly what you mena here. Could I ask you to elaborate a bit on those points?

The 'noconv' case differs in several ways. One is that the to-sequence is empty. After calling the 'conversion', you have to output the from-sequence instead. After doing that, it is not very hard to realize that the entire from-sequence is processed.


An alternative way is to call the always_noconv() function first and, if it returns true, just skip the calls to in() and out().


Bo Persson



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]