This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Incorrect implementation of codecvt<char, char, mbstate_t> in codecvt.cc
- From: "Kristian Spangsege" <kristian dot spangsege at gmail dot com>
- To: libstdc++ at gcc dot gnu dot org, bop at gmb dot dk
- Date: Sat, 12 Jan 2008 18:22:58 +0100
- Subject: Re: Incorrect implementation of codecvt<char, char, mbstate_t> in codecvt.cc
- References: <19c077830801120131r5d4742abv6c51ce07890c195b@mail.gmail.com> <19c077830801120814y1980d73ev8b90cf926c7e008e@mail.gmail.com>
On 1/12/08, Bo Persson <bop at gmb dot dk> wrote:
> Kristian Spangsege wrote:
>
> The implementation of codecvt<char, char, mbstate_t> is not in
> agreement with DR19 (TC). It may be due to a misinterpretation of
> the text there, and if so, the problem could also be present in the
> implementation of codecvt<wchar_t, char, mbstate_t> - I haven't
> checked that.
>
>
> From the latest SVN version of "codecvt.cc":
>
> codecvt_base::result
> codecvt<char, char, mbstate_t>::
> do_out(state_type&, const intern_type* __from,
> const intern_type*, const intern_type*& __from_next,
> extern_type* __to, extern_type*,
> extern_type*& __to_next) const
> {
> // _GLIBCXX_RESOLVE_LIB_DEFECTS
> // According to the resolution of DR19, "If returns noconv [...]
> // there are no changes to the values in [to, to_limit)."
> __from_next = __from;
> __to_next = __to;
> return noconv;
> }
>
>
>
> '__from_next' is set equal to '__from'. According to DR19 it must
> instead be set equal to '__from_end' (the next argument after
> '__from'.)
>
>
>
> From DR19:
>
> If returns noconv, internT and externT are the same type and the
> converted sequence is identical to the input sequence
> [from,from_next). to_next is set equal to to, the value of state is
> unchanged, and there are no changes to the values in [to,
> to_limit).
>
>
>
> The way I interpret this is as follows: If the return value is
> 'noconv' then it indicates that the initial section [from,from_next)
> of the input can be used directly as output since the converted
> sequence would by identical had it been computed.
>
>
> Yes, but as there has been no conversion at all,
It is true that no conversion has been done, but it is reasonable to
say that your 'noconv' reply concerns the entire input, and as such
the entire input has been accounted for, from the point of view of the
facet user. In a sense the 'noconv' reply lets you process input
without producing output. So...
> what should the "end
> of conumed input sequence" be?
It should be 'from_end', since you wish to express that it is the
entire input that requires no conversion. This line of thought will
also allow you to restrict you 'noconv' reply to a prefix of the
presented input. DR19 directly suggests such usage by saying:
> ... and the
> converted sequence is identical to the input sequence
> [from,from_next).
Back to you comments:
> Setting __from_next = __from indicates
> that no characters were used.
Yes, indeed, but that is in conflict with my interpretation of DR19.
Setting '__from_next = __from' is a non-statement saying that the
_zero_ initial characters of input needs no conversion.
I should mention that I'm not trying to persuadu you that the DR19
directions are technically better than the decisions made in the GCC
implementation. However I think it is important that GCC adhere to the
DR TMs. Also, the comment in the code above indicates that in this
case a consious decision was made to adhere to DR19.
Had DR19 added the following line, the GCC implementation would have
been both good (in my oppinion) and conformant:
> If 'do_out' returns 'noconv' and '__from_next == __from' upon return it means that the 'no conversion' result concerns the entire input string.
Alas, this is not want DR19 says :-)
>
>
> The problem with the implementation above is that if I implement my
> application to conform with DR19, then it will enter an inifinite
> loop because the input position '__from_next' is not advanced.
>
>
>
> I think your code will have to special case 'noconv' anyway, since it
> will have to output the from-sequence (all of it). Other cases will
> have to output the to-sequence (possibly in a loop, handling partial
> conversions).
I don't know exactly what you mena here. Could I ask you to elaborate
a bit on those points?
Regards,
Kristian