This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] RFC: On-demand locations within string-literals
- From: David Malcolm <dmalcolm at redhat dot com>
- To: gcc-patches at gcc dot gnu dot org
- Date: Wed, 20 Jul 2016 15:38:23 -0400
- Subject: Re: [PATCH] RFC: On-demand locations within string-literals
- Authentication-results: sourceware.org; auth=none
- References: <1468014566-40305-1-git-send-email-dmalcolm@redhat.com>
On Fri, 2016-07-08 at 17:49 -0400, David Malcolm wrote:
[...]
> Also, this patch currently makes the assumption (in charset.c)
> that there's a 1:1 correspondence between bytes in the source
> character set and bytes in the execution character set. This can
> be the case if both are, say, UTF-8, but might not hold in
> general.
>
> The source char set is UTF-8 or UTF-EBCDIC, and safe-ctype.c has:
>
> # if HOST_CHARSET == HOST_CHARSET_EBCDIC
> #error "FIXME: write tables for EBCDIC"
>
> so presumably we don't actually have any hosts that supports EBCDIC
> (do we?); as far as I can tell, we only currently support UTF-8
> as the source char set.
>
> Similarly, do we support any targets for which the execution
> character set is *not* UTF-8?
I brought this up in this thread on the gcc mailing list:
"gcc/libcpp: non-UTF-8 source or execution encodings?"
https://gcc.gnu.org/ml/gcc/2016-07/msg00091.html
and in particular:
https://gcc.gnu.org/ml/gcc/2016-07/msg00106.html
it's possible to select the execution char set using at the command
-line for C-family frontends using:
-fexec-charset=
-fwide-exec-charset=
e.g. "-fexec-charset=IBM1047" will give one of the variants of EBCDIC.
Given that the internal interface already has a failure mode, I'm
thinking that a reasonable restriction is to only support locations
within string literals for the case where source character set ==
execution character set, and hence we have "convert_no_conversion" as
the converter. Does that sound sane? (I can write test coverage for
this).
[...]