This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] RFC: On-demand locations within string-literals

From: David Malcolm <dmalcolm at redhat dot com>
To: gcc-patches at gcc dot gnu dot org
Date: Wed, 20 Jul 2016 15:38:23 -0400
Subject: Re: [PATCH] RFC: On-demand locations within string-literals
Authentication-results: sourceware.org; auth=none
References: <1468014566-40305-1-git-send-email-dmalcolm@redhat.com>

On Fri, 2016-07-08 at 17:49 -0400, David Malcolm wrote:
[...]

> Also, this patch currently makes the assumption (in charset.c)
> that there's a 1:1 correspondence between bytes in the source
> character set and bytes in the execution character set.  This can
> be the case if both are, say, UTF-8, but might not hold in
> general.
> 
> The source char set is UTF-8 or UTF-EBCDIC, and safe-ctype.c has:
> 
> # if HOST_CHARSET == HOST_CHARSET_EBCDIC
>   #error "FIXME: write tables for EBCDIC"
> 
> so presumably we don't actually have any hosts that supports EBCDIC
> (do we?); as far as I can tell, we only currently support UTF-8
> as the source char set.
> 
> Similarly, do we support any targets for which the execution
> character set is *not* UTF-8?

I brought this up in this thread on the gcc mailing list:
"gcc/libcpp: non-UTF-8 source or execution encodings?"
  https://gcc.gnu.org/ml/gcc/2016-07/msg00091.html
and in particular:
  https://gcc.gnu.org/ml/gcc/2016-07/msg00106.html
it's possible to select the execution char set using at the command
-line for C-family frontends using:
  -fexec-charset=
  -fwide-exec-charset=
e.g. "-fexec-charset=IBM1047" will give one of the variants of EBCDIC.

Given that the internal interface already has a failure mode, I'm
thinking that a reasonable restriction is to only support locations
within string literals for the case where source character set ==
execution character set, and hence we have "convert_no_conversion" as
the converter.  Does that sound sane?  (I can write test coverage for
this).

[...]

Follow-Ups:
- Re: [PATCH] RFC: On-demand locations within string-literals
  - From: Jeff Law

References:
- [PATCH] RFC: On-demand locations within string-literals
  - From: David Malcolm

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]