This is the mail archive of the
mailing list for the GCC project.
Re: gcc/libcpp: non-UTF-8 source or execution encodings?
On Tue, 2016-07-19 at 12:24 -0400, David Edelsohn wrote:
> On Tue, Jul 19, 2016 at 12:05 PM, David Malcolm <firstname.lastname@example.org>
> > libcpp/charset.c has a helpful introductory comment
> > describingcharacter
> > sets, including the source and execution character sets.
> > libcpp appears to attempt to support both UTF-8 and UTF-EBCDIC for
> > the
> > source character set, via:
> > #if HOST_CHARSET == HOST_CHARSET_ASCII
> > #define SOURCE_CHARSET "UTF-8"
> > #define LAST_POSSIBLY_BASIC_SOURCE_CHAR 0x7e
> > #elif HOST_CHARSET == HOST_CHARSET_EBCDIC
> > #define SOURCE_CHARSET "UTF-EBCDIC"
> > #define LAST_POSSIBLY_BASIC_SOURCE_CHAR 0xFF
> > #else
> > #error "Unrecognized basic host character set"
> > #endif
> > though libiberty's safe-ctype.c has:
> > # if HOST_CHARSET == HOST_CHARSET_EBCDIC
> > #error "FIXME: write tables for EBCDIC"
> > so presumably we only effectively support UTF-8 as the source char
> > set.
> > Do we support any hosts for which the source character set is *not*
> > UTF
> > -8?
> > Similarly, do we support any targets for which the execution
> > character
> > set is *not* UTF-8?
> > This relates to the locations-within-string-literals patch I posted
> > here:
> > https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00441.html
> > ("[PATCH] RFC: On-demand locations within string-literals"); that
> > patch
> > currently has an assumption that the source encoding == execution
> > encoding, and I'd appreciate knowing a configuration for which this
> > isn't the case so I can test accordingly.
> I believe that the GCC z/TPF configuration uses EBCDIC. There also
> the on-again off-again i370 port.
> Thanks, David
Thanks. Looks like the triple for the former is "s390x-ibm-tpf"; I'm
experimenting with that as the target.
Is there any accessible hardware for these? I don't see them in the
gcc compile farm.