This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

questions about new multibyte character support in EGCS/GCC2


In July multibyte character support was added to EGCS, and these
changes recently got folded into GCC2.  E.g. now strings can contain
shift-JIS (which formerly was troublesome in strings since it uses '\'
bytes to encode Japanese characters).

I'm looking into adding draft-C9x support to the C preprocessor and
lexer.  Among other things, draft C9x specifies the relationship
between multibyte chars and \u escapes.  I have some questions about
the EGCS/GCC2 multibyte support, though.

* As far as I can tell, the multibyte functionality isn't documented;
  is this intentional?  Is it documented somewhere outside the EGCS
  distribution?

* The cccp.c startup code currently looks like this:

    literal_codeset = getenv ("LANG");

  but the usual way in other programs is to look at LC_ALL first, then
  LC_CTYPE, and then LANG last of all.  Why are LC_ALL and LC_CTYPE
  being ignored here?

* mbchar.c supports the quasi-LC_CTYPE locales "C-SJIS", "C-EUCJP",
  and "C-JIS".  Apparently one is supposed to set LANG to one of these
  values if you want to use this functionality -- if you use an
  ordinary value for LANG (e.g. "ja" in Solaris) then you get its
  interpretation.  Are the "C-*" quasi-locales meant for
  cross-compiling or something like that?  Is this undocumented
  functionality being used?

  It seems awkward to usurp LANG for something that is not strictly
  locale-related.  If this functionality is needed, perhaps it should
  be a compiler option instead?  Another possibility might be to use a
  different environment variable (e.g. CROSS_LANG) but allow it to use
  the same values as LANG.  If the functionality is not needed, it might
  be simpler to rename local_mblen to mblen, which would bypass the need
  for separately maintained multibyte functions; one could simply use
  the system functions.

* It appears to me that the multibyte lexing code could be sped up quite
  a bit by using the draft C9x multibyte functions, if available.  Any
  thoughts before I start hacking in this direction?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]