[PATCH] PR18785: Support non-native execution charsets
Joseph S. Myers
joseph@codesourcery.com
Wed Dec 22 17:02:00 GMT 2004
On Wed, 22 Dec 2004, Roger Sayle wrote:
> To support "cross-environments", I propose the following changes below
> to resolve PR18785 and generally improve handling of execution character
> sets. The first is that the middle-end needs to keep track of "charset
> cross-compilation" where the execution charset is different from the
> internal source charset. This is represented below by the new global
> variable "nonnative_charset_p". When this flag is true, the contents
> of TREE_STRING etc... can't be assumed to be NUL-terminated C strings
> that can be passed to the current run-time, i.e. "strcmp". Checking
> this flag, which should be rare in practice, allows the middle-end to
> avoid optimizations that transform or precompute libc builtins at
> comile-time.
String constants are always NUL-terminated, DR#278 / C99 TC2 have
disallowed encodings in which NUL is the first byte of a multibyte
character. (But this is target NUL which might be wider than host NUL, on
C4x; however we don't presently really allow for target strings wider than
host strings in such optimizations at present, and Stage 3 is no time to
start doing so.) String functions such as strcmp operate on strings
independent of locale, with characters treated as unsigned char, see
DR#274 / C99 TC2. (Whereas strcoll indeed can't be optimized even in the
C locale, see DR#235.)
So in general the optimizations based on extracting strings are valid, and
only a few need to take account of the execution character set. (The
standard doesn't seem to make any allowance of runtime variation of the
values of the basic execution character set, although the extended
character set may vary with the runtime locale, so I think isdigit can
indeed be optimized at compile time based on -fexec-charset.)
I think a conservative patch should only disable ctype/wctype/*printf
optimizations for non-native character sets (but not the format checking
diagnostics though they do have the same issue, and the format strings can
be mixtures of multibyte characters and sequences of bytes interpreted as
individual bytes, in complicated ways, rather than simple multibyte
strings). The <string.h> optimizations are safe.
--
Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/
jsm@polyomino.org.uk (personal mail)
joseph@codesourcery.com (CodeSourcery mail)
jsm28@gcc.gnu.org (Bugzilla assignments and CCs)
More information about the Gcc-patches
mailing list