This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Handle wide-chars in native_encode_string
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 4 Sep 2017 14:52:25 +0000
- Subject: Re: [PATCH] Handle wide-chars in native_encode_string
- Authentication-results: sourceware.org; auth=none
- References: <alpine.LSU.2.20.1709041617300.14191@zhemvz.fhfr.qr>
On Mon, 4 Sep 2017, Richard Biener wrote:
> always have a consistend "character" size and how the individual
> "characters" are encoded. The patch assumes that the array element
> type of the STRING_CST can be used to get access to individual
> characters by means of the element type size and those elements
> are stored in host byteorder. Which means the patch simply handles
It's actually target byte order, i.e. the STRING_CST stores the same
sequence of target bytes as would appear on the target system (modulo
certain strings such as in asm statements and attributes, for which
translation to the execution character set is disabled because those
strings are only processed in the compiler on the host, not on the target
- but you should never encounter such strings in the optimizers etc.).
This is documented in generic.texi (complete with a warning about how it's
not well-defined what the encoding is if target bytes are not the same as
host bytes).
I suspect that, generically in the compiler, the use of C++ might make it
easier than it would have been some time ago to build some abstractions
around target strings that work for all of narrow strings, wide strings,
char16_t strings etc. (for extracting individual elements - or individual
characters which might be multibyte characters in the narrow string case,
etc.) - as would be useful for e.g. wide string format checking and more
generally for making e.g. optimizations for narrow strings also work for
wide strings. (Such abstractions wouldn't solve the question of what the
format is if host and target bytes differ, but their use would reduce the
number of places needing changing to establish a definition of the format
in that case if someone were to do a port to a system with bytes bigger
than 8 bits.)
However, as I understand the place you're patching, it doesn't have any
use for such an abstraction; it just needs to copy a sequence of bytes
from one place to another. (And even with host bytes different from
target bytes, clearly it would make sense to define the internal
interfaces to make the encodings consistent so this function still only
needs to copy bytes from one place to another and still doesn't need such
abstractions.)
--
Joseph S. Myers
joseph@codesourcery.com