[PATCH] Handle wide-chars in native_encode_string
Richard Biener
rguenther@suse.de
Tue Sep 5 08:25:00 GMT 2017
On Mon, 4 Sep 2017, Joseph Myers wrote:
> On Mon, 4 Sep 2017, Richard Biener wrote:
>
> > always have a consistend "character" size and how the individual
> > "characters" are encoded. The patch assumes that the array element
> > type of the STRING_CST can be used to get access to individual
> > characters by means of the element type size and those elements
> > are stored in host byteorder. Which means the patch simply handles
>
> It's actually target byte order, i.e. the STRING_CST stores the same
> sequence of target bytes as would appear on the target system (modulo
> certain strings such as in asm statements and attributes, for which
> translation to the execution character set is disabled because those
> strings are only processed in the compiler on the host, not on the target
> - but you should never encounter such strings in the optimizers etc.).
> This is documented in generic.texi (complete with a warning about how it's
> not well-defined what the encoding is if target bytes are not the same as
> host bytes).
Ah thanks.
> I suspect that, generically in the compiler, the use of C++ might make it
> easier than it would have been some time ago to build some abstractions
> around target strings that work for all of narrow strings, wide strings,
> char16_t strings etc. (for extracting individual elements - or individual
> characters which might be multibyte characters in the narrow string case,
> etc.) - as would be useful for e.g. wide string format checking and more
> generally for making e.g. optimizations for narrow strings also work for
> wide strings. (Such abstractions wouldn't solve the question of what the
> format is if host and target bytes differ, but their use would reduce the
> number of places needing changing to establish a definition of the format
> in that case if someone were to do a port to a system with bytes bigger
> than 8 bits.)
>
> However, as I understand the place you're patching, it doesn't have any
> use for such an abstraction; it just needs to copy a sequence of bytes
> from one place to another. (And even with host bytes different from
> target bytes, clearly it would make sense to define the internal
> interfaces to make the encodings consistent so this function still only
> needs to copy bytes from one place to another and still doesn't need such
> abstractions.)
Right. Given they are in target representation the patch becomes much
simpler and we can handle all STRING_CSTs modulo for the case where
BITS_PER_UNIT != CHAR_BIT (as you say). I suppose we can easily
declare we'll never support a CHAR_BIT != 8 host and we currently
don't have any BITS_PER_UNIT != 8 port (we had c4x). I'm not
sure what constraints we have on CHAR_TYPE_SIZE vs. BITS_PER_UNIT,
or for what port it would make sense to have differing values.
Or what it means for native encoding (should the BITS_PER_UNIT != CHAR_BIT
test be CHAR_TYPE_SIZE != CHAR_BIT instead?). BITS_PER_UNIT is
also only documented in rtl.texi rather than in tm.texi.
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
Richard.
2017-09-05 Richard Biener <rguenther@suse.de>
PR tree-optimization/82084
* fold-const.c (can_native_encode_string_p): Handle wide characters.
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c (revision 251661)
+++ gcc/fold-const.c (working copy)
@@ -7489,10 +7489,11 @@ can_native_encode_string_p (const_tree e
{
tree type = TREE_TYPE (expr);
- if (TREE_CODE (type) != ARRAY_TYPE
+ /* Wide-char strings are encoded in target byte-order so native
+ encoding them is trivial. */
+ if (BITS_PER_UNIT != CHAR_BIT
+ || TREE_CODE (type) != ARRAY_TYPE
|| TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE
- || (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (type)))
- != BITS_PER_UNIT)
|| !tree_fits_shwi_p (TYPE_SIZE_UNIT (type)))
return false;
return true;
More information about the Gcc-patches
mailing list