[PATCH] Handle wide-chars in native_encode_string

Richard Biener rguenther@suse.de
Tue Sep 5 08:25:00 GMT 2017


On Mon, 4 Sep 2017, Joseph Myers wrote:

> On Mon, 4 Sep 2017, Richard Biener wrote:
> 
> > always have a consistend "character" size and how the individual
> > "characters" are encoded.  The patch assumes that the array element
> > type of the STRING_CST can be used to get access to individual
> > characters by means of the element type size and those elements
> > are stored in host byteorder.  Which means the patch simply handles
> 
> It's actually target byte order, i.e. the STRING_CST stores the same 
> sequence of target bytes as would appear on the target system (modulo 
> certain strings such as in asm statements and attributes, for which 
> translation to the execution character set is disabled because those 
> strings are only processed in the compiler on the host, not on the target 
> - but you should never encounter such strings in the optimizers etc.).  
> This is documented in generic.texi (complete with a warning about how it's 
> not well-defined what the encoding is if target bytes are not the same as 
> host bytes).

Ah thanks.

> I suspect that, generically in the compiler, the use of C++ might make it 
> easier than it would have been some time ago to build some abstractions 
> around target strings that work for all of narrow strings, wide strings, 
> char16_t strings etc. (for extracting individual elements - or individual 
> characters which might be multibyte characters in the narrow string case, 
> etc.) - as would be useful for e.g. wide string format checking and more 
> generally for making e.g. optimizations for narrow strings also work for 
> wide strings.  (Such abstractions wouldn't solve the question of what the 
> format is if host and target bytes differ, but their use would reduce the 
> number of places needing changing to establish a definition of the format 
> in that case if someone were to do a port to a system with bytes bigger 
> than 8 bits.)
> 
> However, as I understand the place you're patching, it doesn't have any 
> use for such an abstraction; it just needs to copy a sequence of bytes 
> from one place to another.  (And even with host bytes different from 
> target bytes, clearly it would make sense to define the internal 
> interfaces to make the encodings consistent so this function still only 
> needs to copy bytes from one place to another and still doesn't need such 
> abstractions.)

Right.  Given they are in target representation the patch becomes much
simpler and we can handle all STRING_CSTs modulo for the case where
BITS_PER_UNIT != CHAR_BIT (as you say).  I suppose we can easily
declare we'll never support a CHAR_BIT != 8 host and we currently
don't have any BITS_PER_UNIT != 8 port (we had c4x).  I'm not
sure what constraints we have on CHAR_TYPE_SIZE vs. BITS_PER_UNIT,
or for what port it would make sense to have differing values.
Or what it means for native encoding (should the BITS_PER_UNIT != CHAR_BIT
test be CHAR_TYPE_SIZE != CHAR_BIT instead?).  BITS_PER_UNIT is
also only documented in rtl.texi rather than in tm.texi.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-09-05  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/82084
	* fold-const.c (can_native_encode_string_p): Handle wide characters.

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 251661)
+++ gcc/fold-const.c	(working copy)
@@ -7489,10 +7489,11 @@ can_native_encode_string_p (const_tree e
 {
   tree type = TREE_TYPE (expr);
 
-  if (TREE_CODE (type) != ARRAY_TYPE
+  /* Wide-char strings are encoded in target byte-order so native
+     encoding them is trivial.  */
+  if (BITS_PER_UNIT != CHAR_BIT
+      || TREE_CODE (type) != ARRAY_TYPE
       || TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE
-      || (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (type)))
-	  != BITS_PER_UNIT)
       || !tree_fits_shwi_p (TYPE_SIZE_UNIT (type)))
     return false;
   return true;



More information about the Gcc-patches mailing list