This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[RFC] Adjust output for strings in tree-pretty-print.c

From: FX <fxcoudert at gmail dot com>
To: "GCC Development" <gcc at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Mon, 19 May 2008 14:59:16 +0100
Subject: [RFC] Adjust output for strings in tree-pretty-print.c

Hi all,

The Fortran front-end now handles wide character strings
(UCS-4/UTF-32); for these, the string literals are emitted as strings
with the type of an array of unsigned 32-bit integers. The issue is
that tree-pretty-print.c, in pretty_print_string() assumes strings are
composed of chars and NUL-terminated. This fails, for example, if you
look at the tree dump for the following Fortran source file:

  subroutine foo
    call test(4_"I'm here!")
  end subroutine foo

you currently get:

  foo ()
  {
    test (&"I"[1]{lb: 1 sz: 4}, 9);

On my little-endian compiler, "I'm here!" is in UTF-32:
"I\0\0\0'\0\0\0m\0\0\0 \0\0\0h\0\0\0e\0\0\0r\0\0\0e\0\0\0!\0\0\0". So,
tree-pretty-print.c stops at the first '\0', and we get "l". To make
this work better, as STRING_CST's have an attached length
(TREE_STRING_LENGTH), I suggest using that to output the full string
length, instead of stopping at the first NUL character.

With that patch, the tree dump for the same Fortran source file looks like this:

  test (&"I\0\0\0\'\0\0\0m\0\0\0
\0\0\0h\0\0\0e\0\0\0r\0\0\0e\0\0\0!\0\0\0"[1]{lb: 1 sz: 4}, 9);

and the tree dump for the following C testcase:

  unsigned char *foo(void) { return "look\0here"; }

which was like this:

  return (unsigned char *) "look";

is now like this:

  return (unsigned char *) "look\0here\0";


Notice the added final '\0' in the C case; I don't know if it's bad to
have it there, but I don't see a way to not output it and still have
the correct output for Fortran (whose strings are not NUL-terminated).

Any comments? Is it OK to commit as is? It bootstraps and regtests
fine on x86_64-linux, with C and Fortran enabled, except for
gcc.dg/tree-ssa/builtin-{v,}{f,}printf-1.c which need their
scan-tree-dump patterns adjusted accordingly. If there is no
objection, I'll do that and build and regtest C++, objc and objc++ as
well before going ahead.

Thanks,
FX

-- 
FX Coudert
http://www.homepages.ucl.ac.uk/~uccafco/

Attachment: wide_char_part6_gcc.diff
Description: Binary data

Follow-Ups:
- Re: [RFC] Adjust output for strings in tree-pretty-print.c
  - From: Manfred Hollstein
- Re: [RFC] Adjust output for strings in tree-pretty-print.c
  - From: Jakub Jelinek
- Re: [RFC] Adjust output for strings in tree-pretty-print.c
  - From: Paolo Bonzini

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]