This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

proposed Opengroup action for c99 command (XCU ERN 76)


I recently proposed to the Open Group an action that would modify the
POSIX specification for the c99 command that is often implemented
using GCC.  I thought the action would not affect GCC's conformance,
but Joseph S. Myers raised the issue of UCNs and multibyte characters
and I'd like to double-check that GCC is OK.  If the action does
affect GCC I'd like to modify the action before it's too late.

Here's the problem.  Currently, POSIX places almost no requirements on
how c99 transforms the physical source file into C source-language
characters.  For example, c99 is free to treat CR as LF, ignore
trailing white space, convert tabs to spaces, or even (perversely)
require that input files all start with line numbers that are
otherwise ignored.  This lack of specification was not intended, and
I'm trying to help nail down the intent of what c99 is allowed to do.

I proposed to insert the following paragraph after XCU page 213 line
8366 (i.e, at the end of the INPUT FILES section of the c99 spec
<http://www.opengroup.org/onlinepubs/009695399/utilities/c99.html>):

   It is implementation-defined whether trailing white-space characters
   in each C-language source line are ignored.  Otherwise, the
   multibyte characters of each source line are mapped on a one-to-one
   basis to the C source character set.

In response Joseph S. Myers pointed out that this action would require
c99 to use interpretation B of section 5.2.1 (page 20) of the C99 Rationale
<http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf>.
The Rationale says C preprocessors can be implemented in three ways:

  A.  Convert everything to UCNs in basic source characters as soon
      as possible, that is, in translation phase 1.  (This is what
      C++ requires, apparently.)

  B.  Use native encodings where possible, UCNs otherwise.

  C.  Convert everything to wide characters as soon as possible
      using an internal encoding that encompasses the entire source
      character set and all UCNs.

The C99 standardizers chose (B), but said implementations could also
use (A) or (C) because the C99 standard gives almost unlimited freedom
in translation phase 1 for compilers to do whatever transformations
they like.

However, the proposed action for the c99 command would close this
escape hatch, forcing interpretation (B) for c99 implementations.

So my question is: Is it a burden on GCC to require interpretation (B)?

My understanding is that GCC already uses (B), and that the answer is
"no, it's no problem", but if I'm wrong please let me know.

For more details, please see Shell and Utilities Enhancement Request
Number 76 (XCU ERN 76), which you can find in
<http://www.opengroup.org/austin/aardvark/latest/xcubug2.txt>.
Also please see the followup email discussion at
<http://www.opengroup.org/austin/mailarchives/ag/>
(look for messages whose subject lines contain "XCU ERN 76").

Thanks.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]