This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: concatenation of string literals


On Sat, Apr 19, 2003 at 07:41:44AM +0100, Joseph S. Myers wrote:
> On Fri, 18 Apr 2003, Matt Kraai wrote:
> 
> > - K+R C compilers did not have a void pointer, and used char * as the
> > - pointer to anything.  The macro PTR is defined as either void * or
> > - char * depending on whether you have a standards compliant compiler or
> > - a K+R one.  Thus
> > - 
> > -   free ((void *) h->value.expansion);
> > - 
> > - should be written
> > - 
> > -   free ((PTR) h->value.expansion);
> 
> I suppose uses of both PTR and such casts as these should be removed from
> the code.
> 
> > - Variable-argument functions are best described by example:-
> 
> Likewise, these can be converted to simply using ISO C <stdarg.h>.
> 
> (Perhaps there should be a list of obsolete coding practices to look out
> for in ISO C conversion?)

In line with this and Neil Booth's suggestion, the following patch
moves the K+R compatibility information into a separate section
and notes that it is no longer required.

> > - Trigraphs
> > - ---------
> > - 
> > - You weren't going to use them anyway, but trigraphs were not defined
> > - in K+R C, and some otherwise ISO C compliant compilers do not accept
> > - them.
> 
> The comment that trigraphs must not be used is still relevant.

D'oh.  Readded.

Here's take two.  OK to commit?

Matt
-- 
Matt Kraai <kraai at alumni dot cmu dot edu>
Debian GNU/Linux Peon

       * README.Portability: Move to a new section and obsolete
       K+R portability issues.

Index: gcc/README.Portability
===================================================================
RCS file: /cvs/gcc/gcc/gcc/README.Portability,v
retrieving revision 1.9
diff -c -3 -p -r1.9 README.Portability
*** gcc/README.Portability	2 Jul 2002 00:15:42 -0000	1.9
--- gcc/README.Portability	19 Apr 2003 10:44:35 -0000
***************
*** 1,4 ****
! Copyright (C) 2000 Free Software Foundation, Inc.
  
  This file is intended to contain a few notes about writing C code
  within GCC so that it compiles without error on the full range of
--- 1,4 ----
! Copyright (C) 2000, 2003 Free Software Foundation, Inc.
  
  This file is intended to contain a few notes about writing C code
  within GCC so that it compiles without error on the full range of
*************** probably what most people code to natura
*** 15,31 ****
  constructs introduced after that is not a good idea.
  
  The first section of this file deals strictly with portability issues,
! the second with common coding pitfalls.
  
  
  			Portability Issues
  			==================
  
  Unary +
  -------
  
  K+R C compilers and preprocessors have no notion of unary '+'.  Thus
! the following code snippet contains 2 portability problems.
  
  int x = +2;  /* int x = 2;  */
  #if +1       /* #if 1  */
--- 15,212 ----
  constructs introduced after that is not a good idea.
  
  The first section of this file deals strictly with portability issues,
! the second with common coding pitfalls, and the third with obsolete
! K+R portability issues.
  
  
  			Portability Issues
  			==================
  
+ String literals
+ ---------------
+ 
+ Some SGI compilers choke on the parentheses in:-
+ 
+ const char string[] = ("A string");
+ 
+ This is unfortunate since this is what the GNU gettext macro N_
+ produces.  You need to find a different way to code it.
+ 
+ Some compilers like MSVC++ have fairly low limits on the maximum
+ length of a string literal; 509 is the lowest we've come across.  You
+ may need to break up a long printf statement into many smaller ones.
+ 
+ 
+ Empty macro arguments
+ ---------------------
+ 
+ ISO C (6.8.3 in the 1990 standard) specifies the following:
+ 
+ If (before argument substitution) any argument consists of no
+ preprocessing tokens, the behavior is undefined.
+ 
+ This was relaxed by ISO C99, but some older compilers emit an error,
+ so code like
+ 
+ #define foo(x, y) x y
+ foo (bar, )
+ 
+ needs to be coded in some other way.
+ 
+ 
+ free and realloc
+ ----------------
+ 
+ Some implementations crash upon attempts to free or realloc the null
+ pointer.  Thus if mem might be null, you need to write
+ 
+   if (mem)
+     free (mem);
+ 
+ 
+ Trigraphs
+ ---------
+ 
+ You weren't going to use them anyway, but some otherwise ISO C
+ compliant compilers do not accept trigraphs.
+ 
+ 
+ Suffixes on Integer Constants
+ -----------------------------
+ 
+ You should never use a 'l' suffix on integer constants ('L' is fine),
+ since it can easily be confused with the number '1'.
+ 
+ 
+ 			Common Coding Pitfalls
+ 			======================
+ 
+ errno
+ -----
+ 
+ errno might be declared as a macro.
+ 
+ 
+ Implicit int
+ ------------
+ 
+ In C, the 'int' keyword can often be omitted from type declarations.
+ For instance, you can write
+ 
+   unsigned variable;
+ 
+ as shorthand for
+ 
+   unsigned int variable;
+ 
+ There are several places where this can cause trouble.  First, suppose
+ 'variable' is a long; then you might think
+ 
+   (unsigned) variable
+ 
+ would convert it to unsigned long.  It does not.  It converts to
+ unsigned int.  This mostly causes problems on 64-bit platforms, where
+ long and int are not the same size.
+ 
+ Second, if you write a function definition with no return type at
+ all:
+ 
+   operate (int a, int b)
+   {
+     ...
+   }
+ 
+ that function is expected to return int, *not* void.  GCC will warn
+ about this.
+ 
+ Implicit function declarations always have return type int.  So if you
+ correct the above definition to
+ 
+   void
+   operate (int a, int b)
+   ...
+ 
+ but operate() is called above its definition, you will get an error
+ about a "type mismatch with previous implicit declaration".  The cure
+ is to prototype all functions at the top of the file, or in an
+ appropriate header.
+ 
+ Char vs unsigned char vs int
+ ----------------------------
+ 
+ In C, unqualified 'char' may be either signed or unsigned; it is the
+ implementation's choice.  When you are processing 7-bit ASCII, it does
+ not matter.  But when your program must handle arbitrary binary data,
+ or fully 8-bit character sets, you have a problem.  The most obvious
+ issue is if you have a look-up table indexed by characters.
+ 
+ For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
+ WITH ACUTE ACCENT.  In the proper locale, isalpha('\341') will be
+ true.  But if you read '\341' from a file and store it in a plain
+ char, isalpha(c) may look up character 225, or it may look up
+ character -31.  And the ctype table has no entry at offset -31, so
+ your program will crash.  (If you're lucky.)
+ 
+ It is wise to use unsigned char everywhere you possibly can.  This
+ avoids all these problems.  Unfortunately, the routines in <string.h>
+ take plain char arguments, so you have to remember to cast them back
+ and forth - or avoid the use of strxxx() functions, which is probably
+ a good idea anyway.
+ 
+ Another common mistake is to use either char or unsigned char to
+ receive the result of getc() or related stdio functions.  They may
+ return EOF, which is outside the range of values representable by
+ char.  If you use char, some legal character value may be confused
+ with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
+ The correct choice is int.
+ 
+ A more subtle version of the same mistake might look like this:
+ 
+   unsigned char pushback[NPUSHBACK];
+   int pbidx;
+   #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
+   #define get(c) (pbidx ? pushback[--pbidx] : getchar())
+   ...
+   unget(EOF);
+ 
+ which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
+ WITH UMLAUT.
+ 
+ 
+ Other common pitfalls
+ ---------------------
+ 
+ o Expecting 'plain' char to be either sign or unsigned extending
+ 
+ o Shifting an item by a negative amount or by greater than or equal to
+   the number of bits in a type (expecting shifts by 32 to be sensible
+   has caused quite a number of bugs at least in the early days).
+ 
+ o Expecting ints shifted right to be sign extended.
+ 
+ o Modifying the same value twice within one sequence point.
+ 
+ o Host vs. target floating point representation, including emitting NaNs
+   and Infinities in a form that the assembler handles.
+ 
+ o qsort being an unstable sort function (unstable in the sense that
+   multiple items that sort the same may be sorted in different orders
+   by different qsort functions).
+ 
+ o Passing incorrect types to fprintf and friends.
+ 
+ o Adding a function declaration for a module declared in another file to
+   a .c file instead of to a .h file.
+ 
+ 
+ 			K+R Portability Issues
+ 			======================
+ 
  Unary +
  -------
  
  K+R C compilers and preprocessors have no notion of unary '+'.  Thus
! the following code snippet contained 2 portability problems.
  
  int x = +2;  /* int x = 2;  */
  #if +1       /* #if 1  */
*************** a K+R one.  Thus
*** 42,103 ****
  
    free ((void *) h->value.expansion);
  
! should be written
  
    free ((PTR) h->value.expansion);
  
  Further, an initial investigation indicates that pointers to functions
! returning void are okay.  Thus the example given by "Calling functions
! through pointers to functions" below appears not to cause a problem.
  
  
  String literals
  ---------------
  
- Some SGI compilers choke on the parentheses in:-
- 
- const char string[] = ("A string");
- 
- This is unfortunate since this is what the GNU gettext macro N_
- produces.  You need to find a different way to code it.
- 
  K+R C did not allow concatenation of string literals like
  
    "This is a " "single string literal".
  
- Moreover, some compilers like MSVC++ have fairly low limits on the
- maximum length of a string literal; 509 is the lowest we've come
- across.  You may need to break up a long printf statement into many
- smaller ones.
- 
- 
- Empty macro arguments
- ---------------------
- 
- ISO C (6.8.3 in the 1990 standard) specifies the following:
- 
- If (before argument substitution) any argument consists of no
- preprocessing tokens, the behavior is undefined.
- 
- This was relaxed by ISO C99, but some older compilers emit an error,
- so code like
- 
- #define foo(x, y) x y
- foo (bar, )
- 
- needs to be coded in some other way.
- 
  
  signed keyword
  --------------
  
  The signed keyword did not exist in K+R compilers; it was introduced
! in ISO C89, so you cannot use it.  In both K+R and standard C,
  unqualified char and bitfields may be signed or unsigned.  There is no
  way to portably declare signed chars or signed bitfields.
  
  All other arithmetic types are signed unless you use the 'unsigned'
! qualifier.  For instance, it is safe to write
  
    short paramc;
  
--- 223,256 ----
  
    free ((void *) h->value.expansion);
  
! should have been written
  
    free ((PTR) h->value.expansion);
  
  Further, an initial investigation indicates that pointers to functions
! returning void were okay.  Thus the example given by "Calling
! functions through pointers to functions" below appeared not to cause a
! problem.
  
  
  String literals
  ---------------
  
  K+R C did not allow concatenation of string literals like
  
    "This is a " "single string literal".
  
  
  signed keyword
  --------------
  
  The signed keyword did not exist in K+R compilers; it was introduced
! in ISO C89, so you could not use it.  In both K+R and standard C,
  unqualified char and bitfields may be signed or unsigned.  There is no
  way to portably declare signed chars or signed bitfields.
  
  All other arithmetic types are signed unless you use the 'unsigned'
! qualifier.  For instance, it was safe to write
  
    short paramc;
  
*************** instead of
*** 106,112 ****
    signed short paramc;
  
  If you have an algorithm that depends on signed char or signed
! bitfields, you must find another way to write it before it can be
  integrated into GCC.
  
  
--- 259,265 ----
    signed short paramc;
  
  If you have an algorithm that depends on signed char or signed
! bitfields, you had to find another way to write it before it could be
  integrated into GCC.
  
  
*************** Function prototypes
*** 114,123 ****
  -------------------
  
  You need to provide a function prototype for every function before you
! use it, and functions must be defined K+R style.  The function
! prototype should use the PARAMS macro, which takes a single argument.
! Therefore the parameter list must be enclosed in parentheses.  For
! example,
  
  int myfunc PARAMS ((double, int *));
  
--- 267,276 ----
  -------------------
  
  You need to provide a function prototype for every function before you
! use it, and functions had to be defined K+R style.  The function
! prototype should have used the PARAMS macro, which takes a single
! argument.  Therefore the parameter list had to be enclosed in
! parentheses.  For example,
  
  int myfunc PARAMS ((double, int *));
  
*************** myfunc (var1, var2)
*** 129,135 ****
    ...
  }
  
! This implies that if the function takes no arguments, it should be
  declared and defined as follows:
  
  int myfunc PARAMS ((void));
--- 282,288 ----
    ...
  }
  
! This implies that if the function takes no arguments, it had to be
  declared and defined as follows:
  
  int myfunc PARAMS ((void));
*************** myfunc ()
*** 140,146 ****
    ...
  }
  
! You also need to use PARAMS when referring to function protypes in
  other circumstances, for example see "Calling functions through
  pointers to functions" below.
  
--- 293,299 ----
    ...
  }
  
! You also had to use PARAMS when referring to function protypes in
  other circumstances, for example see "Calling functions through
  pointers to functions" below.
  
*************** cpp_ice VPARAMS ((cpp_reader *pfile, con
*** 161,167 ****
  
  See ansidecl.h for the definitions of the above macros and more.
  
! One aspect of using K+R style function declarations, is you cannot
  have arguments whose types are char, short, or float, since without
  prototypes (ie, K+R rules), these types are promoted to int, int, and
  double respectively.
--- 314,320 ----
  
  See ansidecl.h for the definitions of the above macros and more.
  
! One aspect of using K+R style function declarations, is you could not
  have arguments whose types are char, short, or float, since without
  prototypes (ie, K+R rules), these types are promoted to int, int, and
  double respectively.
*************** example
*** 176,182 ****
  typedef void (* cl_directive_handler) PARAMS ((cpp_reader *, const char *));
        *p->handler (pfile, p->arg);
  
! needs to become
  
        (*p->handler) (pfile, p->arg);
  
--- 329,335 ----
  typedef void (* cl_directive_handler) PARAMS ((cpp_reader *, const char *));
        *p->handler (pfile, p->arg);
  
! had to become
  
        (*p->handler) (pfile, p->arg);
  
*************** compilers x should not have spaces aroun
*** 202,217 ****
  Passing structures by value
  ---------------------------
  
! Avoid passing structures by value, either to or from functions.  It
! seems some K+R compilers handle this differently or not at all.
  
  
  Enums
  -----
  
! In K+R C, you have to cast enum types to use them as integers, and
! some compilers in particular give lots of warnings for using an enum
! as an array index.
  
  
  Bitfields
--- 355,371 ----
  Passing structures by value
  ---------------------------
  
! You had to avoid passing structures by value, either to or from
! functions.  It seems some K+R compilers handle this differently or not
! at all.
  
  
  Enums
  -----
  
! In K+R C, you had to cast enum types to use them as integers, and some
! compilers in particular give lots of warnings for using an enum as an
! array index.
  
  
  Bitfields
*************** were defined (i.e. unsigned char, unsign
*** 222,241 ****
  Using plain int/short/long was not allowed).
  
  
- free and realloc
- ----------------
- 
- Some implementations crash upon attempts to free or realloc the null
- pointer.  Thus if mem might be null, you need to write
- 
-   if (mem)
-     free (mem);
- 
- 
  Reserved Keywords
  -----------------
  
! K+R C has "entry" as a reserved keyword, so you should not use it for
  your variable names.
  
  
--- 376,385 ----
  Using plain int/short/long was not allowed).
  
  
  Reserved Keywords
  -----------------
  
! K+R C has "entry" as a reserved keyword, so you had to not use it for
  your variable names.
  
  
*************** int is done as an unsigned comparison in
*** 248,391 ****
  promotes to unsigned) while it is signed in ISO (since all of the
  values in unsigned char fit in an int, it promotes to int).
  
- Trigraphs
- ---------
- 
- You weren't going to use them anyway, but trigraphs were not defined
- in K+R C, and some otherwise ISO C compliant compilers do not accept
- them.
- 
  
  Suffixes on Integer Constants
  -----------------------------
  
! K+R C did not accept a 'u' suffix on integer constants.  If you want
! to declare a constant to be be unsigned, you must use an explicit
  cast.
- 
- You should never use a 'l' suffix on integer constants ('L' is fine),
- since it can easily be confused with the number '1'.
- 
- 
- 			Common Coding Pitfalls
- 			======================
- 
- errno
- -----
- 
- errno might be declared as a macro.
- 
- 
- Implicit int
- ------------
- 
- In C, the 'int' keyword can often be omitted from type declarations.
- For instance, you can write
- 
-   unsigned variable;
- 
- as shorthand for
- 
-   unsigned int variable;
- 
- There are several places where this can cause trouble.  First, suppose
- 'variable' is a long; then you might think
- 
-   (unsigned) variable
- 
- would convert it to unsigned long.  It does not.  It converts to
- unsigned int.  This mostly causes problems on 64-bit platforms, where
- long and int are not the same size.
- 
- Second, if you write a function definition with no return type at
- all:
- 
-   operate (a, b)
-        int a, b;
-   {
-     ...
-   }
- 
- that function is expected to return int, *not* void.  GCC will warn
- about this.  K+R C has no problem with 'void' as a return type, so you
- need not worry about that.
- 
- Implicit function declarations always have return type int.  So if you
- correct the above definition to
- 
-   void
-   operate (a, b)
-        int a, b;
-   ...
- 
- but operate() is called above its definition, you will get an error
- about a "type mismatch with previous implicit declaration".  The cure
- is to prototype all functions at the top of the file, or in an
- appropriate header.
- 
- Char vs unsigned char vs int
- ----------------------------
- 
- In C, unqualified 'char' may be either signed or unsigned; it is the
- implementation's choice.  When you are processing 7-bit ASCII, it does
- not matter.  But when your program must handle arbitrary binary data,
- or fully 8-bit character sets, you have a problem.  The most obvious
- issue is if you have a look-up table indexed by characters.
- 
- For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
- WITH ACUTE ACCENT.  In the proper locale, isalpha('\341') will be
- true.  But if you read '\341' from a file and store it in a plain
- char, isalpha(c) may look up character 225, or it may look up
- character -31.  And the ctype table has no entry at offset -31, so
- your program will crash.  (If you're lucky.)
- 
- It is wise to use unsigned char everywhere you possibly can.  This
- avoids all these problems.  Unfortunately, the routines in <string.h>
- take plain char arguments, so you have to remember to cast them back
- and forth - or avoid the use of strxxx() functions, which is probably
- a good idea anyway.
- 
- Another common mistake is to use either char or unsigned char to
- receive the result of getc() or related stdio functions.  They may
- return EOF, which is outside the range of values representable by
- char.  If you use char, some legal character value may be confused
- with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
- The correct choice is int.
- 
- A more subtle version of the same mistake might look like this:
- 
-   unsigned char pushback[NPUSHBACK];
-   int pbidx;
-   #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
-   #define get(c) (pbidx ? pushback[--pbidx] : getchar())
-   ...
-   unget(EOF);
- 
- which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
- WITH UMLAUT.
- 
- 
- Other common pitfalls
- ---------------------
- 
- o Expecting 'plain' char to be either sign or unsigned extending
- 
- o Shifting an item by a negative amount or by greater than or equal to
-   the number of bits in a type (expecting shifts by 32 to be sensible
-   has caused quite a number of bugs at least in the early days).
- 
- o Expecting ints shifted right to be sign extended.
- 
- o Modifying the same value twice within one sequence point.
- 
- o Host vs. target floating point representation, including emitting NaNs
-   and Infinities in a form that the assembler handles.
- 
- o qsort being an unstable sort function (unstable in the sense that
-   multiple items that sort the same may be sorted in different orders
-   by different qsort functions).
- 
- o Passing incorrect types to fprintf and friends.
- 
- o Adding a function declaration for a module declared in another file to
-   a .c file instead of to a .h file.
--- 392,401 ----
  promotes to unsigned) while it is signed in ISO (since all of the
  values in unsigned char fit in an int, it promotes to int).
  
  
  Suffixes on Integer Constants
  -----------------------------
  
! K+R C did not accept a 'u' suffix on integer constants.  If you wanted
! to declare a constant to be be unsigned, you had to use an explicit
  cast.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]