This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

README.Portability


No-one raised an objection, and I guess this comes under
documentation, so I've committed the patch below.

Thanks to Michael and Zack for their comments, which I've
incorporated.  I wasn't quite sure what you meant in a couple of
instances Michael (I don't know K+R C), so I've surrounded those by
double stars for you or someone else to expand on.

Please feel free to add new cases, and update and correct mistakes in
existing ones.

Neil.

Index: README.Portability
===================================================================
RCS file: README.Portability
diff -N README.Portability
--- /dev/null	Tue May  5 13:32:27 1998
+++ README.Portability	Fri Jul 14 19:53:53 2000
@@ -0,0 +1,369 @@
+Copyright (C) 2000 Free Software Foundation, Inc.
+
+This file is intended to contain a few notes about writing C code
+within GCC so that it compiles without error on the full range of
+compilers GCC needs to be able to compile on.
+
+The problem is that many ISO-standard constructs are not accepted by
+either old or buggy compilers, and we keep getting bitten by them.
+This knowledge until know has been sparsely spread around, so I
+thought I'd collect it in one useful place.  Please add and correct
+any problems as you come across them.
+
+I'm going to start from a base of the ISO C89 standard, since that is
+probably what most people code to naturally.  Obviously using
+constructs introduced after that is not a good idea.
+
+The first section of this file deals strictly with portability issues,
+the second with common coding pitfalls.
+
+
+			Portability Issues
+			==================
+
+Unary +
+-------
+
+K+R C compilers and preprocessors have no notion of unary '+'.  Thus
+the following code snippet contains 2 portability problems.
+
+int x = +2;  /* int x = 2;  */
+#if +1       /* #if 1  */
+#endif
+
+
+Pointers to void
+----------------
+
+K+R C compilers did not have a void pointer, and used char * as the
+pointer to anything.  The macro PTR is defined as either void * or
+char * depending on whether you have a standards compliant compiler or
+a K+R one.  Thus
+
+  free ((void *) h->value.expansion);
+
+should be written
+
+  free ((PTR) h->value.expansion);
+
+
+String literals
+---------------
+
+K+R C did not allow concatenation of string literals like
+
+  "This is a " "single string literal".
+
+Moreover, some compilers like MSVC++ have fairly low limits on the
+maximum length of a string literal; 509 is the lowest we've come
+across.  You may need to break up a long printf statement into many
+smaller ones.
+
+
+Empty macro arguments
+---------------------
+
+ISO C (6.8.3 in the 1990 standard) specifies the following:
+
+If (before argument substitution) any argument consists of no
+preprocessing tokens, the behavior is undefined.
+
+This was relaxed by ISO C99, but some older compilers emit an error,
+so code like
+
+#define foo(x, y) x y
+foo (bar, )
+
+needs to be coded in some other way.
+
+
+signed keyword
+--------------
+
+The signed keyword did not exist in K+R comilers, it was introduced in
+ISO C89, so you cannot use it.  In both K+R and standard C,
+unqualified char and bitfields may be signed or unsigned.  There is no
+way to portably declare signed chars or signed bitfields.
+
+All other arithmetic types are signed unless you use the 'unsigned'
+qualifier.  For instance, it is safe to write
+
+  short paramc;
+
+instead of
+
+  signed short paramc;
+
+If you have an algorithm that depends on signed char or signed
+bitfields, you must find another way to write it before it can be
+integrated into GCC.
+
+
+Function prototypes
+-------------------
+
+You need to provide a function prototype for every function before you
+use it, and functions must be defined K+R style.  The function
+prototype should use the PARAMS macro, which takes a single argument.
+Therefore the parameter list must be enclosed in parentheses.  For
+example,
+
+int myfunc PARAMS ((double, int *));
+
+int
+myfunc (var1, var2)
+	double var1;
+	int *var2;
+{
+  ...
+}
+
+You also need to use PARAMS when referring to function protypes in
+other circumstances, for example see "Calling functions through
+pointers to functions" below.
+
+Variable-argument functions are best described by example:-
+
+void cpp_ice PARAMS ((cpp_reader *, const char *msgid, ...));
+
+void
+cpp_ice VPARAMS ((cpp_reader *pfile, const char *msgid, ...))
+{  
+#ifndef ANSI_PROTOTYPES
+  cpp_reader *pfile;
+  const char *msgid;
+#endif
+  va_list ap;
+  
+  VA_START (ap, msgid);
+  
+#ifndef ANSI_PROTOTYPES
+  pfile = va_arg (ap, cpp_reader *);
+  msgid = va_arg (ap, const char *);
+#endif
+
+  ...
+  va_end (ap);
+}
+
+For the curious, here are the definitions of the above macros.  See
+ansidecl.h for the definitions of the above macros and more.
+
+#define PARAMS(paramlist)  paramlist  /* ISO C.  */
+#define VPARAMS(args)   args
+
+#define PARAMS(paramlist)  ()         /* K+R C.  */
+#define VPARAMS(args)   (va_alist) va_dcl
+
+
+Calling functions through pointers to functions
+-----------------------------------------------
+
+K+R C compilers require brackets around the dereferenced pointer
+variable.  For example
+
+typedef void (* cl_directive_handler) PARAMS ((cpp_reader *, const char *));
+      p->handler (pfile, p->arg);
+
+needs to become
+
+      (p->handler) (pfile, p->arg);
+
+
+Macros
+------
+
+The rules under K+R C and ISO C for achieving stringification and
+token pasting are quite different.  Therefore some macros have been
+defined which will get it right depending upon the compiler.
+
+  CONCAT2(a,b) CONCAT3(a,b,c) and CONCAT4(a,b,c,d)
+
+will paste the tokens passed as arguments.  You must not leave any
+space around the commas.  Also,
+
+  STRINGX(x)
+
+will stringify an argument; to get the same result on K+R and ISO
+compilers x should not have spaces around it.
+
+
+Enums
+-----
+
+In K+R C, you have to cast enum types to use them as integers, and
+some compilers in particular give lots of warnings for using an enum
+as an array index.
+
+Bitfields
+---------
+
+See also "signed keyword" above.  In K+R C only unsigned int bitfields
+were defined (i.e. unsigned char, unsigned short, unsigned long.
+Using plain int/short/long was not allowed).
+
+
+free and realloc
+----------------
+
+Some implementations crash upon attempts to free or realloc the null
+pointer.  Thus if mem might be null, you need to write
+
+  if (mem)
+    free (mem);
+
+
+Reserved Keywords
+-----------------
+
+K+R C has "entry" as a reserved keyword, so you should not use it for
+your variable names.
+
+
+Type promotions
+---------------
+
+K+R used unsigned-preserving rules for arithmetic expresssions, while
+ISO uses value-preserving.  This means an unsigned char compared to an
+int is done as an unsigned comparison in K+R (since unsigned char
+promotes to unsigned) while it is signed in ISO (since all of the
+values in unsigned char fit in an int, it promotes to int).
+
+** Not having any argument whose type is a short type (char, short,
+float of any flavor) and subject to promotion. **
+
+Trigraphs
+---------
+
+You weren't going to use them anyway, but trigraphs were not defined
+in K+R C, and some otherwise ISO C compliant compilers do not accept
+them.
+
+
+Suffixes on Integer Constants
+-----------------------------
+
+**Using a 'u' suffix on integer constants.**
+
+
+errno
+-----
+
+errno might be declared as a macro.
+
+
+			Common Coding Pitfalls
+			======================
+Implicit int
+------------
+
+In C, the 'int' keyword can often be omitted from type declarations.
+For instance, you can write
+
+  unsigned variable;
+
+as shorthand for
+
+  unsigned int variable;
+
+There are several places where this can cause trouble.  First, suppose
+'variable' is a long; then you might think
+
+  (unsigned) variable
+
+would convert it to unsigned long.  It does not.  It converts to
+unsigned int.  This mostly causes problems on 64-bit platforms, where
+long and int are not the same size.
+
+Second, if you write a function definition with no return type at
+all:
+
+  operate(a, b)
+      int a, b;
+  {
+    ...
+  }
+
+that function is expected to return int, *not* void.  GCC will warn
+about this.  K+R C has no problem with 'void' as a return type, so you
+need not worry about that.
+
+Implicit function declarations always have return type int.  So if you
+correct the above definition to
+
+  void
+  operate(a, b)
+      int a, b;
+  ...
+
+but operate() is called above its definition, you will get an error
+about a "type mismatch with previous implicit declaration".  The cure
+is to prototype all functions at the top of the file, or in an
+appropriate header.
+
+Char vs unsigned char vs int
+----------------------------
+
+In C, unqualified 'char' may be either signed or unsigned; it is the
+implementation's choice.  When you are processing 7-bit ASCII, it does
+not matter.  But when your program must handle arbitrary binary data,
+or fully 8-bit character sets, you have a problem.  The most obvious
+issue is if you have a look-up table indexed by characters.
+
+For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
+WITH ACUTE ACCENT.  In the proper locale, isalpha('\341') will be
+true.  But if you read '\341' from a file and store it in a plain
+char, isalpha(c) may look up character 225, or it may look up
+character -31.  And the ctype table has no entry at offset -31, so
+your program will crash.  (If you're lucky.)
+
+It is wise to use unsigned char everywhere you possibly can.  This
+avoids all these problems.  Unfortunately, the routines in <string.h>
+take plain char arguments, so you have to remember to cast them back
+and forth - or avoid the use of strxxx() functions, which is probably
+a good idea anyway.
+
+Another common mistake is to use either char or unsigned char to
+receive the result of getc() or related stdio functions.  They may
+return EOF, which is outside the range of values representable by
+char.  If you use char, some legal character value may be confused
+with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
+The correct choice is int.
+
+A more subtle version of the same mistake might look like this:
+
+  unsigned char pushback[NPUSHBACK];
+  int pbidx;
+  #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
+  #define get(c) (pbidx ? pushback[--pbidx] : getchar())
+  ...
+  unget(EOF);
+
+which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
+WITH UMLAUT.
+
+
+Other common pitfalls
+---------------------
+
+o Expecting 'plain' char to be either sign or unsigned extending
+
+o Shifting an item by a negative amount or by greater than or equal to
+  the number of bits in a type (expecting shifts by 32 to be sensible
+  has caused quite a number of bugs at least in the early days).
+
+o Expecting ints shifted right to be sign extended.
+
+o Modifying the same value twice within one sequence point.
+
+o Host vs. target floating point representation, including emitting NaNs
+  and Infinities in a form that the assembler handles.
+
+o qsort being an unstable sort function (unstable in the sense that
+  multiple items that sort the same may be sorted in different orders
+  by different qsort functions).
+
+o Passing incorrect types to fprintf and friends.
+
+o Adding a function declaration for a module declared in another file to
+  a .c file instead of to a .h file.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]