This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

libcpp how-to question: Tokenizing and spaces & tabs â or special Fortran needs


Currently, gfortran reads source files directly. If preprocessing is enabled, it calls libcpp directly but writes the preprocessed output into a temporary file, which is then read. In order to bring processing closer to the common code, show macro expansion in error messages and similar, I'd like to use libcpp for the reading of the files â preprocessed and to-be preprocessed.

The problem is that whitespace seems to get lost in libcpp. In Fortran, spaces play a role: * In free format only to warn if tabs appear (invalid per ISO standard, -Wtabs) and to print an error if the line is too long (max 132 characters according to the standard; with -ffree-line-length-none = unlimitted.) * For fixed format, the whitespace is crucial. The columns 1 to 6 have a special meaning, but also the total length is limited to 72 characters; excess characters are ignored (comment). That dates back to the time of punch cards and the eight excess characters were e.g. used to enumerate the punch cards. There are still Fortran programs out there which assume that everything beyond 72 characters is ignored. Others assume that 80 (= full punch card) or 132 characters are permitted (free-form limit). (gfortran permits any value >=72, including unlimited.)


Now back to libcpp: As first step, I tried to use
  token = cpp_get_token (cpp_in);
  cpp_token_as_text (cpp_in, token);
for converting the input. I can recover linebreaks and whether there was a preceeding line space with the flags BOL and PREV_WHITE; for line breaks also by defining a call back. At the beginning of the line, I can still recover the number of spaces from the source location (SOURCE_COLUMN) but not whether it was done with " " or via a tab. For mid line, I could use: souce column of current token minus previous token minus the length of the previous token when spellt as text. However, that's not really elegant.

[A bit related, adding a special Fortran mode makes sense; currently, preprocessing can only use the traditional mode as things like
   print *, 'That''s a string which &
     ! Here's a comment line inbetween
     &is continued in the next line'
is not properly handled. Complaining about unterminated strings either because of the & continuation line or the ' in the comment. However, as some other compilers support features such as "##" concatenation, there's the wish by users to go beyond traditional.]

As the Fortran standard doesn't define how the preprocessing works,* we do have quite some leeway. However, for -fpreprocessed, the white spaces have really to be passed as is. (-fpreprocess which is the default in gfortran, unless the special file extension (.F, .F90, .fpp) or "-cpp" is used.

Do you have a suggestion how to best implement this white-space preserving with libcpp? It can (and presumably should) be a special flag/function for Fortran.


* To be precise: Part 3 of the Fortran standarization series (ISO/IEC 1539-3:1998) defines conditional compilation ("coco") but that never caught on [an external tool "coco" exists to use it]. I think coco is supposed to get retired. On the other hand, all Fortran compilers support to optionally but automatically run the code through the C pre-processor; some use simply "cpp", others's fpp and some support newer features.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]