This is the mail archive of the
mailing list for the GCC project.
libcpp how-to question: Tokenizing and spaces & tabs â or special Fortran needs
- From: Tobias Burnus <burnus at net-b dot de>
- To: Dodji Seketeli <dodji at redhat dot com>, Tom Tromey <tromey at redhat dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Manuel LÃpez-IbÃÃez <lopezibanez at gmail dot com>, gfortran <fortran at gcc dot gnu dot org>, gcc <gcc at gcc dot gnu dot org>
- Date: Sat, 29 Nov 2014 16:10:35 +0100
- Subject: libcpp how-to question: Tokenizing and spaces & tabs â or special Fortran needs
- Authentication-results: sourceware.org; auth=none
Currently, gfortran reads source files directly. If preprocessing is
enabled, it calls libcpp directly but writes the preprocessed output
into a temporary file, which is then read. In order to bring processing
closer to the common code, show macro expansion in error messages and
similar, I'd like to use libcpp for the reading of the files â
preprocessed and to-be preprocessed.
The problem is that whitespace seems to get lost in libcpp. In Fortran,
spaces play a role:
* In free format only to warn if tabs appear (invalid per ISO standard,
-Wtabs) and to print an error if the line is too long (max 132
characters according to the standard; with -ffree-line-length-none =
* For fixed format, the whitespace is crucial. The columns 1 to 6 have a
special meaning, but also the total length is limited to 72 characters;
excess characters are ignored (comment). That dates back to the time of
punch cards and the eight excess characters were e.g. used to enumerate
the punch cards. There are still Fortran programs out there which assume
that everything beyond 72 characters is ignored. Others assume that 80
(= full punch card) or 132 characters are permitted (free-form limit).
(gfortran permits any value >=72, including unlimited.)
Now back to libcpp: As first step, I tried to use
token = cpp_get_token (cpp_in);
cpp_token_as_text (cpp_in, token);
for converting the input. I can recover linebreaks and whether there was
a preceeding line space with the flags BOL and PREV_WHITE; for line
breaks also by defining a call back. At the beginning of the line, I can
still recover the number of spaces from the source location
(SOURCE_COLUMN) but not whether it was done with " " or via a tab. For
mid line, I could use: souce column of current token minus previous
token minus the length of the previous token when spellt as text.
However, that's not really elegant.
[A bit related, adding a special Fortran mode makes sense; currently,
preprocessing can only use the traditional mode as things like
print *, 'That''s a string which &
! Here's a comment line inbetween
&is continued in the next line'
is not properly handled. Complaining about unterminated strings either
because of the & continuation line or the ' in the comment. However, as
some other compilers support features such as "##" concatenation,
there's the wish by users to go beyond traditional.]
As the Fortran standard doesn't define how the preprocessing works,* we
do have quite some leeway. However, for -fpreprocessed, the white spaces
have really to be passed as is. (-fpreprocess which is the default in
gfortran, unless the special file extension (.F, .F90, .fpp) or "-cpp"
Do you have a suggestion how to best implement this white-space
preserving with libcpp? It can (and presumably should) be a special
flag/function for Fortran.
* To be precise: Part 3 of the Fortran standarization series (ISO/IEC
1539-3:1998) defines conditional compilation ("coco") but that never
caught on [an external tool "coco" exists to use it]. I think coco is
supposed to get retired. On the other hand, all Fortran compilers
support to optionally but automatically run the code through the C
pre-processor; some use simply "cpp", others netlib.org's fpp and some
support newer features.