I was recently investigating the feasability of adding support for ".xor."
as a synonym of ".neqv." as a GNU extension to the gfortran front-end [to
match the equivalent functionality provided by several commercial fortran
compilers], when I stumbled across match.c:gfc_match_strings.
In the world computer science and compiler design, lexical analysis and
keyword recognition has evolved to a fine-art. Modern techniques using
finite state machines or perfect hash functions now achieve remarkable
efficiency, as per-character processing is often a performance bottle-next
in non-optimizing compilation. The GNU project even has a flex and gperf
tools to simplify the process. gfortran's current gfc_match_strings
implementation possesses none of these virtuous qualities. :-)
Perhaps my bioinformatics/chemoinformatics pattern matching background has
made me sensitive to such inefficiencies [2], but the proposed patch below
and the following follow-up, provide far more efficient implementations,
based upon the theory/analysis described in [1].
This first installment proposes as alternative implementation for
primary.c's match_logical_constant to avoid use of gfc_match_strings and
thereby avoids run times that have order dependent upon the number of
pattern strings being matched. This "simple example" should help
introduce the reviewers to the more complex example presented in the
following patch.
The following patch has been tested on x86_64-unknown-linux-gnu, with a
full "make bootstrap", including gfortran, and tested with a top-level
"make -k check" with no new regressions.
Thoughts?