This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 00/17] RFC: New source-location representation; Language Server Protocol


We currently capture some source location information in the
frontends, but there are many kinds of source entity for which we *don't*
retain the location information after the initial parse.

For example, in the C/C++ frontends:

* we don't capture the locations of the individual parameters
  within e.g. an extern function declaration, so we can't underline
  the pertinent param when there's a mismatching type in a call to
  that function decl.

* we don't capture the locations of attributes of a function,
  so we can't underline these if they're wrong (e.g. a "noreturn" on a
  function that does in fact return).

* we don't retain the locations of things like close parens and
  semicolons for after parsing, so we can't offer fix-it hints for
  adding new attributes, or, say the C++11 "override" feature.

* we can't at present implement many kinds of useful "cousins" of a
  compiler on top of the GCC codebase (e.g. code refactoring tools,
  code reformatting tools, IDE support daemons, etc), since much of the
  useful location information is discarded at parse time.

This patch kit implements:

(a) a new, optional, representation of this location information,
    enabled by a command-line flag

(b) improvements to various diagnostics to use this location information
    if it's present, falling back to the status-quo (less accurate)
    source locations otherwise

(b) a gcc-based implementation of Microsoft's Language Server Protocol,
      https://github.com/Microsoft/language-server-protocol
    allowing IDEs to connect to a gcc-based LSP server, making RPC
    calls to query it for things like "where is this struct declared?".
    This last part is very much just a proof-of-concept.


================================
(a) The new location information
================================

Our existing "tree" type represents a node within an abstract syntax tree,
but this is sometimes too abstract - sometimes we want the locations
of the clauses and tokens that were abstracted away by the frontends.

In theory we could generate the full parse tree ("concrete syntax tree"),
showing every production followed to parse the input, but it is likely
to be unwieldy: large and difficult to navigate.

(aside: I found myself re-reading the Dragon book to refresh my mind
on exactly what an AST vs a CST is; I also found this blog post to be
very useful:
  http://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees )

So the patch kit implements a middle-ground: an additional tree of parse
information, much more concrete than our "tree" type, but not quite the
full parse tree.

My working title for this system is "BLT" (and hence "class blt_node").
I could claim that this is a acronym for "bonus location tree" (but
it's actually a reference to a sandwich) - it needs a name, and that
name needs to not clash with anything else in the source tree.
"Parse Tree" would be "PT" which clashes with "points-to", and
"Concrete Syntax Tree" would be "CST" which clashes with our abbreviation
for "constant".  ("BLT" popped into my mind somewhere between "AST"
and "CST"; ideas for better names welcome).

blt_nodes form a tree-like structure; a blt_node has a "kind",
identifying the terminal or nonterminal it corresponds to
(e.g. BLT_TRANSLATION_UNIT or BLT_DECLARATION_SPECIFIERS).
This is just an enum, but allows for language-specific traversals,
without introducing significant language-specific features in
the shared "gcc" dir (it's just an enum of IDs).

There is a partial mapping between "tree" and blt_node: a blt_node
can reference a tree, and a tree can reference a blt_node, though
typically the mapping is very sparse; most don't.  This allows us
to go from e.g. a function_decl in the "tree" world and navigate to
pertinent parts of the syntax that was used to declare it.

All of this is enabled by a new "-fblt" command-line option; in the
absense of -fblt, almost all of it becomes close to no-ops, and the
relevant diagnostics fall back to using less accurate location
information.

So it's a kind of optional, "on-the-side" record of how we parsed
the source, with a sparse relationship to our tree type.

The patch kit implements it for the C and C++ frontends.

An example of a BLT dump for a C file can be seen here:
  https://dmalcolm.fedorapeople.org/gcc/2017-07-24/fdump-blt.html
It shows the tree structure using indentation (and colorization);
source locations are printed, and, for each node where the
location is different from the parent, the pertinent source range
is printed and underlined inline.
(BTW, does the colorization of dumps look useful for other
dump formats?  similarly for the ASCII art for printing hierarchies)


=====================
(b) Examples of usage
=====================

Patches 6-10 in the kit update various diagnostics to use
the improved location information where available:

* C and C++: highlighting the pertinent parameter of a function
  decl when there's a mismatched type in a call

* C and C++: highlighting the return type in the function defn
  when compaining about mismatch in the body (e.g. highlighting
  the "void" when we see a "return EXPR;" within a void function).

* C++: add a fix-it hint to -Wsuggest-override

I have plenty of ideas for other uses of this infrastructure
(but which aren't implemented yet), e.g.:

* C++: highlight the "const" token (or suggest a fix-it hint)
  when you have a missing "const" on the *definition* of a member
  function that was declared as "const" (I make this mistake
  all the time).

* C++: add a fix-it hint to -Wsuggest-final-methods

* highlight bogus attributes

* add fix-it hints suggesting missing attributes

...etc, plus those "cousins of a compiler" ideas mentioned above.

Any other ideas?


============================
(c) Language Server Protocol
============================

The later parts of the patch kit implement a proof-of-concept
LSP server, making use of the extended location information,
exposing it to IDEs.

LSP is an RPC protocol layered on top of JSON-RPC (and hence JSON
and HTTP):
  https://github.com/Microsoft/language-server-protocol
so the patch kit implements a set of classes to support
this (including a barebones HTTP server running inside cc1), and
a toy IDE written in PyGTK to test it.


=======
Caveats
=======

* There are plenty of FIXMEs and TODOs in the patch kit.

* I've entirely ignored tentative parsing in the C++ frontend for now.

* I haven't attempted to optimize it at all yet (so no performance
  measurements yet).

* How much of the syntax tree ought to be captured?  I've focussed on the
  stuff outside of function bodies, since we don't currently capture that
  well, but to do "proper" IDE support we'd want to capture things more
  deeply.  (I experimented with using it to fix some of our missing
  location information for things like uses of constants and variables
  as arguments at callsites, but it quickly turned into a much more
  invasive patch).

* The LSP implementation is a just a proof-of-concept, to further
  motivate capturing the extra data.  Turning it into a "proper" LSP
  server implementation would be a *lot* more work, and I'm unlikely to
  actually do that (but maybe someone on the list wants to take this on?)

I've successfully bootstrapped&regrtested the combination of the patches
on x86_64-pc-linux-gnu; takes -fself-test from 39458 passes to 41574;
adds 30 PASS results to gcc.sum; adds 182 PASS results to g++.sum.

Thoughts?
Dave

David Malcolm (17):
  Add param-type-mismatch.c/C testcases as a baseline
  diagnostics: support prefixes within diagnostic_show_locus
  Core of BLT implementation
  C frontend: capture BLT information
  C++ frontend: capture BLT information
  C: use BLT to highlight parameter of callee decl for mismatching types
  C++: use BLT to highlight parameter of callee decl for mismatching
    types
  C: highlight return types when complaining about mismatches
  C++: highlight return types when complaining about mismatches
  C++: provide fix-it hints in -Wsuggest-override
  Add JSON implementation
  Add server.h and server.c
  Add http-server.h and http-server.c
  Add implementation of JSON-RPC
  Language Server Protocol: add lsp::server abstract base class
  Language Server Protocol: proof-of-concept GCC implementation
  Language Server Protocol: work-in-progess on testsuite

 gcc/Makefile.in                                    |    7 +
 gcc/blt.c                                          |  768 ++++++++
 gcc/blt.def                                        |   87 +
 gcc/blt.h                                          |  147 ++
 gcc/c-family/c-opts.c                              |    2 +-
 gcc/c-family/c.opt                                 |    8 +
 gcc/c/c-decl.c                                     |   13 +-
 gcc/c/c-parser.c                                   |  241 ++-
 gcc/c/c-tree.h                                     |    6 +-
 gcc/c/c-typeck.c                                   |  120 +-
 gcc/common.opt                                     |    4 +
 gcc/cp/call.c                                      |   79 +-
 gcc/cp/class.c                                     |   23 +-
 gcc/cp/cp-tree.h                                   |    7 +
 gcc/cp/decl.c                                      |   32 +-
 gcc/cp/parser.c                                    |  369 +++-
 gcc/cp/parser.h                                    |    7 +
 gcc/cp/pt.c                                        |    8 +
 gcc/cp/typeck.c                                    |   70 +-
 gcc/diagnostic-show-locus.c                        |   94 +-
 gcc/diagnostic.c                                   |    5 +-
 gcc/http-server.c                                  |  358 ++++
 gcc/http-server.h                                  |  101 ++
 gcc/json-rpc.c                                     |  486 +++++
 gcc/json-rpc.h                                     |   94 +
 gcc/json.c                                         | 1914 ++++++++++++++++++++
 gcc/json.h                                         |  214 +++
 gcc/lsp-main.c                                     |  168 ++
 gcc/lsp-main.h                                     |   25 +
 gcc/lsp.c                                          |  291 +++
 gcc/lsp.h                                          |  210 +++
 gcc/selftest-run-tests.c                           |    5 +
 gcc/selftest.h                                     |    5 +
 gcc/server.c                                       |  152 ++
 gcc/server.h                                       |   46 +
 gcc/testsuite/g++.dg/bad-return-type.C             |  135 ++
 .../g++.dg/diagnostic/param-type-mismatch.C        |  159 ++
 gcc/testsuite/g++.dg/warn/Wsuggest-override.C      |   12 +-
 gcc/testsuite/gcc.dg/bad-return-type.c             |   67 +
 gcc/testsuite/gcc.dg/lsp/lsp.py                    |  125 ++
 gcc/testsuite/gcc.dg/lsp/test.c                    |   12 +
 gcc/testsuite/gcc.dg/lsp/test.py                   |   28 +
 gcc/testsuite/gcc.dg/lsp/toy-ide.py                |  111 ++
 gcc/testsuite/gcc.dg/param-type-mismatch.c         |   60 +
 .../plugin/diagnostic_plugin_test_show_locus.c     |    1 +
 gcc/toplev.c                                       |    4 +
 46 files changed, 6772 insertions(+), 108 deletions(-)
 create mode 100644 gcc/blt.c
 create mode 100644 gcc/blt.def
 create mode 100644 gcc/blt.h
 create mode 100644 gcc/http-server.c
 create mode 100644 gcc/http-server.h
 create mode 100644 gcc/json-rpc.c
 create mode 100644 gcc/json-rpc.h
 create mode 100644 gcc/json.c
 create mode 100644 gcc/json.h
 create mode 100644 gcc/lsp-main.c
 create mode 100644 gcc/lsp-main.h
 create mode 100644 gcc/lsp.c
 create mode 100644 gcc/lsp.h
 create mode 100644 gcc/server.c
 create mode 100644 gcc/server.h
 create mode 100644 gcc/testsuite/g++.dg/bad-return-type.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/param-type-mismatch.C
 create mode 100644 gcc/testsuite/gcc.dg/bad-return-type.c
 create mode 100644 gcc/testsuite/gcc.dg/lsp/lsp.py
 create mode 100644 gcc/testsuite/gcc.dg/lsp/test.c
 create mode 100644 gcc/testsuite/gcc.dg/lsp/test.py
 create mode 100644 gcc/testsuite/gcc.dg/lsp/toy-ide.py
 create mode 100644 gcc/testsuite/gcc.dg/param-type-mismatch.c

-- 
1.8.5.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]