This is the mail archive of the
mailing list for the GCC project.
[PATCH 00/17] RFC: New source-location representation; Language Server Protocol
- From: David Malcolm <dmalcolm at redhat dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: David Malcolm <dmalcolm at redhat dot com>
- Date: Mon, 24 Jul 2017 16:04:57 -0400
- Subject: [PATCH 00/17] RFC: New source-location representation; Language Server Protocol
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=dmalcolm at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com AB3A913A42
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com AB3A913A42
We currently capture some source location information in the
frontends, but there are many kinds of source entity for which we *don't*
retain the location information after the initial parse.
For example, in the C/C++ frontends:
* we don't capture the locations of the individual parameters
within e.g. an extern function declaration, so we can't underline
the pertinent param when there's a mismatching type in a call to
that function decl.
* we don't capture the locations of attributes of a function,
so we can't underline these if they're wrong (e.g. a "noreturn" on a
function that does in fact return).
* we don't retain the locations of things like close parens and
semicolons for after parsing, so we can't offer fix-it hints for
adding new attributes, or, say the C++11 "override" feature.
* we can't at present implement many kinds of useful "cousins" of a
compiler on top of the GCC codebase (e.g. code refactoring tools,
code reformatting tools, IDE support daemons, etc), since much of the
useful location information is discarded at parse time.
This patch kit implements:
(a) a new, optional, representation of this location information,
enabled by a command-line flag
(b) improvements to various diagnostics to use this location information
if it's present, falling back to the status-quo (less accurate)
source locations otherwise
(b) a gcc-based implementation of Microsoft's Language Server Protocol,
allowing IDEs to connect to a gcc-based LSP server, making RPC
calls to query it for things like "where is this struct declared?".
This last part is very much just a proof-of-concept.
(a) The new location information
Our existing "tree" type represents a node within an abstract syntax tree,
but this is sometimes too abstract - sometimes we want the locations
of the clauses and tokens that were abstracted away by the frontends.
In theory we could generate the full parse tree ("concrete syntax tree"),
showing every production followed to parse the input, but it is likely
to be unwieldy: large and difficult to navigate.
(aside: I found myself re-reading the Dragon book to refresh my mind
on exactly what an AST vs a CST is; I also found this blog post to be
So the patch kit implements a middle-ground: an additional tree of parse
information, much more concrete than our "tree" type, but not quite the
full parse tree.
My working title for this system is "BLT" (and hence "class blt_node").
I could claim that this is a acronym for "bonus location tree" (but
it's actually a reference to a sandwich) - it needs a name, and that
name needs to not clash with anything else in the source tree.
"Parse Tree" would be "PT" which clashes with "points-to", and
"Concrete Syntax Tree" would be "CST" which clashes with our abbreviation
for "constant". ("BLT" popped into my mind somewhere between "AST"
and "CST"; ideas for better names welcome).
blt_nodes form a tree-like structure; a blt_node has a "kind",
identifying the terminal or nonterminal it corresponds to
(e.g. BLT_TRANSLATION_UNIT or BLT_DECLARATION_SPECIFIERS).
This is just an enum, but allows for language-specific traversals,
without introducing significant language-specific features in
the shared "gcc" dir (it's just an enum of IDs).
There is a partial mapping between "tree" and blt_node: a blt_node
can reference a tree, and a tree can reference a blt_node, though
typically the mapping is very sparse; most don't. This allows us
to go from e.g. a function_decl in the "tree" world and navigate to
pertinent parts of the syntax that was used to declare it.
All of this is enabled by a new "-fblt" command-line option; in the
absense of -fblt, almost all of it becomes close to no-ops, and the
relevant diagnostics fall back to using less accurate location
So it's a kind of optional, "on-the-side" record of how we parsed
the source, with a sparse relationship to our tree type.
The patch kit implements it for the C and C++ frontends.
An example of a BLT dump for a C file can be seen here:
It shows the tree structure using indentation (and colorization);
source locations are printed, and, for each node where the
location is different from the parent, the pertinent source range
is printed and underlined inline.
(BTW, does the colorization of dumps look useful for other
dump formats? similarly for the ASCII art for printing hierarchies)
(b) Examples of usage
Patches 6-10 in the kit update various diagnostics to use
the improved location information where available:
* C and C++: highlighting the pertinent parameter of a function
decl when there's a mismatched type in a call
* C and C++: highlighting the return type in the function defn
when compaining about mismatch in the body (e.g. highlighting
the "void" when we see a "return EXPR;" within a void function).
* C++: add a fix-it hint to -Wsuggest-override
I have plenty of ideas for other uses of this infrastructure
(but which aren't implemented yet), e.g.:
* C++: highlight the "const" token (or suggest a fix-it hint)
when you have a missing "const" on the *definition* of a member
function that was declared as "const" (I make this mistake
all the time).
* C++: add a fix-it hint to -Wsuggest-final-methods
* highlight bogus attributes
* add fix-it hints suggesting missing attributes
...etc, plus those "cousins of a compiler" ideas mentioned above.
Any other ideas?
(c) Language Server Protocol
The later parts of the patch kit implement a proof-of-concept
LSP server, making use of the extended location information,
exposing it to IDEs.
LSP is an RPC protocol layered on top of JSON-RPC (and hence JSON
so the patch kit implements a set of classes to support
this (including a barebones HTTP server running inside cc1), and
a toy IDE written in PyGTK to test it.
* There are plenty of FIXMEs and TODOs in the patch kit.
* I've entirely ignored tentative parsing in the C++ frontend for now.
* I haven't attempted to optimize it at all yet (so no performance
* How much of the syntax tree ought to be captured? I've focussed on the
stuff outside of function bodies, since we don't currently capture that
well, but to do "proper" IDE support we'd want to capture things more
deeply. (I experimented with using it to fix some of our missing
location information for things like uses of constants and variables
as arguments at callsites, but it quickly turned into a much more
* The LSP implementation is a just a proof-of-concept, to further
motivate capturing the extra data. Turning it into a "proper" LSP
server implementation would be a *lot* more work, and I'm unlikely to
actually do that (but maybe someone on the list wants to take this on?)
I've successfully bootstrapped®rtested the combination of the patches
on x86_64-pc-linux-gnu; takes -fself-test from 39458 passes to 41574;
adds 30 PASS results to gcc.sum; adds 182 PASS results to g++.sum.
David Malcolm (17):
Add param-type-mismatch.c/C testcases as a baseline
diagnostics: support prefixes within diagnostic_show_locus
Core of BLT implementation
C frontend: capture BLT information
C++ frontend: capture BLT information
C: use BLT to highlight parameter of callee decl for mismatching types
C++: use BLT to highlight parameter of callee decl for mismatching
C: highlight return types when complaining about mismatches
C++: highlight return types when complaining about mismatches
C++: provide fix-it hints in -Wsuggest-override
Add JSON implementation
Add server.h and server.c
Add http-server.h and http-server.c
Add implementation of JSON-RPC
Language Server Protocol: add lsp::server abstract base class
Language Server Protocol: proof-of-concept GCC implementation
Language Server Protocol: work-in-progess on testsuite
gcc/Makefile.in | 7 +
gcc/blt.c | 768 ++++++++
gcc/blt.def | 87 +
gcc/blt.h | 147 ++
gcc/c-family/c-opts.c | 2 +-
gcc/c-family/c.opt | 8 +
gcc/c/c-decl.c | 13 +-
gcc/c/c-parser.c | 241 ++-
gcc/c/c-tree.h | 6 +-
gcc/c/c-typeck.c | 120 +-
gcc/common.opt | 4 +
gcc/cp/call.c | 79 +-
gcc/cp/class.c | 23 +-
gcc/cp/cp-tree.h | 7 +
gcc/cp/decl.c | 32 +-
gcc/cp/parser.c | 369 +++-
gcc/cp/parser.h | 7 +
gcc/cp/pt.c | 8 +
gcc/cp/typeck.c | 70 +-
gcc/diagnostic-show-locus.c | 94 +-
gcc/diagnostic.c | 5 +-
gcc/http-server.c | 358 ++++
gcc/http-server.h | 101 ++
gcc/json-rpc.c | 486 +++++
gcc/json-rpc.h | 94 +
gcc/json.c | 1914 ++++++++++++++++++++
gcc/json.h | 214 +++
gcc/lsp-main.c | 168 ++
gcc/lsp-main.h | 25 +
gcc/lsp.c | 291 +++
gcc/lsp.h | 210 +++
gcc/selftest-run-tests.c | 5 +
gcc/selftest.h | 5 +
gcc/server.c | 152 ++
gcc/server.h | 46 +
gcc/testsuite/g++.dg/bad-return-type.C | 135 ++
.../g++.dg/diagnostic/param-type-mismatch.C | 159 ++
gcc/testsuite/g++.dg/warn/Wsuggest-override.C | 12 +-
gcc/testsuite/gcc.dg/bad-return-type.c | 67 +
gcc/testsuite/gcc.dg/lsp/lsp.py | 125 ++
gcc/testsuite/gcc.dg/lsp/test.c | 12 +
gcc/testsuite/gcc.dg/lsp/test.py | 28 +
gcc/testsuite/gcc.dg/lsp/toy-ide.py | 111 ++
gcc/testsuite/gcc.dg/param-type-mismatch.c | 60 +
.../plugin/diagnostic_plugin_test_show_locus.c | 1 +
gcc/toplev.c | 4 +
46 files changed, 6772 insertions(+), 108 deletions(-)
create mode 100644 gcc/blt.c
create mode 100644 gcc/blt.def
create mode 100644 gcc/blt.h
create mode 100644 gcc/http-server.c
create mode 100644 gcc/http-server.h
create mode 100644 gcc/json-rpc.c
create mode 100644 gcc/json-rpc.h
create mode 100644 gcc/json.c
create mode 100644 gcc/json.h
create mode 100644 gcc/lsp-main.c
create mode 100644 gcc/lsp-main.h
create mode 100644 gcc/lsp.c
create mode 100644 gcc/lsp.h
create mode 100644 gcc/server.c
create mode 100644 gcc/server.h
create mode 100644 gcc/testsuite/g++.dg/bad-return-type.C
create mode 100644 gcc/testsuite/g++.dg/diagnostic/param-type-mismatch.C
create mode 100644 gcc/testsuite/gcc.dg/bad-return-type.c
create mode 100644 gcc/testsuite/gcc.dg/lsp/lsp.py
create mode 100644 gcc/testsuite/gcc.dg/lsp/test.c
create mode 100644 gcc/testsuite/gcc.dg/lsp/test.py
create mode 100644 gcc/testsuite/gcc.dg/lsp/toy-ide.py
create mode 100644 gcc/testsuite/gcc.dg/param-type-mismatch.c