Each point below is a mini-project on itself. Please ask in gcc@gcc.gnu.org for more information or you want to help to implement some of these projects.
A) Printing the input expression instead of re-constructing it. This will fix the well-known limitation of the pretty-printer (see PR3544[123], PR35742, PR49152 and etc.). This requires:
For each preprocessed token, we would need to keep two locations: one for the preprocessed location and another for the original location. [FIXED in GCC 4.7 with -ftrack-macro-expansion]
- For non-preprocessed expr we need at least two locations per expr (beg/end). This requires:
- Changes on the build_* functions to handle multiple locations.
Track the end of tokens:
X + some_long\ _ident??/ ifierWe need to track the locations of X and r somehow.
Changes in the parser to pass down the correct locations to the build_* functions.
A location(s) -> source strings interface and machinery. Ideally, this should be more or less independent of CPP, so CPP (through the diagnostics machinery) calls into this when needed and not the other way around. This can be implemented in several ways:
Keeping the CPP buffers in memory and having in line-maps pointers directly into the buffers contents. This is easy and fast but potentially memory consuming. Care to handle charsets, tabs, etc must be taken into account. Factoring out anything useful from libcpp would help to implement this.
Re-open the file and fseek. This is not trivial since we need to do it fast but still do all character conversions that we did when libcpp opened it the first time. This is approximately what Clang (LLVM) does and it seems they can do it very fast by keeping a cache of buffers ever reopened. I think that thanks to our line-maps implementation, we can do the seeking quite more efficiently in terms of computation time. However, opening files is quite embedded into CPP, so that would need to be factored out so we can avoid any unnecessary CPP stuff when reopening but still do it *properly* and *efficiently*. [A basic implementation is available in GCC 4.8 -fdiagnostics-show-caret]
- Changes in the diagnostics machinery to extract locations from expr and print a string from a source file instead of re-constructing things.
- Handle locations during folding or avoid aggressive folding in the front-ends.
- Handle locations during optimisation or update middle-end diagnostics to not rely in perfect location information. This probably means not using %qE, not column info, and similar limitations. Some trade-off must be investigated.
B) Printing accurate column information. This requires:
#) Preprocessed/original locations in a single location_t. Similar as (A.0) above. [FIXED in GCC 4.7 with -ftrack-macro-expansion]
- #) Changes in the parser to pass down the correct locations to diagnostics machinery. Similar to (A.2) above.
- B.1) Changes in the testsuite to enable testing column numbers. [FIXED?]
C) Consistent diagnostics. This requires:
C.1) Make CPP use the diagnostics machinery. This will fix part of PR7263 and other similar bugs where there is a mismatch between the diagnostics machinery and CPP's own diagnostics machinery. [FIXED in GCC 4.5, PR7263 fixed in GCC 4.8]
#) Preprocessed/original locations in a single location_t. This will avoid different behaviour when a token comes from a macro expansion. Similar as (A.0) above. [FIXED in GCC 4.7 with -ftrack-macro-expansion]
D) Printing Ranges. This requires:
- #) Printing accurate column information. Similar to (B) above.
#) A location(s) -> source strings interface and machinery. Similar to (A.3) above.
- #) Track beg/end locations. Similar to (A.1.b) above.
- #) Changes in the parser to pass down ranges. Similar to (A.2) above.
- D.1) Changes in the testsuite to enable testing ranges.
- D.2) Changes in the diagnostics machinery to handle ranges.
E) Caret diagnostics. This requires: [FIXED in GCC 4.8 with -fdiagnostics-show-caret]
- #) Printing accurate column information. Similar to (B) above.
#) A location(s) -> source strings interface and machinery. Similar to (A.3) above.
- E.1) Changes in the diagnostics machinery to print the source line and a caret.
F) Precision in Wording
void foo(int i)
{
*i = 0;
}
$ gcc-4.8 -fsyntax-only t.c
precisionword2.c: In function ‘foo’:
precisionword2.c:3:3: error: invalid type argument of unary ‘*’ (have ‘int’)
*i = 0;
^
$ clang -fsyntax-only t.c
precisionword2.c:3:3: error: indirection requires pointer operand ('int' invalid)
*i = 0;
^~ $ gcc-4.8 t.c
t.c: In function ‘foo’:
t.c:5:1: error: expected ‘;’ before ‘}’ token
}
^
$ clang t.c
precisionword3.c:4:8: error: expected ';' after expression
bar()
^
;int main()
{
void *p = 0;
p += 1;
p++;
}
$ gcc-4.8 -Wpedantic
precisionword4.c: In function ‘main’:
precisionword4.c:4:5: warning: pointer of type ‘void *’ used in arithmetic [-Wpedantic]
p += 1;
^
precisionword4.c:5:4: warning: wrong type argument to increment [-Wpedantic]
p++;
^
$ clang -Wpedantic
precisionword4.c:4:5: warning: arithmetic on a pointer to void is a GNU extension [-Wpointer-arith]
p += 1;
~ ^
precisionword4.c:5:4: warning: arithmetic on a pointer to void is a GNU extension [-Wpointer-arith]
p++;
~^http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25801
G) Typedef Preservation and Selective Unwrapping: Clang (LLVM) expressive diagnostics.
$ gcc-4.2 -fsyntax-only t.c
t.c:15: error: invalid operands to binary / (have 'float __vector__' and 'const int *')
$ clang -fsyntax-only t.c
t.c:15:11: error: can't convert between vector values of different size ('__m128' and 'int const *')
$ g++-4.2 -fsyntax-only t.cpp
t.cpp:12: error: no match for 'operator=' in 'str = vec'
$ clang -fsyntax-only t.cpp
t.cpp:12:7: error: incompatible type assigning 'vector<Real>', expected 'std::string' (aka 'class std::basic_string<char>')H) Fix-it Hints
$ clang t.cpp
t.cpp:9:3: error: template specialization requires 'template<>'
struct iterator_traits<file_iterator> {
^
template<> I) Automatic Macro Expansion [FIXED in GCC 4.8 with -fdiagnostics-show-caret -ftrack-macro-expansion]
OPEN BUGS: http://gcc.gnu.org/PR52998
J) Spell-checker
K) Precise locations within strings (PR52952)
L) Template type diffing: http://clang.llvm.org/diagnostics.html
M) Color:
Clang (LLVM) has color diagnostics, users want color diagnostics in GCC, however, it is not clear what the consensus for color diagnostics is. Latest discussion: http://gcc.gnu.org/ml/gcc/2012-04/msg00570.html
Notes:
Diagnostics messages should follow GNU standards for error messages, so if we want to change the output, we have to modify the standard first (there has been some discussion in the gcc mailing list already).
C++ diagnostics survey: http://people.redhat.com/bkoz/diagnostics/diagnostics.html (VERY OUTDATED! It will be extremely useful if someone cloned/updated it)
More examples of Clang (LLVM) superior diagnostics over GCC.
More bad diagnostics from GCC: Some of them were reported but closed as INVALID. Only a complete patch showing a unequivocally superior alternative may convince GCC maintainers. Or report it to Clang, fix it there and try to convince GCC maintainers of the superior alternative by comparison.
Linus criticism. Points to take home: fix -Wshadow (FIXED in GCC 4.8) , add -Wstrict-prototypes (and other useful warnings) to -Wall and fix unsigned warnings.
Diagnostics for inline assembler. Clang does it already thanks to their integrated assembler. People seem to like it http://www.reddit.com/r/programming/comments/bnhxb/clang_now_with_inline_assembly_diagnostics/. When using a recent GNU assembler, the diagnostics are slightly better. However, it is unlikely to get to the level of Clang, without a closer integration. There is some consensus that the assembler should NOT be integrated into GCC. Yet, one could imagine passing a bit more information to GNU as when dealing with inline-asm. This could be done by some annotations in the input to gas, or using a library interface to gas. If you are interested on this: Please help!
It is unclear what the goal of GCC diagnostics is. Currently, this is not possible with GCC: http://blogs.gnome.org/jessevdk/2011/09/10/gedit-clang-plugin-progress/
- Every FE implements their own diagnostics. Unfortunately, the most popular (C/C++) is probably the less advanced in terms of general features (caret, spell-corrector) and it is not easy to reuse to support other languages, and, hence, there is little incentive for others (Fortran / Ada) to switch to it.
More amazing Clang diagnostics: http://ecn.channel9.msdn.com/events/GoingNative12/GN12Clang.pdf source
Open diagnostics bugs Help us to fix them!