Improving GCC Diagnostics
Each point below is a mini-project on itself. Please ask in gcc@gcc.gnu.org for more information if you want to help to implement some of these projects.
A) Printing the input expression instead of re-constructing it. This will fix the well-known limitation of the pretty-printer (see PR35441), PR35742, PR49152 and etc.). This requires:
For each preprocessed token, we would need to keep two locations: one for the preprocessed location and another for the original location. [FIXED in GCC 4.7 with -ftrack-macro-expansion]
- For non-preprocessed expr we need at least two locations per expr (beg/end). This requires:
- Changes on the build_* functions to handle multiple locations.
Track the end of tokens:
X + some_long\ _ident??/ ifier
We need to track the locations of X and r somehow.
Changes in the parser to pass down the correct locations to the build_* functions.
A location(s) -> source strings interface and machinery. Ideally, this should be more or less independent of CPP, so CPP (through the diagnostics machinery) calls into this when needed and not the other way around. This can be implemented in several ways:
Keeping the CPP buffers in memory and having in line-maps pointers directly into the buffers contents. This is easy and fast but potentially memory consuming. Care to handle charsets, tabs, etc must be taken into account. Factoring out anything useful from libcpp would help to implement this.
Re-open the file and fseek. This is not trivial since we need to do it fast but still do all character conversions that we did when libcpp opened it the first time. This is approximately what Clang (LLVM) does and it seems they can do it very fast by keeping a cache of buffers ever reopened. I think that thanks to our line-maps implementation, we can do the seeking quite more efficiently in terms of computation time. However, opening files is quite embedded into CPP, so that would need to be factored out so we can avoid any unnecessary CPP stuff when reopening but still do it *properly* and *efficiently*. [A basic implementation is available in GCC 4.8 -fdiagnostics-show-caret]
- Changes in the diagnostics machinery to extract locations from expr and print a string from a source file instead of re-constructing things.
Handle locations during folding or avoid aggressive folding in the front-ends. See PR32643, PR60090, https://gcc.gnu.org/ml/gcc/2013-11/msg00253.html and this quote from https://gcc.gnu.org/ml/gcc-patches/2008-10/msg01061.html
- At present, there are still various "optimizations" done in the C front end while the trees for expressions are built up, and some cases where some folding still happens; such optimizations can be incrementally moved out of the front end into fold (or other optimizers) where they belong, and such folding during parsing incrementally disabled, moving towards trees that more closely correspond to the original source. In addition, the front end does not logically need all the transformations done by fold. In principle, fold should act on GIMPLE (so avoiding any present dependence on which subexpressions were combined into larger expressions in the source program) with the only folding done before gimplification being more limited folding required for initializers. With such a change, c_fully_fold would only need to do the more limited folding. If the C gimplification handled C_MAYBE_CONST_EXPR, some calls to c_fully_fold could be eliminated altogether; only those where the information about constancy is required, in particular for static initializers, would need to remain. However, c_fully_fold could also be thought of as a logical lowering step, converting front-end-specific structures (which presently are GENERIC plus the odd extra tree code) to GENERIC, with potentially further transformations needed in future, and increases in the amount of lowering involved (making datastructures correspond more closely to the source for longer in the front end) might require this lowering step everywhere even when folding isn't needed.
- Handle locations during optimisation or update middle-end diagnostics to not rely in perfect location information. This probably means not using %qE, not column info, and similar limitations. Some trade-off must be investigated.
Add locations for operands of expressions. Constants and uses of variables do not have a location on their own. (PR43486)
B) Printing accurate column information. This requires:
#) Preprocessed/original locations in a single location_t. Similar as (A.0) above. [FIXED in GCC 4.7 with -ftrack-macro-expansion]
- #) Changes in the parser to pass down the correct locations to diagnostics machinery. Similar to (A.2) above.
B.1) Changes in the testsuite to enable testing column numbers. [FIXED: Use /* { dg-warning "5:something wrong" } */ ]
#) Open bugs: PR49973: Column numbers count special characters as multiple columns
C) Consistent diagnostics. This requires:
C.1) Make CPP use the diagnostics machinery. This will fix part of PR7263 and other similar bugs where there is a mismatch between the diagnostics machinery and CPP's own diagnostics machinery. [FIXED in GCC 4.5, PR7263 fixed in GCC 4.8]
#) Preprocessed/original locations in a single location_t. This will avoid different behaviour when a token comes from a macro expansion. Similar as (A.0) above. [FIXED in GCC 4.7 with -ftrack-macro-expansion]
D) Printing Ranges. This requires: [ FIXED in GCC 6 ]
- #) Printing accurate column information. Similar to (B) above.
#) A location(s) -> source strings interface and machinery. Similar to (A.3) above.
- #) Track beg/end locations. Similar to (A.1.b) above.
- #) Changes in the parser to pass down ranges. Similar to (A.2) above.
- D.1) Changes in the testsuite to enable testing ranges.
- D.2) Changes in the diagnostics machinery to handle ranges.
E) Caret diagnostics. This requires: [FIXED in GCC 4.8 with -fdiagnostics-show-caret]
- #) Printing accurate column information. Similar to (B) above.
#) A location(s) -> source strings interface and machinery. Similar to (A.3) above.
- E.1) Changes in the diagnostics machinery to print the source line and a caret.
F) Precision in Wording
void foo(int i) { *i = 0; } $ gcc-4.8 -fsyntax-only t.c precisionword2.c: In function ‘foo’: precisionword2.c:3:3: error: invalid type argument of unary ‘*’ (have ‘int’) *i = 0; ^ $ clang -fsyntax-only t.c precisionword2.c:3:3: error: indirection requires pointer operand ('int' invalid) *i = 0; ^~
$ gcc-4.8 t.c t.c: In function ‘foo’: t.c:5:1: error: expected ‘;’ before ‘}’ token } ^ $ clang t.c precisionword3.c:4:8: error: expected ';' after expression bar() ^ ;
int main() { void *p = 0; p += 1; p++; } $ gcc-4.8 -Wpedantic precisionword4.c: In function ‘main’: precisionword4.c:4:5: warning: pointer of type ‘void *’ used in arithmetic [-Wpedantic] p += 1; ^ precisionword4.c:5:4: warning: wrong type argument to increment [-Wpedantic] p++; ^ $ clang -Wpedantic precisionword4.c:4:5: warning: arithmetic on a pointer to void is a GNU extension [-Wpointer-arith] p += 1; ~ ^ precisionword4.c:5:4: warning: arithmetic on a pointer to void is a GNU extension [-Wpointer-arith] p++; ~^
G) Typedef Preservation and Selective Unwrapping: Clang (LLVM) expressive diagnostics.
$ gcc-4.2 -fsyntax-only t.c t.c:15: error: invalid operands to binary / (have 'float __vector__' and 'const int *') $ clang -fsyntax-only t.c t.c:15:11: error: can't convert between vector values of different size ('__m128' and 'int const *') $ g++-4.2 -fsyntax-only t.cpp t.cpp:12: error: no match for 'operator=' in 'str = vec' $ clang -fsyntax-only t.cpp t.cpp:12:7: error: incompatible type assigning 'vector<Real>', expected 'std::string' (aka 'class std::basic_string<char>')
H) Fix-it Hints [ Support added in GCC 6 but more hints need to be added] (PR62314)
$ clang t.cpp t.cpp:9:3: error: template specialization requires 'template<>' struct iterator_traits<file_iterator> { ^ template<>
I) Automatic Macro Unwinder/Expansion [FIXED in GCC 4.8 with -fdiagnostics-show-caret -ftrack-macro-expansion]
J) Spell-checker [ FIXED in GCC 6 ]
K) Precise locations within strings [FIXED in GCC 6 in most usual cases] (open issues are detailed in PR52952)
L) Template type diffing: http://clang.llvm.org/diagnostics.html [FIXED in GCC 8]
M) Color [FIXED in GCC 4.9 for C family, in GCC 6 for Fortran]
N) Diagnostics for inline assembler:
Clang does it already thanks to their integrated assembler. People seem to like it http://www.reddit.com/r/programming/comments/bnhxb/clang_now_with_inline_assembly_diagnostics/. When using a recent GNU assembler, the diagnostics are slightly better. However, it is unlikely to get to the level of Clang, without a closer integration. There is some consensus that the assembler should NOT be integrated into GCC. Yet, one could imagine passing a bit more information to GNU as when dealing with inline-asm. This could be done by some annotations in the input to gas, or using a library interface to gas. If you are interested on this: Please help!
Bugs: PR57950
O) Multiple locations:
So we can have something like
/home/manuel/test2/src/gcc/testsuite/g++.dg/warn/Wbraces1.C:3:17,3:22: warning: missing braces around initializer for ‘int [2]’ [-Wmissing-braces] int a[2][2] = { 0, 1 , 2, 3 }; // { dg-warning "" } ^ ^ { }
The diagnostics machinery supports this and it is used by Fortran, however, the C/C++ FE are not using this feature yet.
Notes:
Diagnostics messages should follow GNU standards for error messages, so if we want to change the output, we have to modify the standard first (there has been some discussion in the gcc mailing list already).
More examples of Clang (LLVM) superior diagnostics over GCC.
More bad diagnostics from GCC: Some of them were reported but closed as INVALID. Only a complete patch showing a unequivocally superior alternative may convince GCC maintainers. Or report it to Clang, fix it there and try to convince GCC maintainers of the superior alternative by comparison.
Linus criticism. Points to take home: fix -Wshadow (FIXED in GCC 4.8) , add -Wstrict-prototypes (and other useful warnings) to -Wall and fix unsigned warnings.
More amazing Clang diagnostics: http://ecn.channel9.msdn.com/events/GoingNative12/GN12Clang.pdf source
Open diagnostics bugs Help us to fix them!