Improving GCC Diagnostics

Each point below is a mini-project on itself. Please ask in for more information if you want to help to implement some of these projects.

A) Printing the input expression instead of re-constructing it. This will fix the well-known limitation of the pretty-printer (see PR35441), PR35742, PR49152 and etc.). This requires:

  1. For each preprocessed token, we would need to keep two locations: one for the preprocessed location and another for the original location. [FIXED in GCC 4.7 with -ftrack-macro-expansion]

  2. For non-preprocessed expr we need at least two locations per expr (beg/end). This requires:
    1. Changes on the build_* functions to handle multiple locations.
    2. Track the end of tokens:

                 X + some_long\
      We need to track the locations of X and r somehow.
  3. Changes in the parser to pass down the correct locations to the build_* functions.

  4. A location(s) -> source strings interface and machinery. Ideally, this should be more or less independent of CPP, so CPP (through the diagnostics machinery) calls into this when needed and not the other way around. This can be implemented in several ways:

    1. Keeping the CPP buffers in memory and having in line-maps pointers directly into the buffers contents. This is easy and fast but potentially memory consuming. Care to handle charsets, tabs, etc must be taken into account. Factoring out anything useful from libcpp would help to implement this.

    2. Re-open the file and fseek. This is not trivial since we need to do it fast but still do all character conversions that we did when libcpp opened it the first time. This is approximately what Clang (LLVM) does and it seems they can do it very fast by keeping a cache of buffers ever reopened. I think that thanks to our line-maps implementation, we can do the seeking quite more efficiently in terms of computation time. However, opening files is quite embedded into CPP, so that would need to be factored out so we can avoid any unnecessary CPP stuff when reopening but still do it *properly* and *efficiently*. [A basic implementation is available in GCC 4.8 -fdiagnostics-show-caret]

  5. Changes in the diagnostics machinery to extract locations from expr and print a string from a source file instead of re-constructing things.
  6. Handle locations during folding or avoid aggressive folding in the front-ends. See PR32643, PR60090, and this quote from

    • At present, there are still various "optimizations" done in the C front end while the trees for expressions are built up, and some cases where some folding still happens; such optimizations can be incrementally moved out of the front end into fold (or other optimizers) where they belong, and such folding during parsing incrementally disabled, moving towards trees that more closely correspond to the original source. In addition, the front end does not logically need all the transformations done by fold. In principle, fold should act on GIMPLE (so avoiding any present dependence on which subexpressions were combined into larger expressions in the source program) with the only folding done before gimplification being more limited folding required for initializers. With such a change, c_fully_fold would only need to do the more limited folding. If the C gimplification handled C_MAYBE_CONST_EXPR, some calls to c_fully_fold could be eliminated altogether; only those where the information about constancy is required, in particular for static initializers, would need to remain. However, c_fully_fold could also be thought of as a logical lowering step, converting front-end-specific structures (which presently are GENERIC plus the odd extra tree code) to GENERIC, with potentially further transformations needed in future, and increases in the amount of lowering involved (making datastructures correspond more closely to the source for longer in the front end) might require this lowering step everywhere even when folding isn't needed.
  7. Handle locations during optimisation or update middle-end diagnostics to not rely in perfect location information. This probably means not using %qE, not column info, and similar limitations. Some trade-off must be investigated.
  8. Add locations for operands of expressions. Constants and uses of variables do not have a location on their own. (PR43486)

B) Printing accurate column information. This requires:

C) Consistent diagnostics. This requires:

D) Printing Ranges. This requires: [ FIXED in GCC 6 ]

E) Caret diagnostics. This requires: [FIXED in GCC 4.8 with -fdiagnostics-show-caret]

F) Precision in Wording

void foo(int i)
  *i = 0;
  $ gcc-4.8 -fsyntax-only t.c
precisionword2.c: In function ‘foo’:
precisionword2.c:3:3: error: invalid type argument of unary ‘*’ (have ‘int’)
   *i = 0;
  $ clang -fsyntax-only t.c
precisionword2.c:3:3: error: indirection requires pointer operand ('int' invalid)
  *i = 0;

  $ gcc-4.8 t.c
t.c: In function ‘foo’:
t.c:5:1: error: expected ‘;’ before ‘}’ token
  $ clang t.c
 precisionword3.c:4:8: error: expected ';' after expression

int main()
  void *p = 0;
  p += 1;

$ gcc-4.8 -Wpedantic
precisionword4.c: In function ‘main’:
precisionword4.c:4:5: warning: pointer of type ‘void *’ used in arithmetic [-Wpedantic]
   p += 1;
precisionword4.c:5:4: warning: wrong type argument to increment [-Wpedantic]
$ clang -Wpedantic
precisionword4.c:4:5: warning: arithmetic on a pointer to void is a GNU extension [-Wpointer-arith]
  p += 1;
  ~ ^
precisionword4.c:5:4: warning: arithmetic on a pointer to void is a GNU extension [-Wpointer-arith]

G) Typedef Preservation and Selective Unwrapping: Clang (LLVM) expressive diagnostics.

  $ gcc-4.2 -fsyntax-only t.c
  t.c:15: error: invalid operands to binary / (have 'float __vector__' and 'const int *')
  $ clang -fsyntax-only t.c
  t.c:15:11: error: can't convert between vector values of different size ('__m128' and 'int const *')

  $ g++-4.2 -fsyntax-only t.cpp
  t.cpp:12: error: no match for 'operator=' in 'str = vec'
  $ clang -fsyntax-only t.cpp
  t.cpp:12:7: error: incompatible type assigning 'vector<Real>', expected 'std::string' (aka 'class std::basic_string<char>')

H) Fix-it Hints [ Support added in GCC 6 but more hints need to be added] (PR62314)

 $ clang t.cpp
  t.cpp:9:3: error: template specialization requires 'template<>'
    struct iterator_traits<file_iterator> {

I) Automatic Macro Unwinder/Expansion [FIXED in GCC 4.8 with -fdiagnostics-show-caret -ftrack-macro-expansion]

J) Spell-checker [ FIXED in GCC 6 ]

K) Precise locations within strings [FIXED in GCC 6 in most usual cases] (open issues are detailed in PR52952)

L) Template type diffing: [FIXED in GCC 8]

M) Color [FIXED in GCC 4.9 for C family, in GCC 6 for Fortran]

N) Diagnostics for inline assembler:

O) Multiple locations:

So we can have something like

/home/manuel/test2/src/gcc/testsuite/g++.dg/warn/Wbraces1.C:3:17,3:22: warning: missing braces around initializer for ‘int [2]’ [-Wmissing-braces]
int a[2][2] = { 0, 1 , 2, 3 }; // { dg-warning "" }
                ^    ^ 
                {    }

The diagnostics machinery supports this and it is used by Fortran, however, the C/C++ FE are not using this feature yet.


None: Better_Diagnostics (last edited 2021-02-21 22:24:28 by ManuelLópezIbáñez)