I've started looking at using the Eclipse CDT for my C development. One thing the CDT does is parse the diagnostic messages printed by gcc. To determine whether a message is an error or a warning, the CDT checks whether the message starts with "warning:". If it does, it is considered a warning, otherwise an error. However, since I'm using a Swedish locale, my warning messages start with "varning:" rather than "warning:", making the CDT heuristic fail. I would like gcc to mark warnings and errors in a non-locale dependent way, easily parseable by frontends such as the CDT. I did read http://gcc.gnu.org/onlinedocs/gcc-3.4.3/gcc/Language-Independent-Options.html and AFAICT there isn't currently any switch for making gcc classify its diagnostic messages this way.
I don't think we should do this because the warnings are for people not for IDEs. Maybe the IDEs should use the translated message instead aka use the .pot file from gcc to do the parsing :).
The IDE wants to present gcc's messages to people, so it's not as if the IDE wants to understand the messages themselves (except whether they are warnings or errors). Now that I read what I wrote again I can see how it sounded as if the CDT tries to do more than that, but it doesn't. It just wants to do errors-vs-warnings classification. The reason it wants to be able to tell errors from warnings is that it wants to highlight the errors more than the warnings. Unforturnately no sane way of doing that seems to exist currently :-(.
What about changing the IDEs so they understand the natural language warning/error/note classification instead?
That's what they are doing currently, but it works only for English :-(.
Confirmed.
There's an Eclipse PR for this, fwiw: https://bugs.eclipse.org/bugs/show_bug.cgi?id=108720 If you look there you can see further motivation -- in particular, the continuation messages that gcc sometimes prints are basically confusing to an IDE. Parsing the translated message seems like a very difficult approach. Consider messages that have printf substitutions in them for instance. Or in the case in the above PR, there is nothing distinguishing about some of the continuation messages.
an XML output mode would solve this ( and potentially a number of other similar issues .. like having to set -fmessage-length=0 for most analyzers )
Is this really a bug in GCC? Eclipse should run GCC with a locale it can understand. Then, if it wants to support other languages, it has to support them also in the parser. Or we go for the XML output? That would be kind of interesting...
Some kind of machine-readable output is necessary for use by an IDE. Eclipse can't translate the messages after they have been emitted by GCC. So, it should run GCC in the user's locale. However, then it would like to differentiate between warnings and errors. There's no way to do that without knowing all the ways that GCC might translate the words "warning" and "error" (running in a locale Eclipse thinks it understands is not a good option because GCC may change its choices of translation...). It would be friendlier if GCC provided this information. XML, or anything machine-readable, would be fine. So, yes, this is a GCC bug.
Created attachment 18329 [details] XML diagnostics A prototype of XML mode for diagnostics. The output looks like: <diagnostic class="error" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:10:8"> inicializaci�n de un miembro de matriz flexible en un contexto anidado </diagnostic> <diagnostic class="error" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:10:8"> (cerca de la inicializaci�n de 'g2.f.x') </diagnostic> <diagnostic class="warning" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:13:8"> exceso de elementos en el inicializador de matriz </diagnostic> <diagnostic class="warning" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:13:8"> (cerca de la inicializaci�n de 'h1.x') </diagnostic> I am not planning to work on this further. This patch shows that it can be done, but I don't know if there is any interest on this.
Subject: Re: (Natural) language independent error / warning classification On Sat, 8 Aug 2009, manu at gcc dot gnu dot org wrote: > I am not planning to work on this further. This patch shows that it can be > done, but I don't know if there is any interest on this. The principle makes sense (obviously the prototype patch would need further work for actual inclusion, e.g. escaping of < and > signs for the XML output), but I think in practice it's only useful if driven by cooperation from IDE people who will help establish what the XML should look like and commit to making an IDE use the XML output in future by default when using a GCC version that supports it. I imagine that the XML should have some way of marking continuation messages as such, should include the option (as from -fdiagnostics-show-option) in some structured way, and probably should give locations and inclusion context in an XML structured way as well rather than as plain text - but discussion would be needed with IDE people on what information GCC can give and how an IDE could use it.
(In reply to comment #11) > XML output), but I think in practice it's only useful if driven by > cooperation from IDE people who will help establish what the XML should > look like and commit to making an IDE use the XML output in future by > default when using a GCC version that supports it. Completely agreed. > I imagine that the XML should have some way of marking continuation > messages as such, should include the option (as from > -fdiagnostics-show-option) in some structured way, and probably should > give locations and inclusion context in an XML structured way as well > rather than as plain text - but discussion would be needed with IDE people > on what information GCC can give and how an IDE could use it. Yes, it will require some custom pretty-printing functions to handle XML entities. I think it will also need to address some weaknesses of the current diagnostics/pretty-printing machinery: first/last diagnostic callbacks, handle "\n" in messages correctly, being able to construct a single diagnostic entity from various error/warning/notes (I think this is what you mean by continuation messages). Nothing of this would be difficult. In addition, GCC could expose much more info from the internal representation through XML, when printing %T, %E, %D and such. But it is not worth if no popular IDE is making use of it. I just realized that Clang has already HTML output. http://clang.llvm.org/doxygen/HTMLRewrite_8cpp-source.html They advance really fast! So if anyone is interested on this, he/she should also check what Clang does and which IDEs are using Clang with HTML output, to avoid unnecessary incompatibilities.
*** Bug 55336 has been marked as a duplicate of this bug. ***
As describt in duplication Bug 55336 I would extend the xml for compound messages p.e. a warning/error which points to 2 source positions like member initialisation and member position. In fact it is only one bug, but gcc displays more.
I'm speaking as one of Code::Blocks' developers: If you implement this we'll for sure use it, because we have many complaints similar to the one Eclipse's developers have. (After one such complaint I've found this bug, by the way). Some suggestions: Don't pack the line/column info with the file name, if possible. So the proposed diagnostic from this: <diagnostic class="error" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:10:8"> inicializaci�n de un miembro de matriz flexible en un contexto anidado </diagnostic> will turn in to this, which will be easier to parse: <diagnostic class="error" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c" line="10" column="8"> inicializaci�n de un miembro de matriz flexible en un contexto anidado </diagnostic> Also, if it is possible group the notes/instances info with the error/warning messages. This way it will allows us to show the information in a better way.
(In reply to comment #15) > I'm speaking as one of Code::Blocks' developers: > If you implement this we'll for sure use it, because we have many complaints > similar to the one Eclipse's developers have. If you have some developer power to spare, it may be worthwhile to try to tackle this yourself. Otherwise I am afraid this will never be implemented in GCC. The patch in comment #10 is very rough and outdated, but the idea is simple. Copy diagnostic.c to diagnostic-xml.c and start modifying the output functions to emit XML. > Don't pack the line/column info with the file name, if possible. This will be trivial to do. > Also, if it is possible group the notes/instances info with the error/warning > messages. This way it will allows us to show the information in a better way. This will be more complicated, because the diagnostics machinery does not have a concept of multiple messages belonging to the same diagnostic. But I think this is eventually the way to go (and I think Gabriel, the diagnostics maintainer, thinks the same) by the way of defining an internal representation for diagnostics output. (http://gcc.gnu.org/ml/gcc/2012-04/msg00567.html) Dumping this internal representation in XML format would be easy then. But designing and implementing this IR does not seem trivial, so you may want to start with the simple stuff.
(In reply to comment #16) > > If you have some developer power to spare, it may be worthwhile to try to > tackle this yourself. Otherwise I am afraid this will never be implemented in > GCC. The patch in comment #10 is very rough and outdated, but the idea is > simple. Copy diagnostic.c to diagnostic-xml.c and start modifying the output > functions to emit XML. Probably, I can spare some time, but I don't have time to spare and be bothered with all the copyright assignments and other bureaucracy related things.
(In reply to comment #17) > (In reply to comment #16) > > > > If you have some developer power to spare, it may be worthwhile to try to > > tackle this yourself. Otherwise I am afraid this will never be implemented in > > GCC. The patch in comment #10 is very rough and outdated, but the idea is > > simple. Copy diagnostic.c to diagnostic-xml.c and start modifying the output > > functions to emit XML. > Probably, I can spare some time, but I don't have time to spare and be bothered > with all the copyright assignments and other bureaucracy related things. Unfortunately, and from my own experience, I completely understand your position. Worse, the amount of bureaucracy required seems to be random. I would suggest to give it a token try, by sending an email to both gcc@gnu.gcc.org and assignments@gcc.org, perhaps trying to opt for a copyright disclaimer instead. See http://gcc.gnu.org/contribute.html#legal If you find the process too troublesome, well, nobody can said you didn't try. But it would be good to announce your decision to give up in both mailing lists, as it is difficult to estimate how many people don't contribute to gcc just because of the bureaucracy (Although I believe the number to be larger than the number of current contributors, sadly).
Parsing textual gcc diagnostics is non-trivial. FWIW, as noted on the gcc list, I had a go at creating an interchange format for static analysis results (which includes compiler diagnostics). The aim was to run lots of static analyzers on lots of code, and capture the results in a consistent format in a browseable database, hence the need for an interchange format. [1] I created a format I call "Firehose": https://github.com/fedora-static-analysis/firehose as a set of Python classes that can be roundtripped through XML and JSON. It currently provides parsers for the output of gcc, clang-analyzer, cppcheck, and findbugs, and my gcc-python-plugin has a branch that can emit firehose reports directly. It can store more than just location+message: clang-analyzer can emit a series of messages describing a trace of events leading to a bug, and firehose can capture that (by reading the plist file). We don't provide that yet from gcc, but it might be worth thinking about. The gcc diagnostic parser is here: https://github.com/fedora-static-analysis/firehose/blob/master/firehose/parsers/gcc.py with test cases: https://github.com/fedora-static-analysis/firehose/blob/master/tests/parsers/test_gcc_parser.py
(In reply to Teodor Petrov from comment #15) > I'm speaking as one of Code::Blocks' developers: > If you implement this we'll for sure use it, because we have many complaints > similar to the one Eclipse's developers have. > > (After one such complaint I've found this bug, by the way). > > Some suggestions: > Don't pack the line/column info with the file name, if possible. > So the proposed diagnostic from this: > <diagnostic class="error" > location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:10:8"> > inicializaci�n de un miembro de matriz flexible en un contexto anidado > </diagnostic> > > will turn in to this, which will be easier to parse: > <diagnostic class="error" > location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c" line="10" > column="8"> > inicializaci�n de un miembro de matriz flexible en un contexto anidado > </diagnostic> Indeed. > Also, if it is possible group the notes/instances info with the > error/warning messages. This way it will allows us to show the information > in a better way. FWIW, in the "firehose" gcc parser, I captured the warning's switch so e.g. "num_get_float.cpp:535:29: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]" has id="string-aliasing" as one of the captured attributes in the XML.
One other issue is that column numbering is rather a mess right now. From my rich-location patch: /* Both gcc and emacs number source *lines* starting at 1, but they have differing conventions for *columns*. GCC uses a 1-based convention for source columns, whereas Emacs's M-x column-number-mode uses a 0-based convention. For example, an error in the initial, left-hand column of source line 3 is reported by GCC as: some-file.c:3:1: error: ...etc... On navigating to the location of that error in Emacs (e.g. via "next-error"), the locus is reported in the Mode Line (assuming M-x column-number-mode) as: some-file.c 10% (3, 0) i.e. "3:1:" in GCC corresponds to "(3, 0)" in Emacs. */ Our "column numbers" are also simply a byte-count, I believe, so a tab character is treated by us as simply an increment of 1 right now. I guess this is a separate issue though.
(In reply to David Malcolm from comment #21) > Our "column numbers" are also simply a byte-count, I believe, so a tab > character is treated by us as simply an increment of 1 right now. > > I guess this is a separate issue though. There was a discussion in the mailing list not so long ago about this precise issue and I think there were quite good ideas on how to fix this. I cannot find the link, but if you do, would you mind adding it to https://gcc.gnu.org/wiki/Better_Diagnostics under B) ? I added PR49973, which is related to this.
(In reply to David Malcolm from comment #21) > One other issue is that column numbering is rather a mess right now. From > my rich-location patch: > > /* Both gcc and emacs number source *lines* starting at 1, but > they have differing conventions for *columns*. > > GCC uses a 1-based convention for source columns, > whereas Emacs's M-x column-number-mode uses a 0-based convention. FWIW, GCC is right and Emacs wrong according to https://www.gnu.org/prep/standards/html_node/Errors.html Emacs could simply do the transformation itself.
Candidate patch for JSON output: https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01038.html
Author: dmalcolm Date: Thu Nov 15 14:32:41 2018 New Revision: 266186 URL: https://gcc.gnu.org/viewcvs?rev=266186&root=gcc&view=rev Log: Machine-readable diagnostic output (PR other/19165) This patch implements a -fdiagnostics-format=json option which converts the diagnostics to be output to stderr in a JSON format; see the documentation in invoke.texi. Logically-related diagnostics are nested at the JSON level, using the auto_diagnostic_group mechanism. gcc/ChangeLog: PR other/19165 * Makefile.in (OBJS): Move json.o to... (OBJS-libcommon): ...here and add diagnostic-format-json.o. * common.opt (fdiagnostics-format=): New option. (diagnostics_output_format): New enum. * diagnostic-format-json.cc: New file. * diagnostic.c (default_diagnostic_final_cb): New function, taken from start of diagnostic_finish. (diagnostic_initialize): Initialize final_cb to default_diagnostic_final_cb. (diagnostic_finish): Move "being treated as errors" messages to default_diagnostic_final_cb. Call any final_cb. (default_diagnostic_finalizer): Add diagnostic_t param. (diagnostic_report_diagnostic): Pass "orig_diag_kind" to diagnostic_finalizer callback. * diagnostic.h (enum diagnostics_output_format): New enum. (diagnostic_finalizer_fn): Reimplement, adding diagnostic_t param. (struct diagnostic_context): Add "final_cb". (default_diagnostic_finalizer): Add diagnostic_t param. (diagnostic_output_format_init): New decl. * doc/invoke.texi (-fdiagnostics-format): New option. * dwarf2out.c (gen_producer_string): Ignore OPT_fdiagnostics_format_. * gcc.c (driver_handle_option): Handle OPT_fdiagnostics_format_. * lto-wrapper.c (append_diag_options): Ignore it. * opts.c (common_handle_option): Handle it. gcc/c-family/ChangeLog: PR other/19165 * c-opts.c (c_diagnostic_finalizer): Add diagnostic_t param. gcc/fortran/ChangeLog: PR other/19165 * error.c (gfc_diagnostic_finalizer): Add diagnostic_t param. gcc/jit/ChangeLog: PR other/19165 * dummy-frontend.c (jit_begin_diagnostic): Add diagnostic_t param. gcc/testsuite/ChangeLog: PR other/19165 * c-c++-common/diagnostic-format-json-1.c: New test. * c-c++-common/diagnostic-format-json-2.c: New test. * c-c++-common/diagnostic-format-json-3.c: New test. * c-c++-common/diagnostic-format-json-4.c: New test. * c-c++-common/diagnostic-format-json-5.c: New test. * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (custom_diagnostic_finalizer): Add diagnostic_t param. * gcc.dg/plugin/location_overflow_plugin.c (verify_unpacked_ranges): Likewise. (verify_no_columns): Likewise. * gfortran.dg/diagnostic-format-json-1.F90: New test. * gfortran.dg/diagnostic-format-json-2.F90: New test. * gfortran.dg/diagnostic-format-json-3.F90: New test. Added: trunk/gcc/diagnostic-format-json.cc trunk/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c trunk/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c trunk/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c trunk/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c trunk/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c trunk/gcc/testsuite/gfortran.dg/diagnostic-format-json-1.F90 trunk/gcc/testsuite/gfortran.dg/diagnostic-format-json-2.F90 trunk/gcc/testsuite/gfortran.dg/diagnostic-format-json-3.F90 Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-opts.c trunk/gcc/common.opt trunk/gcc/diagnostic.c trunk/gcc/diagnostic.h trunk/gcc/doc/invoke.texi trunk/gcc/dwarf2out.c trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/error.c trunk/gcc/gcc.c trunk/gcc/jit/ChangeLog trunk/gcc/jit/dummy-frontend.c trunk/gcc/lto-wrapper.c trunk/gcc/opts.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c trunk/gcc/testsuite/gcc.dg/plugin/location_overflow_plugin.c
Implemented for gcc 9 via r266186.
*** Bug 96032 has been marked as a duplicate of this bug. ***