User account creation filtered due to spam.

Bug 19165 - (Natural) language independent error / warning classification
Summary: (Natural) language independent error / warning classification
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: other (show other bugs)
Version: 3.4.3
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic
: 55336 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-12-27 14:22 UTC by Johan Walles
Modified: 2016-05-24 16:56 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-01-21 02:51:52


Attachments
XML diagnostics (4.05 KB, patch)
2009-08-08 16:05 UTC, Manuel López-Ibáñez
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Johan Walles 2004-12-27 14:22:26 UTC
I've started looking at using the Eclipse CDT for my C development.  One thing
the CDT does is parse the diagnostic messages printed by gcc.

To determine whether a message is an error or a warning, the CDT checks whether
the message starts with "warning:".  If it does, it is considered a warning,
otherwise an error.

However, since I'm using a Swedish locale, my warning messages start with
"varning:" rather than "warning:", making the CDT heuristic fail.

I would like gcc to mark warnings and errors in a non-locale dependent way,
easily parseable by frontends such as the CDT.

I did read
http://gcc.gnu.org/onlinedocs/gcc-3.4.3/gcc/Language-Independent-Options.html
and AFAICT there isn't currently any switch for making gcc classify its
diagnostic messages this way.
Comment 1 Andrew Pinski 2004-12-27 14:29:07 UTC
I don't think we should do this because the warnings are for people not for IDEs.  Maybe the IDEs 
should use the translated message instead aka use the .pot file from gcc to do the parsing :).
Comment 2 Johan Walles 2004-12-27 17:51:30 UTC
The IDE wants to present gcc's messages to people, so it's not as if the IDE
wants to understand the messages themselves (except whether they are warnings or
errors).  Now that I read what I wrote again I can see how it sounded as if the
CDT tries to do more than that, but it doesn't.  It just wants to do
errors-vs-warnings classification.

The reason it wants to be able to tell errors from warnings is that it wants to
highlight the errors more than the warnings.  Unforturnately no sane way of
doing that seems to exist currently :-(.
Comment 3 Andrew Pinski 2005-03-29 01:46:31 UTC
What about changing the IDEs so they understand the natural language warning/error/note 
classification instead?
Comment 4 Johan Walles 2005-03-29 08:04:29 UTC
That's what they are doing currently, but it works only for English :-(.
Comment 5 Andrew Pinski 2005-07-15 21:20:47 UTC
Confirmed.
Comment 6 Tom Tromey 2005-10-23 06:47:43 UTC
There's an Eclipse PR for this, fwiw:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=108720

If you look there you can see further motivation -- in particular,
the continuation messages that gcc sometimes prints are basically
confusing to an IDE.  

Parsing the translated message seems like a very difficult approach.
Consider messages that have printf substitutions in them for instance.
Or in the case in the above PR, there is nothing distinguishing about
some of the continuation messages.
Comment 7 Timothee Besset 2006-05-15 18:42:11 UTC
an XML output mode would solve this ( and potentially a number of other similar issues .. like having to set -fmessage-length=0 for most analyzers )
Comment 8 Manuel López-Ibáñez 2007-01-22 08:58:55 UTC
Is this really a bug in GCC? Eclipse should run GCC with a locale it can understand. Then, if it wants to support other languages, it has to support them also in the parser.

Or we go for the XML output? That would be kind of interesting...
Comment 9 Tom Tromey 2007-01-22 16:12:01 UTC
Some kind of machine-readable output is necessary for use by an IDE.
Eclipse can't translate the messages after they have been emitted by GCC.
So, it should run GCC in the user's locale.
However, then it would like to differentiate between warnings and errors.
There's no way to do that without knowing all the ways that GCC might
translate the words "warning" and "error" (running in a locale Eclipse
thinks it understands is not a good option because GCC may change its
choices of translation...).  It would be friendlier if
GCC provided this information.  XML, or anything machine-readable, would be fine.
So, yes, this is a GCC bug.
Comment 10 Manuel López-Ibáñez 2009-08-08 16:05:56 UTC
Created attachment 18329 [details]
XML diagnostics

A prototype of XML mode for diagnostics. The output looks like:

<diagnostic class="error" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:10:8">
inicializaci&#65533;n de un miembro de matriz flexible en un contexto anidado
</diagnostic>

<diagnostic class="error" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:10:8">
(cerca de la inicializaci&#65533;n de 'g2.f.x')
</diagnostic>

<diagnostic class="warning" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:13:8">
exceso de elementos en el inicializador de matriz
</diagnostic>

<diagnostic class="warning" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:13:8">
(cerca de la inicializaci&#65533;n de 'h1.x')
</diagnostic>

I am not planning to work on this further. This patch shows that it can be done, but I don't know if there is any interest on this.
Comment 11 joseph@codesourcery.com 2009-08-08 16:33:18 UTC
Subject: Re:  (Natural) language independent error / warning
 classification

On Sat, 8 Aug 2009, manu at gcc dot gnu dot org wrote:

> I am not planning to work on this further. This patch shows that it can be
> done, but I don't know if there is any interest on this.

The principle makes sense (obviously the prototype patch would need 
further work for actual inclusion, e.g. escaping of < and > signs for the 
XML output), but I think in practice it's only useful if driven by 
cooperation from IDE people who will help establish what the XML should 
look like and commit to making an IDE use the XML output in future by 
default when using a GCC version that supports it.

I imagine that the XML should have some way of marking continuation 
messages as such, should include the option (as from 
-fdiagnostics-show-option) in some structured way, and probably should 
give locations and inclusion context in an XML structured way as well 
rather than as plain text - but discussion would be needed with IDE people 
on what information GCC can give and how an IDE could use it.

Comment 12 Manuel López-Ibáñez 2009-08-08 17:05:38 UTC
(In reply to comment #11)
> XML output), but I think in practice it's only useful if driven by 
> cooperation from IDE people who will help establish what the XML should 
> look like and commit to making an IDE use the XML output in future by 
> default when using a GCC version that supports it.

Completely agreed. 

> I imagine that the XML should have some way of marking continuation 
> messages as such, should include the option (as from 
> -fdiagnostics-show-option) in some structured way, and probably should 
> give locations and inclusion context in an XML structured way as well 
> rather than as plain text - but discussion would be needed with IDE people 
> on what information GCC can give and how an IDE could use it.

Yes, it will require some custom pretty-printing functions to handle XML entities. I think it will also need to address some weaknesses of the current diagnostics/pretty-printing machinery: first/last diagnostic callbacks, handle "\n" in messages correctly, being able to construct a single diagnostic entity from various error/warning/notes (I think this is what you mean by continuation messages).

Nothing of this would be difficult.

In addition, GCC could expose much more info from the internal representation through XML, when printing %T, %E, %D and such.

But it is not worth if no popular IDE is making use of it.

I just realized that Clang has already HTML output. 

http://clang.llvm.org/doxygen/HTMLRewrite_8cpp-source.html

They advance really fast! So if anyone is interested on this, he/she should also check what Clang does and which IDEs are using Clang with HTML output, to avoid unnecessary incompatibilities.
Comment 13 Manuel López-Ibáñez 2012-11-15 10:39:35 UTC
*** Bug 55336 has been marked as a duplicate of this bug. ***
Comment 14 Clemens 2012-11-15 15:32:11 UTC
As describt in duplication Bug 55336 I would extend the xml for compound messages p.e. a warning/error which points to 2 source positions like member initialisation and member position. In fact it is only one bug, but gcc displays more.
Comment 15 Teodor Petrov 2013-01-30 12:13:52 UTC
I'm speaking as one of Code::Blocks' developers:
If you implement this we'll for sure use it, because we have many complaints similar to the one Eclipse's developers have. 

(After one such complaint I've found this bug, by the way).

Some suggestions: 
Don't pack the line/column info with the file name, if possible.
So the proposed diagnostic from this:
<diagnostic class="error"
location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:10:8">
inicializaci&#65533;n de un miembro de matriz flexible en un contexto anidado
</diagnostic>

will turn in to this, which will be easier to parse:
<diagnostic class="error" location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c" line="10" column="8">
inicializaci&#65533;n de un miembro de matriz flexible en un contexto anidado
</diagnostic>

Also, if it is possible group the notes/instances info with the error/warning messages. This way it will allows us to show the information in a better way.
Comment 16 Manuel López-Ibáñez 2013-01-30 18:18:35 UTC
(In reply to comment #15)
> I'm speaking as one of Code::Blocks' developers:
> If you implement this we'll for sure use it, because we have many complaints
> similar to the one Eclipse's developers have. 

If you have some developer power to spare, it may be worthwhile to try to tackle this yourself. Otherwise I am afraid this will never be implemented in GCC. The patch in comment #10 is very rough and outdated, but the idea is simple. Copy diagnostic.c to diagnostic-xml.c and start modifying the output functions to emit XML.

> Don't pack the line/column info with the file name, if possible.

This will be trivial to do.

> Also, if it is possible group the notes/instances info with the error/warning
> messages. This way it will allows us to show the information in a better way.

This will be more complicated, because the diagnostics machinery does not have a concept of multiple messages belonging to the same diagnostic. But I think this is eventually the way to go (and I think Gabriel, the diagnostics maintainer, thinks the same) by the way of defining an internal representation for diagnostics output. (http://gcc.gnu.org/ml/gcc/2012-04/msg00567.html) Dumping this internal representation in XML format would be easy then.

But designing and implementing this IR does not seem trivial, so you may want to start with the simple stuff.
Comment 17 Teodor Petrov 2013-01-30 20:34:00 UTC
(In reply to comment #16)
> 
> If you have some developer power to spare, it may be worthwhile to try to
> tackle this yourself. Otherwise I am afraid this will never be implemented in
> GCC. The patch in comment #10 is very rough and outdated, but the idea is
> simple. Copy diagnostic.c to diagnostic-xml.c and start modifying the output
> functions to emit XML.
Probably, I can spare some time, but I don't have time to spare and be bothered with all the copyright assignments and other bureaucracy related things.
Comment 18 Manuel López-Ibáñez 2013-03-27 00:59:43 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > 
> > If you have some developer power to spare, it may be worthwhile to try to
> > tackle this yourself. Otherwise I am afraid this will never be implemented in
> > GCC. The patch in comment #10 is very rough and outdated, but the idea is
> > simple. Copy diagnostic.c to diagnostic-xml.c and start modifying the output
> > functions to emit XML.
> Probably, I can spare some time, but I don't have time to spare and be bothered
> with all the copyright assignments and other bureaucracy related things.

Unfortunately, and from my own experience, I completely understand your position. Worse, the amount of bureaucracy required seems to be random. I would suggest to give it a token try, by sending an email to both gcc@gnu.gcc.org and assignments@gcc.org, perhaps trying to opt for a copyright disclaimer instead. See http://gcc.gnu.org/contribute.html#legal
If you find the process too troublesome, well, nobody can said you didn't try.  But it would be good to announce your decision to give up in both mailing lists, as it is difficult to estimate how many people don't contribute to gcc just because of the bureaucracy (Although I believe the number to be larger than the number of current contributors, sadly).
Comment 19 David Malcolm 2015-11-05 11:29:34 UTC
Parsing textual gcc diagnostics is non-trivial.

FWIW, as noted on the gcc list, I had a go at creating an interchange format for static analysis results (which includes compiler diagnostics).  The aim was to run lots of static analyzers on lots of code, and capture the results in a consistent format in a browseable database, hence the need for an interchange format. [1] I created a format I call "Firehose":
  https://github.com/fedora-static-analysis/firehose
as a set of Python classes that can be roundtripped through XML and JSON.  It currently provides parsers for the output of gcc, clang-analyzer, cppcheck, and findbugs, and my gcc-python-plugin has a branch that can emit firehose reports directly.

It can store more than just location+message: clang-analyzer can emit a series of messages describing a trace of events leading to a bug, and firehose can capture that (by reading the plist file).  We don't provide that yet from gcc, but it might be worth thinking about.

The gcc diagnostic parser is here:
https://github.com/fedora-static-analysis/firehose/blob/master/firehose/parsers/gcc.py

with test cases:
https://github.com/fedora-static-analysis/firehose/blob/master/tests/parsers/test_gcc_parser.py
Comment 20 David Malcolm 2015-11-05 11:37:06 UTC
(In reply to Teodor Petrov from comment #15)
> I'm speaking as one of Code::Blocks' developers:
> If you implement this we'll for sure use it, because we have many complaints
> similar to the one Eclipse's developers have. 
> 
> (After one such complaint I've found this bug, by the way).
> 
> Some suggestions: 
> Don't pack the line/column info with the file name, if possible.
> So the proposed diagnostic from this:
> <diagnostic class="error"
> location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c:10:8">
> inicializaci&#65533;n de un miembro de matriz flexible en un contexto anidado
> </diagnostic>
> 
> will turn in to this, which will be easier to parse:
> <diagnostic class="error"
> location="/home/manuel/src/test/gcc/testsuite/gcc.dg/array-2.c" line="10"
> column="8">
> inicializaci&#65533;n de un miembro de matriz flexible en un contexto anidado
> </diagnostic>

Indeed.

> Also, if it is possible group the notes/instances info with the
> error/warning messages. This way it will allows us to show the information
> in a better way.

FWIW, in the "firehose" gcc parser, I captured the warning's switch so e.g.
"num_get_float.cpp:535:29: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]"

has id="string-aliasing" as one of the captured attributes in the XML.
Comment 21 David Malcolm 2015-11-05 11:40:21 UTC
One other issue is that column numbering is rather a mess right now.  From my rich-location patch:

/* Both gcc and emacs number source *lines* starting at 1, but
   they have differing conventions for *columns*.

   GCC uses a 1-based convention for source columns,
   whereas Emacs's M-x column-number-mode uses a 0-based convention.

   For example, an error in the initial, left-hand
   column of source line 3 is reported by GCC as:

      some-file.c:3:1: error: ...etc...

   On navigating to the location of that error in Emacs
   (e.g. via "next-error"),
   the locus is reported in the Mode Line
   (assuming M-x column-number-mode) as:

     some-file.c   10%   (3, 0)

   i.e. "3:1:" in GCC corresponds to "(3, 0)" in Emacs.  */

Our "column numbers" are also simply a byte-count, I believe, so a tab character is treated by us as simply an increment of 1 right now.

I guess this is a separate issue though.
Comment 22 Manuel López-Ibáñez 2015-11-05 11:57:04 UTC
(In reply to David Malcolm from comment #21)
> Our "column numbers" are also simply a byte-count, I believe, so a tab
> character is treated by us as simply an increment of 1 right now.
> 
> I guess this is a separate issue though.

There was a discussion in the mailing list not so long ago about this precise issue and I think there were quite good ideas on how to fix this. I cannot find the link, but if you do, would you mind adding it to https://gcc.gnu.org/wiki/Better_Diagnostics under B) ?

I added PR49973, which is related to this.
Comment 23 Manuel López-Ibáñez 2015-12-09 20:13:05 UTC
(In reply to David Malcolm from comment #21)
> One other issue is that column numbering is rather a mess right now.  From
> my rich-location patch:
> 
> /* Both gcc and emacs number source *lines* starting at 1, but
>    they have differing conventions for *columns*.
> 
>    GCC uses a 1-based convention for source columns,
>    whereas Emacs's M-x column-number-mode uses a 0-based convention.

FWIW, GCC is right and Emacs wrong according to https://www.gnu.org/prep/standards/html_node/Errors.html

Emacs could simply do the transformation itself.