Bug 87488

Summary: hyperlink filenames in diagnostics
Product: gcc Reporter: Manuel López-Ibáñez <manu>
Component: cAssignee: David Malcolm <dmalcolm>
Status: RESOLVED FIXED    
Severity: enhancement CC: burnus, ebotcazou, egmont, jakub
Priority: P3 Keywords: diagnostic
Version: 9.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2019-10-09 00:00:00

Description Manuel López-Ibáñez 2018-10-02 16:32:22 UTC
Modern terminals are starting to implement the interpretation of hyperlinks encoded in escape sequences (https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda). The idea is to produce a escape sequence similar to the one generated for colors that encodes a hyperlink that, in the simplest case, describes the path to a file.

Support for hyperlinks has been added to ls:

https://github.com/coreutils/coreutils/commit/799bac0d06cfabe9491498727308df8d1aca6d98

and to systemd:

https://github.com/systemd/systemd/commit/23b27b39d2a3a002ad827a2e8a9872a51495d797

It would be interesting if GCC hyperlinked filenames in GCC diagnostics such that given:

myfile.c:1:2: error: some error

it is possible to click in myfile.c and automatically open "file://path/to/myfile.c"

I suggest  -fdiagnostics-hyperlink=[always|never|auto] with a default of never for now until support becomes more widespread.
Comment 1 Manuel López-Ibáñez 2018-10-02 16:32:58 UTC
David may be interested in this.
Comment 2 Manuel López-Ibáñez 2018-10-02 16:36:15 UTC
Actually, according to https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#detecting-availability-of-the-feature

"almost all terminal emulators ignore the OSC sequences they don't know about, so emitting OSC 8 will result in the anchor text not becoming a hyperlink, and the target URI getting lost."

That is, the default could be "auto" since terminals that do not recognize the sequence will simply discard it.
Comment 3 Jonathan Wakely 2018-10-02 16:52:09 UTC
https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#file-uris-and-the-hostname says there should be a hostname part, so not just file:///path but file://hostname/path

If I compile on a remote machine I don't want URLs that include the remote hostname (which might not even be reachable from my local machine via that name) or even necessarily the path relative to the remote machine. I might want to be able to click on a filename in the output of the remote compilation but have it open the corresponding file locally. So it might be useful to have one or more -fdiagnostic-xxx options that allow me to set the hostname explicitly, and/or to map from one path to another. So when I click on myfile.c on the remote host I can get a URL to a local copy of that file (which I would edit and push to the remote machine before recompiling).
Comment 4 David Malcolm 2018-10-02 16:53:18 UTC
Interesting.  Is there a way to encode the line number in the URL?  (and the column number?)  Or does it just give the file?
Comment 5 Jonathan Wakely 2018-10-02 17:32:55 UTC
Vim already Does The Right Thing for a filename like file.c:20:23
Comment 6 Manuel López-Ibáñez 2018-10-02 19:51:20 UTC
I think the utility of this is similar to colors, that is when the output is a terminal, not when it is emacs or vim or any other non-tty.

In remote hosts, I think we should do whatever GNU coreutils (so far, ls) is doing, probably no hostname at all. Most GNU/Linux distributions will use xdg-open to open the file:// url and that seems to work without a hostname.
Comment 7 David Malcolm 2019-01-08 15:15:34 UTC
I'm unconvinced that doing it for filenames is a good idea (based on the objections in comment #3), but I think that there could be other good uses tagging URLs into the output.

For example, a static analysis plugin that emits e.g.:

  error: NULL pointer 'p' passed as argument 1 to 'foo' [CWE-690]

might want to tag the "CWE-690" in the message/metadata with the URL https://cwe.mitre.org/data/definitions/690.html

Similarly, any time we mention a GCC option, perhaps we could provide a URL to the documentation of that option on the GCC website.  (Sadly, most of our options don't have good URLs)
Comment 8 David Malcolm 2019-10-09 20:24:13 UTC
I'm working on an implementation of this, providing URLs for option documentation for the [-Woption-name] suffixes to diagnostics.
Comment 9 David Malcolm 2019-10-10 16:58:01 UTC
Author: dmalcolm
Date: Thu Oct 10 16:57:30 2019
New Revision: 276841

URL: https://gcc.gnu.org/viewcvs?rev=276841&root=gcc&view=rev
Log:
pretty-print: support URL escape sequences (PR 87488)

https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda
describes an emerging standard for embedding URLs in escape sequences
for marking up text output.  This is supported e.g. by recent releases
of GNOME Terminal.

This patch adds support to our pretty-printing framework for emitting
URLs.

A followup patch uses this to add URLs to the pertinent documentation
for the output of -fdiagnostics-show-option.

gcc/ChangeLog:
	PR 87488
	* common.opt (fdiagnostics-urls=): New option.
	(diagnostic-url.h): Add SourceInclude.
	(diagnostic_url_rule): New enum.
	* diagnostic-color.c: Include "diagnostic-url.h".
	(diagnostic_urls_enabled_p): New function.
	* diagnostic-url.h: New file.
	* diagnostic.c: Include "diagnostic-url.h".
	(diagnostic_urls_init): New function.
	* diagnostic.h (diagnostic_urls_init): New decl.
	* doc/invoke.texi (Diagnostic Message Formatting Options): Add
	-fdiagnostics-urls to the list.
	(-fdiagnostics-urls): New option.
	* gcc.c (driver_handle_option): Handle OPT_fdiagnostics_urls_.
	(driver::global_initializations): Call diagnostic_urls_init.
	* opts-global.c (init_options_once): Likewise.
	* opts.c (common_handle_option): Handle OPT_fdiagnostics_urls_.
	* pretty-print.c (pretty_printer::pretty_printer): Initialize
	show_urls.
	(pp_begin_url): New function.
	(pp_end_url): New function.
	(selftest::test_urls): New selftest.
	(selftest::pretty_print_c_tests): Call it.
	* pretty-print.h (pretty_printer::show_urls): New field.
	(pp_begin_url): New decl.
	(pp_end_url): New decl.

gcc/testsuite/ChangeLog:
	PR 87488
	* lib/prune.exp (TEST_ALWAYS_FLAGS): Add -fdiagnostics-urls=never.


Added:
    trunk/gcc/diagnostic-url.h
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/common.opt
    trunk/gcc/diagnostic-color.c
    trunk/gcc/diagnostic.c
    trunk/gcc/diagnostic.h
    trunk/gcc/doc/invoke.texi
    trunk/gcc/gcc.c
    trunk/gcc/opts-global.c
    trunk/gcc/opts.c
    trunk/gcc/pretty-print.c
    trunk/gcc/pretty-print.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/prune.exp
Comment 10 David Malcolm 2019-10-10 17:04:17 UTC
Author: dmalcolm
Date: Thu Oct 10 17:03:46 2019
New Revision: 276843

URL: https://gcc.gnu.org/viewcvs?rev=276843&root=gcc&view=rev
Log:
Documentation hyperlinks for [-Wname-of-option] (PR 87488)

This patch uses the support for pretty-printing escaped URLs added in
the previous patch (PR 87488) so that in a sufficiently capable terminal
the [-Wname-of-option] emitted at the end of each diagnostic becomes a
hyperlink to the documentation for that option on the GCC website.

I've tested it with Fedora 30's GNOME Terminal (3.32.2 with VTE 0.56.3);
the option text is printed with a dotted underline, hovering shows the
URL, and on right-clicking it offers menu items to visit/copy the URL.

gcc/ChangeLog:
	PR 87488
	* Makefile.in (CFLAGS-opts.o): Pass in DOCUMENTATION_ROOT_URL via
	-D.
	* configure.ac (--with-documentation-root-url): New option.
	* configure: Regenerate.
	* diagnostic-format-json.cc (json_end_diagnostic): If there is an
	option URL, add it as a new string field of the diagnostic option.
	* diagnostic.c (diagnostic_initialize): Initialize get_option_url.
	(print_option_information): If get_option_url is non-NULL, call
	it, and if the result is non-NULL, potentially emit an escape
	sequence to markup the option text with the resulting URL.
	* diagnostic.h (diagnostic_context::get_option_url): New callback.
	* doc/invoke.texi (-fdiagnostics-format=): Add "option_url" to
	example of JSON output.
	* opts-diagnostic.h (get_option_url): New decl.
	* opts.c (get_option_url): New function.
	* toplev.c (general_init): Initialize the get_option_url callback.

gcc/testsuite/ChangeLog:
	PR 87488
	* c-c++-common/diagnostic-format-json-2.c: Expect an "option_url"
	field.
	* c-c++-common/diagnostic-format-json-3.c: Likewise.
	* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
	* gfortran.dg/diagnostic-format-json-3.F90: Likewise.
	* jit.dg/test-error-array-bounds.c (create_code): Ensure that
	error messages don't contain escaped URLs.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/configure
    trunk/gcc/configure.ac
    trunk/gcc/diagnostic-format-json.cc
    trunk/gcc/diagnostic.c
    trunk/gcc/diagnostic.h
    trunk/gcc/doc/invoke.texi
    trunk/gcc/opts-diagnostic.h
    trunk/gcc/opts.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
    trunk/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
    trunk/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
    trunk/gcc/testsuite/gfortran.dg/diagnostic-format-json-2.F90
    trunk/gcc/testsuite/gfortran.dg/diagnostic-format-json-3.F90
    trunk/gcc/testsuite/jit.dg/test-error-array-bounds.c
    trunk/gcc/toplev.c
Comment 11 David Malcolm 2019-10-10 17:12:41 UTC
r276841 implements URL support within pretty_printer.

r276843 uses this to implement hyperlinks to our documentation for the [-Wname-of-option] text in our diagnostics.

I'm still not convinced that we should do this for filenames (based on the objections in comment #3), so I'm going to mark this bug as resolved.
Comment 12 Egmont Koblinger 2019-10-10 21:11:49 UTC
(In reply to Jonathan Wakely from comment #3) 

> [the spec] says there should be a hostname part, so not just
> file:///path but file://hostname/path

Yes. The exact reason for this design is that it can be detected when the filename belongs to a remote host and not mistakenly open its local counterpart.

E.g. gnome-terminal just tells you it doesn't know how to open that file. iTerm2, as far as I know, offers you to fetch it via scp (which only works if the host is accessible under its name in a single hop).

> If I compile on a remote machine I don't want URLs that include the remote
> hostname (which might not even be reachable from my local machine via that
> name) or even necessarily the path relative to the remote machine.

Do you have any particular concern other than "I don't want..."?

You access a remote host from your local one (possibly via multiple hops) and you're free to issue commands like "hostname" remotely. It's most likely printed in your prompt. I don't see how the remote host automatically telling it when you run gcc would be any sort of data leakage or privacy concern -- if that's what you are worried of.

> I might
> want to be able to click on a filename in the output of the remote
> compilation but have it open the corresponding file locally.

Fair expectation from the user's point of view. iTerm2's scp feature does this in simple setups.

> So it might be
> useful to have one or more -fdiagnostic-xxx options that allow me to set the
> hostname explicitly,

Or gcc could simply prefer $HOSTNAME if defined.

> and/or to map from one path to another. So when I click
> on myfile.c on the remote host I can get a URL to a local copy of that file
> (which I would edit and push to the remote machine before recompiling).

It's unclear to me how that local copy would be created -- or do you only consider setups where the entire source tree has a local copy available? And what about files that are outside of your project, e.g. system headers? If gcc wants to point to /usr/include/stdio.h of the remote host, how would it map to a local one?

That being said, I'm not against this approach at all. Maybe you could define how your remote files map to local ones, and ones that don't have a 1:1 local counterpart (with guaranteed same content) would remain unlinkified by gcc. Or maybe you could rewrite all filenames so that they use sshfs that lets you access the entire remote filesystem. Seems perhaps a bit cumbersome to implement, and also potentially cumbersome for users to set up, and your use case might IMHO be a quite special one, so I'm not sure if it's all worth it, but it might be.

I assume however that the majority of developers compile locally. (Do you gcc devs have any stats on this?) For them linkifying the files would just work, and might be a great convenience feature. IMO a convenience feature not working for all of gcc's users, only for the majority of them, should not be a reason per se for rejecting it.

And let me emphasize again: Those who compile remotely wouldn't get a sneaky faulty behavior where a different local counterpart is opened and they waste precious time pondering why the hell that's relevant at all (assuming that the remote hostname differs from the local one, a pretty reasonable assumption I hope). The worst they would get is a hyperlink that fails to open. GCC could even employ other simple low hanging fruit tricks, e.g. to enable/disable linkifying filenames by default based on the absence/presence of some $SSH_* env var or so, or the build system could do such magic.


(In reply to David Malcolm from comment #4)

> Is there a way to encode the line number in the URL?

Unfortunately no, not yet. See the comments under the gist (especially the ones from Aug 2019) for the reasons and where we are stuck at the moment.
Comment 13 Tobias Burnus 2019-12-05 17:52:07 UTC
While this works with xterm (ignored) and gnome-terminal (shows hyperlink),  but using KDE Konsole, the warnings now have:

   warning: control reaches end of non-void function [\-Wreturn-type\]

That is: There are spurious \ in the output. Looking at the output of Coreutil's 'ls --hyperlink', they use  \a  instead of  \033\

I don't know whether there is a downside, but it seems to work fine.

The quoted reference (second link below or comment above pp_begin_url) states:
   >  OSC 8 ; params ; URI ST
at the beginning and
   > OSC 8 ; ; ST
but not stating what ST exactly is.

Cross references:

* Coreutils: https://bugzilla.gnome.org/show_bug.cgi?id=779734 – they use '\a' from the beginning w/o discussion
* https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda - this web page suggests \033\
* KDE Konsole - no intention to implement it, cf.
  https://bugs.kde.org/show_bug.cgi?id=379294


Assuming there is no (known) downside, I suggest to apply the patch:

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index c57a3dbd887..032baad0146 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -2052,7 +2052,7 @@ void
 pp_begin_url (pretty_printer *pp, const char *url)
 {
   if (pp->show_urls)
-    pp_printf (pp, "\33]8;;%s\33\\", url);
+    pp_printf (pp, "\33]8;;%s\a", url);
 }
 
 /* If URL-printing is enabled, write a "close URL" escape sequence to PP.  */
@@ -2061,7 +2061,7 @@ void
 pp_end_url (pretty_printer *pp)
 {
   if (pp->show_urls)
-    pp_string (pp, "\33]8;;\33\\");
+    pp_string (pp, "\33]8;;\a");
 }
 
 #if CHECKING_P
@@ -2369,7 +2369,7 @@ test_urls ()
     pp_begin_url (&pp, "http://example.com");
     pp_string (&pp, "This is a link");
     pp_end_url (&pp);
-    ASSERT_STREQ ("\33]8;;http://example.com\33\\This is a link\33]8;;\33\\",
+    ASSERT_STREQ ("\33]8;;http://example.com\aThis is a link\33]8;;\a",
 		  pp_formatted_text (&pp));
   }
 }
Comment 14 David Malcolm 2019-12-05 18:59:09 UTC
Thanks.

systemd also uses '\a':
  https://github.com/systemd/systemd/blob/master/src/shared/pretty-print.c#L53
    n = strjoin("\x1B]8;;", url, "\a", text, "\x1B]8;;\a");

I didn't see your update to this BZ after our discussion on IRC, and independently came up with the same patch as your one in comment #13, so I think that's the right approach.

Shall I go ahead and bootstrap/test/commit the patch, or do you want to?
Comment 15 David Malcolm 2019-12-05 19:00:48 UTC
I guess the other thing to test it on is on older gnome terminals that predate the support - but I don't have one handy.
Comment 16 Egmont Koblinger 2019-12-05 19:05:42 UTC
(In reply to Tobias Burnus from comment #13)

> but using KDE Konsole, the warnings now have:
> 
>    warning: control reaches end of non-void function [\-Wreturn-type\]
> 
> That is: There are spurious \ in the output.

This has recently been fixed and is going to be released in konsole 19.12. See https://bugs.kde.org/show_bug.cgi?id=231405 for details.

I don't know when gcc 10 planned to be released, but if later than that then very few users will hit this bug (and as far as I understand, they can still disable this formatting in gcc, correct?).

> The quoted reference [...] but not stating what ST exactly is.

Please read that reference again carefully, it _does_ state what ST is. ECMA-48 also states what it is, and states it as the only terminator for an OSC (section 8.3.89).

Contrary, BEL is a nonstandard extension.

That's why the OSC 8 spec also uses ST.

> Assuming there is no (known) downside

There is no practical downside I'm aware of, except that you would switch from the official standard solution to an unofficial nonstandard one.

Should you do it in order to workaround a terminal emulator which failed to fix this bug for 9.5 years after it was reported? Or should you comply with the standards and blame them for this mistake? Your call.
Comment 17 Jakub Jelinek 2019-12-07 08:24:58 UTC
Author: jakub
Date: Sat Dec  7 08:24:14 2019
New Revision: 279073

URL: https://gcc.gnu.org/viewcvs?rev=279073&root=gcc&view=rev
Log:
	PR c/87488
	* pretty-print.c (pp_begin_url, pp_end_url, test_urls): Use BEL
	instead of ST sequence to terminate OSC 8 strings.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/pretty-print.c
Comment 18 Jakub Jelinek 2019-12-07 08:57:16 UTC
I came up independently with the same patch (sorry, wasn't aware a patch is discussed in this PR, I've searched just the mailing list and nothing has been posted), thus I've added you to attribution.

In https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00473.html
I'm mentioning the behavior of various terminals, ~ last year and ~ 4 years ago.
What still worries me is the 4 years old gnome-terminal that emits garbage to the screen for both sequences.
And, given that new GCC is quite often used with older setups, whether running on some remote host over ssh etc., especially if this is enabled by default, it is better to use something that works with more terminals.  Perhaps in a few years we can change it back to ST from BEL.
Comment 19 David Malcolm 2019-12-07 16:23:13 UTC
(In reply to Jakub Jelinek from comment #18)
> I came up independently with the same patch (sorry, wasn't aware a patch is
> discussed in this PR, I've searched just the mailing list and nothing has
> been posted), thus I've added you to attribution.

Yes, it looks like the three of us each managed to write the same patch.  Bother.  Thanks for testing it and committing it.


> In https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00473.html
> I'm mentioning the behavior of various terminals, ~ last year and ~ 4 years
> ago.
> What still worries me is the 4 years old gnome-terminal that emits garbage
> to the screen for both sequences.
> And, given that new GCC is quite often used with older setups, whether
> running on some remote host over ssh etc., especially if this is enabled by
> default, it is better to use something that works with more terminals. 
> Perhaps in a few years we can change it back to ST from BEL.

I'm concerned about the behavior of this with the terminals shipped in existing distributions.  Do we need a configure-time flag to allow distributions to configure the default for URLification?  (e.g. perhaps turn it off by default for gcc 10 on older RHELs?  I don't have a RHEL box handy here)
Comment 20 Egmont Koblinger 2019-12-07 18:43:36 UTC
GNOME Terminal's behavior mainly depends on VTE. Hyperlinks are supported since VTE 0.50 and a corresponding GNOME Terminal 3.26.

VTE 0.48.2 and 0.46.3, and newer versions within these stable series silently ignore OSC 8.

VTE 0.48.[01], 0.46.[012] and older stable series emit garbage.

As far as I know, an up-to-date RHEL7 contains VTE 0.52 (see https://bugzilla.redhat.com/show_bug.cgi?id=1569801). I cannot verify this, nor can comment about older RHELs.

Older distros are also more likely to ship an older Terminator, Xfce4-Terminal, Guake, LXterminal etc. which might use the GTK2-based VTE 0.28 (which also emits garbage), rather than the as-of-then-current GTK3-based VTE 0.4x-0.5x. You'd need to manually check the exact terminal packages and their dependencies to get the whole picture.

I can only guess that distros would probably appreciate such a compile-time kill switch.
Comment 21 Eric Botcazou 2019-12-08 22:08:10 UTC
*** Bug 92858 has been marked as a duplicate of this bug. ***
Comment 22 Jakub Jelinek 2019-12-10 14:14:30 UTC
Indeed, RHEL5 (admittedly in extended lifecycle support only now) and RHEL6 use old VTE though, dunno about the other long term support distros.  Dunno what are the probabilities users will use new GCC on those (but, it is enough if they use such old distros on the desktop and just ssh to some other host running new gcc).
Comment 23 GCC Commits 2020-02-15 07:38:18 UTC
The master branch has been updated by Bernd Edlinger <edlinger@gcc.gnu.org>:

https://gcc.gnu.org/g:458c8d6459c4005fc9886b6e25d168a6535ac415

commit r10-6643-g458c8d6459c4005fc9886b6e25d168a6535ac415
Author: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date:   Wed Jan 29 15:31:10 2020 +0100

    PR 87488: Add --with-diagnostics-urls configuration option
    
    2020-02-15  David Malcolm  <dmalcolm@redhat.com>
    	    Bernd Edlinger  <bernd.edlinger@hotmail.de>
    
    	PR 87488
    	PR other/93168
    	* config.in (DIAGNOSTICS_URLS_DEFAULT): New define.
    	* configure.ac (--with-diagnostics-urls): New configuration
    	option, based on --with-diagnostics-color.
    	(DIAGNOSTICS_URLS_DEFAULT): New define.
    	* config.h: Regenerate.
    	* configure: Regenerate.
    	* diagnostic.c (diagnostic_urls_init): Handle -1 for
    	DIAGNOSTICS_URLS_DEFAULT from configure-time
    	--with-diagnostics-urls=auto-if-env by querying for a GCC_URLS
    	and TERM_URLS environment variable.
    	* diagnostic-url.h (diagnostic_url_format): New enum type.
    	(diagnostic_urls_enabled_p): rename to...
    	(determine_url_format): ... this, and change return type.
    	* diagnostic-color.c (parse_env_vars_for_urls): New helper function.
    	(auto_enable_urls): Disable URLs on xfce4-terminal, gnome-terminal,
    	the linux console, and mingw.
    	(diagnostic_urls_enabled_p): rename to...
    	(determine_url_format): ... this, and adjust.
    	* pretty-print.h (pretty_printer::show_urls): rename to...
    	(pretty_printer::url_format): ... this, and change to enum.
    	* pretty-print.c (pretty_printer::pretty_printer,
    	pp_begin_url, pp_end_url, test_urls): Adjust.
    	* doc/install.texi (--with-diagnostics-urls): Document the new
    	configuration option.
    	(--with-diagnostics-color): Document the existing interaction
    	with GCC_COLORS better.
    	* doc/invoke.texi (-fdiagnostics-urls): Add GCC_URLS and TERM_URLS
    	vindex reference.  Update description of defaults based on the above.
    	(-fdiagnostics-color): Update description of how -fdiagnostics-color
    	interacts with GCC_COLORS.
Comment 24 WHR 2020-02-21 06:31:25 UTC
Looks like commit r10-6643-g458c8d6459c4005fc9886b6e25d168a6535ac415 is a well fix, and my terminal is no longer nosiy running GCC 10.

I have a suggestion, consider check for a few more 'TERM' types that would certainly broken (emitting garbage and/or unwanted bell) with URL escape sequences:
vt100
sun-color
xfce

'TERM=sun-color' is the type of Solaris local console, and it breaks on URL sequences.
Some terminals with 'TERM=vt100' break however I don't have a real VT-100 to test.
'TERM=xfce' would be a very uncommon configuration, but it is supported by ncurses.