Bug 63731 - Fallback to netgo does not work
Summary: Fallback to netgo does not work
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: go (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: ---
Assignee: Ian Lance Taylor
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-04 08:08 UTC by Yohei Ueda
Modified: 2016-02-10 21:28 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-11-21 00:00:00


Attachments
Fix the DNS lookup problem for statically linked binaries (261 bytes, patch)
2014-11-14 04:15 UTC, Yohei Ueda
Details | Diff
"Fallback" netgo solution for gccgo (2.63 KB, patch)
2015-03-31 18:27 UTC, boger
Details | Diff
libgo/go/go/build/doc.go documentation update (556 bytes, patch)
2015-04-08 14:41 UTC, boger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yohei Ueda 2014-11-04 08:08:44 UTC
When the DNS resolver with CGO fails for statically linked binaries, it should fall back to the pure-go DNS resolver, if I understand correctly.

This fallback mechanism does work at least on Linux for x86_64 and ppc64le.

The variable ok in lookupIP in go/net/lookup_unix.go seems always set to true, so the fallback mechanism would never be called.

Here is an example code to demonstrate the problem.

$ gccgo -v
Using built-in specs.
COLLECT_GCC=gccgo
COLLECT_LTO_WRAPPER=/usr/local/gccgo-216834/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --enable-threads=posix --enable-shared --enable-__cxa_atexit --enable-languages=c,c++,go --enable-secureplt --enable-checking=yes --with-long-double-128 --enable-decimal-float --disable-bootstrap --disable-alsa --disable-multilib --prefix=/usr/local/gccgo-216834
Thread model: posix
gcc version 5.0.0 20141029 (experimental) (GCC) 

$ cat lookup.go 
package main

import (
	"fmt"
	"net"
)

func main() {
	addrs, err := net.LookupHost("gcc.gnu.org")
	if err != nil {
		fmt.Println(err)
	} else {
		for i := 0; i < len(addrs); i++ {
			fmt.Println(addrs[i])
		}
	}
}

$ gccgo lookup.go 
$ ./a.out 
209.132.180.131
$ gccgo -static lookup.go 
/usr/local/gccgo-216834/lib/gcc/x86_64-unknown-linux-gnu/5.0.0/../../../../lib64/libgo.a(net.o): In function `net.cgoLookupPort':
/home/yohei/gccgo.216834/bld/x86_64-unknown-linux-gnu/libgo/../../../src/libgo/go/net/cgo_unix.go:83: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
$ ./a.out 
lookup gcc.gnu.org: Name or service not known
$ LD_LIBRARY_PATH=/lib/x86_64-linux-gnu ./a.out 
209.132.180.131
Comment 1 Yohei Ueda 2014-11-14 04:15:39 UTC
Created attachment 33966 [details]
Fix the DNS lookup problem for statically linked binaries

I created a patch file that fixes this problem.
Comment 2 Yohei Ueda 2014-11-14 04:49:49 UTC
I tested this issue with the latest GC trunk again, and noticed that GC always compiles programs that contains DNS lookups with dynamic linking even if -static is specified.

$ go version
go version devel +ae495517bd72 Fri Nov 14 11:43:01 2014 +1100 linux/amd64
$ go build -ldflags '-extldflags "-static"' lookup.go
$ ./lookup 
209.132.180.131
$ file lookup
lookup: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), not stripped
$ ldd lookup
	linux-vdso.so.1 =>  (0x00007fff81550000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f50b943d000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f50b9078000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f50b9677000)

Any suggestions? Should I report this issue to GC?
Comment 3 Ian Lance Taylor 2014-11-14 15:54:24 UTC
The gc linker does not use the external linker merely because you link against the net package.  You need to also pass -linkmode external in ldflags.
Comment 4 Yohei Ueda 2014-11-17 08:16:28 UTC
Thank you for the correction.

I need to use the following instructions to enable netgo with GC.

$ go version
go version devel +ae495517bd72 Fri Nov 14 11:43:01 2014 +1100 linux/amd64
$ go build -ldflags '-linkmode external -extldflags -static' -a -tags netgo lookup.go 
$ ./lookup 
209.132.180.131

Now, I have a question about how to do the same thing with GCCGO.

I think the -a option for the go command is essential to enable netgo, but the -a option seems not working with the standard GO library of GCCGO.

$ go build -compiler gccgo -gccgoflags '-static' -a -tags netgo lookup.go 
# command-line-arguments
/usr/local/gccgo-216834/lib/gcc/x86_64-unknown-linux-gnu/5.0.0/../../../../lib64/libgo.a(net.o): In function `net.cgoLookupPort':
/home/yohei/gccgo.216834/bld/x86_64-unknown-linux-gnu/libgo/../../../src/libgo/go/net/cgo_unix.go:83: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
$ ./lookup 
lookup gcc.gnu.org: Name or service not known
$ LD_LIBRARY_PATH=/lib/x86_64-linux-gnu ./lookup
209.132.180.131

I confirmed that the source code exists under $GOROOT/src where $GOROOT is the value reported by go env GOROOT. I also put libgo/go in the GCC source code into $(PREFIX)/src/pkg.
Comment 5 Ian Lance Taylor 2014-11-17 15:44:48 UTC
You are correct.  The go command does not rebuild the standard library for gccgo.  There is at present no reasonable way to use the netgo tag with gccgo.
Comment 6 boger 2014-11-17 18:27:20 UTC
I understand why some functions like getaddrinfo don't work with static linking unless the LD_LIBRARY_PATH is set to find libc.so, but then what should happen if it isn't?
Comment 7 Ian Lance Taylor 2014-11-18 00:24:07 UTC
That's really a question for the glibc maintainers.  It's entirely a glibc issue.

My understanding is that it's not actually libc.so that needs to be found.  Glibc implements /etc/nsswitch.conf by loading a shared library for each entry listed there ("db", "files", etc.).  That is a very flexible mechanism that allows nsswitch.conf to be extended on a system-specific basis.  However, it only works if those shared libraries are available.  It is those libraries (and libdl.so) that must be found at runtime.  If the libraries can not be found, /etc/nsswitch.conf can not be implemented, and name lookups of various sorts fail by returning "not found."
Comment 8 Yohei Ueda 2014-11-18 06:15:39 UTC
If we can distinguish between the "not found" errors due to invalid host names and no nss, we can fall back to netgo only when nss is not available.

I noticed that errno is set to 0 for invalid hostnames, and is set to ENOENT when nss is not available. However, this behavior is not documented, so it is probably implementation dependent.

I think a practical solution to this issue is to enable the -a option of the go command for the standard library in GCCGO. I know Lynn is working on the go command for GCCGO.
https://groups.google.com/forum/#!topic/gofrontend-dev/1Gm87ieI47Q

What do you think?
Comment 9 boger 2014-11-18 13:16:27 UTC
My question was:  what is supposed to happen on fallback?  Sounds like some code gets rebuilt and used instead?
Comment 10 Yohei Ueda 2014-11-18 13:52:04 UTC
https://code.google.com/p/go/source/browse/src/net/lookup_unix.go#63

func lookupIP(host string) (addrs []IP, err error) {
        addrs, err, ok := cgoLookupIP(host)
        if !ok {
                addrs, err = goLookupIP(host)
        }
        return
}

This code shows how fallback works. If cgoLookup returns ok == false, then pure-Go goLookupIP is called.

When the netgo tag is set, ok is always false.
https://code.google.com/p/go/source/browse/src/net/cgo_stub.go#5

When the netgo tag is not set, ok is always true.
https://code.google.com/p/go/source/browse/src/net/cgo_unix.go#147
https://code.google.com/p/go/source/browse/src/net/cgo_unix.go#84

If cgoLookupIP can also return ok == false when no nss is available, goLookupIP will be called. This is my first idea.
https://code.google.com/p/go/source/browse/src/net/cgo_unix.go#116

However, I realized that we cannot easily distinguish the "not found" errors. getaddrinfo returns "not found" in both cases of invalid hostname lookups and static binaries.

If cgoLookup returns ok == false even when an invalid host name is looked up with nss available, goLookupIP also looks it up again, so IP lookup occurs twice, and both fails with "not found" error. I don't think this is desired behavior.
Comment 11 boger 2014-11-19 20:32:50 UTC
What I was asking is:  what does it mean to call the pure-Go goLookupIP?
Comment 12 Yohei Ueda 2014-11-20 07:54:23 UTC
"pure-Go goLookupIP" means that goLookupIP is written in Go as usual.
https://code.google.com/p/go/source/browse/src/net/dnsclient_unix.go#364

cgoLookupIP uses getaddrinfo via CGO unless the netgo tag is set.
https://code.google.com/p/go/source/browse/src/net/cgo_unix.go#147

Fallback means that goLookupIP is called when cgoLookup fails with ok == false in this code.
https://code.google.com/p/go/source/browse/src/net/lookup_unix.go#63
func lookupIP(host string) (addrs []IP, err error) {
        addrs, err, ok := cgoLookupIP(host)
        if !ok {
                addrs, err = goLookupIP(host)
        }
        return
}

When code that uses LookupHost is compiled with "go build -ldflags '-linkmode external -extldflags -static' -a -tags netgo", the Go standard library including cgoLookupIP is rebuilt. In this case, cgoLookupIP always returns ok == false as defined here.
https://code.google.com/p/go/source/browse/src/net/cgo_stub.go#19
func cgoLookupIP(name string) (addrs []IP, err error, completed bool) {
        return nil, nil, false
}

This leads to calling goLookupIP from lookupIP. I called this mechanism "fallback".
Comment 13 boger 2014-11-20 13:25:48 UTC
Then my question is: why isn't this "fallback" code always built for GO and available to run instead of waiting until it hits this situation and then built and run?  In the situation you are describing it sounds like it is related to whether or not there is a static libnss available which could be determined at GO build time.
Comment 14 Yohei Ueda 2014-11-20 14:49:11 UTC
I am not the original author, so my description may be inaccurate.

> why isn't this "fallback" code always built for GO and available
> to run instead of waiting until it hits this situation and then 
> built and run?

The fallback code (goLookupIP) is always built, but is called at run time only when the code is built with the netgo tag.

When the netgo tag is not set at build time, both cgoLookupIP and goLookupIP are always built. However, only cgoLookupIP is called, and goLookupIP is never called at run time. So, lookup fails when it is statically linked.

When the netgo tag is set at build time, cgoLookupIP is empty and goLookupIP is built. So only goLookupIP is called at run time. 

lookupIP looks like it has a run time fallback mechanism, but it is not true. The current code only selects cgoLookupIP or goLookIP at build time depending on the netgo tag setting.

If we could distinguish "not found" errors of getaddrinfo as described earlier, lookupIP could have a run time fallback mechanism from cgoLookIP to goLookupIP.

However, it is impossible to distinguish "not found" errors as far as I know, so I guess the current code depends on the netgo tag to select cgoLookupIP or goLookupIP at build time.

> In the situation you are describing it sounds like it is related to 
> whether or not there is a static libnss available which could be determined 
> at GO build time.

The existence of libnss at build time does not affect the behavior at run time. The netgo tag affects which lookup function is called at runtime.
Comment 15 boger 2014-11-20 20:02:15 UTC
I think what Ian is saying is that mechanism to rebuild packages in this way doesn't work with gccgo (and probably never should?)

Now I'm finally understanding this.  Originally with gc the net package is built with netgo off, but the netgo tag says to rebuild the GO standard library with the netgo tag set on and then build the program.

Yohei's original fix used the code that was built with netgo off but allowed the go resolver to be called if the call to getaddrinfo failed.  There are probably a small set of errnos that could be checked to determine if the go resolver should be called after getaddrinfo failed, but is that important?  I would expect the errnos are consistent across platforms but don't know for sure.

To me it seems like Yohei's fix is probably OK for gccgo with some additional checks for errno if needed, but would change existing behavior for gc and because of that should not change there?
Comment 16 Ian Lance Taylor 2014-11-21 04:26:26 UTC
Gccgo and GCC act the same on glibc systems: if you choose static linking, DNS lookups only work if the dynamic libraries are available.

The only difference between Go and C here is that the Go library happens to have code that will work for some people some of the time.  I don't want the library to automatically fall back to the Go code in all cases, because in some cases it will turn a possibly-explicable failure into a completely inexplicable failure, and in a few cases it will turn a correct failure into an incorrect success.

It's OK with me to make the go tool's -a option work with gccgo.  It will only work if people have the library sources available.  If we do that, this issue will be fixed.  I don't know how hard that would be--it might be easy.

I don't know of another good way to fix this.
Comment 17 boger 2014-11-21 15:12:35 UTC
Can you clarify how using -a -tags netgo actually works.  I know it requires that the source be available, but it must mean that it rebuilds the package for the current link only, throws it away after using it and next time it has to rebuild it?

Couldn't some of your concerns about unexpected failures be resolved by providing better error information when there were failures, or provide other ways to indicate that the go resolver shouldn't be called (like an environment variable)?  It just seems that rebuilding the package every time is a heavy hammer to resolve this problem.

I think if someone wants to have their GO program use libnss if it is present and then the go resolver if not, that should be an option and currently it is not.
Comment 18 Ian Lance Taylor 2014-11-21 17:10:30 UTC
The -a option to "go build" means to rebuild all packages rather than using the installed versions (see http://golang.org/cmd/go for documentation).  The "-tags netgo" option means to build with the build tag netgo (see the build constraints section in http://golang.org/pkg/go/build/).  So, yes, it rebuilds the packages for the current link only.  This is more reasonable with the gc compiler than with gccgo, since the gc compiler is so fast.

I'm OK in principle with coming up with some other approach to direct the Go library to use the Go DNS lookup rather than calling getaddrinfo.  I don't think that can be the default.  I don't think we want a program to unpredictably sometimes use one and sometimes use the other.  I don't think an environment variable would work well, since Yohei presumably wants the statically linked binary to work this way by default.  Unfortunately all I can think of would be adding another function to the net package directing lookups to use pure DNS; this is unfortunate because the net package already has the most complex and confusing API of all the standard Go packages.
Comment 19 boger 2014-11-25 15:35:29 UTC
(In reply to Ian Lance Taylor from comment #18)
> The -a option to "go build" means to rebuild all packages rather than using
> the installed versions (see http://golang.org/cmd/go for documentation). 
> The "-tags netgo" option means to build with the build tag netgo (see the
> build constraints section in http://golang.org/pkg/go/build/).  So, yes, it
> rebuilds the packages for the current link only.  This is more reasonable
> with the gc compiler than with gccgo, since the gc compiler is so fast.
> 

Most of the examples in the documentation show that the built packages are put into the same directories as the source.  I assume that for an official release with a binary distribution, that is not the way it works.  That's how it would have to work with gccgo.  In that case everyone must share the same copy of the source but then if build options are used that would cause packages to be rebuilt, they must go somewhere that is only used for the curent build.  And I don't understand what 'go install' would mean in that case.  The 'go install' command documentation has very little information on where built packages are stored or if there are cases when 'go install' can't be used.

> I'm OK in principle with coming up with some other approach to direct the Go
> library to use the Go DNS lookup rather than calling getaddrinfo.  I don't
> think that can be the default.  I don't think we want a program to
> unpredictably sometimes use one and sometimes use the other.  I don't think
> an environment variable would work well, since Yohei presumably wants the
> statically linked binary to work this way by default.  Unfortunately all I
> can think of would be adding another function to the net package directing
> lookups to use pure DNS; this is unfortunate because the net package already
> has the most complex and confusing API of all the standard Go packages.

I think providing another function that called the pure GO resolver would be best.  Then the GO programmer can decide how to handle it if the first call failed.
Comment 20 Yohei Ueda 2014-12-03 13:03:43 UTC
I noticed a Docker issue saying GC 1.4 does not rebuild the standard library with -a.

https://github.com/docker/docker/issues/9449 

I think the problem is now not limited to GCCGO.
Comment 21 boger 2014-12-03 16:10:26 UTC
I'm confused by the description of -a in the go1.4 documentation.

I asked about this before and the answer was that each invocation of 'go build' would create a copy of the built package which was then used for the current  build but then thrown away.  But that must not be the way it works?
Comment 22 Ian Lance Taylor 2014-12-03 18:14:53 UTC
I'm not sure why you say that it must not be the way it works.  It is the way it works.

The recent change to Go 1.4 is that the -a option does not apply to the standard library.  I don't know whether that is a good idea or not.
Comment 23 boger 2014-12-03 18:38:57 UTC
If I look at this documentation:  http://tip.golang.org/doc/go1.4#gocmd

It says this:

The behavior of the go build subcommand's -a flag has been changed for non-development installations. For installations running a released distribution, the -a flag will no longer rebuild the standard library and commands, to avoid overwriting the installation's files. 

When I read this it sounds like the previous behavior with the -a option was to rebuild the packages and put the newly built packages into the installed directories, including the standard library.  If everyone who used 'go build' with -a generated their own copy of the built packages and then threw them away, how would the installation's files ever get overwritten?
Comment 24 Ian Lance Taylor 2014-12-03 19:17:41 UTC
They would not have been overwritten, unless you used "go install -a".  That line in the doc may be misleading.
Comment 25 Tatsushi Inagaki 2015-02-28 05:08:52 UTC
What is the most recommended way when we want to use the net package in a statically linked binary? My impression is that a statically linked binary also should call dlopen() (and thus we should export LD_LIBRARY_PATH), if the corresponding dynamically linked binary do so to resolve DNS.
https://sourceware.org/glibc/wiki/FAQ#Even_statically_linked_programs_need_some_shared_libraries_which_is_not_acceptable_for_me.__What_can_I_do.3F

Or, can we expect that netgo can be enabled with 'go build -a' again in Go 1.5?
https://github.com/golang/go/issues/9369
Comment 26 Ian Lance Taylor 2015-02-28 17:59:41 UTC
Tatsushi: are you asking about gccgo, or about gc?
Comment 27 Tatsushi Inagaki 2015-03-01 16:20:15 UTC
(In reply to Ian Lance Taylor from comment #26)
> Tatsushi: are you asking about gccgo, or about gc?

I'm asking about gccgo.
Comment 28 Ian Lance Taylor 2015-03-01 23:39:56 UTC
Currently there is no reasonable way to use the Go DNS resolver when using gccgo.  Any program that uses the net package will call glibc for DNS resolution, meaning that you are limited to what glibc will do, which, as you say, means calling dlopen.

go build -a does not work with gccgo.  The problem is that gccgo uses its own copy of the Go library sources and they can not be built with go build -a.  It would be nice to fix this but it is not at all a priority, since most people will use the installed libgo.
Comment 29 boger 2015-03-02 21:58:22 UTC
Yohei noted in comment 20 that this is also broken with gc in 1.4 when using static linking.  That was a while ago -- is that no longer a problem?
Comment 30 Ian Lance Taylor 2015-03-02 23:34:46 UTC
The problem mentioned in comment #20 has nothing to do with gccgo.  To get around that problem, use the -installsuffix option.  See http://golang.org/issue/9344 .  Note that the docker issue mentioned in comment #20 has been closed.
Comment 31 boger 2015-03-18 19:57:12 UTC
Here are two suggestions to solve this issue without having to use the -a and -tags netgo options to rebuild packages at build time.  Since this is a common problem, it seems best to provide a way to have the packages built and provided with the gccgo build instead of requiring users to know how to rebuild the net package with netgo.

1) Since libgo is provided for both static (libgo.a) and dynamic (libgo.so.N) linking, and the problem only happens with static linking, we could build libgo.a with netgo enabled, so that would be the expected/default behavior with static linking with gccgo.

2) As part of the gccgo and libgo build, build just the net package with netgo enabled and install it somewhere that is easily found, using the normal directory conventions for GO.  Then if someone builds a program for static linking and want the GO DNS resolver, they could link in that package before they link in libgo.a.
Comment 32 boger 2015-03-31 15:29:40 UTC
I have a prototype working for #2.  I am assuming #1 would not be accepted.

This fix consists of building a library called libnetgo.a which consists of the net files that would be built if the netgo tag was used.  This new library was installed into the same directory as libgo.a.

Once this library has been built and installed in the correct location, I was able to get this to work by explicitly linking in this lib:

go build -gccgoflags '-static -lnetgo' lookup.go

I will attach a patch after some more testing.
Comment 33 boger 2015-03-31 18:27:54 UTC
Created attachment 35195 [details]
"Fallback" netgo solution for gccgo

This patch updates the libgo Makefile to build and install the library libnetgo.a into the same install directory as libgo.a.  When libnetgo.a is available then the user can link it into their statically linked program and get the same result as when using the netgo fallback mechanism that is available with golang as mentioned in this bugzilla.

Once libnetgo.a is built and installed then a statically linked program can link this into their program as follows:

go build -gccgoflags '-static -lnetgo' lookup.go
Comment 34 boger 2015-04-01 14:10:54 UTC
Created a codereview:  https://codereview.appspot.com/217620043
Comment 35 ian@gcc.gnu.org 2015-04-07 18:09:59 UTC
Author: ian
Date: Tue Apr  7 18:09:28 2015
New Revision: 221906

URL: https://gcc.gnu.org/viewcvs?rev=221906&root=gcc&view=rev
Log:
	PR go/63731
libgo: Build and install libnetgo.a

libnetgo.a provides the net
package built with the netgo
tag enabled.  This provides the
netgo fallback solution for gccgo.
This lib must be explicitly linked
in using the -gccgoflags, so is
not included by default.

Modified:
    trunk/libgo/Makefile.am
    trunk/libgo/Makefile.in
Comment 36 Ian Lance Taylor 2015-04-07 18:10:48 UTC
Lynn added a new facility.  Some notes on docs:

As far as documentation, I tried to find some documentation on build tags in general and netgo specifically because it seems like this should be documented there, but did not find much.  Here are some ideas:
- add something to the 'go help build' output about the use of the netgo tag in general and how it would be used to work around the static linking  warning/problem against libnss and the workarounds for gc and gccgo
- if there is something somewhere else that describes the netgo tag and when to use it, then add something about how to achieve the same effect in gccgo
- seems like it would be good to have documentation describing the differences in using gccgo vs. gc and this would be included in that, however that is a bigger work item and would require more thought on what to include in such a document.
Comment 37 boger 2015-04-08 14:41:11 UTC
Created attachment 35260 [details]
libgo/go/go/build/doc.go documentation update

Adding comments about the use of the netgo tag and the equivalent method for use with gccgo.
Comment 38 Ian Lance Taylor 2016-02-10 21:28:56 UTC
This seems to be fixed, and the core problem has become less important now that the net package prefers to use the Go DNS library when possible.