Protected function pointer doesn't work right. For pointer to protected function, gcc should treat it as if it is normal.
Created attachment 7985 [details] A testcase With the new linker, I got [hjl@gnu-20 x86_64-3]$ make gcc -fPIC -c -o x.o x.c gcc -shared -o libx.so x.o /usr/local/bin/ld: x.o: relocation R_X86_64_PC32 against `foo' can not be used when making a shared object; recompile with -fPIC /usr/local/bin/ld: final link failed: Bad value collect2: ld returned 1 exit status make: *** [libx.so] Error 1 With the old linker, I got [hjl@gnu-20 x86_64-3]$ make CC="gcc -B/usr/bin/" gcc -B/usr/bin/ -fPIC -c -o x.o x.c gcc -B/usr/bin/ -shared -o libx.so x.o gcc -B/usr/bin/ -o foo m.c libx.so -Wl,-rpath,. ./foo called from main foo_p: 0x400610 called from shared foo: 0x2a9566d8d8 shared foo: 0x2a9566d8d8 shared foo: 0x2a9566d8d8 called from shared foo_p: 0x400610 shared foo: 0x2a9566d8d8 shared foo: 0x2a9566d8d8 called from main foo: 0x400610 got from main foo: 0x2a9566d8d8 Function pointer `foo' are't the same in DSO and main
Isn't this just binutils ld/584? http://sources.redhat.com/bugzilla/show_bug.cgi?id=584 Alan M. claims this is a ld bug rather than a gcc bug.
The same bug also happen on i686-pc-linux-gnu: gcc -fPIC -c -o x.o x.c gcc -shared -o libx.so x.o gcc -o foo m.c libx.so -Wl,-rpath,. ./foo called from main foo_p: 0x80483e4 called from shared foo: 0x111524 shared foo: 0x111524 shared foo: 0x111524 called from shared foo_p: 0x80483e4 shared foo: 0x111524 shared foo: 0x111524 called from main foo: 0x80483e4 got from main foo: 0x111524 Function pointer `foo' are't the same in DSO and main
They aren't the same. It is function pointer vs. function. The other looks like a linker bug.
This is really a dup of bug 10908.
protected always binds local as you cannot override it so the bug is in the linker/asm. *** This bug has been marked as a duplicate of 10908 ***
Please take a closer look at the testcase. It is different from bug 10908. Basically, main executable and DSO see different function pointer values for the SAME function. From the linker /* Will references to this symbol always reference the symbol in this object? STV_PROTECTED is excluded from the visibility test here so that function pointer comparisons work properly. Since function symbols not defined in an app are set to their .plt entry, it's necessary for shared libs to also reference the .plt even though the symbol is really local to the shared lib. */ On many architectures, the function pointer != the address of the function body.
The difference between non protected and protected functions is the following in the asm: movl foo@GOT(%ebx), %eax leal foo@GOTOFF(%ebx), %eax but really add -fPIC to m.c make this work, so again this looks like an ld bug (maybe it is keeping the symbol protected or something). Or gcc is doing: cmpl $foo, -4(%ebp) which is not wrong in the non pic case.
So help out here, which is more correct the GOT or the GOTOFF?(In reply to comment #7) > Please take a closer look at the testcase. It is different from > bug 10908. Basically, main executable and DSO see different > function pointer values for the SAME function. From the linker That comment is only for the PPC bfd so it cannot apply to x86 :).
Well I think there is wrong reloc somewhere or a reloc being resolved wrongly because foo binds locally in x.c otherwise the protect is visibility is really useless otherwise (except maybe to make sure that it does not get overridden).
Depending on the psABI, because of copy relocation on data symbols and function pointer on function symbols, a protected symbol has to be treated very carefully. We have to check 2 things: 1. If the psABI uses copy relocation, protected data symbol is the same as normal symbol. 2. If the psABI doesn't support the "official function address", that is the psABI guarantee there is one and one only function address, only branch to functions can be treated as local.
Ignore the copy relocation. There is not much a compiler can do when the psABI doesn't support protected symbols with copy relocation. See: http://sources.redhat.com/ml/binutils/2003-03/msg00413.html
I think this bug report is reporting an actual bug. At least when using ELF, when the compiler takes the address of a protected function, it has to act as though it is taking the address of an ordinary function, and rely on the dynamic linker to do the right thing. If the compiler takes the address of a protected function without using the PLT, then as HJ says function symbols can not compare equal, even though they should. This is not something the linker can fix up. The dynamic linker, however, when setting up the PLT, should observe that the symbol is protected, and call the local symbol even if the executable overrides it. In other words, we should only treat protected function symbols as special when we call them. Otherwise they should be treated as ordinary symbols. This only applies to ELF. I don't know what should be done for other object file formats, if there are any others which support protected symbols.
A patch is posted at http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01394.html
This is the updated patch: http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01551.html This is the testcase patch: http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01550.html
Confirming that the bug is real. I can't say I like HJ's solution though. It seems to require that ld.so resolve a protected symbol in a shared library to a symbol defined in the main app. That's weird. In other cases you don't want ld.so to do that, for instance when the main app defines a function with the same name as a protected library function. I think it might be difficult for ld.so to choose the right symbol, especially for the general case of multiple levels of shared libraries. Another problem is that making protected functions non-local prevents certain optimizations, for example see alias.c:mark_constant_function.
Please keep in mind that my proposal affects FUNCTION symbols only and my change won't change function CALL, which will still be local. It only changes the function pointer. BTW, I believe ld.so in the current glibc is OK. It is kind of tricky. I think I covered everything for FUNCTION symbols. If you believe ld.so is wrong in some cases, please send me a testcase. I will fix it.
I posted an updated patch http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00196.html I hope it will work better.
FWIW, the reason this leaves a bad taste in my mouth is that I strongly believe symbol visibility should be consistent between ELF platforms. There's at least one ELF platform where resolving a function pointer to a PLT entry is an absolute no-show (MIPS binding stubs).
Each psABI defines how function address works. Not all of psABIs have the same treatment for function address. Function address may mean different things for different psABIs. You can't even compare function address between the x86 psABI and the mips psABI. Where does the consistency come from?
(In reply to comment #18) > I posted an updated patch > > http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00196.html > > I hope it will work better. Sorry to bother but where is the updated patch? That link leads to something else.
Is there any update on this bug? According to http://sourceware.org/ml/binutils/2005-01/msg00401.html, a protected function symbol cannot be used in a R_386_GOTOFF. I don't claim to understand the full implications of the issue, but it seems that the ld decision means gcc must not emit that relocation.
I've changed my opinion on this matter. I think GCC is generating the proper code (most efficient). It's ld that should accept this decision.
*** Bug 51880 has been marked as a duplicate of this bug. ***
LD bug: http://sourceware.org/bugzilla/show_bug.cgi?id=13600 The GCC side is a QOI thing and maybe a conformance thing. ICC generates for __attribute__((visibility("protected"))) void * foo (void) { return (void *)foo; } .protected foo .globl foo foo: ..B1.1: # Preds ..B1.0 ..___tag_value_foo.1: #1.60 movq foo@GOTPCREL(%rip), %rax #1.77 thus does not resolve the function address to the local symbol, which GCC does and which confuses LD (thus the linker bug): .globl foo .protected foo .type foo, @function foo: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 leaq foo(%rip), %rax I think ICC this way avoids the function pointer comparison issues with symbols with protected visibility (can someone double-check? HJs testcase doesn't compile for me).
ld *can* link, it just chooses not to. $ cat > foo.c __attribute__((visibility("protected"))) void * foo (void) { return (void *)foo; } $ gcc -fPIC -shared foo.c /usr/bin/ld: /tmp/cclrufLV.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object /usr/bin/ld: final link failed: Bad value collect2: ld returned 1 exit status $ gcc -Wl,-Bsymbolic-functions -fPIC -shared foo.c && echo success success $ cat > empty.dynlist { "__this_symbol_isnt_present__"; }; $ gcc -Wl,--dynamic-list,empty.dynlist -fPIC -shared foo.c && echo success success I also cannot confirm that icc does anything different: $ icc -fPIC -shared foo.c ld: /tmp/iccf15gTK.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object ld: final link failed: Bad value $ icc -O3 -S -o /dev/stdout -fPIC -shared foo.c | grep -A4 foo: foo: ..B1.1: # Preds ..B1.0 ..___tag_value_foo.1: #2.19 lea foo(%rip), %rax #2.36 ret #2.36 What's more, if you actually do compile the following program into a shared library, it succeeds: $ cat > foo.S .text .globl foo .protected foo .type foo, @function foo: movq foo@GOTPCREL(%rip), %rax ret $ gcc -shared foo.S && echo success success But the resulting shared object has the following (extracted from eu-readelf): Relocation section [ 5] '.rela.dyn' for section [ 0] '' at offset 0x230 contains 1 entry: Offset Type Value Addend Name 0x0000000000200330 X86_64_GLOB_DAT 0x0000000000000248 +0 foo 2: 0000000000000248 0 FUNC GLOBAL PROTECTED 6 foo Now we introduce a third component to this discussion: the dynamic linker. What will it do? This has become a decision, not a bug: what should the compiler do when taking the address of a function when said function is under protected visibility. Both solutions are technically correct and would load the same function address under the correct circumstances. The compiler is also taking on the "protected" visibility to the letter (at least, according to its own definition of so): "protected" Protected visibility is like default visibility except that it indicates that references within the defining module will bind to the definition in that module. That is, the declared entity cannot be overridden by another module. Since the symbol was marked as "protected" in the symbol table, it's expected that the linker and dynamic linker will bind it locally. That being the case, the compiler can optimise for that fact. It can calculate what value would be placed in the GOT entry and load that instead. That's the LEA instruction. The linker, however, mandates that the address to symbol should not be loaded directly, but only through the GOT. This is necessary because the psABI requires that the function address resolve to the PLT entry found in the position-dependent executable. If the executable takes the address of this global (but protected) symbol, it will hardcode the address to its own address space, forcing other ELF modules to follow suit. Finally, what does the dynamic linker do when an "entity (that) cannot be overridden by another module" is overridden by another module? The glibc 2.14 loader will resolve the GOT entry's relocation to the executable's PLT stub, even if the symbol in question has protected visibility. Other loaders might work differently. As it stands, the psABI requires that the address to a protected function be loaded through the GOT, even though the compiler thinks it knows what the address will be. However, I really wish the compiler *not* to change its behaviour for PIC code, but instead change its behaviour for ELF position-dependent executables. I am asking for a change in the psABI and requesting that the loading of function addresses for "default" visibility symbols (not protected!) should be done via the GOT. In other words, I'm asking that we optimise for shared libraries, not for executables. Versions: GCC: 4.6.0 ld: 2.21.51.0.6-6.fc15 20110118 ICC: 12.1.0 20111011
(In reply to comment #26) > The linker, however, mandates that the address to symbol should not be loaded > directly, but only through the GOT. This is necessary because the psABI > requires that the function address resolve to the PLT entry found in the > position-dependent executable. Why on earth does it do that? If we have to go through the GOT it can as well contain the functions address and not that of the PLT entry?
Final conclusion: We need to resolve to the executables PLT consistently, even from inside the shared object where the function binds locally. This is because of references to the function from the executables .rodata section which we can't relocate (and thus have to point to the executables PLT entry). Thus, this is a GCC target bug. __attribute__((visibility("protected"))) void * foo () { return foo; } needs to return the address of foo via a load from the GOT. HJs patch isn't correct as this is really a target ABI choice (another ABI may choose to resolve all references to the functions start address with the cost of having to put the constants into a .rel.rodata section).
(In reply to comment #28) > Final conclusion: We need to resolve to the executables PLT consistently, > even from inside the shared object where the function binds locally. This > is because of references to the function from the executables .rodata section > which we can't relocate (and thus have to point to the executables PLT entry). > > Thus, this is a GCC target bug. > > __attribute__((visibility("protected"))) void * foo () { return foo; } > > needs to return the address of foo via a load from the GOT. HJs patch > isn't correct as this is really a target ABI choice (another ABI may > choose to resolve all references to the functions start address with It only applies when we take an address of a protected function. Branch to a protected function doesn't need to go through PLT.
This does solve the problem. It's just unfortunate that it does so by creating more work for the library even if no executable ever takes the address of this protected function. It would have been preferable to somehow tell the compiler when compiling an executable that this function it's taking the address of is protected elsewhere, so it should use the GOT too.
I think part of the difficulty of this issue is that the behavior of protected is not well-specified. Is it intended to prevent the definition from interposition? Or is it promising the compiler/toolchain that you won't override the definition (and acquiescing that the behavior will be undefined if you break this promise)? If protected's intent is the former, then it's absolutely wrong to resolve the function's address to the main executable's PLT entry for a different function by the same name. To avoid this, the GOT entry for the function in the shared library must point to the PLT entry in the main program if and only if the main program's symbol got resolved to the library's version of the function; otherwise, it must point to the library's version. I don't see an easy way to arrange this without special help from the dynamic linker, and personally, I think it's a slippery slope to try to make promises that are this difficult to keep. As such I'd prefer that protected's behavior be the latter: an optimization hint to the compiler in the form of a promise not to override the definition. In any case, I'm experiencing this bug in the form of not being able to take the address of any external functions when using -fvisibility=protected, and it's making it impossible to use -fvisibility=protected. I get bogus linker errors about not being able to use a protected function for R_386_GOTOFF relocations. So I want to see this solved in one way or another, preferably in the way that results in maximal performance and minimal bloat while ensuring correct behavior as long as the functions are not overridden...
Protected data symbol with copy relocation doesn't work either.
*** Bug 83110 has been marked as a duplicate of this bug. ***