Bug 98112

Summary: Add -f[no-]direct-access-external-data & drop HAVE_LD_PIE_COPYRELOC
Product: gcc Reporter: Fangrui Song <i>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: dimhen, fabian, foom, hjl.tools, hp, ndesaulniers, segher, thiago
Priority: P3    
Version: 11.0   
Target Milestone: 12.0   
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=26815
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56527
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37611
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19520
Host: Target: x86_64-*-* i?86-*-*
Build: Known to work:
Known to fail: Last reconfirmed:

Description Fangrui Song 2020-12-03 01:26:39 UTC
After "x86-64: Optimize access to globals in PIE with copy reloc", GCC x86-64 asks the assembler to produce an R_X86_64_PC32 for an external data access.

* It introduced a configure-time variable HAVE_LD_PIE_COPYRELOC which has a misleading name: PC32 does not necessarily cause a copy relocation.
  If the external data
* It affects users who want to configure GCC not to emit R_X86_64_PC32 for an external data access so that copy relocations can be avoided if the data turns out to be defined in a different shared object/executable
* While it made sense (in turns of performance) before H.J. Lu added GOTPCRELX to x86-64, it hardly matters if any nowadays.
* This optimization can actually benefit non-x86-64. An option is more suitable.

In Clang, the GCC style HAVE_LD_PIE_COPYRELOC is implemented as -mpie-copy-relocations, which has a misleading name.
I agree that this should be implemented as an option, instead of a configure-time variable.

I suggest that we add a new architecture-independent option -f[no-]direct-access-external-data (I am happy to add a similar one in Clang once consensus is made) and delete HAVE_LD_PIE_COPYRELOC. The option means whether a direct access (PC-relative relocation) can be generated for an external data access.
The value can default to true for -fno-pic code (it seems that most architectures behave this way).
For non-x86-64, the value defaults to false for -fpie/-fpic code (I believe most architectures use a GOT).

In the future, for x86-64, please consider defaulting to -fno-direct-access-external-data for -fpie/-fpic so that issues related to STV_PROTECTED data can be properly fixed (see my analysis last year https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html )
Comment 1 H.J. Lu 2020-12-03 12:16:38 UTC
We need comprehensive ABI changes to deal with copy relocation and
STV_PROTECTED, which will impact GCC, glibc and binutils:

https://sourceware.org/bugzilla/show_bug.cgi?id=26815
Comment 2 Fangrui Song 2020-12-03 17:51:41 UTC
Note: -fdirect-access-external-data is architecture-independent. For example, currently Clang on aarch64 can perform the following optimization:

// clang -target aarch64 -fPIE -O3
  adrp    x8, :got:var
  ldr     x8, [x8, :got_lo12:var]
  ldr     w0, [x8]
  ret
// clang -target aarch64 -fPIE -O3 -mpie-copy-relocations
  adrp    x8, var
  ldr     w0, [x8, :lo12:var]
  ret

A better name for -mpie-copy-relocations is -fno-direct-access-external-data:

  1. the option can affect -fno-pic and -fpic
  2. for -no-pie and -pie links, there is not necessary a copy relocation
  (-fpic can use this option as well, but keep in mind that DSOs do not support copy relocations. So if such code is used for -shared links and the data turns out to be undefined, the linker will reject the object file)

---

The second thing about the feature request is that x86-64 should default to -fno-direct-access-external-data for -fpie to address the protected symbol issues.
(-fno-direct-access-external-data for -fpie is the behavior on most architectures.)

  (1): PC32 referencing a protected function is unnecessarily rejected in a -shared link (this also affects aarch64)
  // gcc -fpic -fuse-ld=bfd -shared -fvisibility=protected b.c => relocation R_X86_64_PC32 against protected symbol `f' can not be used when making a shared object
  // aarch64-linux-gnu-gcc -fpic -fuse-ld=bfd -shared -fvisibility=protected b.c => relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `f' which may bind externally can not be used when making a shared object; recompile with -fPIC
  // gold is good

  void f() {}
  void *g() { return &f; }

This can be fixed by making GNU ld more permissive.

  (2) protected data access can use slightly more efficient PC32. Currently it uses the slightly pessimized REX_GOTPCRELX.
  int a __attribute__((visibility("protected")));
  int f() { return a; }
Comment 3 Fangrui Song 2020-12-15 00:25:07 UTC
Are you happy with the option name -f[no-]direct-access-external-data ?
https://reviews.llvm.org/D92633 is what I want to add to Clang.

I want GCC and Clang to use the same option names...
Comment 4 Segher Boessenkool 2020-12-28 07:34:46 UTC
(In reply to Fangrui Song from comment #3)
> Are you happy with the option name -f[no-]direct-access-external-data ?

Not at all, no :-(

The name does not explain its purpose at all, and the whole concept only
makes sense for a fraction of all targets.  A -mcopy-relocs ("generate copy
relocations if that is a good idea"), defined *per target*, would be a lot
better, or a -mpic-use-copy-relocs (since you say it is *not* just for pie),
or something like that.  You want to have this a generic option, while it is
not clear at all what it would mean, what it would *do*, which is especially
important if you want this to be an option used by multiple compilers: if it
is not clear to every user what simple, sensible thing a flag is the knob
for, that flag simply cannot be used at all -- or worse, some users *will*
use it, but then their intentions are not clear to humans, and different
compilers can (and will!) think the user wanted something else!
Comment 5 Fangrui Song 2020-12-28 08:36:14 UTC
(In reply to Segher Boessenkool from comment #4)
> (In reply to Fangrui Song from comment #3)
> > Are you happy with the option name -f[no-]direct-access-external-data ?
> 
> Not at all, no :-(
> 
> The name does not explain its purpose at all, and the whole concept only
> makes sense for a fraction of all targets.

> A -mcopy-relocs ("generate copy
> relocations if that is a good idea"), defined *per target*, would be a lot
> better, or a -mpic-use-copy-relocs (since you say it is *not* just for pie),
> or something like that.

Please read my first comment why copy relocs is a bad name. The compiler behavior is whether the external data symbol is accessed directly/indirectly. Copy relocs is just the inferred ELF linker behavior (in -no-pie/-pie link mode) when the symbol is external. The option name should mention the direct behavior, instead of the inferred behavior at the linking stage.

-fdirect-access-external-data makes sense on other binary formats, though I won't ask GCC to
implement relevant behaviors for other binary formats.

* For example, on COFF, the behavior is like always -fdirect-access-external-data.  __declspec(dllimport) is needed to use indirect access.
* On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic (only available on arm) and the opposite for -fpic.

If you don't want to think of non-ELF, feel free to make the option specific to ELF.
Also feel free to make it specific to -fno-pic/-fpie (disallowed for -fpic).
I have no plan to implement Clang -fdirect-access-external-data for -fpic as well.

> You want to have this a generic option, while it is
> not clear at all what it would mean, what it would *do*, which is especially
> important if you want this to be an option used by multiple compilers: if it
> is not clear to every user what simple, sensible thing a flag is the knob
> for, that flag simply cannot be used at all -- or worse, some users *will*
> use it, but then their intentions are not clear to humans, and different
> compilers can (and will!) think the user wanted something else!

To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC and I made the proposal to (1) let non-x86-64 leverage the missing optimization for -pie (2) eventually fix the x86-64 STV_PROTECTED story.
I have considered all the potential simplification of internal representations for Clang this option will enable.
(llvm/lib/Target/TargetMachine.cpp shouldAssumeDSOLocal can be further simplified with this option)
Comment 6 Segher Boessenkool 2020-12-28 09:54:53 UTC
(In reply to Fangrui Song from comment #5)
> Please read my first comment why copy relocs is a bad name.

Since I reply to some of that (namely, your argument 1)), you could assume I
have read your comment already ;-)

> The compiler
> behavior is whether the external data symbol is accessed
> directly/indirectly.

Not really, no.  It isn't clear at all what "directly" even means!

> Copy relocs is just the inferred ELF linker behavior
> (in -no-pie/-pie link mode) when the symbol is external. The option name
> should mention the direct behavior, instead of the inferred behavior at the
> linking stage.

Yes.  But your proposed solution just makes this worse :-(

> -fdirect-access-external-data makes sense on other binary formats, though I
> won't ask GCC to
> implement relevant behaviors for other binary formats.

But what does that *mean*?  "direct access"?  (And, "external data", for that
matter!  This isn't as obvious as it was thirty years ago.)

> * For example, on COFF, the behavior is like always
> -fdirect-access-external-data.  __declspec(dllimport) is needed to use
> indirect access.

I don't know what "declspec" is.  Something something mswindows?

> * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic
> (only available on arm) and the opposite for -fpic.

So what you want is that object that are globally visible will be implemented
as-is?  For if you do not do whole-program optimisation, for example?  So that
a) those objects will actually *exist*, and b) they will be laid out in the way
the program expects?

> If you don't want to think of non-ELF, feel free to make the option specific
> to ELF.

The problem is not that I don't want to think about it, but that the way it
seems to be defined only applies to ELF (and to some specific (sub-)targets
using ELF, even).

> > You want to have this a generic option, while it is
> > not clear at all what it would mean, what it would *do*, which is especially
> > important if you want this to be an option used by multiple compilers: if it
> > is not clear to every user what simple, sensible thing a flag is the knob
> > for, that flag simply cannot be used at all -- or worse, some users *will*
> > use it, but then their intentions are not clear to humans, and different
> > compilers can (and will!) think the user wanted something else!
> 
> To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC

Huh?  That isn't a user-visible thing at all, it's an implementation detail.
It is a quite straight-forward autoxxxx thing, defined to true if the loader
passes some specific test.

- o - o -

So, what you want is to attach the attribute ((used)) variable attribute to all
data (or at least the data not explicitly made static) automatically?
Comment 7 Fangrui Song 2020-12-28 17:43:19 UTC
(In reply to Segher Boessenkool from comment #6)
> (In reply to Fangrui Song from comment #5)
> > Please read my first comment why copy relocs is a bad name.
> 
> Since I reply to some of that (namely, your argument 1)), you could assume I
> have read your comment already ;-)
> 
> > The compiler
> > behavior is whether the external data symbol is accessed
> > directly/indirectly.
> 
> Not really, no.  It isn't clear at all what "directly" even means!

> > Copy relocs is just the inferred ELF linker behavior
> > (in -no-pie/-pie link mode) when the symbol is external. The option name
> > should mention the direct behavior, instead of the inferred behavior at the
> > linking stage.
> 
> Yes.  But your proposed solution just makes this worse :-(

I try to use one term to describe absolute/PC-relative relocation types (e.g. R_X86_64_64, R_X86_64_PC32)...
"Indirect" means GOT-generating relocation types and (PowerPC64) TOC-generating relocation types.

"direct/indirect" are more descriptive and more accurate than "copy relocs" (which is not the case if the symbol turns out to be defined locally; this term does not apply to other binary formats).

> > -fdirect-access-external-data makes sense on other binary formats, though I
> > won't ask GCC to
> > implement relevant behaviors for other binary formats.
> 
> But what does that *mean*?  "direct access"?  (And, "external data", for that
> matter!  This isn't as obvious as it was thirty years ago.)

In PowerPC64 ELF v2, the term "GOT-indirect addressing" is used,
In x86-64 psABI, there is a section "Indirect Call via the GOT Slot".
Indirect calls/jumps are pretty common - so it is understood that GOT relocation types generally mean "indirect".

"external data" is the best term I find for things like `extern int var;`
It means the data symbol is undefined in the current translation unit but may be defined
in another translation unit or another linked unit.

> > * For example, on COFF, the behavior is like always
> > -fdirect-access-external-data.  __declspec(dllimport) is needed to use
> > indirect access.
> 
> I don't know what "declspec" is.  Something something mswindows?

Yes. `extern int var; int foo() { return var; }` compiles to `movl var(%rip), %eax` (a "direct access" (PC-relative) relocation type).
Its behavior is like always -fdirect-access-external-data. __declspec(dllimport) annotation can override the command line option.

> > * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic
> > (only available on arm) and the opposite for -fpic.
> 
> So what you want is that object that are globally visible will be implemented
> as-is?  For if you do not do whole-program optimisation, for example?  So
> that
> a) those objects will actually *exist*, and b) they will be laid out in the
> way
> the program expects?

Undefined global objects and address-taken functions in the current translation unit are affected.
A function taken address is very like a data symbol:

```
// gcc -fno-pic generates an absolute relocation type. If foo is defined in a DSO,
// it will require a "canonical PLT entry" (st_shndx=0, st_value!=0) - a hack agreed by the linker and ld.so
extern void foo();
void *addr() { return foo; }
```

The default ELF behavior on most architectures is: -fno-pic uses an absolute
relocation type while (non-x86-64) -fpie uses a GOT-generating relocation type
(x86-64) -fpie uses PC-relative.

If -fno-direct-access-external-data is specified, -fno-pic/-fpie will use GOT-generating relocation types
to prevent
* copy relocations if the symbol turns out to be undefined in the module.
* canonical PLT entry for an address-taken function.

The proposed option is local to a translation unit (like most options).
However, if this information is recorded in LTO IR files, the optimizer can
assume the variable can be referenced via a direct relocation type in the
combined IR file.

> > If you don't want to think of non-ELF, feel free to make the option specific
> > to ELF.
> 
> The problem is not that I don't want to think about it, but that the way it
> seems to be defined only applies to ELF (and to some specific (sub-)targets
> using ELF, even).

As I mentioned earlier, this applies to other binary formats.  I'll just show
you evidence by pointing you directly to the code ;-)

In LLVM, generally speaking, a dso_local undefined global object is accessed
directly while a non-dso_local undefined global object is accessed via GOT
indirection.

In Clang, dso_local annotation is added in https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenModule.cpp#L913-L988
(The internal abstraction is currently a bit unfortunate. LLVM IR has another set of rules (many are duplicated) https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/TargetMachine.cpp#L94-L178 I intend to eventually clean up the LLVM IR side rules)
(Attributes generally supersede the proposed command line option.)

The few `return true;` places can be refined to check -f[no-]direct-access-external-data.

Two options are similar to -f[no-]direct-access-external-data.

* -fno-plt: it only applies to external function calls (not taking address)
* -fno-semantic-interposition: it only applies to defined function/variable symbols

I've thought about -f[no-]semantic-interposition-external-data, but I don't find it more suitable than
-f[no-]direct-access-external-data.

> > > You want to have this a generic option, while it is
> > > not clear at all what it would mean, what it would *do*, which is especially
> > > important if you want this to be an option used by multiple compilers: if it
> > > is not clear to every user what simple, sensible thing a flag is the knob
> > > for, that flag simply cannot be used at all -- or worse, some users *will*
> > > use it, but then their intentions are not clear to humans, and different
> > > compilers can (and will!) think the user wanted something else!
> > 
> > To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC
> 
> Huh?  That isn't a user-visible thing at all, it's an implementation detail.
> It is a quite straight-forward autoxxxx thing, defined to true if the loader
> passes some specific test.
> 
> - o - o -
> 
> So, what you want is to attach the attribute ((used)) variable attribute to
> all
> data (or at least the data not explicitly made static) automatically?

No. The option is very different from __attribute__((used)).
Comment 8 James Y Knight 2023-01-02 19:54:50 UTC
The requested semantics were subsequently implemented by GCC as `-mno-direct-extern-access` in PR100593, is that right? (Except that it was done only for x86-64, rather than being arch-independent.)

So maybe this PR should be closed?
Comment 9 Thiago Macieira 2023-01-04 18:50:18 UTC
I can't be certain for other architectures' performance, but my feeling is that indeed they would benefit from this. The option that was added as an -m should be an -f (and match Clang's option).

However, maintainers of other architectures need to step up to help this.

Aside from that, yes, this task can be closed as it's implemented.
Comment 10 Andrew Pinski 2023-01-04 18:54:05 UTC
(In reply to Thiago Macieira from comment #9)
> I can't be certain for other architectures' performance, but my feeling is
> that indeed they would benefit from this. The option that was added as an -m
> should be an -f (and match Clang's option).
NO. This is a target specific option changing target specific elf semantics.

Fixed for GCC 12.