Bug 98112 - Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
Summary: Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-03 01:26 UTC by Fangrui Song
Modified: 2021-01-07 01:27 UTC (History)
6 users (show)

See Also:
Host:
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fangrui Song 2020-12-03 01:26:39 UTC
After "x86-64: Optimize access to globals in PIE with copy reloc", GCC x86-64 asks the assembler to produce an R_X86_64_PC32 for an external data access.

* It introduced a configure-time variable HAVE_LD_PIE_COPYRELOC which has a misleading name: PC32 does not necessarily cause a copy relocation.
  If the external data
* It affects users who want to configure GCC not to emit R_X86_64_PC32 for an external data access so that copy relocations can be avoided if the data turns out to be defined in a different shared object/executable
* While it made sense (in turns of performance) before H.J. Lu added GOTPCRELX to x86-64, it hardly matters if any nowadays.
* This optimization can actually benefit non-x86-64. An option is more suitable.

In Clang, the GCC style HAVE_LD_PIE_COPYRELOC is implemented as -mpie-copy-relocations, which has a misleading name.
I agree that this should be implemented as an option, instead of a configure-time variable.

I suggest that we add a new architecture-independent option -f[no-]direct-access-external-data (I am happy to add a similar one in Clang once consensus is made) and delete HAVE_LD_PIE_COPYRELOC. The option means whether a direct access (PC-relative relocation) can be generated for an external data access.
The value can default to true for -fno-pic code (it seems that most architectures behave this way).
For non-x86-64, the value defaults to false for -fpie/-fpic code (I believe most architectures use a GOT).

In the future, for x86-64, please consider defaulting to -fno-direct-access-external-data for -fpie/-fpic so that issues related to STV_PROTECTED data can be properly fixed (see my analysis last year https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html )
Comment 1 H.J. Lu 2020-12-03 12:16:38 UTC
We need comprehensive ABI changes to deal with copy relocation and
STV_PROTECTED, which will impact GCC, glibc and binutils:

https://sourceware.org/bugzilla/show_bug.cgi?id=26815
Comment 2 Fangrui Song 2020-12-03 17:51:41 UTC
Note: -fdirect-access-external-data is architecture-independent. For example, currently Clang on aarch64 can perform the following optimization:

// clang -target aarch64 -fPIE -O3
  adrp    x8, :got:var
  ldr     x8, [x8, :got_lo12:var]
  ldr     w0, [x8]
  ret
// clang -target aarch64 -fPIE -O3 -mpie-copy-relocations
  adrp    x8, var
  ldr     w0, [x8, :lo12:var]
  ret

A better name for -mpie-copy-relocations is -fno-direct-access-external-data:

  1. the option can affect -fno-pic and -fpic
  2. for -no-pie and -pie links, there is not necessary a copy relocation
  (-fpic can use this option as well, but keep in mind that DSOs do not support copy relocations. So if such code is used for -shared links and the data turns out to be undefined, the linker will reject the object file)

---

The second thing about the feature request is that x86-64 should default to -fno-direct-access-external-data for -fpie to address the protected symbol issues.
(-fno-direct-access-external-data for -fpie is the behavior on most architectures.)

  (1): PC32 referencing a protected function is unnecessarily rejected in a -shared link (this also affects aarch64)
  // gcc -fpic -fuse-ld=bfd -shared -fvisibility=protected b.c => relocation R_X86_64_PC32 against protected symbol `f' can not be used when making a shared object
  // aarch64-linux-gnu-gcc -fpic -fuse-ld=bfd -shared -fvisibility=protected b.c => relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `f' which may bind externally can not be used when making a shared object; recompile with -fPIC
  // gold is good

  void f() {}
  void *g() { return &f; }

This can be fixed by making GNU ld more permissive.

  (2) protected data access can use slightly more efficient PC32. Currently it uses the slightly pessimized REX_GOTPCRELX.
  int a __attribute__((visibility("protected")));
  int f() { return a; }
Comment 3 Fangrui Song 2020-12-15 00:25:07 UTC
Are you happy with the option name -f[no-]direct-access-external-data ?
https://reviews.llvm.org/D92633 is what I want to add to Clang.

I want GCC and Clang to use the same option names...
Comment 4 Segher Boessenkool 2020-12-28 07:34:46 UTC
(In reply to Fangrui Song from comment #3)
> Are you happy with the option name -f[no-]direct-access-external-data ?

Not at all, no :-(

The name does not explain its purpose at all, and the whole concept only
makes sense for a fraction of all targets.  A -mcopy-relocs ("generate copy
relocations if that is a good idea"), defined *per target*, would be a lot
better, or a -mpic-use-copy-relocs (since you say it is *not* just for pie),
or something like that.  You want to have this a generic option, while it is
not clear at all what it would mean, what it would *do*, which is especially
important if you want this to be an option used by multiple compilers: if it
is not clear to every user what simple, sensible thing a flag is the knob
for, that flag simply cannot be used at all -- or worse, some users *will*
use it, but then their intentions are not clear to humans, and different
compilers can (and will!) think the user wanted something else!
Comment 5 Fangrui Song 2020-12-28 08:36:14 UTC
(In reply to Segher Boessenkool from comment #4)
> (In reply to Fangrui Song from comment #3)
> > Are you happy with the option name -f[no-]direct-access-external-data ?
> 
> Not at all, no :-(
> 
> The name does not explain its purpose at all, and the whole concept only
> makes sense for a fraction of all targets.

> A -mcopy-relocs ("generate copy
> relocations if that is a good idea"), defined *per target*, would be a lot
> better, or a -mpic-use-copy-relocs (since you say it is *not* just for pie),
> or something like that.

Please read my first comment why copy relocs is a bad name. The compiler behavior is whether the external data symbol is accessed directly/indirectly. Copy relocs is just the inferred ELF linker behavior (in -no-pie/-pie link mode) when the symbol is external. The option name should mention the direct behavior, instead of the inferred behavior at the linking stage.

-fdirect-access-external-data makes sense on other binary formats, though I won't ask GCC to
implement relevant behaviors for other binary formats.

* For example, on COFF, the behavior is like always -fdirect-access-external-data.  __declspec(dllimport) is needed to use indirect access.
* On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic (only available on arm) and the opposite for -fpic.

If you don't want to think of non-ELF, feel free to make the option specific to ELF.
Also feel free to make it specific to -fno-pic/-fpie (disallowed for -fpic).
I have no plan to implement Clang -fdirect-access-external-data for -fpic as well.

> You want to have this a generic option, while it is
> not clear at all what it would mean, what it would *do*, which is especially
> important if you want this to be an option used by multiple compilers: if it
> is not clear to every user what simple, sensible thing a flag is the knob
> for, that flag simply cannot be used at all -- or worse, some users *will*
> use it, but then their intentions are not clear to humans, and different
> compilers can (and will!) think the user wanted something else!

To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC and I made the proposal to (1) let non-x86-64 leverage the missing optimization for -pie (2) eventually fix the x86-64 STV_PROTECTED story.
I have considered all the potential simplification of internal representations for Clang this option will enable.
(llvm/lib/Target/TargetMachine.cpp shouldAssumeDSOLocal can be further simplified with this option)
Comment 6 Segher Boessenkool 2020-12-28 09:54:53 UTC
(In reply to Fangrui Song from comment #5)
> Please read my first comment why copy relocs is a bad name.

Since I reply to some of that (namely, your argument 1)), you could assume I
have read your comment already ;-)

> The compiler
> behavior is whether the external data symbol is accessed
> directly/indirectly.

Not really, no.  It isn't clear at all what "directly" even means!

> Copy relocs is just the inferred ELF linker behavior
> (in -no-pie/-pie link mode) when the symbol is external. The option name
> should mention the direct behavior, instead of the inferred behavior at the
> linking stage.

Yes.  But your proposed solution just makes this worse :-(

> -fdirect-access-external-data makes sense on other binary formats, though I
> won't ask GCC to
> implement relevant behaviors for other binary formats.

But what does that *mean*?  "direct access"?  (And, "external data", for that
matter!  This isn't as obvious as it was thirty years ago.)

> * For example, on COFF, the behavior is like always
> -fdirect-access-external-data.  __declspec(dllimport) is needed to use
> indirect access.

I don't know what "declspec" is.  Something something mswindows?

> * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic
> (only available on arm) and the opposite for -fpic.

So what you want is that object that are globally visible will be implemented
as-is?  For if you do not do whole-program optimisation, for example?  So that
a) those objects will actually *exist*, and b) they will be laid out in the way
the program expects?

> If you don't want to think of non-ELF, feel free to make the option specific
> to ELF.

The problem is not that I don't want to think about it, but that the way it
seems to be defined only applies to ELF (and to some specific (sub-)targets
using ELF, even).

> > You want to have this a generic option, while it is
> > not clear at all what it would mean, what it would *do*, which is especially
> > important if you want this to be an option used by multiple compilers: if it
> > is not clear to every user what simple, sensible thing a flag is the knob
> > for, that flag simply cannot be used at all -- or worse, some users *will*
> > use it, but then their intentions are not clear to humans, and different
> > compilers can (and will!) think the user wanted something else!
> 
> To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC

Huh?  That isn't a user-visible thing at all, it's an implementation detail.
It is a quite straight-forward autoxxxx thing, defined to true if the loader
passes some specific test.

- o - o -

So, what you want is to attach the attribute ((used)) variable attribute to all
data (or at least the data not explicitly made static) automatically?
Comment 7 Fangrui Song 2020-12-28 17:43:19 UTC
(In reply to Segher Boessenkool from comment #6)
> (In reply to Fangrui Song from comment #5)
> > Please read my first comment why copy relocs is a bad name.
> 
> Since I reply to some of that (namely, your argument 1)), you could assume I
> have read your comment already ;-)
> 
> > The compiler
> > behavior is whether the external data symbol is accessed
> > directly/indirectly.
> 
> Not really, no.  It isn't clear at all what "directly" even means!

> > Copy relocs is just the inferred ELF linker behavior
> > (in -no-pie/-pie link mode) when the symbol is external. The option name
> > should mention the direct behavior, instead of the inferred behavior at the
> > linking stage.
> 
> Yes.  But your proposed solution just makes this worse :-(

I try to use one term to describe absolute/PC-relative relocation types (e.g. R_X86_64_64, R_X86_64_PC32)...
"Indirect" means GOT-generating relocation types and (PowerPC64) TOC-generating relocation types.

"direct/indirect" are more descriptive and more accurate than "copy relocs" (which is not the case if the symbol turns out to be defined locally; this term does not apply to other binary formats).

> > -fdirect-access-external-data makes sense on other binary formats, though I
> > won't ask GCC to
> > implement relevant behaviors for other binary formats.
> 
> But what does that *mean*?  "direct access"?  (And, "external data", for that
> matter!  This isn't as obvious as it was thirty years ago.)

In PowerPC64 ELF v2, the term "GOT-indirect addressing" is used,
In x86-64 psABI, there is a section "Indirect Call via the GOT Slot".
Indirect calls/jumps are pretty common - so it is understood that GOT relocation types generally mean "indirect".

"external data" is the best term I find for things like `extern int var;`
It means the data symbol is undefined in the current translation unit but may be defined
in another translation unit or another linked unit.

> > * For example, on COFF, the behavior is like always
> > -fdirect-access-external-data.  __declspec(dllimport) is needed to use
> > indirect access.
> 
> I don't know what "declspec" is.  Something something mswindows?

Yes. `extern int var; int foo() { return var; }` compiles to `movl var(%rip), %eax` (a "direct access" (PC-relative) relocation type).
Its behavior is like always -fdirect-access-external-data. __declspec(dllimport) annotation can override the command line option.

> > * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic
> > (only available on arm) and the opposite for -fpic.
> 
> So what you want is that object that are globally visible will be implemented
> as-is?  For if you do not do whole-program optimisation, for example?  So
> that
> a) those objects will actually *exist*, and b) they will be laid out in the
> way
> the program expects?

Undefined global objects and address-taken functions in the current translation unit are affected.
A function taken address is very like a data symbol:

```
// gcc -fno-pic generates an absolute relocation type. If foo is defined in a DSO,
// it will require a "canonical PLT entry" (st_shndx=0, st_value!=0) - a hack agreed by the linker and ld.so
extern void foo();
void *addr() { return foo; }
```

The default ELF behavior on most architectures is: -fno-pic uses an absolute
relocation type while (non-x86-64) -fpie uses a GOT-generating relocation type
(x86-64) -fpie uses PC-relative.

If -fno-direct-access-external-data is specified, -fno-pic/-fpie will use GOT-generating relocation types
to prevent
* copy relocations if the symbol turns out to be undefined in the module.
* canonical PLT entry for an address-taken function.

The proposed option is local to a translation unit (like most options).
However, if this information is recorded in LTO IR files, the optimizer can
assume the variable can be referenced via a direct relocation type in the
combined IR file.

> > If you don't want to think of non-ELF, feel free to make the option specific
> > to ELF.
> 
> The problem is not that I don't want to think about it, but that the way it
> seems to be defined only applies to ELF (and to some specific (sub-)targets
> using ELF, even).

As I mentioned earlier, this applies to other binary formats.  I'll just show
you evidence by pointing you directly to the code ;-)

In LLVM, generally speaking, a dso_local undefined global object is accessed
directly while a non-dso_local undefined global object is accessed via GOT
indirection.

In Clang, dso_local annotation is added in https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenModule.cpp#L913-L988
(The internal abstraction is currently a bit unfortunate. LLVM IR has another set of rules (many are duplicated) https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/TargetMachine.cpp#L94-L178 I intend to eventually clean up the LLVM IR side rules)
(Attributes generally supersede the proposed command line option.)

The few `return true;` places can be refined to check -f[no-]direct-access-external-data.

Two options are similar to -f[no-]direct-access-external-data.

* -fno-plt: it only applies to external function calls (not taking address)
* -fno-semantic-interposition: it only applies to defined function/variable symbols

I've thought about -f[no-]semantic-interposition-external-data, but I don't find it more suitable than
-f[no-]direct-access-external-data.

> > > You want to have this a generic option, while it is
> > > not clear at all what it would mean, what it would *do*, which is especially
> > > important if you want this to be an option used by multiple compilers: if it
> > > is not clear to every user what simple, sensible thing a flag is the knob
> > > for, that flag simply cannot be used at all -- or worse, some users *will*
> > > use it, but then their intentions are not clear to humans, and different
> > > compilers can (and will!) think the user wanted something else!
> > 
> > To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC
> 
> Huh?  That isn't a user-visible thing at all, it's an implementation detail.
> It is a quite straight-forward autoxxxx thing, defined to true if the loader
> passes some specific test.
> 
> - o - o -
> 
> So, what you want is to attach the attribute ((used)) variable attribute to
> all
> data (or at least the data not explicitly made static) automatically?

No. The option is very different from __attribute__((used)).