Bug 105625 - Support .llvm_addrsig section
Summary: Support .llvm_addrsig section
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 11.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: missed-optimization
Depends on:
Reported: 2022-05-17 03:02 UTC by Rui Ueyama
Modified: 2022-06-18 15:20 UTC (History)
5 users (show)

See Also:
Known to work:
Known to fail:
Last reconfirmed: 2022-05-17 00:00:00


Note You need to log in before you can comment on or make changes to this bug.
Description Rui Ueyama 2022-05-17 03:02:29 UTC
This is a feature request to implement an LLVM-compatible feature so that linkers can optimize GCC-generated object files as much as they can currently do for LLVM-generated ones.

Disclaimer: I'm the creator of the mold linker (https://github.com/rui314/mold)


GNU gold and LLVM lld have a feature so-called Identical Code Folding (ICF). ICF finds functions that happen to be compiled to the exact same machine code to merge them. This is known as an effective optimization especially for C++ programs, since a function template tend to be compiled to the same machine code for different types. For example, `std::vector<int>` and `std::vector<unsigned>` are likely to be instantiated to the exact same machine code, even though they will get different mangled names. ICF can merge such code.

There's one caveat though. ICF is not a "safe" optimization. In C/C++, two function pointers are equal if and only if they are pointing the same function. For example, if you have two different functions `foo` and `bar`, `foo == bar` will never be true. ICF breaks this assumption if it merges `foo` and `bar`, as after merging, they will be at the same address.

That said, if you know that there's no code that takes a pointer of `foo` or `bar`, it is safe to merge `foo` with `bar`, since it's impossible to compare pointers without taking their addresses. gold and lld implement a "safe" ICF with that observation.

The gold's safe ICF merges only C++ constructors and destructors. Since there's no way to obtain a pointer of a ctors or dtors within the C++ language spec, they are always safe to merge. gold identifies ctors and dtors by reading their mangled names. What gold does is safe but too conservative as it cannot merge other functions.

The lld's safe ICF works with an LLVM feature. Since mid-2018, LLVM emits a `.llvm_addrsig` section to all object files by default. That section contains symbol indices whose addresses are taken. Using this table, lld can merge functions more aggressively than gold can do.

Recently, we implemented an lld-compatible safe ICF to mold. It works great, but it doesn't work with GCC as GCC does not produce `.llvm_addrsig` sections.

Feature request:

Can GCC produce the `.llvm_addrsig` section just like LLVM does? It will make GCC-generated executables on par with LLVM-generated ones with ICF in terms of file size.


Here is an explanation of the `.llvm_addrsig` section: https://llvm.org/docs/Extensions.html#sht-llvm-addrsig-section-address-significance-table

This is a patch to have added the feature to LLVM: https://reviews.llvm.org/D47744

Here is an upstream issue for mold: https://github.com/rui314/mold/issues/484
Comment 1 Andrew Pinski 2022-05-17 03:15:24 UTC
I see most of it was implemented in the assembler. So you might want to report the support for .addrsig directive there too.
Comment 2 Andrew Pinski 2022-05-17 03:17:35 UTC
From reading the specifications of the extension, you do need assembler support first as there is no way for GCC to emit a reference to the index of the symbol table as that is not exposed via normal assembler directives.

Suspended until GNU binutils support is added.
Comment 3 Rui Ueyama 2022-05-17 03:22:08 UTC
I think we can implement the `.addrsig` support to the assembler, but I wonder if GCC will support it once the GNU assembler gains the feature?
Comment 4 Andrew Pinski 2022-05-17 03:25:15 UTC
(In reply to Rui Ueyama from comment #3)
> I think we can implement the `.addrsig` support to the assembler, but I
> wonder if GCC will support it once the GNU assembler gains the feature?

Yes I think it will, shouldn't be too hard to implement because IPA cgraph already has that information and uses it for ICF inside GCC.
Comment 5 Rui Ueyama 2022-05-17 03:30:44 UTC
Cool! We'll add a `.llvm_addrsig` support to binutils and get back to you guys.
Comment 6 Martin Liška 2022-05-17 12:12:55 UTC
Good, I can prepare a GCC patch once the binutils support is there.
Comment 7 Tatsuyuki Ishi 2022-05-25 06:47:41 UTC

I just posted an initial revision of the patchset for gas.