Bug 82303 - Better PIE/PIC code generation for kernel code (x86_64 & arm64)
Summary: Better PIE/PIC code generation for kernel code (x86_64 & arm64)
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2017-09-22 23:52 UTC by Thomas Garnier
Modified: 2018-01-23 23:08 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64, arm64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-09-23 00:00:00


Attachments
A patch to add -fstatic-PIE/-fstatic-pie (1.00 KB, patch)
2017-09-23 22:09 UTC, H.J. Lu
Details | Diff
testcase for mcmodel=large (188 bytes, text/x-csrc)
2018-01-19 18:52 UTC, Thomas Garnier
Details
testcase for mcmodel=large (188 bytes, text/x-csrc)
2018-01-19 18:56 UTC, Thomas Garnier
Details
testcase for switch folding (500 bytes, text/x-csrc)
2018-01-23 19:12 UTC, Thomas Garnier
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Garnier 2017-09-22 23:52:14 UTC
The current PIE/PIC code generation is not optimal for kernel code.

It makes inferences about the execution environment which do not hold for freestanding executables such as the Linux kernel, regarding the need to avoid text relocations, to minimize the footprint of CoWed pages, and to always refer to exported symbols via the GOT so they can be preempted. None of these concerns apply to freestanding binaries.

Having a separate flag (like mcmodel=kernel-pie or -fkernel-pie) would allow better code optimization for PIE/PIC kernel code, notably:

- Select the right segment register for TLS on kernel code (For example x86_64 use gs instead of fs [1]).
- No need for GOT or PLT.
- Re-enable code optimizations disabled for COW pages support, trying to reduce relocations to code sections. For example, switch are not folded for PIE/PIC code to avoid relocations [2].

Note that arm64 PIE uses the small or tiny mcmodel based on UEFI so it should be taken in considerations for this architecture.

For reference the discussion on Linux kernel x86_64 PIE RFC: http://www.openwall.com/lists/kernel-hardening/2017/09/21/16

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
[2] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828
Comment 1 H.J. Lu 2017-09-23 01:05:37 UTC
A static PIE option can be used for both kernel
as well as user space.
Comment 2 H.J. Lu 2017-09-23 22:09:35 UTC
Created attachment 42232 [details]
A patch to add -fstatic-PIE/-fstatic-pie
Comment 3 Kees Cook 2018-01-17 18:53:46 UTC
Any progress on getting this into a GCC release?
Comment 4 H.J. Lu 2018-01-17 18:55:22 UTC
(In reply to Kees Cook from comment #3)
> Any progress on getting this into a GCC release?

Has anyone tried my patch at all? Does it work?
Comment 5 Thomas Garnier 2018-01-17 19:03:32 UTC
I didn't try the patch yet, that could be a good starting point (still need change in switch optimization and segment registers). What is the consequence of the change in default_binds_local_p_3? Is it supposed to remove the need for a GOT / PLT?
Comment 6 H.J. Lu 2018-01-17 19:10:40 UTC
(In reply to Thomas Garnier from comment #5)
> I didn't try the patch yet, that could be a good starting point (still need
> change in switch optimization and segment registers). What is the
> consequence of the change in default_binds_local_p_3? Is it supposed to
> remove the need for a GOT / PLT?

It should.  If not, please provide a testcase.
Comment 7 Thomas Garnier 2018-01-19 18:52:55 UTC
Created attachment 43189 [details]
testcase for mcmodel=large

Build with: gcc -mcmodel=large -c -fstatic-pie ./test.c -o test
Dump relocations on the object file: objdump -dr ./test
Comment 8 Thomas Garnier 2018-01-19 18:56:11 UTC
Created attachment 43190 [details]
testcase for mcmodel=large
Comment 9 Thomas Garnier 2018-01-19 18:56:58 UTC
I tested the change against a modified version of the proposed Linux x86_64 PIE support. The changes removes all the PLT32 and GOT64 entry but I still get R_X86_64_GOTPC64 & R_X86_64_GOTOFF64 relocations on the head64.c file that is built with -mcmodel=large (to prevent odd logic on early boot with different VA).

Do you think the suggested patch can be changed to remove these?

To repro, build the object file with: gcc -mcmodel=large -c -fstatic-pie ./test.c -o test

The objdump -dr output of the testcase:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 20             sub    $0x20,%rsp
   8:   48 8d 05 f9 ff ff ff    lea    -0x7(%rip),%rax        # 8 <main+0x8>
   f:   49 bb 00 00 00 00 00    movabs $0x0,%r11
  16:   00 00 00
                        11: R_X86_64_GOTPC64    _GLOBAL_OFFSET_TABLE_+0x9
  19:   4c 01 d8                add    %r11,%rax
  1c:   89 7d ec                mov    %edi,-0x14(%rbp)
  1f:   48 89 75 e0             mov    %rsi,-0x20(%rbp)
  23:   48 ba 00 00 00 00 00    movabs $0x0,%rdx
  2a:   00 00 00
                        25: R_X86_64_GOTOFF64   _text-0x1023
  2d:   48 8d 14 10             lea    (%rax,%rdx,1),%rdx
  31:   89 55 fc                mov    %edx,-0x4(%rbp)
  34:   8b 55 ec                mov    -0x14(%rbp),%edx
  37:   48 63 d2                movslq %edx,%rdx
  3a:   48 8b 4d e0             mov    -0x20(%rbp),%rcx
  3e:   48 89 ce                mov    %rcx,%rsi
  41:   48 b9 00 00 00 00 00    movabs $0x0,%rcx
  48:   00 00 00
                        43: R_X86_64_GOTOFF64   _text
  4b:   48 8d 3c 08             lea    (%rax,%rcx,1),%rdi
  4f:   48 b9 00 00 00 00 00    movabs $0x0,%rcx
  56:   00 00 00
                        51: R_X86_64_GOTOFF64   memcpy
  59:   48 8d 04 08             lea    (%rax,%rcx,1),%rax
  5d:   ff d0                   callq  *%rax
  5f:   8b 45 fc                mov    -0x4(%rbp),%eax
  62:   c9                      leaveq
  63:   c3                      retq
Comment 10 H.J. Lu 2018-01-19 19:02:23 UTC
(In reply to Thomas Garnier from comment #9)
> I tested the change against a modified version of the proposed Linux x86_64
> PIE support. The changes removes all the PLT32 and GOT64 entry but I still
> get R_X86_64_GOTPC64 & R_X86_64_GOTOFF64 relocations on the head64.c file
> that is built with -mcmodel=large (to prevent odd logic on early boot with
> different VA).
> 
> Do you think the suggested patch can be changed to remove these?
> 

I believe that is the nature of the large PIE model.
Comment 11 Thomas Garnier 2018-01-23 18:30:32 UTC
I think for this file using only -mcmodel=large makes more sense.

Given the proposed option (-fstatic-pie) is not kernel specific, the TLS is not needed. What do you think about disabling optimization like switch folding [1]? It seems to exist only to remove relocations that is not even needed in a classic -fPIE (or -fstatic-pie) scenario.

[1] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828
Comment 12 H.J. Lu 2018-01-23 18:34:25 UTC
(In reply to Thomas Garnier from comment #11)
> I think for this file using only -mcmodel=large makes more sense.
> 
> Given the proposed option (-fstatic-pie) is not kernel specific, the TLS is

Sounds reasonable.

> not needed. What do you think about disabling optimization like switch
> folding [1]? It seems to exist only to remove relocations that is not even
> needed in a classic -fPIE (or -fstatic-pie) scenario.
> 
> [1]
> https://github.com/gcc-mirror/gcc/blob/
> 7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828

Testcase please.
Comment 13 Thomas Garnier 2018-01-23 19:12:49 UTC
Created attachment 43223 [details]
testcase for switch folding

No switch folding if built with:

$CC -O2 -fno-PIE -c -o switch ./switch.c 

Switch folding if built with:

$CC -O2 -fPIE -c -o switch ./switch.c
or
$CC -O2 -fstatic-pie -c -o switch ./switch.c
Comment 14 Thomas Garnier 2018-01-23 19:14:57 UTC
Correcting what I said before, it is about re-enabling switch folding (or switch optimization).

Basically without PIE (-fno-PIE) with -O2, a switch can be optimized to be:

0000000000000000 <phy_modes>:
   0:	b8 00 00 00 00       	mov    $0x0,%eax
			1: R_X86_64_32	.rodata.str1.1
   5:	83 ff 16             	cmp    $0x16,%edi
   8:	77 0a                	ja     14 <phy_modes+0x14>
   a:	89 ff                	mov    %edi,%edi
   c:	48 8b 04 fd 00 00 00 	mov    0x0(,%rdi,8),%rax
  13:	00 
			10: R_X86_64_32S	.rodata
  14:	c3                   	retq   

With PIE and -O2 it becomes:

0000000000000000 <phy_modes>:
   0:	83 ff 16             	cmp    $0x16,%edi
   3:	0f 87 87 01 00 00    	ja     190 <phy_modes+0x190>
   9:	48 8d 15 00 00 00 00 	lea    0x0(%rip),%rdx        # 10 <phy_modes+0x10>
			c: R_X86_64_PC32	.rodata-0x4
  10:	89 ff                	mov    %edi,%edi
  12:	48 63 04 ba          	movslq (%rdx,%rdi,4),%rax
  16:	48 01 d0             	add    %rdx,%rax
  19:	ff e0                	jmpq   *%rax
  1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  20:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 27 <phy_modes+0x27>
			23: R_X86_64_PC32	.LC1-0x4
  27:	c3                   	retq   
  28:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  2f:	00 
  30:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 37 <phy_modes+0x37>
			33: R_X86_64_PC32	.LC22-0x4
  37:	c3                   	retq   
  38:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  3f:	00 
  40:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 47 <phy_modes+0x47>
			43: R_X86_64_PC32	.LC21-0x4
  47:	c3                   	retq   
  48:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4f:	00 
  50:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 57 <phy_modes+0x57>
			53: R_X86_64_PC32	.LC20-0x4
  57:	c3                   	retq   
  58:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  5f:	00 
  60:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 67 <phy_modes+0x67>
			63: R_X86_64_PC32	.LC19-0x4
  67:	c3                   	retq   
  68:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  6f:	00 
  70:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 77 <phy_modes+0x77>
			73: R_X86_64_PC32	.LC18-0x4
  77:	c3                   	retq   
  78:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  7f:	00 
  80:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 87 <phy_modes+0x87>
			83: R_X86_64_PC32	.LC17-0x4
  87:	c3                   	retq   
  88:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  8f:	00 
  90:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 97 <phy_modes+0x97>
			93: R_X86_64_PC32	.LC16-0x4
  97:	c3                   	retq   
  98:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  9f:	00 
  a0:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # a7 <phy_modes+0xa7>
			a3: R_X86_64_PC32	.LC15-0x4
  a7:	c3                   	retq   
  a8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  af:	00 
  b0:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # b7 <phy_modes+0xb7>
			b3: R_X86_64_PC32	.LC14-0x4
  b7:	c3                   	retq   
  b8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  bf:	00 
  c0:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # c7 <phy_modes+0xc7>
			c3: R_X86_64_PC32	.LC13-0x4
  c7:	c3                   	retq   
  c8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  cf:	00 
  d0:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # d7 <phy_modes+0xd7>
			d3: R_X86_64_PC32	.LC12-0x4
  d7:	c3                   	retq   
  d8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  df:	00 
  e0:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # e7 <phy_modes+0xe7>
			e3: R_X86_64_PC32	.LC11-0x4
  e7:	c3                   	retq   
  e8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  ef:	00 
  f0:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # f7 <phy_modes+0xf7>
			f3: R_X86_64_PC32	.LC10-0x4
  f7:	c3                   	retq   
  f8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  ff:	00 
 100:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 107 <phy_modes+0x107>
			103: R_X86_64_PC32	.LC9-0x4
 107:	c3                   	retq   
 108:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 10f:	00 
 110:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 117 <phy_modes+0x117>
			113: R_X86_64_PC32	.LC8-0x4
 117:	c3                   	retq   
 118:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 11f:	00 
 120:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 127 <phy_modes+0x127>
			123: R_X86_64_PC32	.LC7-0x4
 127:	c3                   	retq   
 128:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 12f:	00 
 130:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 137 <phy_modes+0x137>
			133: R_X86_64_PC32	.LC6-0x4
 137:	c3                   	retq   
 138:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 13f:	00 
 140:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 147 <phy_modes+0x147>
			143: R_X86_64_PC32	.LC5-0x4
 147:	c3                   	retq   
 148:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 14f:	00 
 150:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 157 <phy_modes+0x157>
			153: R_X86_64_PC32	.LC4-0x4
 157:	c3                   	retq   
 158:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 15f:	00 
 160:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 167 <phy_modes+0x167>
			163: R_X86_64_PC32	.LC3-0x4
 167:	c3                   	retq   
 168:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 16f:	00 
 170:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 177 <phy_modes+0x177>
			173: R_X86_64_PC32	.LC2-0x4
 177:	c3                   	retq   
 178:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 17f:	00 
 180:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 187 <phy_modes+0x187>
			183: R_X86_64_PC32	.LC23-0x4
 187:	c3                   	retq   
 188:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 18f:	00 
 190:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 197 <phy_modes+0x197>
			193: R_X86_64_PC32	.LC0-0x4
 197:	c3                   	retq
Comment 15 H.J. Lu 2018-01-23 21:18:13 UTC
(In reply to Thomas Garnier from comment #14)
> Correcting what I said before, it is about re-enabling switch folding (or
> switch optimization).
> 
> Basically without PIE (-fno-PIE) with -O2, a switch can be optimized to be:
> 
> 0000000000000000 <phy_modes>:
>    0:	b8 00 00 00 00       	mov    $0x0,%eax
> 			1: R_X86_64_32	.rodata.str1.1
>    5:	83 ff 16             	cmp    $0x16,%edi
>    8:	77 0a                	ja     14 <phy_modes+0x14>
>    a:	89 ff                	mov    %edi,%edi
>    c:	48 8b 04 fd 00 00 00 	mov    0x0(,%rdi,8),%rax
>   13:	00 
> 			10: R_X86_64_32S	.rodata
>   14:	c3                   	retq   
> 

The problem is that R_X86_64_32 isn't PIC:

/usr/local/bin/ld: x.o: relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
Comment 16 Thomas Garnier 2018-01-23 21:34:35 UTC
Yes, I think you can't just default to the non-PIE mode.

Clang does it well though:

0000000000000000 <phy_modes>:
   0:   83 ff 16                cmp    $0x16,%edi
   3:   77 0f                   ja     14 <phy_modes+0x14>
   5:   48 63 c7                movslq %edi,%rax
   8:   48 8d 0d 00 00 00 00    lea    0x0(%rip),%rcx        # f <phy_modes+0xf>
                        b: R_X86_64_PC32        .data.rel.ro-0x4
   f:   48 8b 04 c1             mov    (%rcx,%rax,8),%rax
  13:   c3                      retq   
  14:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 1b <phy_modes+0x1b>
                        17: R_X86_64_PC32       .L.str.23-0x4
  1b:   c3                      retq
Comment 17 H.J. Lu 2018-01-23 21:42:09 UTC
(In reply to Thomas Garnier from comment #16)
> Yes, I think you can't just default to the non-PIE mode.
> 
> Clang does it well though:
> 
> 0000000000000000 <phy_modes>:
>    0:   83 ff 16                cmp    $0x16,%edi
>    3:   77 0f                   ja     14 <phy_modes+0x14>
>    5:   48 63 c7                movslq %edi,%rax
>    8:   48 8d 0d 00 00 00 00    lea    0x0(%rip),%rcx        # f
> <phy_modes+0xf>
>                         b: R_X86_64_PC32        .data.rel.ro-0x4
>    f:   48 8b 04 c1             mov    (%rcx,%rax,8),%rax
>   13:   c3                      retq   
>   14:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 1b
> <phy_modes+0x1b>
>                         17: R_X86_64_PC32       .L.str.23-0x4
>   1b:   c3                      retq

Please open a separate bug report to improve switch codegen for PIC.
Comment 18 Thomas Garnier 2018-01-23 23:08:10 UTC
Ok. Opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011