[Bug c/66122] New: Bad uninlining decisions
vda.linux at googlemail dot com
gcc-bugzilla@gcc.gnu.org
Tue May 12 12:15:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
Bug ID: 66122
Summary: Bad uninlining decisions
Product: gcc
Version: 4.9.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: vda.linux at googlemail dot com
Target Milestone: ---
On linux kernel build, I found thousands of cases where functions which are
expected (by programmer) to be inlined, aren't actually inlined.
The following script is used to find them:
nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^ *1 ' | sort -rn
It caltually finds functions which have same name, size, and occur more than
once. There are a few false positives, but vast majority of them are functions
which were supposed to be inlined, but weren't:
(Count) (size) (name)
473 000000000000000b t spin_unlock_irqrestore
449 000000000000005f t rcu_read_unlock
355 0000000000000009 t atomic_inc
353 000000000000006e t rcu_read_lock
350 0000000000000075 t rcu_read_lock_sched_held
291 000000000000000b t spin_unlock
266 0000000000000019 t arch_local_irq_restore
215 000000000000000b t spin_lock
180 0000000000000011 t kzalloc
165 0000000000000012 t list_add_tail
161 0000000000000019 t arch_local_save_flags
153 0000000000000016 t test_and_set_bit
134 000000000000000b t spin_unlock_irq
134 0000000000000009 t atomic_dec
130 000000000000000b t spin_unlock_bh
122 0000000000000010 t brelse
120 0000000000000016 t test_and_clear_bit
120 000000000000000b t spin_lock_irq
119 000000000000001e t get_dma_ops
117 0000000000000053 t cpumask_next
116 0000000000000036 t kref_get
114 000000000000001a t schedule_work
106 000000000000000b t spin_lock_bh
103 0000000000000019 t arch_local_irq_disable
98 0000000000000014 t atomic_dec_and_test
83 0000000000000020 t sg_page
81 0000000000000037 t cpumask_check
79 0000000000000036 t pskb_may_pull
72 0000000000000044 t perf_fetch_caller_regs
70 000000000000002f t cpumask_next
68 0000000000000036 t clk_prepare_enable
65 0000000000000018 t pci_write_config_byte
65 0000000000000013 t tasklet_schedule
61 0000000000000023 t init_completion
60 000000000000002b t trace_handle_return
59 0000000000000043 t nlmsg_trim
59 0000000000000019 t pci_read_config_dword
59 000000000000000c t slow_down_io
...
...
Note tiny sizes of some functions. Let's take a look at atomic_inc:
static inline void atomic_inc(atomic_t *v)
{
asm volatile(LOCK_PREFIX "incl %0"
: "+m" (v->counter));
}
You would imagine that this won't ever be deinlined, right? It's one assembly
instruction. Well, it isn't always inlined. Here's the disassembly of vmlinux:
ffffffff81003000 <atomic_inc>:
ffffffff81003000: 55 push %rbp
ffffffff81003001: 48 89 e5 mov %rsp,%rbp
ffffffff81003004: f0 ff 07 lock incl (%rdi)
ffffffff81003007: 5d pop %rbp
ffffffff81003008: c3 retq
This can be fixed using __always_inline, but kernel developers hesitate to slap
thousands of __always_inline everywhere, the mood is that this is a compiler's
fault and it should not be accomodated for, but fixed.
This happens quite easily with -Os (IOW: with CC_OPTIMIZE_FOR_SIZE=y kernel
build), but -O2 is not immune either.
I found a file which exhibits an example of bad deinlining for both -O2 and -Os
and I'm going to attach it.
More information about the Gcc-bugs
mailing list