[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
hubicka at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Oct 19 16:13:12 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #19 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
get_order unwinds to:
<bb 2> [local count: 1073741824]:
_1 = __builtin_constant_p (size_68(D));
if (_1 != 0)
goto <bb 3>; [50.00%]
else
goto <bb 71>; [50.00%]
<bb 3> [local count: 536870913]:
if (size_68(D) == 0)
goto <bb 72>; [21.72%]
else
goto <bb 4>; [78.28%]
<bb 4> [local count: 420262548]:
if (size_68(D) <= 4095)
goto <bb 72>; [50.00%]
else
goto <bb 5>; [50.00%]
<bb 5> [local count: 210131274]:
_2 = size_68(D) + 18446744073709551615;
_3 = __builtin_constant_p (_2);
if (_3 != 0)
goto <bb 6>; [50.00%]
else
goto <bb 69>; [50.00%]
<bb 6> [local count: 105065637]:
_4 = (signed long) _2;
if (_4 >= 0)
goto <bb 7>; [59.00%]
else
goto <bb 70>; [41.00%]
... [very long code]
<bb 69> [local count: 105065637]:
__asm__("bsrq %1,%q0" : "=r" bitpos_75 : "rm" _2, "0" -1);
iftmp.1_73 = bitpos_75 + -11;
<bb 70> [local count: 210131274]:
# iftmp.1_67 = PHI <52(6), iftmp.1_73(69), 51(7), 50(8), 49(9), 48(10),
47(11), 46(12), 45(13), 44(14), 43(15), 42(16), 41(17), 40(18), 39(19), 38(20),
37(21), 36(22), 35(23), 34(24), 33(25), 32(26), 31(27), 30(28), 29(29), 28(30),
27(31), 26(32), 25(33), 24(34), 23(35), 22(36), 21(37), 20(38), 19(39), 18(40),
17(41), 16(42), 15(43), 14(44), 13(45), 12(46), 11(47), 10(48), 9(49), 8(50),
7(51), 6(52), 5(53), 4(54), 3(55), 2(56), 1(57), 0(58), -1(59), -2(60), -3(61),
-4(62), -5(63), -6(64), -7(65), -8(66), -10(68), -9(67)>
goto <bb 72>; [100.00%]
<bb 71> [local count: 536870913]:
size_69 = size_68(D) + 18446744073709551615;
size_70 = size_69 >> 12;
__asm__("bsrq %1,%q0" : "=r" bitpos_72 : "rm" size_70, "0" -1);
_74 = bitpos_72 + 1;
<bb 72> [local count: 1073741824]:
# _66 = PHI <52(3), 0(4), iftmp.1_67(70), _74(71)>
return _66;
We get summary:
IPA function summary for get_order/303 inlinable
global time: 8.716289
self size: 201
global size: 201
min size: 4
self stack: 0
global stack: 0
size:4.000000, time:3.000000
size:3.000000, time:2.000000, executed if:(not inlined)
size:4.000000, time:2.000000, executed if:(op0 not constant)
size:2.000000, time:0.782800, executed if:(op0 != 0)
size:3.000000, time:0.391400, executed if:(op0 > 4095) && (op0 != 0)
size:2.000000, time:0.195700, executed if:(op0 > 4095) && (op0 != 0) &&
(op0 not constant)
size:3.000000, time:0.173194, executed if:(op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.000000, time:0.086597, executed if:(op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.000000, time:0.043299, executed if:(op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.000000, time:0.021649, executed if:(op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.000000, time:0.010825, executed if:(op0,(# +
18446744073709551615),(# & 576460752303423488) == 0) && (op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:168.000000, time:0.010825, executed if:(op0,(# +
18446744073709551615),(# & 288230376151711744) == 0) && (op0,(# +
18446744073709551615),(# & 576460752303423488) == 0) && (op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
calls:
__builtin_constant_p/4546 function body not available
freq:0.20 loop depth: 0 size: 0 time: 0 predicate: (op0 > 4095) && (op0
!= 0)
op0 points to local or readonly memory
__builtin_constant_p/4546 function body not available
freq:1.00 loop depth: 0 size: 0 time: 0
and then in calls to get_inline we do not know the constant parameter:
Estimating body: get_order/303
Known to be false: not inlined
size:198 time:6.716289 nonspec time:8.716289 loops with known
iterations:0.000000 known strides:0.000000
the problem here is size of 198 instructions while we inline up to 70
instructions. Of course after concluding that parameter is not constant this
would all collapse to just few instrutions.
It is difficult to handle builtin_constant_p correctly at this stage: ipa-prop
is missing a lot of known constants and it is quite possible parameter will be
folded to constant post inlining and thus we keep both variant.
We could teach ipa-predicates that the if is exclusive and thus only max of
both variants should be accounted byt it does not fit the way predicates works
very well. One option would be to takea hint that function with
builtin_constant_p on parameters really wants to be inlined and increase the
bounds (I think this owuld be good idea to do along with functions having
vector builtins in them), but that would cure only some cases, certainly not
all.
It is always possible to always_inline functions that are intended to be always
inlined.
Honza
More information about the Gcc-bugs
mailing list