[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

hubicka at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Mon Oct 19 16:13:12 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #19 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
get_order unwinds to:

  <bb 2> [local count: 1073741824]:
  _1 = __builtin_constant_p (size_68(D));
  if (_1 != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 71>; [50.00%]

  <bb 3> [local count: 536870913]:
  if (size_68(D) == 0)
    goto <bb 72>; [21.72%]
  else
    goto <bb 4>; [78.28%]

  <bb 4> [local count: 420262548]:
  if (size_68(D) <= 4095)
    goto <bb 72>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 5> [local count: 210131274]:
  _2 = size_68(D) + 18446744073709551615;
  _3 = __builtin_constant_p (_2);
  if (_3 != 0)
    goto <bb 6>; [50.00%]
  else
    goto <bb 69>; [50.00%]

  <bb 6> [local count: 105065637]:
  _4 = (signed long) _2;
  if (_4 >= 0)
    goto <bb 7>; [59.00%]
  else
    goto <bb 70>; [41.00%]

... [very long code]

  <bb 69> [local count: 105065637]:
  __asm__("bsrq %1,%q0" : "=r" bitpos_75 : "rm" _2, "0" -1);
  iftmp.1_73 = bitpos_75 + -11;

  <bb 70> [local count: 210131274]:
  # iftmp.1_67 = PHI <52(6), iftmp.1_73(69), 51(7), 50(8), 49(9), 48(10),
47(11), 46(12), 45(13), 44(14), 43(15), 42(16), 41(17), 40(18), 39(19), 38(20),
37(21), 36(22), 35(23), 34(24), 33(25), 32(26), 31(27), 30(28), 29(29), 28(30),
27(31), 26(32), 25(33), 24(34), 23(35), 22(36), 21(37), 20(38), 19(39), 18(40),
17(41), 16(42), 15(43), 14(44), 13(45), 12(46), 11(47), 10(48), 9(49), 8(50),
7(51), 6(52), 5(53), 4(54), 3(55), 2(56), 1(57), 0(58), -1(59), -2(60), -3(61),
-4(62), -5(63), -6(64), -7(65), -8(66), -10(68), -9(67)>
  goto <bb 72>; [100.00%]

  <bb 71> [local count: 536870913]:
  size_69 = size_68(D) + 18446744073709551615;
  size_70 = size_69 >> 12;
  __asm__("bsrq %1,%q0" : "=r" bitpos_72 : "rm" size_70, "0" -1);
  _74 = bitpos_72 + 1;

  <bb 72> [local count: 1073741824]:
  # _66 = PHI <52(3), 0(4), iftmp.1_67(70), _74(71)>
  return _66;

We get summary:

IPA function summary for get_order/303 inlinable                                
  global time:     8.716289                                                     
  self size:       201                                                          
  global size:     201                                                          
  min size:       4                                                             
  self stack:      0                                                            
  global stack:    0                                                            
    size:4.000000, time:3.000000                                                
    size:3.000000, time:2.000000,  executed if:(not inlined)                    
    size:4.000000, time:2.000000,  executed if:(op0 not constant)               
    size:2.000000, time:0.782800,  executed if:(op0 != 0)                       
    size:3.000000, time:0.391400,  executed if:(op0 > 4095) && (op0 != 0)       
    size:2.000000, time:0.195700,  executed if:(op0 > 4095) && (op0 != 0) &&
(op0 not constant)
    size:3.000000, time:0.173194,  executed if:(op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
    size:3.000000, time:0.086597,  executed if:(op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
    size:3.000000, time:0.043299,  executed if:(op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
    size:3.000000, time:0.021649,  executed if:(op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
    size:3.000000, time:0.010825,  executed if:(op0,(# +
18446744073709551615),(# & 576460752303423488) == 0) && (op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
    size:168.000000, time:0.010825,  executed if:(op0,(# +
18446744073709551615),(# & 288230376151711744) == 0) && (op0,(# +
18446744073709551615),(# & 576460752303423488) == 0) && (op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
  calls:                                                                        
    __builtin_constant_p/4546 function body not available                       
      freq:0.20 loop depth: 0 size: 0 time:  0 predicate: (op0 > 4095) && (op0
!= 0)
       op0 points to local or readonly memory                                   
    __builtin_constant_p/4546 function body not available                       
      freq:1.00 loop depth: 0 size: 0 time:  0                                  

and then in calls to get_inline we do not know the constant parameter:

   Estimating body: get_order/303                                               
   Known to be false: not inlined                                               
   size:198 time:6.716289 nonspec time:8.716289 loops with known
iterations:0.000000 known strides:0.000000

the problem here is size of 198 instructions while we inline up to 70
instructions.  Of course after concluding that parameter is not constant this
would all collapse to just few instrutions.

It is difficult to handle builtin_constant_p correctly at this stage: ipa-prop
is missing a lot of known constants and it is quite possible parameter will be
folded to constant post inlining and thus we keep both variant.

We could teach ipa-predicates that the if is exclusive and thus only max of
both variants should be accounted byt it does not fit the way predicates works
very well.  One option would be to takea hint that function with
builtin_constant_p on parameters really wants to be inlined and increase the
bounds (I think this owuld be good idea to do along with functions having
vector builtins in them), but that would cure only some cases, certainly not
all.

It is always possible to always_inline functions that are intended to be always
inlined.
Honza


More information about the Gcc-bugs mailing list