This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug ipa/65478] [5 regression] crafty performance regression


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65478

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenther at suse dot de

--- Comment #13 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks, Martin.  I now get Search unduplicated by ipa-cp.  Funnily enough
however the fix to vortex slowdown (caused by bug in inliner's LTO
inline_failed bookeeping) caused differences in the inline decisions and now we
do not inline FirstOne and LastOne.
This seems to be very stupid implementation of clz
  int FirstOne(BITBOARD arg1)                                                   
  {                                                                             
    union doub {                                                                
      unsigned short i[4];                                                      
      BITBOARD d;                                                               
    };                                                                          
#ifndef SPEC_CPU2000                                                            
    register union doub x;                                                      
#else                                                                           
    union doub x;                                                               
#endif /* SPEC_CPU2000 */                                                       
    x.d=arg1;                                                                   
#  if defined(LITTLE_ENDIAN_ARCH)                                               
    if (x.i[3])                                                                 
      return (first_ones[x.i[3]]);                                              
    if (x.i[2])                                                                 
      return (first_ones[x.i[2]]+16);                                           
    if (x.i[1])                                                                 
      return (first_ones[x.i[1]]+32);                                           
    if (x.i[0])                                                                 
      return (first_ones[x.i[0]]+48);                                           
#  endif                                                                        
#  if !defined(LITTLE_ENDIAN_ARCH)                                              
    if (x.i[0])                                                                 
      return (first_ones[x.i[0]]);                                              
    if (x.i[1])                                                                 
      return (first_ones[x.i[1]]+16);                                           
    if (x.i[2])                                                                 
      return (first_ones[x.i[2]]+32);                                           
    if (x.i[3])                                                                 
      return (first_ones[x.i[3]]+48);                                           
#  endif                                                                        
    return(64);                                                                 
  }                                                                             
which unfortunately gets estimates as quite large by inliner:

Analyzing function body size: FirstOne                                          
                Accounting size:2.00, time:0.00 on new predicate:(not inlined)  

 BB 2 predicate:(true)                                                          
  x.d = arg1_3(D);                                                              
                freq:1.00 size:  1 time:  1                                     
                Accounting size:1.00, time:1.00 on predicate:(true)             
  _5 = x.i[3];                                                                  
                freq:1.00 size:  1 time:  1                                     
                Accounting size:1.00, time:1.00 on predicate:(true)             
  if (_5 != 0)                                                                  
                freq:1.00 size:  2 time:  2                                     
                Accounting size:2.00, time:2.00 on predicate:(true)             

 BB 4 predicate:(true)                                                          
  _9 = x.i[2];                                                                  
                freq:0.61 size:  1 time:  1                                     
                Accounting size:1.00, time:0.61 on predicate:(true)             
  if (_9 != 0)                                                                  
                freq:0.61 size:  2 time:  2                                     
                Accounting size:2.00, time:1.22 on predicate:(true)             
...
so at this point we do not even see that x.d is the value arg. If the things
was implemented by view_convert_expr, inliner would at least see that the
return value of FirstOne depends on its parameter.

Tree optimizers produce:

Removing basic block 11
FirstOne (BITBOARD arg1)
{
  union doub x;
  int _1;
  short unsigned int _5;
  int _6;
  unsigned char _7;
  int _8;
  short unsigned int _9;
  int _10;
  unsigned char _11;
  int _12;
  int _13;
  short unsigned int _14;
  int _15;
  unsigned char _16;
  int _17;
  int _18;
  short unsigned int _19;
  int _20;
  unsigned char _21;
  int _22;
  int _23;

  <bb 2>:
  x.d = arg1_3(D);
  _5 = x.i[3];
  if (_5 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  _6 = (int) _5;
  _7 = first_ones[_6];
  _8 = (int) _7;
  goto <bb 10>;
...
this is somewhat lame.  Richard, i believed we should synthetize
view_convert_expr in this case?

I am checking how much I need to bump up inline unit growth.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]