[Bug ipa/65478] [5 regression] crafty performance regression
hubicka at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Sun Mar 29 14:15:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65478
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenther at suse dot de
--- Comment #13 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks, Martin. I now get Search unduplicated by ipa-cp. Funnily enough
however the fix to vortex slowdown (caused by bug in inliner's LTO
inline_failed bookeeping) caused differences in the inline decisions and now we
do not inline FirstOne and LastOne.
This seems to be very stupid implementation of clz
int FirstOne(BITBOARD arg1)
{
union doub {
unsigned short i[4];
BITBOARD d;
};
#ifndef SPEC_CPU2000
register union doub x;
#else
union doub x;
#endif /* SPEC_CPU2000 */
x.d=arg1;
# if defined(LITTLE_ENDIAN_ARCH)
if (x.i[3])
return (first_ones[x.i[3]]);
if (x.i[2])
return (first_ones[x.i[2]]+16);
if (x.i[1])
return (first_ones[x.i[1]]+32);
if (x.i[0])
return (first_ones[x.i[0]]+48);
# endif
# if !defined(LITTLE_ENDIAN_ARCH)
if (x.i[0])
return (first_ones[x.i[0]]);
if (x.i[1])
return (first_ones[x.i[1]]+16);
if (x.i[2])
return (first_ones[x.i[2]]+32);
if (x.i[3])
return (first_ones[x.i[3]]+48);
# endif
return(64);
}
which unfortunately gets estimates as quite large by inliner:
Analyzing function body size: FirstOne
Accounting size:2.00, time:0.00 on new predicate:(not inlined)
BB 2 predicate:(true)
x.d = arg1_3(D);
freq:1.00 size: 1 time: 1
Accounting size:1.00, time:1.00 on predicate:(true)
_5 = x.i[3];
freq:1.00 size: 1 time: 1
Accounting size:1.00, time:1.00 on predicate:(true)
if (_5 != 0)
freq:1.00 size: 2 time: 2
Accounting size:2.00, time:2.00 on predicate:(true)
BB 4 predicate:(true)
_9 = x.i[2];
freq:0.61 size: 1 time: 1
Accounting size:1.00, time:0.61 on predicate:(true)
if (_9 != 0)
freq:0.61 size: 2 time: 2
Accounting size:2.00, time:1.22 on predicate:(true)
...
so at this point we do not even see that x.d is the value arg. If the things
was implemented by view_convert_expr, inliner would at least see that the
return value of FirstOne depends on its parameter.
Tree optimizers produce:
Removing basic block 11
FirstOne (BITBOARD arg1)
{
union doub x;
int _1;
short unsigned int _5;
int _6;
unsigned char _7;
int _8;
short unsigned int _9;
int _10;
unsigned char _11;
int _12;
int _13;
short unsigned int _14;
int _15;
unsigned char _16;
int _17;
int _18;
short unsigned int _19;
int _20;
unsigned char _21;
int _22;
int _23;
<bb 2>:
x.d = arg1_3(D);
_5 = x.i[3];
if (_5 != 0)
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
_6 = (int) _5;
_7 = first_ones[_6];
_8 = (int) _7;
goto <bb 10>;
...
this is somewhat lame. Richard, i believed we should synthetize
view_convert_expr in this case?
I am checking how much I need to bump up inline unit growth.
More information about the Gcc-bugs
mailing list