Bug 109445 - r13-6372-g822a11a1e642e0 regression due to noline with -Ofast -march=sapphirerapids -funroll-loops -flto, 541.leela_r performance decrease by 2-3%
Summary: r13-6372-g822a11a1e642e0 regression due to noline with -Ofast -march=sapphire...
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2023-04-07 05:30 UTC by Jun zhang
Modified: 2023-04-20 09:31 UTC (History)
4 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
random unlined (82.43 KB, image/png)
2023-04-20 08:12 UTC, Jun zhang
Details
leela_r.wpa.085i.inline log (490.69 KB, application/x-zip-compressed)
2023-04-20 08:16 UTC, Jun zhang
Details
set param_inline_unit_growth to 41 (509 bytes, patch)
2023-04-20 08:24 UTC, Jun zhang
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jun zhang 2023-04-07 05:30:19 UTC
r13-6372-g822a11a1e642e0 regression due to noline with -Ofast -march=sapphirerapids -funroll-loops -flto, 541.leela_r performance decrease by 2-3%

Follow is the inline dump, left dump is before the commit, right dump is after the commit.

  <bb 104> [local count: 210861628]:               <bb 104> [local count: 210861628]:
  # DEBUG BEGIN_STMT                               # DEBUG BEGIN_STMT
  _466 = s_rng;                                    _466 = s_rng;
----------------------------------------------------------------------------------------------
  _607 = _466;                                <>   _561 = _466;
  _118 = _607;                                     _118 = _561;
----------------------------------------------------------------------------------------------
  _35 = this_72(D)->board.D.5191.m_empty_cnt; =    _35 = this_72(D)->board.D.5191.m_empty_cnt;
  _5 = _35 & 65535;                                _5 = _35 & 65535;
  # DEBUG this => _118                             # DEBUG this => _118
  max_458 = (const uint16) _5;                     max_458 = (const uint16) _5;
  # DEBUG max => max_458                           # DEBUG max => max_458
  # DEBUG BEGIN_STMT                               # DEBUG BEGIN_STMT
----------------------------------------------------------------------------------------------
  # DEBUG this => _118                        <>
  # DEBUG BEGIN_STMT
  # DEBUG mask => 4294967295
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  _467 = _118->s1;
  _468 = _467 << 13;
  _469 = _467 ^ _468;
  b_470 = _469 >> 19;
  # DEBUG b => b_470
  # DEBUG BEGIN_STMT
  _471 = _467 << 12;
  _472 = _471 & 4294959104;
  _473 = b_470 ^ _472;
  _118->s1 = _473;
  # DEBUG BEGIN_STMT
  _474 = _118->s2;
  _475 = _474 << 2;
  _476 = _474 ^ _475;
  b_477 = _476 >> 25;
  # DEBUG b => b_477
  # DEBUG BEGIN_STMT
  _478 = _474 << 4;
  _479 = _478 & 4294967168;
  _480 = b_477 ^ _479;
  _118->s2 = _480;
  # DEBUG BEGIN_STMT
  _481 = _118->s3;
  _482 = _481 << 3;
  _483 = _481 ^ _482;
  b_484 = _483 >> 11;
  # DEBUG b => b_484
  # DEBUG BEGIN_STMT
  _485 = _481 << 17;
  _486 = _485 & 4292870144;
  _487 = b_484 ^ _486;
  _118->s3 = _487;
  # DEBUG BEGIN_STMT
  _488 = _473 ^ _480;
  _489 = _487 ^ _488;
  _611 = _489;                                     _459 = random (_118);
  # DEBUG this => NULL
  # DEBUG b => NULL
  _459 = _611;
----------------------------------------------------------------------------------------------
  _460 = _459 >> 16;                          =    _460 = _459 >> 16;
  _461 = (unsigned int) max_458;                   _461 = (unsigned int) max_458;
  _462 = _460 * _461;                              _462 = _460 * _461;
  _463 = _462 >> 16;                               _463 = _462 >> 16;
----------------------------------------------------------------------------------------------
  _612 = _463;                                <>   _563 = _463;
----------------------------------------------------------------------------------------------
  # DEBUG this => NULL                        =    # DEBUG this => NULL
  # DEBUG max => NULL                              # DEBUG max => NULL
----------------------------------------------------------------------------------------------
  _120 = _612;                                <>   _120 = _563;
----------------------------------------------------------------------------------------------
  vidx_121 = (int) _120;                      =    vidx_121 = (int) _120;
  # DEBUG vidx => vidx_121                         # DEBUG vidx => vidx_121
  # DEBUG BEGIN_STMT                               # DEBUG BEGIN_STMT
  _37 = this_72(D)->board.D.5191.m_tomove;         _37 = this_72(D)->board.D.5191.m_tomove;
----------------------------------------------------------------------------------------------
  # DEBUG D#1845 => 1                         <>   # DEBUG D#1824 => 1
----------------------------------------------------------------------------------------------
  # DEBUG this => this_72(D)                  =    # DEBUG this => this_72(D)
  # DEBUG color => _37                             # DEBUG color => _37
  # DEBUG vidx => vidx_121                         # DEBUG vidx => vidx_121
  # DEBUG allow_sa => 1                            # DEBUG allow_sa => 1
  # DEBUG BEGIN_STMT                               # DEBUG BEGIN_STMT
----------------------------------------------------------------------------------------------
  _495 = s_rng;                               <>   _472 = s_rng;
  if (_495 == 0B)                                  if (_472 == 0B)
----------------------------------------------------------------------------------------------
    goto <bb 105>; [17.43%]                   =      goto <bb 105>; [17.43%]
  else                                             else
    goto <bb 106>; [82.57%]                          goto <bb 106>; [82.57%]


  <bb 106> [local count: 210861628]:               <bb 106> [local count: 210861628]:
  # DEBUG BEGIN_STMT                               # DEBUG BEGIN_STMT
----------------------------------------------------------------------------------------------
  _497 = s_rng;                               <>   _474 = s_rng;
  _619 = _497;                                     _570 = _474;
  _420 = _619;                                     _420 = _570;
----------------------------------------------------------------------------------------------
  # DEBUG this => _420                        =    # DEBUG this => _420
  # DEBUG max => 2                                 # DEBUG max => 2
  # DEBUG BEGIN_STMT                               # DEBUG BEGIN_STMT
----------------------------------------------------------------------------------------------
  # DEBUG this => _420                        <>
  # DEBUG BEGIN_STMT
  # DEBUG mask => 4294967295
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  _498 = _420->s1;
  _499 = _498 << 13;
  _500 = _498 ^ _499;
  b_501 = _500 >> 19;
  # DEBUG b => b_501
  # DEBUG BEGIN_STMT
  _502 = _498 << 12;
  _503 = _502 & 4294959104;
  _504 = b_501 ^ _503;
  _420->s1 = _504;
  # DEBUG BEGIN_STMT
  _505 = _420->s2;
  _506 = _505 << 2;
  _507 = _505 ^ _506;
  b_508 = _507 >> 25;
  # DEBUG b => b_508
  # DEBUG BEGIN_STMT
  _509 = _505 << 4;
  _510 = _509 & 4294967168;
  _511 = b_508 ^ _510;
  _420->s2 = _511;
  # DEBUG BEGIN_STMT
  _512 = _420->s3;
  _513 = _512 << 3;
  _514 = _512 ^ _513;
  b_515 = _514 >> 11;
  # DEBUG b => b_515
  # DEBUG BEGIN_STMT
  _516 = _512 << 17;
  _517 = _516 & 4292870144;
  _518 = b_515 ^ _517;
  _420->s3 = _518;
  # DEBUG BEGIN_STMT
  _519 = _504 ^ _511;
  _520 = _518 ^ _519;
  _623 = _520;
  # DEBUG this => NULL
  # DEBUG b => NULL
  _490 = _623;                                     _467 = random (_420);
  _491 = _490 >> 16;                               _468 = _467 >> 16;
  _492 = 2;                                        _469 = 2;
  _493 = _491 * _492;                              _470 = _468 * _469;
  _494 = _493 >> 16;                               _471 = _470 >> 16;
  _624 = _494;                                     _572 = _471;
----------------------------------------------------------------------------------------------
  # DEBUG this => NULL                        =    # DEBUG this => NULL
  # DEBUG max => NULL                              # DEBUG max => NULL
----------------------------------------------------------------------------------------------
  _421 = _624;                                <>   _421 = _572;
----------------------------------------------------------------------------------------------
  # DEBUG dir => (int) _421                   =    # DEBUG dir => (int) _421
  if (_421 == 0)                                   if (_421 == 0)
    goto <bb 110>; [50.00%]                          goto <bb 110>; [50.00%]
  else                                             else
    goto <bb 118>; [50.00%]                          goto <bb 118>; [50.00%]
----------------------------------------------------------------------------------------------
Comment 1 Andrew Pinski 2023-04-07 05:41:46 UTC
This just seems like bad luck.

Maybe look at the inline dumps see what the cost difference is.
Comment 2 Jun zhang 2023-04-20 08:12:21 UTC
Created attachment 54888 [details]
random unlined
Comment 3 Andrew Pinski 2023-04-20 08:15:58 UTC
Yep it was just pure luck that the difference causes the unit growth limit to hit now.
Comment 4 Jun zhang 2023-04-20 08:16:01 UTC
Created attachment 54889 [details]
leela_r.wpa.085i.inline log
Comment 5 Jun zhang 2023-04-20 08:24:40 UTC
Created attachment 54890 [details]
set param_inline_unit_growth to 41

Hello, Andrew
  this patch could work!