[Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2
Changbin Du
changbin.du@huawei.com
Tue Aug 1 12:45:23 GMT 2023
On Mon, Jul 31, 2023 at 08:55:35PM +0800, Changbin Du wrote:
> The result (p-core, no ht, no turbo, performance mode):
>
> O2 O3 PGO
> cycles 2,581,832,749 8,638,401,568 9,394,200,585
> (1.07s) (3.49s) (3.80s)
> instructions 12,609,600,094 11,827,675,782 12,036,010,638
> branches 2,303,416,221 2,671,184,833 2,723,414,574
> branch-misses 0.00% 7.94% 8.84%
> cache-misses 3,012,613 3,055,722 3,076,316
> L1-icache-load-misses 11,416,391 12,112,703 11,896,077
> icache_tag.stalls 1,553,521 1,364,092 1,896,066
> itlb_misses.stlb_hit 6,856 21,756 22,600
> itlb_misses.walk_completed 14,430 4,454 15,084
> baclears.any 131,573 140,355 131,644
> int_misc.clear_resteer_cycles 2,545,915 586,578,125 679,021,993
> machine_clears.count 22,235 39,671 37,307
> dsb2mite_switches.penalty_cycles 6,985,838 12,929,675 8,405,493
> frontend_retired.any_dsb_miss 28,785,677 28,161,724 28,093,319
> idq.dsb_cycles_any 1,986,038,896 5,683,820,258 5,971,969,906
> idq.dsb_uops 11,149,445,952 26,438,051,062 28,622,657,650
> idq.mite_uops 207,881,687 216,734,007 212,003,064
>
>
> Above data shows:
> o O3/PGO lead to *2.3x/2.6x* performance drop than O2 respectively.
> o O3/PGO reduced instructions by 6.2% and 4.5%. I think this attributes to
> aggressive inline.
> o O3/PGO introduced very bad branch prediction. I will explain it later.
> o Code built with O3 has high iTLB miss but much lower sTLB miss. This is beyond
> my expectation.
> o O3/PGO introduced 78% and 68% more machine clears. This is interesting and
> I don't know why. (subcategory MC is not measured yet)
The MCs are caused by memory ordering conflict and attribute to the kernel rcu
lock in I/O path, when ext4 tries to update its journal.
> o O3 has much higher dsb2mite_switches.penalty_cycles than O2/PGO.
> o The idq.mite_uops of O3/PGO increased 4%, while idq.dsb_uops increased 2x.
> DSB hit well. So frontend fetching and decoding is not a problem for O3/PGO.
> o Other events are mainly affected by bad branch misprediction.
>
--
Cheers,
Changbin Du
More information about the Gcc-bugs
mailing list