[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4
jamborm at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Jan 26 18:27:35 GMT 2024
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600
--- Comment #4 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #2)
> A patch is posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640276.html
>
> Would you give a try to see if it fixes the regression, I don't currently
> have a znver4 machine for testing.
Unfortunately it does not.
(In reply to Richard Biener from comment #3)
> I think we need to figure out what exactly gets slower (and hope it's not
> scattered all over the place)
I have collected some profiles:
r14-5602-ge6269bb69c0734
# Samples: 516K of event 'cycles:u'
# Event count (approx.): 468008188417
# Overhead Samples Command Shared Object
Symbol
# ........ ............ ...............
.....................................
.................................................
#
13.55% 69886 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] mc_chroma
11.05% 57017 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_16x16
9.24% 47693 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_8x8
8.67% 44733 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] get_ref
4.84% 24984 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] sub16x16_dct
4.16% 21484 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_me_search_ref
3.30% 17033 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_hadamard_ac_16x16
2.28% 11770 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_4x4
2.10% 10824 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] quant_trellis_cabac
2.07% 10694 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] hpel_filter
2.05% 10616 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] sub8x8_dct
1.86% 9593 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] refine_subpel
1.70% 8788 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] quant_4x4
1.57% 8077 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_sad_16x16
1.16% 6324 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] frame_init_lowres_core
1.14% 5867 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_sa8d_8x8
1.11% 5738 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_cabac_encode_decision_c
1.08% 5736 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_var_16x16
r14-5603-g2b59e2b4dff421
# Samples: 550K of event 'cycles:u'
# Event count (approx.): 498834737657
# Overhead Samples Command Shared Object
Symbol
# ........ ............ ...............
.....................................
.................................................
#
18.21% 100151 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_16x16
12.37% 68006 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] mc_chroma
8.51% 46815 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_8x8
7.56% 41560 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] get_ref
4.53% 24901 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] sub16x16_dct
3.92% 21561 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_me_search_ref
3.08% 16963 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_hadamard_ac_16x16
2.41% 13239 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_4x4
1.99% 10931 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] quant_trellis_cabac
1.96% 10801 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] hpel_filter
1.95% 10764 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] sub8x8_dct
1.56% 8587 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] quant_4x4
1.49% 8166 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] refine_subpel
1.48% 8124 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_sad_16x16
1.09% 6328 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] frame_init_lowres_core
1.07% 5901 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_sa8d_8x8
1.04% 5703 x264_r_peak.min
x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_cabac_encode_decision_c
More information about the Gcc-bugs
mailing list