this is visible at https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=226.548.8 and https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.548.8
The two main changes during that time period was jump threading and modref. modref seems might be more likely with wrf being fortran code and even using nested functions and such.
> The two main changes during that time period was jump threading and modref. > modref seems might be more likely with wrf being fortran code and even using > nested functions and such. Yep, I think both are possible. There was also change enabling ipa-sra on fortran. There was yet another regression in wrf earlier that I think was related to imroving flags propagation. I think those are not by modref itself, but triggers some other pass. I wil try to look up the regression range (it was before ranger got in).
I filled in PR103423. Interesting observation is that both regressions are cca 18% but happens at different time-ranges. This one is spec2017 WRF while the other is spec2006 WRF and neither reproduce on both. So perhaps modref improved enough in July to regress compile time of 2006 wrf while for 2017 it needed couple more months...
I'm going to bisect that.
Started with r12-3903-g0288527f47cec669.
> Started with r12-3903-g0288527f47cec669. This is September change (for which we have PR102943) however the regression range was g:1ae8edf5f73ca5c3 (or g:264f061997c0a534 on second plot) and g:3e09331f6aeaf595 which is the latest regression visible on the graphs appearing betwen Nov 12 and Nov 15. The September regression is there too, but it is tracket as PR102943
Ok, then it started with r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d.
thanks for bisecting! So not modref, but jump threading this itme. Linking with the other PR on WRF and threading (perhaps those are different issues).
There's definitely something in the threader, but I'm not sure it's the cause of all the regression. For the record, I've reproduced on ppc64le with a spec .cfg file having: OPTIMIZE = -O2 -flto=100 -save-temps -ftime-report -v -fno-checking The slow wrf_r.ltransNN.o files that dominate the compilation and are taking more than 2-3 seconds are (42, 76, and 24). I've distilled -ftime-report for VRP and jump threading, which usually go hand in hand now that VRP2 runs with ranger: dumping.42: tree VRP : 13.70 ( 3%) 0.08 ( 2%) 13.73 ( 3%) 45M ( 4%) dumping.42: backwards jump threading : 26.68 ( 5%) 0.00 ( 0%) 26.72 ( 5%) 3609k ( 0%) dumping.42: TOTAL : 524.00 3.31 527.30 1277M dumping.76: tree VRP : 38.30 ( 13%) 0.03 ( 2%) 38.31 ( 13%) 19M ( 2%) dumping.76: backwards jump threading : 47.38 ( 17%) 0.01 ( 1%) 47.37 ( 16%) 1671k ( 0%) dumping.76: TOTAL : 286.03 1.79 287.82 1173M dumping.24: tree VRP : 87.43 ( 8%) 0.07 ( 2%) 87.53 ( 8%) 58M ( 3%) dumping.24: backwards jump threading : 129.81 ( 12%) 0.00 ( 0%) 129.81 ( 12%) 8986k ( 0%) dumping.24: TOTAL :1042.37 3.58 1045.93 2325M Threading is usually more expensive than VRP because it tries candidates over and over, but it's not meant to be orders of magnitude slower. Prior to the bisected patch in r12-5228, we had: dumping.42: tree VRP : 14.58 ( 3%) 0.07 ( 2%) 14.62 ( 3%) 45M ( 4%) dumping.42: backwards jump threading : 13.88 ( 3%) 0.00 ( 0%) 13.89 ( 3%) 3609k ( 0%) dumping.42: TOTAL : 484.12 3.06 487.18 1277M dumping.76: tree VRP : 37.68 ( 13%) 0.04 ( 2%) 37.79 ( 13%) 19M ( 2%) dumping.76: backwards jump threading : 45.50 ( 15%) 0.03 ( 2%) 45.52 ( 15%) 1671k ( 0%) dumping.76: TOTAL : 293.74 1.81 295.55 1173M dumping.24: tree VRP : 94.27 ( 9%) 0.11 ( 3%) 94.39 ( 9%) 58M ( 3%) dumping.24: backwards jump threading : 102.63 ( 10%) 0.02 ( 0%) 102.67 ( 10%) 8986k ( 0%) dumping.24: TOTAL :1021.66 4.28 1025.92 2325M So at least for ltrans42, there's a big slowdown with this patch. Before, threading was 4.80% faster than VRP, whereas now it's 94.7% slower. I have a patch for the above slowdown, but I wouldn't characterize the above difference as a "compile hog". When I add up the 3 ltrans unit totals (which are basically the entire compilation), the difference is a 3% slowdown. If this PR is for a larger than 3-4% slowdown, I think we should look elsewhere. I could be wrong though ;-).
Created attachment 51896 [details] untested patch The threading slowdown here is due to the ssa_global_cache temporary. It doesn't look like ssa_global_cache was meant to be lightweight temporary cache ;-). We can avoid the temporary altogether by using the bitmap already used to determine if a cache entry is available. With this (untested) patch the ltrans42 unit is back to: tree VRP : 13.70 ( 3%) 0.04 ( 2%) 13.71 ( 3%) 45M ( 4%) backwards jump threading : 13.22 ( 3%) 0.01 ( 0%) 13.26 ( 3%) 3609k ( 0%)
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>: https://gcc.gnu.org/g:54ebec35abec09a24b47b997172622ca0d8e2318 commit r12-5694-g54ebec35abec09a24b47b997172622ca0d8e2318 Author: Aldy Hernandez <aldyh@redhat.com> Date: Mon Nov 29 14:49:59 2021 +0100 path solver: Use only one ssa_global_cache. We're using a temporary range cache while computing ranges for PHIs to make sure the real cache doesn't get set until all PHIs are computed. With the ltrans beast in LTO mode this causes undue overhead. Since we already have a bitmap to indicate whether there's a cache entry, we can avoid the extra cache object by clearing it while PHIs are being calculated. gcc/ChangeLog: PR tree-optimization/103409 * gimple-range-path.cc (path_range_query::compute_ranges_in_phis): Do all the work with just one ssa_global_cache. * gimple-range-path.h: Remove m_tmp_phi_cache.
I've fixed the threading slowdown. Can someone verify and close this PR if all the slowdown has been accounted for? If not, then someone needs to explore any slowdown unrelated to the threader.
> I've fixed the threading slowdown. Can someone verify and close this PR if all > the slowdown has been accounted for? If not, then someone needs to explore any > slowdown unrelated to the threader. The plots linked from the PR are live, so they should come back to original speed (so far they did not). https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=226.548.8 and https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.548.8
(In reply to hubicka from comment #13) > > I've fixed the threading slowdown. Can someone verify and close this PR if all > > the slowdown has been accounted for? If not, then someone needs to explore any > > slowdown unrelated to the threader. > The plots linked from the PR are live, so they should come back to > original speed (so far they did not). > > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=226.548.8 > and > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.548.8 There's now a big drop for the first graph, and a small drop for the second one.
the first graph seems to be back to normal and I think the second is withing noise range. If not I will try to figure out what happens here (one is -O2 and other -Ofast so there may be something in that).