I have a switch statement with almost 11,000 cases (produced by a decompiler), which is taking almost 5 minutes to compile with GCC 10.2. Helpful people on Stack Overflow suggested that it was a regression, that it only took 3 seconds to compile with GCC 8.4, and for comparison, only 1.3 seconds in Clang. I haven't been able to confirm those results yet, but I'll look into doing so. SO discussion: https://stackoverflow.com/q/67363813/2854284 Preprocessed source: https://gist.githubusercontent.com/curiousdannii/9476375ff3ae22c403ce2a8132e6a5dc/raw/568f519f2f1b599e98c514f3609a4968d4153eed/functions_unsafe.i The `-ftime-report -ftime-report-details` results (full results in the SO page) show that the slow part is in "tree switch lowering". I also tried compiling it with `-fno-jump-tables`, which, rather than helping, made it much worse, taking over 33 minutes to compile.
Okay, I've confirmed the regression myself, using functions_unsafe.i: gcc-8: real 0m11.450s gcc-10: real 4m46.472s And for comparison clang: real 0m0.711s
Confirmed at -O0 and -O1+.
Samples: 847K of event 'cycles:u', Event count (approx.): 839745061761 Overhead Samples Command Shared Object Symbol 95.05% 804298 cc1 cc1 [.] tree_switch_conversion::jump_table_cluster::can_be_handled 1.69% 14493 cc1 cc1 [.] tree_switch_conversion::cluster::get_range 0.86% 7291 cc1 cc1 [.] optimize_function_for_size_p looks like there's obvious quadraticness here when doing the summing: bool jump_table_cluster::can_be_handled (const vec<cluster *> &clusters, unsigned start, unsigned end) { .. for (unsigned i = start; i <= end; i++) { simple_cluster *sc = static_cast<simple_cluster *> (clusters[i]); comparison_count += sc->m_range_p ? 2 : 1; } and we're almost trying all possible clusters in jump_table_cluster::find_jump_tables Now, there's the possibility of doing a quick guess before doing this loop, once via using (end - start) and once via using 2 * (end - start). Only if the first can be handled and the second not we have to compute. Martin?
I guess that's already done, so it has to be fixed in other ways, like by keeping the partial sum when decreasing the size in for (unsigned j = 0; j < i; j++) { if (min[j].m_count + 1 < min[i].m_count && can_be_handled (clusters, j, i - 1)) min[i] = min_cluster_item (min[j].m_count + 1, j, INT_MAX); } maybe "simply" binary search j in [0, i-1] rather than doing a linear search here.
Mine.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
> Now, there's the possibility of doing a quick guess before doing this > loop, once via using (end - start) and once via using 2 * (end - start). > Only if the first can be handled and the second not we have to compute. Yeah, we already to that, problem with this test-case is that it's density is very close to the max growth limit (8x).
(In reply to Richard Biener from comment #4) > I guess that's already done, so it has to be fixed in other ways, like by > keeping the partial sum when decreasing the size in I can confirm the decreasing works pretty well and we get to 6s with -O0. Hope it's a reasonable fix.
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>: https://gcc.gnu.org/g:c517cf2e685e2903b591d63c1034ff9726cb3822 commit r12-2928-gc517cf2e685e2903b591d63c1034ff9726cb3822 Author: Martin Liska <mliska@suse.cz> Date: Fri Aug 13 17:22:35 2021 +0200 Speed up jump table switch detection. PR tree-optimization/100393 gcc/ChangeLog: * tree-switch-conversion.c (group_cluster::dump): Use get_comparison_count. (jump_table_cluster::find_jump_tables): Pre-compute number of comparisons and then decrement it. Cache also max_ratio. (jump_table_cluster::can_be_handled): Change signature. * tree-switch-conversion.h (get_comparison_count): New.
Fixed on master so far.
The releases/gcc-11 branch has been updated by Martin Liska <marxin@gcc.gnu.org>: https://gcc.gnu.org/g:95c7ef9fbcf2c4b760c296fae597f093b6ee9aaa commit r11-9209-g95c7ef9fbcf2c4b760c296fae597f093b6ee9aaa Author: Martin Liska <mliska@suse.cz> Date: Fri Aug 13 17:22:35 2021 +0200 Speed up jump table switch detection. PR tree-optimization/100393 gcc/ChangeLog: * tree-switch-conversion.c (group_cluster::dump): Use get_comparison_count. (jump_table_cluster::find_jump_tables): Pre-compute number of comparisons and then decrement it. Cache also max_ratio. (jump_table_cluster::can_be_handled): Change signature. * tree-switch-conversion.h (get_comparison_count): New. (cherry picked from commit c517cf2e685e2903b591d63c1034ff9726cb3822)
Fixed on GCC 11, not planning backporting more.