Bug 100393 - [9/10/11 Regression] Very slow compilation of switch statement with thousands of cases
Summary: [9/10/11 Regression] Very slow compilation of switch statement with thousands...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.2.0
: P2 normal
Target Milestone: 9.5
Assignee: Martin Liška
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2021-05-03 11:46 UTC by Dannii Willis
Modified: 2021-11-05 16:10 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 11.2.0, 12.0, 8.4.0
Known to fail: 9.3.1
Last reconfirmed: 2021-05-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dannii Willis 2021-05-03 11:46:51 UTC
I have a switch statement with almost 11,000 cases (produced by a decompiler), which is taking almost 5 minutes to compile with GCC 10.2. Helpful people on Stack Overflow suggested that it was a regression, that it only took 3 seconds to compile with GCC 8.4, and for comparison, only 1.3 seconds in Clang. I haven't been able to confirm those results yet, but I'll look into doing so.

SO discussion: https://stackoverflow.com/q/67363813/2854284

Preprocessed source: https://gist.githubusercontent.com/curiousdannii/9476375ff3ae22c403ce2a8132e6a5dc/raw/568f519f2f1b599e98c514f3609a4968d4153eed/functions_unsafe.i

The `-ftime-report -ftime-report-details` results (full results in the SO page) show that the slow part is in "tree switch lowering".

I also tried compiling it with `-fno-jump-tables`, which, rather than helping, made it much worse, taking over 33 minutes to compile.
Comment 1 Dannii Willis 2021-05-03 12:04:18 UTC
Okay, I've confirmed the regression myself, using functions_unsafe.i:

gcc-8: real	0m11.450s
gcc-10: real	4m46.472s

And for comparison
clang: real	0m0.711s
Comment 2 Richard Biener 2021-05-03 13:37:01 UTC
Confirmed at -O0 and -O1+.
Comment 3 Richard Biener 2021-05-03 13:48:51 UTC
Samples: 847K of event 'cycles:u', Event count (approx.): 839745061761                                                  
Overhead       Samples  Command  Shared Object     Symbol                                                               
  95.05%        804298  cc1      cc1               [.] tree_switch_conversion::jump_table_cluster::can_be_handled
   1.69%         14493  cc1      cc1               [.] tree_switch_conversion::cluster::get_range
   0.86%          7291  cc1      cc1               [.] optimize_function_for_size_p

looks like there's obvious quadraticness here when doing the summing:

bool
jump_table_cluster::can_be_handled (const vec<cluster *> &clusters,
                                    unsigned start, unsigned end)
{
..
  for (unsigned i = start; i <= end; i++)
    {
      simple_cluster *sc = static_cast<simple_cluster *> (clusters[i]);
      comparison_count += sc->m_range_p ? 2 : 1;
    }

and we're almost trying all possible clusters in jump_table_cluster::find_jump_tables

Now, there's the possibility of doing a quick guess before doing this
loop, once via using (end - start) and once via using 2 * (end - start).
Only if the first can be handled and the second not we have to compute.

Martin?
Comment 4 Richard Biener 2021-05-03 13:58:48 UTC
I guess that's already done, so it has to be fixed in other ways, like by
keeping the partial sum when decreasing the size in

      for (unsigned j = 0; j < i; j++)
        {
          if (min[j].m_count + 1 < min[i].m_count
              && can_be_handled (clusters, j, i - 1))
            min[i] = min_cluster_item (min[j].m_count + 1, j, INT_MAX);
        }

maybe "simply" binary search j in [0, i-1] rather than doing a linear
search here.
Comment 5 Martin Liška 2021-05-10 12:22:06 UTC
Mine.
Comment 6 Richard Biener 2021-06-01 08:20:49 UTC
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Comment 7 Martin Liška 2021-08-13 15:25:16 UTC
> Now, there's the possibility of doing a quick guess before doing this
> loop, once via using (end - start) and once via using 2 * (end - start).
> Only if the first can be handled and the second not we have to compute.

Yeah, we already to that, problem with this test-case is that it's density is very close to the max growth limit (8x).
Comment 8 Martin Liška 2021-08-13 15:26:01 UTC
(In reply to Richard Biener from comment #4)
> I guess that's already done, so it has to be fixed in other ways, like by
> keeping the partial sum when decreasing the size in

I can confirm the decreasing works pretty well and we get to 6s with -O0. Hope it's a reasonable fix.
Comment 9 GCC Commits 2021-08-16 11:38:12 UTC
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:c517cf2e685e2903b591d63c1034ff9726cb3822

commit r12-2928-gc517cf2e685e2903b591d63c1034ff9726cb3822
Author: Martin Liska <mliska@suse.cz>
Date:   Fri Aug 13 17:22:35 2021 +0200

    Speed up jump table switch detection.
    
            PR tree-optimization/100393
    
    gcc/ChangeLog:
    
            * tree-switch-conversion.c (group_cluster::dump): Use
              get_comparison_count.
            (jump_table_cluster::find_jump_tables): Pre-compute number of
            comparisons and then decrement it. Cache also max_ratio.
            (jump_table_cluster::can_be_handled): Change signature.
            * tree-switch-conversion.h (get_comparison_count): New.
Comment 10 Martin Liška 2021-08-16 11:38:57 UTC
Fixed on master so far.
Comment 11 GCC Commits 2021-11-05 16:09:39 UTC
The releases/gcc-11 branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:95c7ef9fbcf2c4b760c296fae597f093b6ee9aaa

commit r11-9209-g95c7ef9fbcf2c4b760c296fae597f093b6ee9aaa
Author: Martin Liska <mliska@suse.cz>
Date:   Fri Aug 13 17:22:35 2021 +0200

    Speed up jump table switch detection.
    
            PR tree-optimization/100393
    
    gcc/ChangeLog:
    
            * tree-switch-conversion.c (group_cluster::dump): Use
              get_comparison_count.
            (jump_table_cluster::find_jump_tables): Pre-compute number of
            comparisons and then decrement it. Cache also max_ratio.
            (jump_table_cluster::can_be_handled): Change signature.
            * tree-switch-conversion.h (get_comparison_count): New.
    
    (cherry picked from commit c517cf2e685e2903b591d63c1034ff9726cb3822)
Comment 12 Martin Liška 2021-11-05 16:10:23 UTC
Fixed on GCC 11, not planning backporting more.