[PATCH] switch lowering: limit number of cluster attemps

Wed Sep 23 10:10:29 GMT 2020

On Tue, Sep 22, 2020 at 02:19:12PM +0200, Richard Biener via Gcc-patches wrote:
> On September 22, 2020 1:22:12 PM GMT+02:00, "Martin Liška" <mliska@suse.cz> wrote:
> >Hi.
> >
> >The patch is about a bail out limit that needs to be added to switch
> >lowering.
> >Currently the algorithm is quadratic and needs some bail out. I've
> >tested value
> >of 100K which corresponds to about 0.2s in the problematic test-case
> >before
> >it's reached.
> >
> >Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> >Ready to be installed?
> 
> OK. Though the default limit looks high? 

This (with the new very low limit of 10000) broke bootstrap on i686-linux
PATH=~/hbin:$PATH i386 ../configure --enable-languages=default,obj-c++,lto,go,brig,d --enable-checking=yes,rtl,extra
(where ~/hbin contains as/ld/gcc/g++ wrappers that ensure 32-bit
compilation).
When compiling insn-extract.c which contains a single large function with
a huge switch in it, we punt on the switch optimization and run out of
memory during DF (sure, latent problem).
I get
cc1plus: out of memory allocating 65536 bytes after a total of 2854985728 bytes
even with --param=max-switch-clustering-attempts=100000 though, or even with
=1000000.  It works fine with =10000000 (and compiles pretty quickly, 1m50sec).
Surprisingly, with =5000000 it takes huge amounts of time (in the switchconv
code - btw, couldn't it use wide_ints rather than const_binop to save
memory/compile time?, over 6 minutes, before it dies in DF).
What I see is that the function has slightly less than 3500 switch clusters,
so for O(n^2) that is definitely more than the 10000 (which is for 100
switch clusters).

With =10000000, I see that it finishes in find_jump_tables very quickly with
attempts equal to 5997916 when it exits successfully the loop.

Now, I wonder if it is really a good idea to measure attempts as the number
of the can_be_handled calls or something similar.

And
  static unsigned HOST_WIDE_INT get_range (tree low, tree high)
  {
    tree r = fold_build2 (MINUS_EXPR, TREE_TYPE (low), high, low);
    if (!tree_fits_uhwi_p (r))
      return 0;

    return tree_to_uhwi (r) + 1;
  }
is a big waste of time and memory.
If all labels are already checked to be INTEGER_CSTs of the same type
(fold_build2 wouldn't work well if they have different types), then
  static unsigned HOST_WIDE_INT get_range (tree low, tree high)
  {
    wide_int w = wi::to_wide (high) - wi::to_wide (low);
    if (wi::neg_p (w, TYPE_SIGN (TREE_TYPE (low))) || !wi::fits_uhwi_p (w))
      return 0;
    return wi::to_uhwi (w) + 1;
  }
should do it without having to look up new and new INTEGER_CSTs in the hash
tables and/or create them?

As for the DF problem, seems it is the MIR DF problem used by REE that runs
out of memory, perhaps REE should punt on certain kinds of cfgs?

	Jakub