[Bug c++/95264] Infinite Loop When Compiling Templated C++ code at -O1 and above

rguenther at suse dot de gcc-bugzilla@gcc.gnu.org
Fri May 22 11:34:15 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95264

--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 22 May 2020, freddie at witherden dot org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95264
> 
> --- Comment #6 from Freddie Witherden <freddie at witherden dot org> ---
> (In reply to Richard Biener from comment #3)
> > So with the [[gnu::flatten]] attributes removed -O1 needs 80 seconds to
> > compile and about 3GB of memory, -O2 needs around 2 minutes (same memory),
> > -O3
> > is the same as -O2.
> > 
> > Maybe instead of [[gnu::flatten]] you want to bump --param inline-unit-growth
> > or --param large-function-growth more moderately in case you can measure an
> > effect on runtime.
> > 
> > Note multiple [[gnu::flatten]] can really exponentially grow program size
> > since it is not appearant which functions might be used from another
> > translation unit until you can use -fwhole-program (single CU program)
> > or -flto (but there [[gnu::flatten]] is applied to early to avoid such
> > growth - sth we might want to fix).  Placing things not used from outside
> > in anonymous namespaces might help.
> 
> The [[gnu::flatten]] was added to get GCC's performance in the case of T =
> double on a par with Clang's.  (We don't care about performance with T = bfloat
> as it is just used as a final polishing pass.)  I can understand why GCC does
> not want to inline it in the case of T = bfloat which is a complex type, but
> for T = double the function is basically just a sequence of mov's to populate
> an array.
> 
> As the function is of the form
> 
> for (int i = 0; i < N; i++) // N = template arg
>   for (int j = 0; j < p[N]; j++) // runtime trip count
>       foo(i, ...); // static polymorphism
> 
> with foo being a large switch-case on its first argument the expectation was
> for the compiler to inline foo, unroll the outer loop, and then prune the dead
> cases such that we have something similar to
> 
> for (int j = 0; j < p[0]; j++)
>     foo(0, ...); // inline i = 0 case
> for (int j = 0; j < p[1]; j++)
>     foo(1, ...); // inline i = 1 case
> // ...

Ah, interesting.  This kind of static polymorphism should be handled
by IPA-CP already but it's of course possible we're confused about
a detail in this very testcase.  Honza?

Instead of [[gnu::flatten]] you could use the 
__attribute__((always_inline)) attribute on the foo function definition
if you didn't simplify the outline above too much to make that
infeasible.  IIRC we do not have sth like

  [[gnu::inline]] foo(i, ...);

to force inlining of a specific call, nor [[gnu::noinline]] foo(i, ...);
both which seem useful.  Not sure if the C++ syntax would support
such placement of an attribute of course.


More information about the Gcc-bugs mailing list