[Bug c++/95264] Infinite Loop When Compiling Templated C++ code at -O1 and above
rguenther at suse dot de
gcc-bugzilla@gcc.gnu.org
Fri May 22 11:34:15 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95264
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 22 May 2020, freddie at witherden dot org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95264
>
> --- Comment #6 from Freddie Witherden <freddie at witherden dot org> ---
> (In reply to Richard Biener from comment #3)
> > So with the [[gnu::flatten]] attributes removed -O1 needs 80 seconds to
> > compile and about 3GB of memory, -O2 needs around 2 minutes (same memory),
> > -O3
> > is the same as -O2.
> >
> > Maybe instead of [[gnu::flatten]] you want to bump --param inline-unit-growth
> > or --param large-function-growth more moderately in case you can measure an
> > effect on runtime.
> >
> > Note multiple [[gnu::flatten]] can really exponentially grow program size
> > since it is not appearant which functions might be used from another
> > translation unit until you can use -fwhole-program (single CU program)
> > or -flto (but there [[gnu::flatten]] is applied to early to avoid such
> > growth - sth we might want to fix). Placing things not used from outside
> > in anonymous namespaces might help.
>
> The [[gnu::flatten]] was added to get GCC's performance in the case of T =
> double on a par with Clang's. (We don't care about performance with T = bfloat
> as it is just used as a final polishing pass.) I can understand why GCC does
> not want to inline it in the case of T = bfloat which is a complex type, but
> for T = double the function is basically just a sequence of mov's to populate
> an array.
>
> As the function is of the form
>
> for (int i = 0; i < N; i++) // N = template arg
> for (int j = 0; j < p[N]; j++) // runtime trip count
> foo(i, ...); // static polymorphism
>
> with foo being a large switch-case on its first argument the expectation was
> for the compiler to inline foo, unroll the outer loop, and then prune the dead
> cases such that we have something similar to
>
> for (int j = 0; j < p[0]; j++)
> foo(0, ...); // inline i = 0 case
> for (int j = 0; j < p[1]; j++)
> foo(1, ...); // inline i = 1 case
> // ...
Ah, interesting. This kind of static polymorphism should be handled
by IPA-CP already but it's of course possible we're confused about
a detail in this very testcase. Honza?
Instead of [[gnu::flatten]] you could use the
__attribute__((always_inline)) attribute on the foo function definition
if you didn't simplify the outline above too much to make that
infeasible. IIRC we do not have sth like
[[gnu::inline]] foo(i, ...);
to force inlining of a specific call, nor [[gnu::noinline]] foo(i, ...);
both which seem useful. Not sure if the C++ syntax would support
such placement of an attribute of course.
More information about the Gcc-bugs
mailing list