This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Compile time of Expression Templates in C++
- From: Mike Stump <mrs at apple dot com>
- To: Jochen Haerdtlein <jochen dot haerdtlein at informatik dot uni-erlangen dot de>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 10 Oct 2006 14:48:17 -0700
- Subject: Re: Compile time of Expression Templates in C++
- References: <452B9F55.1090009@informatik.uni-erlangen.de>
On Oct 10, 2006, at 6:25 AM, Jochen Haerdtlein wrote:
I am a PhD student working on the extended use of expression
templates for solving partial differential equations.
Since compile time of huge expression template programs is a severe
problem
I fear that trying to solve that problem will result in orders of
magnitude decrease in compilation speed at best, after you speed it
up, as compared to new competing technologies for larger projects.
The benefit is that those tricks are standard and can be relied upon.
If you're up for innovation, identify those things that slow
compilation to a crawl in traditional C++, design a system, say based
upon a lisp system, that can `compile' that fast. Map that over to a
more static C++ environment, and then implement that in g++ and then
try it out on your problem. If it works, propose the system as
extensions to the C++ language. I think C++ would benefit from it.
Let me give you a concrete example, take:
#include <stdio.h>
template <int I, int J>
class divides {
public:
enum { answer = I%J == 0 || divides<I,J-1>::answer };
};
template<>
template <int I>
class divides<I, 1> {
public:
enum { answer = 0 };
};
template <int I>
class is_prime {
public:
enum { answer = divides<I,(I-1)/2+1>::answer == 0};
};
template <>
class is_prime<1> {
public:
enum { answer = 0 };
};
template <>
class is_prime<2> {
public:
enum { answer = 1 };
};
template <>
class is_prime<3> {
public:
enum { answer = 1 };
};
main() {
#define T(N) printf("Is %d a prime, %s.\n", N, is_prime<N>::answer ?
"yes" : "no");
T(2);
T(3);
T(4);
T(5);
T(6);
T(7);
T(8);
T(9);
T(10);
T(11);
T(12);
T(13);
T(14);
T(47797);
}
for example, it takes around 3m9.947s to compile on my system. The
competition, seen at the end, clocks in at 1862x faster, around
0m0.102s to compile. The problem is that the second isn't guaranteed
to be solved at compile time, while the above is. If this were the
only problem to so fix, one could just change g++ to evaluate all
constant expressions at compile time, no matter what and do all the
required inlining for this, and then one could write the above program
as the one at the end, and have the guarantee, as well as cleaner and
clearer code, benefits all the way around.
Just as there exists an alternative way of doing the above, I'd expect
there is an alternative way of doing what you want to do, that would
be 2-3 orders of magnitude faster than the current compilation speed
of the current solution and hopefully remain 1-2 orders faster _after_
you sped the g++ compilation speed up using more traditional techniques.
If this is secondary to your goal or if you want to remain true to the
standard, you should be able to spot bad things like linear
instantiation scans in the compiler and sped up those things in
straight forward ways and gain 2-10x in speed on some problems (n ==
5-200 type things). Just crank up the problem size and watch for the
n^2 curve and when it dominates, get a gprof at that point, and take a
look at what dominates and then fix it. The usual culprit will be a
linear search of a list (vector) with n things on it, where that is
done n times. When you see something related to templates, say (from
cp-tree.h):
/* For a static member variable template, the
DECL_TEMPLATE_INSTANTIATIONS list contains the explicitly and
implicitly generated instantiations of the variable.
or
/* For a function template, the DECL_TEMPLATE_SPECIALIZATIONS lists
contains all instantiations and specializations of the function,
including partial instantiations.
you know you are about in the right place. If you wanted to
statically look for these things, a quick grep or these two finds
things like:
tree
maybe_process_partial_specialization (tree type)
{
[ ... ]
for (t = DECL_TEMPLATE_INSTANTIATIONS
(most_general_template (CLASSTYPE_TI_TEMPLATE
(type)));
t; t = TREE_CHAIN (t))
and
if (!class_specializations_p
&& TREE_CODE (DECL_TEMPLATE_RESULT (tmpl)) == TYPE_DECL)
sp = &DECL_TEMPLATE_INSTANTIATIONS (tmpl);
else
sp = &DECL_TEMPLATE_SPECIALIZATIONS (tmpl);
head = sp;
/* Iterate through the list until we find a matching template.
*/
while (*sp != NULL_TREE)
{
I'll leave it as an exercise for the reader what algorithm would best
replace these sorts of things, or even if, for any particular one, it
is possible to do better.
Hope that helps.
Anyway, here is the obvious version that compiles faster:
#include <stdio.h>
static int
divides(int I, int J)
{
if (J == 1)
return 0;
return I%J == 0 || divides(I,J-1);
}
static int
is_prime(int I)
{
if (I == 1)
return 0;
if (I == 2 || I == 3)
return 1;
return divides(I,(I-1)/2+1) == 0;
}
main() {
#define T(N) printf("Is %d a prime, %s.\n", N, is_prime(N) ? "yes" :
"no");
T(2);
T(3);
T(4);
T(5);
T(6);
T(7);
T(8);
T(9);
T(10);
T(11);
T(12);
T(13);
T(14);
T(47797);
}