Overload resolution in GCC (and in C++ in general) is quite slow, and I would like to suggest the following enhancement: look-up (in constant or logarithmic time) for exact matches first, then perform regular overload resolution if necessary. The idea is that struct id_0 {}; void f(id_0); struct id_1; void f(id_1) {}; ... and then calling f(id_x()); should be as fast as void f_0(); void f_1(); ... and then calling f_x(); Now if this could be made to work for things like struct h0 {}; struct h1 : h0 {}; struct id_0 {}; template<class T> void f(id_0, h0<T>); template<class T> void f(id_0, h1<T>); to reduce the set of possible overloads to 2 early (templates inserted to make it non-absolutely orderable), that would be perfect. According to my benchmarks, resolving a function with an exact match on the first argument among 1,000 tags with 10 overloads each takes 30s, while with 1,000 differently named functions of 10 overloads each it takes 100ms.
At some point Jason mentioned that other improvements were in principle possible beyond those already implemented relatively recently. No idea if this has been considered already...
Would you happen to have a reference to those changes or discussed improvements? I'm not testing with a very recent GCC.
I recently made a number of improvements that will be in 4.7.0; see PR 48481. It certainly would be possible to do more optimization: 1) Once we have seen a valid exact match, quickly disqualify any later candidates that are not. 2) Try to find that exact match faster with a hash or splay tree lookup. #2 would only be useful for large overload sets, but #1 could be a minor optimization for all code.
(In reply to comment #3) > I recently made a number of improvements that will be in 4.7.0; see PR 48481. The fn_set change in particular should cut your overload resolution time by 50%. The other changes are to avoid memory bloat, which may or may not affect you.
clang was already 50% faster in my tests, so I guess that will put gcc 4.7 on par with it.
By the way, since apparently you ran actual benchmarks, some testcases (and relative numbers?) would not hurt here...
Created attachment 25349 [details] Python script to generate C++ files with many overloads Syntax is ./generate.py [use single function for everything (1 or 0)] [number of function tags] [number of overloads per function tag] [inheritance depth for the arguments (must be higher than number of overloads per function tag)] [arity of the functions]
Created attachment 25350 [details] Bash script to more easily drive python script Same usage as generate.py, but doesn't take the first argument
Thanks for the scripts!