Bug 50495 - Optimize exact matches in overload resolution
Summary: Optimize exact matches in overload resolution
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2011-09-23 14:47 UTC by Mathias Gaunard
Modified: 2021-11-29 14:27 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-09-23 00:00:00


Attachments
Python script to generate C++ files with many overloads (512 bytes, text/x-python)
2011-09-23 17:56 UTC, Mathias Gaunard
Details
Bash script to more easily drive python script (194 bytes, application/x-shellscript)
2011-09-23 17:58 UTC, Mathias Gaunard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mathias Gaunard 2011-09-23 14:47:16 UTC
Overload resolution in GCC (and in C++ in general) is quite slow, and I would like to suggest the following enhancement: look-up (in constant or logarithmic time) for exact matches first, then perform regular overload resolution if necessary.

The idea is that

    struct id_0 {};
    void f(id_0);

    struct id_1;
    void f(id_1) {};

    ...

and then calling

    f(id_x());

should be as fast as

    void f_0();
    void f_1();
    ...

and then calling

    f_x();

Now if this could be made to work for things like

    struct h0 {};
    struct h1 : h0 {};

    struct id_0 {};
    template<class T> void f(id_0, h0<T>);
    template<class T> void f(id_0, h1<T>);

to reduce the set of possible overloads to 2 early (templates inserted to make it non-absolutely orderable), that would be perfect.

According to my benchmarks, resolving a function with an exact match on the first argument among 1,000 tags with 10 overloads each takes 30s, while with 1,000 differently named functions of 10 overloads each it takes 100ms.
Comment 1 Paolo Carlini 2011-09-23 14:54:15 UTC
At some point Jason mentioned that other improvements were in principle possible beyond those already implemented relatively recently. No idea if this has been considered already...
Comment 2 Mathias Gaunard 2011-09-23 15:46:50 UTC
Would you happen to have a reference to those changes or discussed improvements?
I'm not testing with a very recent GCC.
Comment 3 Jason Merrill 2011-09-23 15:58:12 UTC
I recently made a number of improvements that will be in 4.7.0; see PR 48481.

It certainly would be possible to do more optimization:
1) Once we have seen a valid exact match, quickly disqualify any later candidates that are not.
2) Try to find that exact match faster with a hash or splay tree lookup.

#2 would only be useful for large overload sets, but #1 could be a minor optimization for all code.
Comment 4 Jason Merrill 2011-09-23 16:00:37 UTC
(In reply to comment #3)
> I recently made a number of improvements that will be in 4.7.0; see PR 48481.

The fn_set change in particular should cut your overload resolution time by 50%.  The other changes are to avoid memory bloat, which may or may not affect you.
Comment 5 Mathias Gaunard 2011-09-23 16:38:11 UTC
clang was already 50% faster in my tests, so I guess that will put gcc 4.7 on par with it.
Comment 6 Paolo Carlini 2011-09-23 17:08:57 UTC
By the way, since apparently you ran actual benchmarks, some testcases (and relative numbers?) would not hurt here...
Comment 7 Mathias Gaunard 2011-09-23 17:56:48 UTC
Created attachment 25349 [details]
Python script to generate C++ files with many overloads

Syntax is
./generate.py [use single function for everything (1 or 0)]
              [number of function tags] [number of overloads per function tag]
              [inheritance depth for the arguments (must be higher than number of overloads per function tag)]
              [arity of the functions]
Comment 8 Mathias Gaunard 2011-09-23 17:58:41 UTC
Created attachment 25350 [details]
Bash script to more easily drive python script

Same usage as generate.py, but doesn't take the first argument
Comment 9 Paolo Carlini 2011-09-24 22:44:04 UTC
Thanks for the scripts!