This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug lto/69678] New: Missed function specialization + partial devirtualization opportunity
- From: "wschmidt at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 04 Feb 2016 20:41:38 +0000
- Subject: [Bug lto/69678] New: Missed function specialization + partial devirtualization opportunity
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69678
Bug ID: 69678
Summary: Missed function specialization + partial
devirtualization opportunity
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: lto
Assignee: unassigned at gcc dot gnu.org
Reporter: wschmidt at gcc dot gnu.org
CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org
Target Milestone: ---
Target: powerpc64le-*
Created attachment 37583
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37583&action=edit
Tarball with source, object, executable
This is a missed optimization that requires a combination of LTO, function
specialization, profiling, partial devirtualization, and inlining, so there are
plenty of places where we can go wrong. However, another compiler manages it,
and GCC is over 2x slower on this simple test as a result.
In the attachment, there are two source files, disp.c and dispf.c. dispf.c
contains two functions, "one" and "two." disp.c contains one call to "one" and
an indirect call (within a loop) that can call either "one" or "two." Both
functions are always called using the value 3 as input. "two" ignores this
parameter, while "one" has an early exit for the value 3.
The desired behavior with options -O3 -flto -fprofile-use would be:
(1) Specialize each of "one" and "two" for a parameter value of 3;
(2) Perform partial devirtualization based on profile data for the indirect
call, resulting in conditional calls to "one-prime" and "two-prime" in that
order prior to falling back to the indirect call; and
(3) Inlining the specialized functions at the three direct call sites.
Currently it appears that GCC will do neither step (1) nor step (2), making
step (3) impossible.