This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c++/82405] Function not inlined for switch and suboptimal assembly is generated
- From: "jakub at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 03 Oct 2017 16:52:43 +0000
- Subject: [Bug c++/82405] Function not inlined for switch and suboptimal assembly is generated
- Auto-submitted: auto-generated
- References: <bug-82405-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82405
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So it means maybe llvm performs more advanced switchconv in this case, at least
judging from the #c0 assembly snippet. We look solely for PHIs which have
arguments SSA_NAMEs initialized in the cases to constants, while in order to
optimize this without -ffast-math, it would need to handle at least a couple of
stmts where the operands match except for some constant argument that is
changing.
<L6> [20.00%]:
_5 = r_4(D) * 4.0e+0;
_6 = r_4(D) * _5;
goto <bb 8>; [100.00%]
<L7> [20.00%]:
_7 = r_4(D) * 3.141500000000000181188397618825547397136688232421875e+0;
_8 = r_4(D) * _7;
goto <bb 8>; [100.00%]
So, in the above case we'd look from the PHI with _6 and _8 arguments, and see
that the because the def stmt isn't assignment from constant, we'd notice it is
a binary (or unary or ternary) assign where one of the operands is identical
(r_4(D), while the other one is another SSA_NAME defined in the case, and we'd
loop to that, seeing another assign where one operand is the same and another
one is a constant. Thus, we'd build a table with the 4.0e+0 and
3.141500000000000181188397618825547397136688232421875e+0 constants, and after
the load from the table did _21 = r_4(D) * value_loaded_from_table_20; _22 =
r_4(D) * _21;
The question is if we'd require all operands matching except for one which
could be a constant eventually, or something different (allow some small number
of constant arguments to a computation).
Or should we have a separate pass that performs such an optimization (noticing
similar code blocks with just changing constant parameters and merge the blocks
except for computing the parameters)?