This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/83411] New: function multiversioning should clone the entire sub-callgraph


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83411

            Bug ID: 83411
           Summary: function multiversioning should clone the entire
                    sub-callgraph
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: h2+bugs at fsfe dot org
  Target Milestone: ---

[is this component entry correct?]

The documentation of FMV states: This aim of this project is to make it really
easy for the developer to specify multiple versions of a function, each catered
to a specific target ISA feature. GCC then takes care of creating the
dispatching code necessary to execute the right function version.

This sounds really cool, but in practice there is the huge problem that FMV
does not apply to nested functions calls forcing the developer to move the FMV
to the bottom of the callgraph incurring a possibly huge run-time penalty due
to calling the dispatch ridiculously often.

I have described this problem in detail on my blog¹, but I think it should be
quite evident to you. I would humbly suggest adding something like

__attribute__((target_clone_trees("default", "popcnt")))

that recursively also clones the nested function calls (without additional
dispatch steps and working independent of inlining or not). For optimisations
that are then generated automatically or by builtins this would already solve
all my problems :)

For manually curated SIMD code however this touches on another problem with
current FMV: one cannot rely on macros for feature detection. To fix this one
needs another mechanism to be able to find out which features are available for
the code block currently being compiled. I am not a compiler expert, but I
guess it is not possible to make macros work, simply because they are evaluated
much earlier. But could the compiler provide constexpr feature variables to the
code?
Then one could simply "clone_tree" early and "if constexpr (__has_sse4) ...
else ..." in the actual funcion.

Thanks for reading all of this and you work on GCC in general!

¹ https://hannes.hauswedell.net/post/2017/12/09/fmv/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]