[Bug libgomp/97542] New: Enable OpenMP efficient performance profiling via ITT tracing
vitaly.slobodskoy at huawei dot com
gcc-bugzilla@gcc.gnu.org
Fri Oct 23 11:52:42 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97542
Bug ID: 97542
Summary: Enable OpenMP efficient performance profiling via ITT
tracing
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: vitaly.slobodskoy at huawei dot com
CC: jakub at gcc dot gnu.org
Target Milestone: ---
Created attachment 49429
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49429&action=edit
OpenMP runtime changes
In order to optimize OpenMP workloads, it is quite important to have a
dedicated performance analysis tool familiar with the OpenMP runtime specifics.
The typical OpenMP performance issues are:
- Not all the performance-critical code is parallel
* Serial time significantly affects scaling (Amdahl’s law)
- Work balance is not good
* Not all the cores doing useful work
- Overhead on
* Synchronization
* Scheduling
* Threads creation
Performance analysis tool should be able to identify serially executed portion
and parallel execution within work-sharing construct. Imbalance within the
parallel region can hardly be calculated without dedicated runtime support.
The proposal is to instrument GCC OpenMP runtime with add ITT API
(https://github.com/intel/ittapi) like it was already done for LLVM
(https://github.com/llvm/llvm-project/tree/master/openmp/runtime/src/thirdparty/ittnotify)
to enable dedicated OpenMP support within the tools like Intel VTune
(https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/methodologies/openmp-code-analysis-method.html)
and others. This would enable "Serial Time", "Parallel Time", "Imbalance Time"
metrics and would allow performance tools to focus on serial or parallel
execution.
ITT is a lightweight API for source-based instrumentation. Open-source part is
simply a set of APIs and single .c file for loading dynamic ITT library
(so-called ITT collector, can be easily created by anyone). In order to enable
tracing, target application needs to be launched under the
"INTEL_LIBITTNOTIFY64=<collector>" environment variable. Otherwise all the ITT
calls would do nothing without causing any noticeable runtime overhead.
Attaching the initial proposal for the ITT integration enabling Serial/Parallel
Time metrics:
- core.patch is the actual changes within the OpenMP runtime
- Itt.patch is integration of ITT API (GPLv2 license is used)
- autogenerated.patch - the list of autogenerated files as result of
"autoreconf" launch within libgomp directory
This proposal adds new "--disable-itt-instrumentation" configure option which
completely disables (removes) all the tracing. The tracing is ON by default.
OpenMP Imbalance time calculation is not included in this patch.
More information about the Gcc-bugs
mailing list