Next: , Previous: , Up: Top   [Contents][Index]

10 OpenACC Profiling Interface

10.1 Implementation Status and Implementation-Defined Behavior

We’re implementing the OpenACC Profiling Interface as defined by the OpenACC 2.6 specification. We’re clarifying some aspects here as implementation-defined behavior, while they’re still under discussion within the OpenACC Technical Committee.

This implementation is tuned to keep the performance impact as low as possible for the (very common) case that the Profiling Interface is not enabled. This is relevant, as the Profiling Interface affects all the hot code paths (in the target code, not in the offloaded code). Users of the OpenACC Profiling Interface can be expected to understand that performance will be impacted to some degree once the Profiling Interface has gotten enabled: for example, because of the runtime (libgomp) calling into a third-party library for every event that has been registered.

We’re not yet accounting for the fact that OpenACC events may occur during event processing. We just handle one case specially, as required by CUDA 9.0 nvprof, that acc_get_device_type (acc_get_device_type)) may be called from acc_ev_device_init_start, acc_ev_device_init_end callbacks.

We’re not yet implementing initialization via a acc_register_library function that is either statically linked in, or dynamically via LD_PRELOAD. Initialization via acc_register_library functions dynamically loaded via the ACC_PROFLIB environment variable does work, as does directly calling acc_prof_register, acc_prof_unregister, acc_prof_lookup.

As currently there are no inquiry functions defined, calls to acc_prof_lookup will always return NULL.

There aren’t separate start, stop events defined for the event types acc_ev_create, acc_ev_delete, acc_ev_alloc, acc_ev_free. It’s not clear if these should be triggered before or after the actual device-specific call is made. We trigger them after.

Remarks about data provided to callbacks:


It’s not clear if for nested event callbacks (for example, acc_ev_enqueue_launch_start as part of a parent compute construct), this should be set for the nested event (acc_ev_enqueue_launch_start), or if the value of the parent construct should remain (acc_ev_compute_construct_start). In this implementation, the value will generally correspond to the innermost nested event type.


Always -1; not yet implemented.


There is no limited number of asynchronous queues in libgomp. This will always have the same value as acc_prof_info.async.


Always NULL; not yet implemented.


Always NULL; not yet implemented.


Always -1; not yet implemented.


Always -1; not yet implemented.


Always -1; not yet implemented.


Always -1; not yet implemented.

acc_event_info.event_type, acc_event_info.*.event_type

Relating to acc_prof_info.event_type discussed above, in this implementation, this will always be the same value as acc_prof_info.event_type.


For acc_ev_alloc, acc_ev_free, acc_ev_enqueue_upload_start, acc_ev_enqueue_upload_end, acc_ev_enqueue_download_start, and acc_ev_enqueue_download_end, this currently will be 1 also for explicit usage.


Always NULL; not yet implemented.


For acc_ev_alloc, and acc_ev_free, this is always NULL.

typedef union acc_api_info

… as printed in 5.2.3. Third Argument: API-Specific Information. This should obviously be typedef struct acc_api_info.


Possibly not yet implemented correctly for acc_ev_compute_construct_start, acc_ev_device_init_start, acc_ev_device_init_end: will always be acc_device_api_none for these event types. For acc_ev_enter_data_start, it will be acc_device_api_none in some cases.


Always the same as acc_prof_info.device_type.


Always -1; not yet implemented.


Always NULL; not yet implemented.


Always NULL; not yet implemented.


Always NULL; not yet implemented.

Remarks about certain event types:

acc_ev_device_init_start, acc_ev_device_init_end
acc_ev_enter_data_start, acc_ev_enter_data_end, acc_ev_exit_data_start, acc_ev_exit_data_end

Callbacks for the following event types will be invoked, but dispatch and information provided therein has not yet been thoroughly reviewed:

During device initialization, and finalization, respectively, callbacks for the following event types will not yet be invoked:

Callbacks for the following event types have not yet been implemented, so currently won’t be invoked:

For the following runtime library functions, not all expected callbacks will be invoked (mostly concerning implicit device initialization):

Aside from implicit device initialization, for the following runtime library functions, no callbacks will be invoked for shared-memory offloading devices (it’s not clear if they should be):

Next: , Previous: , Up: Top   [Contents][Index]