This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 3/4, libgomp] Resolve deadlock on plugin exit, HSA plugin parts
- From: Martin Jambor <mjambor at suse dot cz>
- To: Chung-Lin Tang <cltang at codesourcery dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 24 Mar 2016 19:40:48 +0100
- Subject: Re: [PATCH 3/4, libgomp] Resolve deadlock on plugin exit, HSA plugin parts
- Authentication-results: sourceware.org; auth=none
- References: <56EFCB59 dot 7050205 at codesourcery dot com>
Hi,
On Mon, Mar 21, 2016 at 06:22:17PM +0800, Chung-Lin Tang wrote:
> Hi Martin, I think you're the one to CC for this,
> as I mentioned in the first email, this has been build tested, however I did
> not know if I could test this without a Radeon card. If convenient,
> could you or anyone familiar with the setup do a make check-target-libgomp
> with this patch series?
>
> Thanks,
> Chung-Lin
>
>
> * plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable
> to 'hsa_error_msg', for clarity.
> (hsa_fatal): Likewise.
> (hsa_error): New function.
> (init_hsa_context): Change return type to bool, adjust to return
> false on error.
> (queue_callback): Adjust to call hsa_error.
> (GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context
> return value.
> (GOMP_OFFLOAD_init_device): Change return type to bool, adjust to
> return false on error.
> (get_agent_info): Adjust to return NULL on error.
> (destroy_hsa_program): Change return type to bool, adjust to
> return false on error.
> (GOMP_OFFLOAD_load_image): Adjust to return -1 on error.
> (destroy_module): Change return type to bool, adjust to
> return false on error.
> (GOMP_OFFLOAD_unload_image): Likewise.
> (GOMP_OFFLOAD_fini_device): Likewise.
> (GOMP_OFFLOAD_alloc): Change to return NULL when called.
> (GOMP_OFFLOAD_free): Change to return false when called.
> (GOMP_OFFLOAD_dev2host): Likewise.
> (GOMP_OFFLOAD_host2dev): Likewise.
> (GOMP_OFFLOAD_dev2dev): Likewise.
On the whole, I am fine with the patch but there are two issues:
First, and generally, when you change the return type of a function,
you must document what return values mean in the comment of the
function. Most importantly, it must be immediately apparent whether a
function returns true or false on failure from its comment. So please
fix that.
Second...
> Index: libgomp/plugin/plugin-hsa.c
> ===================================================================
> --- libgomp/plugin/plugin-hsa.c (revision 234358)
> +++ libgomp/plugin/plugin-hsa.c (working copy)
> @@ -175,10 +175,10 @@ hsa_warn (const char *str, hsa_status_t status)
> if (!debug)
> return;
>
> - const char *hsa_error;
> - hsa_status_string (status, &hsa_error);
> + const char *hsa_error_msg;
> + hsa_status_string (status, &hsa_error_msg);
>
> - fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error);
> + fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error_msg);
> }
>
> /* Report a fatal error STR together with the HSA error corresponding to STATUS
> @@ -187,12 +187,25 @@ hsa_warn (const char *str, hsa_status_t status)
> static void
> hsa_fatal (const char *str, hsa_status_t status)
> {
> - const char *hsa_error;
> - hsa_status_string (status, &hsa_error);
> + const char *hsa_error_msg;
> + hsa_status_string (status, &hsa_error_msg);
> GOMP_PLUGIN_fatal ("HSA fatal error: %s\nRuntime message: %s", str,
> - hsa_error);
> + hsa_error_msg);
> }
>
> +/* Like hsa_fatal, except only report error message, and return FALSE
> + for propagating error processing to outside of plugin. */
> +
> +static bool
> +hsa_error (const char *str, hsa_status_t status)
> +{
> + const char *hsa_error_msg;
> + hsa_status_string (status, &hsa_error_msg);
> + GOMP_PLUGIN_error ("HSA fatal error: %s\nRuntime message: %s", str,
> + hsa_error_msg);
> + return false;
> +}
> +
> struct hsa_kernel_description
> {
> const char *name;
...
> /* Callback of dispatch queues to report errors. */
> @@ -454,7 +471,7 @@ queue_callback (hsa_status_t status,
> hsa_queue_t *queue __attribute__ ((unused)),
> void *data __attribute__ ((unused)))
> {
> - hsa_fatal ("Asynchronous queue error", status);
> + hsa_error ("Asynchronous queue error", status);
> }
...I believe this hunk is wrong. Errors reported in this way mean
that something is very wrong and generally happen during execution of
code on HSA GPU, i.e. within GOMP_OFFLOAD_run. And since you left
calls in create_single_kernel_dispatch, which is called as a part of
GOMP_OFFLOAD_run, intact, I believe you actually want to leave
hsa_fatel here too.
Thanks,
Martin