[PATCH v4] libgfortran: Replace mutex with rwlock

Bernhard Reutner-Fischer rep.dot.nop@gmail.com
Mon May 8 10:28:48 GMT 2023


On Mon,  8 May 2023 17:44:43 +0800
Lipeng Zhu <lipeng.zhu@intel.com> wrote:

> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT

See commentary typos below.
You did not state if you regression tested the patch?
Other than that it LGTM but i cannot approve it.

> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h
> index ad226c8e856..0033cc74252 100644
> --- a/libgfortran/io/async.h
> +++ b/libgfortran/io/async.h
> @@ -210,6 +210,128 @@
>      DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \
>    } while (0)
>  
> +#ifdef __GTHREAD_RWLOCK_INIT
> +#define RWLOCK_DEBUG_ADD(rwlock) do {		\
> +    aio_rwlock_debug *n;				\
> +    n = xmalloc (sizeof(aio_rwlock_debug));	\

Missing space before the open brace: sizeof (

> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
> index 82664dc5f98..62f1db21d34 100644
> --- a/libgfortran/io/unit.c
> +++ b/libgfortran/io/unit.c
> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  
>  
>  /* IO locking rules:
> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   And use the rwlock to spilt read and write phase to UNIT_ROOT tree
> +   and UNIT_CACHE to increase CPU efficiency.

s/spilt/split. Maybe:

Using an rwlock improves efficiency by allowing us to separate readers
and writers of both UNIT_ROOT and UNIT_CACHE.

> @@ -350,6 +356,17 @@ retry:
>        if (c == 0)
>  	break;
>      }
> +  /* We did not find a unit in the cache nor in the unit list, create a new
> +    (locked) unit and insert into the unit list and cache.
> +    Manipulating either or both the unit list and the unit cache requires to
> +    hold a write-lock [for obvious reasons]:
> +    1. By separating the read/write lock, it will greatly reduce the contention
> +       at the read part, while write part is not always necessary or most
> +       unlikely once the unit hit in cache.

+    By separating the read/write lock, we will greatly reduce the contention
+    on the read part, while the write part is unlikely once the unit hits
+    the cache.

> +    2. We try to balance the implementation complexity and the performance
> +       gains that fit into current cases we observed by just using a
> +       pthread_rwlock. */

Let's drop 2.
thanks,


More information about the Gcc-patches mailing list