[PATCH] Practical Improvement to libgcc Complex Divide

Tue Nov 17 02:34:28 GMT 2020

On Tue, 8 Sep 2020, Patrick McGehearty via Gcc-patches wrote:

> This project started with an investigation related to
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
> provided an overview of past and recent practice for computing complex
> divide. The current glibc implementation is based on Robert Smith's
> algorithm [2] from 1962.  A google search found the paper by Baudin
> and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
> proposed patch [4] is based on that paper.

Thanks, I've now read Baudin and Smith so can review the patch properly.  
I'm fine with the overall algorithm, so my comments generally relate to 
how the code should best be integrated into libgcc while keeping it 
properly machine-mode-generic as far as possible.

> I developed two sets of test set by randomly distributing values over
> a restricted range and the full range of input values. The current

Are these tests available somewhere?

> Support for half, float, double, extended, and long double precision
> is included as all are handled with suitable preprocessor symbols in a
> single source routine. Since half precision is computed with float
> precision as per current libgcc practice, the enhanced algorithm
> provides no benefit for half precision and would cost performance.
> Therefore half precision is left unchanged.
> 
> The existing constants for each precision:
> float: FLT_MAX, FLT_MIN;
> double: DBL_MAX, DBL_MIN;
> extended and/or long double: LDBL_MAX, LDBL_MIN
> are used for avoiding the more common overflow/underflow cases.

In general, libgcc code works with modes, not types; hardcoding references 
to a particular mapping between modes and types is problematic.  Rather, 
the existing code in c-cppbuiltin.c that has a loop over modes should be 
extended to provide whatever information is needed, as macros defined for 
each machine mode.

  /* For libgcc-internal use only.  */
  if (flag_building_libgcc)
    {
      /* Properties of floating-point modes for libgcc2.c.  */
      opt_scalar_float_mode mode_iter;
      FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)
        {
[...]

For example, that defines macros such as __LIBGCC_DF_FUNC_EXT__ and 
__LIBGCC_DF_MANT_DIG__.  The _FUNC_EXT__ definition involves that code 
computing a mapping to types.

I'd suggest defining additional macros such as __LIBGCC_DF_MAX__ in the 
same code - for each supported floating-point mode.  They can be defined 
to __FLT_MAX__ etc. (the predefined macros rather than the ones in 
float.h) - the existing code that computes a suffix for functions can be 
adjusted so it also computes the string such as "FLT", "DBL", "LDBL", 
"FLT128" etc.

(I suggest defining to __FLT_MAX__ rather than to the expansion of 
__FLT_MAX__ because that avoids any tricky interactions with the logic to 
compute such expansions lazily.  I suggest __FLT_MAX__ rather than the 
FLT_MAX name from float.h because that way you avoid any need to define 
feature test macros to access names such as FLT128_MAX.)

> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> index 74ecca8..02c06d8 100644
> --- a/gcc/c-family/c-cppbuiltin.c
> +++ b/gcc/c-family/c-cppbuiltin.c
> @@ -1343,6 +1343,11 @@ c_cpp_builtins (cpp_reader *pfile)
>        builtin_define_with_value ("__LIBGCC_INIT_SECTION_ASM_OP__",
>  				 INIT_SECTION_ASM_OP, 1);
>  #endif
> +      /* For libgcc float/double optimization */
> +#ifdef HAVE_adddf3
> +      builtin_define_with_int_value ("__LIBGCC_HAVE_HWDBL__",
> +				 HAVE_adddf3);
> +#endif

This is another thing to handle more generically - possibly with something 
like the mode_has_fma function, and defining a macro for each mode, named 
after the mode, rather than only for DFmode.  For an alternative, see the 
discussion below.

> diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
> index ccfd6f6..8bd66c5 100644
> --- a/libgcc/ChangeLog
> +++ b/libgcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2020-08-27  Patrick McGehearty <patrick.mcgehearty@oracle.com>
> +
> +	* libgcc2.c (__divsc3, __divdc3, __divxc3, __divtc3): Enhance
> +	accuracy of complex divide by avoiding underflow/overflow when
> +	ratio underflows or when arguments have very large or very
> +	small exponents.

Note that diffs to ChangeLog files should not now be included in patches; 
the ChangeLog content needs to be included in the proposed commit message 
instead for automatic ChangeLog generation.

> +#if defined(L_divsc3)
> +#define RBIG	((FLT_MAX)/2.0)
> +#define RMIN	(FLT_MIN)
> +#define RMIN2	(0x1.0p-21)
> +#define RMINSCAL (0x1.0p+19)
> +#define RMAX2	((RBIG)*(RMIN2))
> +#endif

I'd expect all of these to use generic macros based on the mode.  For the 
division by 2.0, probably also divide by integer 2 not 2.0 to avoid 
unwanted conversions to/from double.

> +#if defined(L_divdc3)
> +#define RBIG	((DBL_MAX)/2.0)
> +#define RMIN	(DBL_MIN)
> +#define RMIN2	(0x1.0p-53)
> +#define RMINSCAL (0x1.0p+51)

A plausible definition of RMIN2 might be based on the EPSILON macro for 
the mode (with RMINSCAL then being defined in terms of RMIN2).  But you're 
defining RMIN2 for SCmode to a value equal to 4 * 
FLT_EPSILON, while for DCmode you're using DBL_EPSILON / 2.  Why the 
difference?

> +#if defined(L_divhc3)
> +  /* half precision is handled with float precision.
> +     no extra measures are needed to avoid overflow/underflow */

Note for comment style that each sentence should start with an uppercase 
letter (unless that would be incorrect, e.g. for "float" meaning the C 
type) and comments should end with ".  " (two spaces after '.').

You're duplicating essentially the same code for HCmode, and for SCmode 
when hardware DFmode is available.  I think that's a bad idea.  Rather, in 
one place you should conditionally define a macro for "wider-range 
floating-point type to use for computations in these functions" (the 
relevant thing being that it has at least twice (plus a bit) the exponent 
range, rather than extra precision).  This would use types such as SFtype 
or DFtype, not float or double, as elsewhere in libgcc.  And if that macro 
is defined, then use this simple implementation instead of the more 
complicate one.

That leaves the question of how to define the macro.  The simple approach 
is certainly to define it to SFtype in the HCmode case (if there is 
hardware SFmode) and to DFtype in the SCmode case (if there is hardware 
DFmode).  A more complicated approach involves using various macros to 
examine properties of various possible wider-range modes to attempt to 
identity one that could be used (probably new macros from c-cppbuiltin.c 
rather than following the existing code elsewhere in libgcc2.c using 
AVOID_FP_TYPE_CONVERSION, given the incomplete coverage of definitions of 
WIDEST_HARDWARE_FP_SIZE).  But hardcoding two cases would reduce the risk 
of unexpected results for now.

In any case, note libgcc/config/rs6000/_divkc3.c should have similar 
changes to those in libgcc2.c (to implement the more complicated 
algorithm, as no wider mode is available there).

> +  /* Scale by max(c,d) to reduce chances of denominator overflowing */
> +  if (FABS (c) < FABS (d))

The comment seems questionable in the code using the simpler algorithm.  
And do you need this "if" and two cases at all there?

> +      /* prevent underflow when denominator is near max representable */
> +      if (FABS (d) >= RBIG)
> +	{
> +	  a = a * 0.5;
> +	  b = b * 0.5;
> +	  c = c * 0.5;
> +	  d = d * 0.5;

This sort of thing might best use "/ 2" (integer 2) to avoid unwanted use 
of double.

> +	{
> +	  if(((FABS (a) < RMIN) && (FABS (b) < RMAX2) && (FABS (d) < RMAX2)) ||
> +	     ((FABS (b) < RMIN) && (FABS (a) < RMAX2) && (FABS (d) < RMAX2)))

GNU style breaks lines before operators such as "||", not after.  Missing 
space after "if".

I think the patch should add some testcases to the GCC testsuite that 
verify expected results for particular inputs to within +/- a few ulp (and 
that are checked to fail before and pass after the patch), to illustrate 
the sort of cases the patch improves.  Those will need to take the inputs 
from volatile variables to ensure the evaluation is at runtime, and it may 
be easiest to write such tests only for the cases of _Complex float and 
_Complex double to avoid complications making them portable when testing 
other types.

-- 
Joseph S. Myers
joseph@codesourcery.com