[PATCH] Practical Improvement to libgcc Complex Divide

Patrick McGehearty patrick.mcgehearty@oracle.com
Tue Nov 17 16:58:36 GMT 2020

Joseph, thank you for your detailed review and comments.

I will get to work on the necessary revisions as well
as find for a suitable place for sharing my random number
generating tests.

- patrick

On 11/16/2020 8:34 PM, Joseph Myers wrote:
> On Tue, 8 Sep 2020, Patrick McGehearty via Gcc-patches wrote:
>> This project started with an investigation related to
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
>> provided an overview of past and recent practice for computing complex
>> divide. The current glibc implementation is based on Robert Smith's
>> algorithm [2] from 1962.  A google search found the paper by Baudin
>> and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
>> proposed patch [4] is based on that paper.
> Thanks, I've now read Baudin and Smith so can review the patch properly.
> I'm fine with the overall algorithm, so my comments generally relate to
> how the code should best be integrated into libgcc while keeping it
> properly machine-mode-generic as far as possible.
>> I developed two sets of test set by randomly distributing values over
>> a restricted range and the full range of input values. The current
> Are these tests available somewhere?
>> Support for half, float, double, extended, and long double precision
>> is included as all are handled with suitable preprocessor symbols in a
>> single source routine. Since half precision is computed with float
>> precision as per current libgcc practice, the enhanced algorithm
>> provides no benefit for half precision and would cost performance.
>> Therefore half precision is left unchanged.
>> The existing constants for each precision:
>> float: FLT_MAX, FLT_MIN;
>> double: DBL_MAX, DBL_MIN;
>> extended and/or long double: LDBL_MAX, LDBL_MIN
>> are used for avoiding the more common overflow/underflow cases.
> In general, libgcc code works with modes, not types; hardcoding references
> to a particular mapping between modes and types is problematic.  Rather,
> the existing code in c-cppbuiltin.c that has a loop over modes should be
> extended to provide whatever information is needed, as macros defined for
> each machine mode.
>    /* For libgcc-internal use only.  */
>    if (flag_building_libgcc)
>      {
>        /* Properties of floating-point modes for libgcc2.c.  */
>        opt_scalar_float_mode mode_iter;
>        FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)
>          {
> [...]
> For example, that defines macros such as __LIBGCC_DF_FUNC_EXT__ and
> __LIBGCC_DF_MANT_DIG__.  The _FUNC_EXT__ definition involves that code
> computing a mapping to types.
> I'd suggest defining additional macros such as __LIBGCC_DF_MAX__ in the
> same code - for each supported floating-point mode.  They can be defined
> to __FLT_MAX__ etc. (the predefined macros rather than the ones in
> float.h) - the existing code that computes a suffix for functions can be
> adjusted so it also computes the string such as "FLT", "DBL", "LDBL",
> "FLT128" etc.
> (I suggest defining to __FLT_MAX__ rather than to the expansion of
> __FLT_MAX__ because that avoids any tricky interactions with the logic to
> compute such expansions lazily.  I suggest __FLT_MAX__ rather than the
> FLT_MAX name from float.h because that way you avoid any need to define
> feature test macros to access names such as FLT128_MAX.)
>> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
>> index 74ecca8..02c06d8 100644
>> --- a/gcc/c-family/c-cppbuiltin.c
>> +++ b/gcc/c-family/c-cppbuiltin.c
>> @@ -1343,6 +1343,11 @@ c_cpp_builtins (cpp_reader *pfile)
>>         builtin_define_with_value ("__LIBGCC_INIT_SECTION_ASM_OP__",
>>   				 INIT_SECTION_ASM_OP, 1);
>>   #endif
>> +      /* For libgcc float/double optimization */
>> +#ifdef HAVE_adddf3
>> +      builtin_define_with_int_value ("__LIBGCC_HAVE_HWDBL__",
>> +				 HAVE_adddf3);
>> +#endif
> This is another thing to handle more generically - possibly with something
> like the mode_has_fma function, and defining a macro for each mode, named
> after the mode, rather than only for DFmode.  For an alternative, see the
> discussion below.
>> diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
>> index ccfd6f6..8bd66c5 100644
>> --- a/libgcc/ChangeLog
>> +++ b/libgcc/ChangeLog
>> @@ -1,3 +1,10 @@
>> +2020-08-27  Patrick McGehearty <patrick.mcgehearty@oracle.com>
>> +
>> +	* libgcc2.c (__divsc3, __divdc3, __divxc3, __divtc3): Enhance
>> +	accuracy of complex divide by avoiding underflow/overflow when
>> +	ratio underflows or when arguments have very large or very
>> +	small exponents.
> Note that diffs to ChangeLog files should not now be included in patches;
> the ChangeLog content needs to be included in the proposed commit message
> instead for automatic ChangeLog generation.
>> +#if defined(L_divsc3)
>> +#define RBIG	((FLT_MAX)/2.0)
>> +#define RMIN	(FLT_MIN)
>> +#define RMIN2	(0x1.0p-21)
>> +#define RMINSCAL (0x1.0p+19)
>> +#define RMAX2	((RBIG)*(RMIN2))
>> +#endif
> I'd expect all of these to use generic macros based on the mode.  For the
> division by 2.0, probably also divide by integer 2 not 2.0 to avoid
> unwanted conversions to/from double.
>> +#if defined(L_divdc3)
>> +#define RBIG	((DBL_MAX)/2.0)
>> +#define RMIN	(DBL_MIN)
>> +#define RMIN2	(0x1.0p-53)
>> +#define RMINSCAL (0x1.0p+51)
> A plausible definition of RMIN2 might be based on the EPSILON macro for
> the mode (with RMINSCAL then being defined in terms of RMIN2).  But you're
> defining RMIN2 for SCmode to a value equal to 4 *
> FLT_EPSILON, while for DCmode you're using DBL_EPSILON / 2.  Why the
> difference?
>> +#if defined(L_divhc3)
>> +  /* half precision is handled with float precision.
>> +     no extra measures are needed to avoid overflow/underflow */
> Note for comment style that each sentence should start with an uppercase
> letter (unless that would be incorrect, e.g. for "float" meaning the C
> type) and comments should end with ".  " (two spaces after '.').
> You're duplicating essentially the same code for HCmode, and for SCmode
> when hardware DFmode is available.  I think that's a bad idea.  Rather, in
> one place you should conditionally define a macro for "wider-range
> floating-point type to use for computations in these functions" (the
> relevant thing being that it has at least twice (plus a bit) the exponent
> range, rather than extra precision).  This would use types such as SFtype
> or DFtype, not float or double, as elsewhere in libgcc.  And if that macro
> is defined, then use this simple implementation instead of the more
> complicate one.
> That leaves the question of how to define the macro.  The simple approach
> is certainly to define it to SFtype in the HCmode case (if there is
> hardware SFmode) and to DFtype in the SCmode case (if there is hardware
> DFmode).  A more complicated approach involves using various macros to
> examine properties of various possible wider-range modes to attempt to
> identity one that could be used (probably new macros from c-cppbuiltin.c
> rather than following the existing code elsewhere in libgcc2.c using
> AVOID_FP_TYPE_CONVERSION, given the incomplete coverage of definitions of
> WIDEST_HARDWARE_FP_SIZE).  But hardcoding two cases would reduce the risk
> of unexpected results for now.
> In any case, note libgcc/config/rs6000/_divkc3.c should have similar
> changes to those in libgcc2.c (to implement the more complicated
> algorithm, as no wider mode is available there).
>> +  /* Scale by max(c,d) to reduce chances of denominator overflowing */
>> +  if (FABS (c) < FABS (d))
> The comment seems questionable in the code using the simpler algorithm.
> And do you need this "if" and two cases at all there?
>> +      /* prevent underflow when denominator is near max representable */
>> +      if (FABS (d) >= RBIG)
>> +	{
>> +	  a = a * 0.5;
>> +	  b = b * 0.5;
>> +	  c = c * 0.5;
>> +	  d = d * 0.5;
> This sort of thing might best use "/ 2" (integer 2) to avoid unwanted use
> of double.
>> +	{
>> +	  if(((FABS (a) < RMIN) && (FABS (b) < RMAX2) && (FABS (d) < RMAX2)) ||
>> +	     ((FABS (b) < RMIN) && (FABS (a) < RMAX2) && (FABS (d) < RMAX2)))
> GNU style breaks lines before operators such as "||", not after.  Missing
> space after "if".
> I think the patch should add some testcases to the GCC testsuite that
> verify expected results for particular inputs to within +/- a few ulp (and
> that are checked to fail before and pass after the patch), to illustrate
> the sort of cases the patch improves.  Those will need to take the inputs
> from volatile variables to ensure the evaluation is at runtime, and it may
> be easiest to write such tests only for the cases of _Complex float and
> _Complex double to avoid complications making them portable when testing
> other types.

More information about the Gcc-patches mailing list