93871 – COTAN is slow for complex types

Bug 93871 - COTAN is slow for complex types

Summary: COTAN is slow for complex types

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	libfortran (show other bugs)
Version:	10.0

Importance:	P3 minor
Target Milestone:	---
Assignee:	Fritz Reese

URL:
Keywords:	ice-on-valid-code

Depends on:
Blocks:

Reported:	2020-02-21 16:52 UTC by Thomas Henlich
Modified:	2020-04-21 15:36 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2020-02-21 00:00:00

Attachments
Proposed patch for COTAN speedup (283 bytes, patch) 2020-02-21 16:55 UTC, Thomas Henlich	Details \| Diff
Test case (385 bytes, text/plain) 2020-02-21 19:39 UTC, Thomas Henlich	Details
Diff that replaces handling of degree trig functions. (6.21 KB, patch) 2020-02-26 00:59 UTC, kargls	Details \| Diff
Demonstration of range reduction (319 bytes, text/plain) 2020-02-26 12:58 UTC, Thomas Henlich	Details
Patch for trunk (21.87 KB, patch) 2020-04-01 18:08 UTC, Fritz Reese	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Thomas Henlich 2020-02-21 16:52:40 UTC

Runtime tests show that the COTAN function (cotangent function, GNU extension)  needs about twice the time as TAN for complex types.

This is surprising, since it basically performs the same operations as TAN, only in a different order.

I would expect COTAN to perform about the same as TAN.

Comment 1 Thomas Henlich 2020-02-21 16:55:26 UTC

Created attachment 47883 [details]
Proposed patch for COTAN speedup

This is basically the same method mpc uses internally to compute mpc_tan (only reversed)

Comment 2 kargls 2020-02-21 18:50:42 UTC

Can you post the code you used for testing?  Your patch to simplify.c
affects compile-time constant folding.  simplify.c has nothing to do
with a run-time evaluation.

Hmmm, upon inspection, the implementation of cotan in the Fortran FE
is certainly broken.

function foo(z) result(a)
   complex a
   complex, intent(in) :: z
   intrinsic cotan
   a = cotan(z)
end function foo
  
% gfcx -c a.f90
a.f90:4:18:

    4 |    intrinsic cotan
      |                  1
Error: 'cotan' declared INTRINSIC at (1) does not exist

Removing the 'intrinsic cotan' line and running in the debugger,
one finds that none of gfc_check_fn_rc2008, gfc_simplify_cotan,
and gfc_resolve_cotan are called.  So, compiling this to assembly

% gfcx -S -o a.s -O -c a.f90


        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        call    cotan_
        movss   %xmm0, -8(%rbp)
        movl    $0x00000000, -4(%rbp)
        movq    -8(%rbp), %xmm0
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc

we see that we're calling a function named cotan.  Haven't found
where cotan lives.

% nm /usr/local/lib/gcc9/libgfortran.a | grep cotan
nm: bessel_r10.o: no symbols
nm: atomic.o: no symbols
% nm /usr/lib/libm.a | grep cotan
% find ~/gcc/gccx/libgfortran -type f | xargs grep -i cotan

Completely the program

program bar
  complex b, c
  b = (1.,1.)
  c = cotan(b)
  print *, c
end program bar

% gfcx -o z a.f90
/usr/local/bin/ld: /tmp/ccqbsxb5.o: in function `MAIN__':
a.f90:(.text+0x2d): undefined reference to `cotan_'
collect2: error: ld returned 1 exit status

Comment 3 kargls 2020-02-21 18:53:18 UTC

(In reply to kargl from comment #2)
> Can you post the code you used for testing?  Your patch to simplify.c
> affects compile-time constant folding.  simplify.c has nothing to do
> with a run-time evaluation.
> 

Nevermind, I didn't realize that this required -fdec.  It would help
if you added your test program, and command line used for testing!

Comment 4 Steve Kargl 2020-02-21 19:05:17 UTC

On Fri, Feb 21, 2020 at 06:53:18PM +0000, kargl at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #3 from kargl at gcc dot gnu.org ---
> (In reply to kargl from comment #2)
> > Can you post the code you used for testing?  Your patch to simplify.c
> > affects compile-time constant folding.  simplify.c has nothing to do
> > with a run-time evaluation.
> > 
> 
> Nevermind, I didn't realize that this required -fdec.  It would help
> if you added your test program, and command line used for testing!
> 

Adding -fdec and -fdump-tree-original, see

  c = __builtin_ccosf (b) / __builtin_csinf (b);

which comes from iresolve.c.

Comment 5 Thomas Henlich 2020-02-21 19:39:01 UTC

Created attachment 47884 [details]
Test case

Output:
th@THe-Surface:~$ /opt/gcc/bin/gfortran -L/opt/gcc/lib64 -Wl,-rpath -Wl,/opt/gcc/lib64 -fdec-math /mnt/c/Temp/test_cotan_complex.f90  && ./a.out
 Testing        (+1.00000000000000002E-003,+0.0000000000000000)
 1/tan=                   (+999.99966666664454,+0.0000000000000000)  time:  +0.243003011
 cotan=                   (+999.99966666664443,-0.0000000000000000)  time:  +0.535230994
 cos/sin=                 (+999.99966666664443,-0.0000000000000000)  time:  +0.530906022
 tan(pi/2-a)=             (+1000.0433799820958,+0.0000000000000000)  time:  +0.263039947
 Testing              (+1.0000000000000000,+0.0000000000000000)
 1/tan=                  (+0.64209261593433076,+0.0000000000000000)  time:  +0.264049053
 cotan=                  (+0.64209261593433076,-0.0000000000000000)  time:  +0.602337122
 cos/sin=                (+0.64209261593433076,-0.0000000000000000)  time:  +0.600989103
 tan(pi/2-a)=            (+0.64209267766718214,+0.0000000000000000)  time:  +0.247081995
 Testing              (+1.5700000000000001,+0.0000000000000000)
 1/tan=             (+7.96326963223192592E-004,+0.0000000000000000)  time:  +0.258723021
 cotan=             (+7.96326963223192592E-004,-0.0000000000000000)  time:  +0.588716030
 cos/sin=           (+7.96326963223192592E-004,-0.0000000000000000)  time:  +0.590244293
 tan(pi/2-a)=       (+7.96370674640914985E-004,+0.0000000000000000)  time:  +0.230789661
 Testing              (+3.0000000000000000,+0.0000000000000000)
 1/tan=                   (-7.0152525514345339,-0.0000000000000000)  time:  +0.321346283
 cotan=                   (-7.0152525514345339,-0.0000000000000000)  time:  +0.733272552
 cos/sin=                 (-7.0152525514345339,-0.0000000000000000)  time:  +0.718843937
 tan(pi/2-a)=             (-7.0152503565215953,+0.0000000000000000)  time:  +0.266773224
t

Comment 6 Thomas Henlich 2020-02-21 19:45:39 UTC

(In reply to kargl from comment #2)
> Can you post the code you used for testing?  Your patch to simplify.c
> affects compile-time constant folding.  simplify.c has nothing to do
> with a run-time evaluation.

Ok, I was looking in the wrong place.

It's still a cool patch though ;-)

Comment 7 Steve Kargl 2020-02-21 20:19:01 UTC

On Fri, Feb 21, 2020 at 07:45:39PM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #6 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> (In reply to kargl from comment #2)
> > Can you post the code you used for testing?  Your patch to simplify.c
> > affects compile-time constant folding.  simplify.c has nothing to do
> > with a run-time evaluation.
> 
> Ok, I was looking in the wrong place.
> 
> It's still a cool patch though ;-)

Any patch that makes gfortran better is a cool.

Was gonna suggest a variation of this code 
for a testcase:

program foo
  complex, parameter :: z = cotan((1.,1.))
  print *, z
end program foo

troutmask:sgk[218] gfcx -o z -fdec a.f90 && ./z
f951: internal compiler error: Segmentation fault
0xdf68ca crash_signal
        ../../gccx/gcc/toplev.c:328
0x7bae8f check_null
        ../../gccx/gcc/fortran/expr.c:2796
0x7bae8f gfc_check_init_expr(gfc_expr*)
        ../../gccx/gcc/fortran/expr.c:2897
0x7b7229 check_intrinsic_op
        ../../gccx/gcc/fortran/expr.c:2440
0x7bac91 gfc_check_init_expr(gfc_expr*)
        ../../gccx/gcc/fortran/expr.c:2850
0x7bb427 gfc_reduce_init_expr(gfc_expr*)
        ../../gccx/gcc/fortran/expr.c:3067
0x7bf156 gfc_match_init_expr(gfc_expr**)
        ../../gccx/gcc/fortran/expr.c:3113
0x7a6dde variable_decl
        ../../gccx/gcc/fortran/decl.c:2854
0x7a6dde gfc_match_data_decl()
        ../../gccx/gcc/fortran/decl.c:6130
0x812eec match_word
        ../../gccx/gcc/fortran/parse.c:65
0x812eec decode_statement
        ../../gccx/gcc/fortran/parse.c:376
0x818854 next_free
        ../../gccx/gcc/fortran/parse.c:1279
0x818854 next_statement
        ../../gccx/gcc/fortran/parse.c:1511
0x81b0ac parse_spec
        ../../gccx/gcc/fortran/parse.c:3738
0x81d0bf parse_progunit
        ../../gccx/gcc/fortran/parse.c:5848
0x81e7b4 gfc_parse_file()
        ../../gccx/gcc/fortran/parse.c:6388
0x8718f8 gfc_be_parse_file
        ../../gccx/gcc/fortran/f95-lang.c:210

Comment 8 Steve Kargl 2020-02-21 20:33:04 UTC

On Fri, Feb 21, 2020 at 08:19:01PM +0000, sgk at troutmask dot apl.washington.edu wrote:
> 
> program foo
>   complex, parameter :: z = cotan((1.,1.))
>   print *, z
> end program foo
> 

Something is definitely broken.  I'll need to
revisit intrinsic.c.  To get to the first
executable statement in gfc_simplify_cotan,

(gdb) b simplify.c:7793
(gdb) p *x
$6 = {expr_type = EXPR_FUNCTION, ts = {type = BT_UNKNOWN, kind = 0, u = {derived = 0x0, 
      cl = 0x0, pad = 0}, interface = 0x0, is_c_interop = 0, is_iso_c = 0, 
    f90_type = BT_UNKNOWN, deferred = false, interop_kind = 0x0}, rank = 0, shape = 0x0, 
  symtree = 0x2024e8570, ref = 0x0, where = {nextc = 0x202db65cc, lb = 0x202db6540}, 
  base_expr = 0x0, is_snan = 0, error = 0, user_operator = 0, mold = 0, must_finalize = 0, 
  no_bounds_check = 0, external_blas = 0, do_not_resolve_again = 0, do_not_warn = 0, 
  from_constructor = 0, representation = {length = 0, string = 0x0}, boz = {len = 0, 
    rdx = 0, str = 0x0}, value = {logical = 38700448, iokind = 38700448, integer = {{
        _mp_alloc = 38700448, _mp_size = 2, _mp_d = 0x0}}, real = {{
        _mpfr_prec = 8628635040, _mpfr_sign = 0, _mpfr_exp = 8647423552, _mpfr_d = 0x0}}, 
    complex = {{re = {{_mpfr_prec = 8628635040, _mpfr_sign = 0, _mpfr_exp = 8647423552, 
            _mpfr_d = 0x0}}, im = {{_mpfr_prec = 0, _mpfr_sign = 0, _mpfr_exp = 0, 
            _mpfr_d = 0x0}}}}, op = {op = 38700448, uop = 0x0, op1 = 0x2036d3640, 
      op2 = 0x0}, function = {actual = 0x2024e85a0, name = 0x0, isym = 0x2036d3640, 
      esym = 0x0}, compcall = {actual = 0x2024e85a0, name = 0x0, 
      base_object = 0x2036d3640, tbp = 0x0, ignore_pass = 0, assign = 0}, character = {
      length = 8628635040, string = 0x0}, constructor = 0x2024e85a0}, param_list = 0x0}

Hmmm, x should be the argument to cotan.

(gdb) p *x->value.function->isym
$9 = {name = 0x20344fa88 "cotan", lib_name = 0x20346a130 "_gfortran_cotan", 
  formal = 0x2036e6928, ts = {type = BT_REAL, kind = 4, u = {derived = 0x0, cl = 0x0, 
      pad = 0}, interface = 0x0, is_c_interop = 0, is_iso_c = 0, f90_type = BT_UNKNOWN, 
    deferred = false, interop_kind = 0x0}, elemental = 1, inquiry = 0, 
  transformational = 0, pure = 1, generic = 1, specific = 1, actual_ok = 1, noreturn = 0, 

but it apprears to be the function itself!

(gdb) p *x->value.function->actual->expr
$11 = {expr_type = EXPR_CONSTANT, ts = {type = BT_COMPLEX, kind = 4, u = {derived = 0x0, 
      cl = 0x0, pad = 0}, interface = 0x0, is_c_interop = 0, is_iso_c = 0,

Comment 9 Steve Kargl 2020-02-21 20:40:22 UTC

On Fri, Feb 21, 2020 at 08:33:04PM +0000, sgk at troutmask dot apl.washington.edu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #8 from Steve Kargl <sgk at troutmask dot apl.washington.edu> ---
> On Fri, Feb 21, 2020 at 08:19:01PM +0000, sgk at troutmask dot
> apl.washington.edu wrote:
> > 
> > program foo
> >   complex, parameter :: z = cotan((1.,1.))
> >   print *, z
> > end program foo
> > 
> 
> Something is definitely broken.  I'll need to
> revisit intrinsic.c.  To get to the first
> executable statement in gfc_simplify_cotan,
> 
> (gdb) b simplify.c:7793
> (gdb) p *x
> $6 = {expr_type = EXPR_FUNCTION, ts = {type = BT_UNKNOWN, kind = 0, u =
> {derived = 0x0, 
>       cl = 0x0, pad = 0}, interface = 0x0, is_c_interop = 0, is_iso_c = 0, 
>     f90_type = BT_UNKNOWN, deferred = false, interop_kind = 0x0}, rank = 0,
> shape = 0x0, 
>   symtree = 0x2024e8570, ref = 0x0, where = {nextc = 0x202db65cc, lb =
> 0x202db6540}, 
>   base_expr = 0x0, is_snan = 0, error = 0, user_operator = 0, mold = 0,
> must_finalize = 0, 
>   no_bounds_check = 0, external_blas = 0, do_not_resolve_again = 0, do_not_warn
> = 0, 
>   from_constructor = 0, representation = {length = 0, string = 0x0}, boz = {len
> = 0, 
>     rdx = 0, str = 0x0}, value = {logical = 38700448, iokind = 38700448,
> integer = {{
>         _mp_alloc = 38700448, _mp_size = 2, _mp_d = 0x0}}, real = {{
>         _mpfr_prec = 8628635040, _mpfr_sign = 0, _mpfr_exp = 8647423552,
> _mpfr_d = 0x0}}, 
>     complex = {{re = {{_mpfr_prec = 8628635040, _mpfr_sign = 0, _mpfr_exp =
> 8647423552, 
>             _mpfr_d = 0x0}}, im = {{_mpfr_prec = 0, _mpfr_sign = 0, _mpfr_exp =
> 0, 
>             _mpfr_d = 0x0}}}}, op = {op = 38700448, uop = 0x0, op1 =
> 0x2036d3640, 
>       op2 = 0x0}, function = {actual = 0x2024e85a0, name = 0x0, isym =
> 0x2036d3640, 
>       esym = 0x0}, compcall = {actual = 0x2024e85a0, name = 0x0, 
>       base_object = 0x2036d3640, tbp = 0x0, ignore_pass = 0, assign = 0},
> character = {
>       length = 8628635040, string = 0x0}, constructor = 0x2024e85a0},
> param_list = 0x0}
> 
> Hmmm, x should be the argument to cotan.
> 
> (gdb) p *x->value.function->isym
> $9 = {name = 0x20344fa88 "cotan", lib_name = 0x20346a130 "_gfortran_cotan", 
>   formal = 0x2036e6928, ts = {type = BT_REAL, kind = 4, u = {derived = 0x0, cl
> = 0x0, 
>       pad = 0}, interface = 0x0, is_c_interop = 0, is_iso_c = 0, f90_type =
> BT_UNKNOWN, 
>     deferred = false, interop_kind = 0x0}, elemental = 1, inquiry = 0, 
>   transformational = 0, pure = 1, generic = 1, specific = 1, actual_ok = 1,
> noreturn = 0, 
> 
> but it apprears to be the function itself!
> 
> (gdb) p *x->value.function->actual->expr
> $11 = {expr_type = EXPR_CONSTANT, ts = {type = BT_COMPLEX, kind = 4, u =
> {derived = 0x0, 
>       cl = 0x0, pad = 0}, interface = 0x0, is_c_interop = 0, is_iso_c = 0,
> 

Ugh, this diff fixes constant-folding (without your mpc_sincos) patch.

Index: gcc/fortran/simplify.c
===================================================================
--- gcc/fortran/simplify.c (revision 280157)
+++ gcc/fortran/simplify.c (working copy)
@@ -7782,26 +7787,32 @@ gfc_simplify_sum (gfc_expr *array, gfc_expr *dim, gfc_
 gfc_expr *
 gfc_simplify_cotan (gfc_expr *x)
 {
-  gfc_expr *result;
+  gfc_expr *arg, *result;
   mpc_t swp, *val;
 
-  if (x->expr_type != EXPR_CONSTANT)
+  if (x->expr_type == EXPR_FUNCTION
+      && strcmp(x->value.function.isym->name, "cotan") == 0)
+    arg = x->value.function.actual->expr;
+  else
+    arg = x;
+
+  if (arg->expr_type != EXPR_CONSTANT)
     return NULL;
 
-  result = gfc_get_constant_expr (x->ts.type, x->ts.kind, &x->where);
+  result = gfc_get_constant_expr (arg->ts.type, arg->ts.kind, &arg->where);
 
-  switch (x->ts.type)
+  switch (arg->ts.type)
     {
     case BT_REAL:
-      mpfr_cot (result->value.real, x->value.real, GFC_RND_MODE);
+      mpfr_cot (result->value.real, arg->value.real, GFC_RND_MODE);
       break;
 
     case BT_COMPLEX:
       /* There is no builtin mpc_cot, so compute cot = cos / sin.  */
       val = &result->value.complex;
       mpc_init2 (swp, mpfr_get_default_prec ());
-      mpc_cos (swp, x->value.complex, GFC_MPC_RND_MODE);
-      mpc_sin (*val, x->value.complex, GFC_MPC_RND_MODE);
+      mpc_cos (swp, arg->value.complex, GFC_MPC_RND_MODE);
+      mpc_sin (*val, arg->value.complex, GFC_MPC_RND_MODE);
       mpc_div (*val, swp, *val, GFC_MPC_RND_MODE);
       mpc_clear (swp);
       break;

But, as I stated something is broken, and I suspect it affects
all -fdec functions.

Comment 10 Fritz Reese 2020-02-21 20:44:25 UTC

Thomas, thank you for discovering this. Steve, thanks for your investigative work and the patch. I will try to look at this and get a patch in within the next week or so.

Comment 11 Steve Kargl 2020-02-21 20:58:59 UTC

On Fri, Feb 21, 2020 at 08:44:25PM +0000, foreese at gcc dot gnu.org wrote:
> 
> --- Comment #10 from Fritz Reese <foreese at gcc dot gnu.org> ---
> Thomas, thank you for discovering this. Steve, thanks for your investigative
> work and the patch. I will try to look at this and get a patch in within the
> next week or so.
> 

Hi Fritz,

I'm scratching my head on this one.  The way you added
'cotan' to intrinsic.c looks exactly like how 'sin'
is added to the list of intrinsics.  I would expect 
it to just work, but ... clearly something is amiss.

Comment 12 Steve Kargl 2020-02-22 00:18:12 UTC

On Fri, Feb 21, 2020 at 08:40:22PM +0000, sgk at troutmask dot apl.washington.edu wrote:
> 
> Ugh, this diff fixes constant-folding (without your mpc_sincos) patch.
> 
> Index: gcc/fortran/simplify.c
> ===================================================================
> --- gcc/fortran/simplify.c (revision 280157)
> +++ gcc/fortran/simplify.c (working copy)
> @@ -7782,26 +7787,32 @@ gfc_simplify_sum (gfc_expr *array, gfc_expr *dim, gfc_
>  gfc_expr *
>  gfc_simplify_cotan (gfc_expr *x)
>  {
> -  gfc_expr *result;
> +  gfc_expr *arg, *result;
>    mpc_t swp, *val;
> 
> -  if (x->expr_type != EXPR_CONSTANT)
> +  if (x->expr_type == EXPR_FUNCTION
> +      && strcmp(x->value.function.isym->name, "cotan") == 0)

  if (x->expr_type == EXPR_FUNCTION
      && (strcmp(x->value.function.isym->name, "cotan") == 0
          || strcmp(x->value.function.isym->name, "dcotan") == 0))

Something is broken. :(

Comment 13 Steve Kargl 2020-02-22 03:35:25 UTC

On Fri, Feb 21, 2020 at 08:58:59PM +0000, sgk at troutmask dot apl.washington.edu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #11 from Steve Kargl <sgk at troutmask dot apl.washington.edu> ---
> On Fri, Feb 21, 2020 at 08:44:25PM +0000, foreese at gcc dot gnu.org wrote:
> > 
> > --- Comment #10 from Fritz Reese <foreese at gcc dot gnu.org> ---
> > Thomas, thank you for discovering this. Steve, thanks for your investigative
> > work and the patch. I will try to look at this and get a patch in within the
> > next week or so.
> > 
> 
> Hi Fritz,
> 
> I'm scratching my head on this one.  The way you added
> 'cotan' to intrinsic.c looks exactly like how 'sin'
> is added to the list of intrinsics.  I would expect 
> it to just work, but ... clearly something is amiss.
> 

Found it.

Index: gcc/fortran/intrinsic.c
===================================================================
--- gcc/fortran/intrinsic.c     (revision 280157)
+++ gcc/fortran/intrinsic.c     (working copy)
@@ -4568,8 +4568,7 @@ do_simplify (gfc_intrinsic_sym *specific, gfc_expr *e)
 
   /* Some math intrinsics need to wrap the original expression.  */
   if (specific->simplify.f1 == gfc_simplify_trigd
-      || specific->simplify.f1 == gfc_simplify_atrigd
-      || specific->simplify.f1 == gfc_simplify_cotan)
+      || specific->simplify.f1 == gfc_simplify_atrigd)
     {
       result = (*specific->simplify.f1) (e);
       goto finish;

gfc_simplify_cotan doesn't need the special treatment that the
degree-family of DEC functions need.

Comment 14 Thomas Henlich 2020-02-22 10:17:17 UTC

Come for the runtime, stay for the ICE!

Comment 15 Steve Kargl 2020-02-22 15:52:37 UTC

On Sat, Feb 22, 2020 at 10:17:17AM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #14 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> Come for the runtime, stay for the ICE!

To improve the runtime, one needs to rework gfc_resolve_cotan().
This is where sin(x) and cos(x) are evaluated, then the quotient
is taken.  The slowness comes from doing the argument reduction
twice.  For some architectures, it can be improved by computed
cexp() and then using the real and imaginary parts to compute
cotan().  Why?  On some architectures, cexp() is computed via
__builtin_sincos().

Comment 16 Thomas Henlich 2020-02-22 16:37:37 UTC

Additional note:

The issue is not restricted to complex types, but also occurs for real types.

On i686-mingw32, by a factor 2...10 depending on kind.

However I could not measure any slowdown on i686-pc-linux-gnu. Due to this, and my earlier confusion about mpc/mpfr, I chose not to include it in this bug report.

Comment 17 Thomas Henlich 2020-02-24 07:42:41 UTC

The following should give an error message, not a ICE:

program test_dtrig2
  real, parameter :: d = asind(1.1)
  print *, d
end

gfortran -fdec-math test_dtrig2.f90
f951.exe: internal compiler error: Segmentation fault
...

program test_dtrig2
  real, parameter :: d = asin(1.1)
  print *, d
end

test_dtrig2.f90:2:30:

    2 |   real, parameter :: d = asin(1.1)
      |                              1
Error: Argument of ASIN at (1) must be between -1 and 1

Comment 18 Richard Biener 2020-02-24 12:09:52 UTC

Given there's no cotan or ccotan in libm indeed the solution looks like somehow
exploiting sincos.  Not sure how exactly that can be done.

Comment 19 Thomas Henlich 2020-02-24 13:44:47 UTC

Regarding the following:

https://stackoverflow.com/questions/3738384/stable-cotangent#56864824
====
Is there a more stable implementation for the cotangent function than return 1.0/tan(x)?

No.

No machine number x can get close enough to multiples of π/2 to cause tan(x) to overflow, therefore tan(x) is well-defined and finite for all floating-point encodings for any of the IEEE-754 floating-point formats, and by extension, so is cot(x) = 1.0 / tan(x).

...

Using a math library with an accurate implementation of tan() with a maximum error of ~= 0.5 ulp, we find that computing cot(x) = 1.0 / tan(x) incurs a maximum error of less than 1.5 ulp, where the additional error compared to tan() itself is contributed by the rounding error of the division.

Repeating this exhaustive test over all float values with cot(x) = cos(x) / sin(x), where sin() and cos() are computed with a maximum error of ~= 0.5 ulp, we find that the maximum error in cot() is less than 2.0 ulps, so slightly larger. This is easily explained by having three sources of error instead of two in the previous formula.
====

it may be best (and fastest) to just implement it like above: cot=1/tan

Comment 20 jsm-csl@polyomino.org.uk 2020-02-25 01:09:01 UTC

Note that for tiny arguments, tan should raise the underflow exception, 
whereas cot should not raise underflow, but maybe you're not concerned 
about exceptions raised.

Comment 21 Thomas Henlich 2020-02-25 07:32:31 UTC

One should also ask: What is the least surprising way to implement the cotangent function?

If someone uses a (non-standard) function bearing a name similar to "tangent", they probably expect a function similar in precision and runtime to the standard tangent function, and nothing which is more accurate in certain ranges, but which is also slower.

For being able to use a construct like "y = a * cotan(x)" instead of "y = a / tan(x)" there should be no additional punishment aside from standard non-compliance.

Comment 22 Steve Kargl 2020-02-25 15:07:08 UTC

On Tue, Feb 25, 2020 at 07:32:31AM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #21 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> One should also ask: What is the least surprising way to implement the
> cotangent function?
> 
> If someone uses a (non-standard) function bearing a name similar to "tangent",
> they probably expect a function similar in precision and runtime to the
> standard tangent function, and nothing which is more accurate in certain
> ranges, but which is also slower.
> 
> For being able to use a construct like "y = a * cotan(x)" instead of "y = a /
> tan(x)" there should be no additional punishment aside from standard
> non-compliance.
> 

Having spent considerable time implementing tanl() that
resides in FreeBSD's libm, I suspect no one is going to
do the work to ensure the max ULP of cotan is no worse
than max ULP of tan.

Comment 23 kargls 2020-02-26 00:59:05 UTC

Created attachment 47913 [details]
Diff that replaces handling of degree trig functions.

Comment 24 kargls 2020-02-26 00:59:31 UTC

(In reply to kargl from comment #23)
> Created attachment 47913 [details]
> Diff that replaces handling of degree trig functions.

Fritz,

I have attached a patch that completely replaces how you handled
the various degree trig functions.  This patch brings these
functions into alignment with how all other functions are handled.
The patch fixes a number of ICE issues with simplification.
Ironically, I did not include Thomas's patch that started me down
this path, so you'll want to revisit his patch.  A simplify ChangeLog
follows.

	* gfortran.h (GFC_ISYM_ACOSD,GFC_ISYM_ASIND, GFC_ISYM_ATAN2D,
	GFC_ISYM_ATAND,	GFC_ISYM_COSD, GFC_ISYM_COTAND,	GFC_ISYM_SIND,
	GFC_ISYM_TAND): New enum values.

	* intrinsic.c (add_functions): Associate acosd, asind, atand, atan2d,
	cosd, cotand, sind, and tand with its enum value, and give each
	routine a simplification routine.

	* intrinsic.c (add_functions): Connect cotan to gfc_simplify_cotan
	(do_simplify): Remove special casing of DEC math functions.

	* intrinsic.h: New prototypes for gfc_simplify_acosd, 
	gfc_simplify_asind, gfc_simplify_atand, gfc_simplify_cosd,
	gfc_simplify_cotand, gfc_simplify_sind, gfc_simplify_tand.
	Remove prototypes for deleted functions gfc_simplify_atrigd,
	gfc_simplify_trigd, gfc_resolve_cotan, gfc_resolve_trigd
	gfc_resolve_atrigd

	* iresolve.c (is_trig_resolved, copy_replace_function_shallow,
	gfc_resolve_cotan, get_radians, get_degrees, resolve_trig_call,
	gfc_resolve_trigd, gfc_resolve_atrigd, gfc_resolve_atan2d): Deleted
	functions.

	* simplify.c (gfc_simplify_acosd, gfc_simplify_asind,
	gfc_simplify_atand, gfc_simplify_atan2d, gfc_simplify_cosd,
	gfc_simplify_sind, gfc_simplify_tand, gfc_simplify_cotand): New
	function for, well, simplification.

	* simplify.c (gfc_simplify_asinh): Fix error message.

 	* simplify.c (simplify_trig_call, radians_f, gfc_simplify_atrigd):
	Delete function.

	* trans-intrinsic.c (rad2deg, deg2rad, gfc_conv_intrinsic_atrigd,
	gfc_conv_intrinsic_trigd, gfc_conv_intrinsic_cotan,
	gfc_conv_intrinsic_atan2d): New function.
	(gfc_conv_intrinsic_function): Use new enum values to select a new
	function.

Comment 25 Steve Kargl 2020-02-26 01:22:48 UTC

On Wed, Feb 26, 2020 at 12:59:31AM +0000, kargl at gcc dot gnu.org wrote:
> 
>         * trans-intrinsic.c (rad2deg, deg2rad, gfc_conv_intrinsic_atrigd,
>         gfc_conv_intrinsic_trigd, gfc_conv_intrinsic_cotan,
>         gfc_conv_intrinsic_atan2d): New function.
>         (gfc_conv_intrinsic_function): Use new enum values to select a new
>         function.
> 

Looks like gfc_conv_intrinsic_cotan isn't handling complex
as I expected.  Need to look at this a bit more closely.

Comment 26 Thomas Henlich 2020-02-26 12:58:21 UTC

Created attachment 47914 [details]
Demonstration of range reduction

There is a danger of some inaccuracy in the degree trigonometric functions (sind, cosd, tand, cotand) because a full period of 360 degrees can not be expressed accurately in radians.

This can be circumvented by using some trigonometric identities to reduce the range and also place the inaccurate x range to where |dy/dx|→min, and away from y values near zero toward high y values so that the error is mostly cancelled.

This is a best effort to still be able to use the standard library functions but also get an increased accuracy expected from the degree functions.

Thus:
1. Calculate cosd(x) = sind(90 - x)
2. Calculate cotand(x) = tand(90 - x)
3. Reduce range of sind() argument from (0...360) further to x-360 if it is above 270, and to 180-x if it is above 90

Considering the demonstration program:

$ gf10 -fdec-math periods.f90 && ./a.exe
 cos(   90.0000000     )=  -4.37113883E-08   0.00000000
 sin(   180.000000     )=  -8.74227766E-08   0.00000000
 cos(   270.000000     )=   1.19248806E-08   0.00000000
 sin(  -360.000000     )=  -0.00000000       0.00000000
 sin(   36000000.0     )=   0.00000000       0.00000000
 sin(   36000180.0     )=  -8.74227766E-08   0.00000000
 cotan(   90.0000000     )=  -4.37113883E-08   0.00000000
$ gf10 -fdec-math -fdefault-real-8 periods.f90 && ./a.exe
 cos(   90.000000000000000 )=   6.1230317691118863E-017  0.0000000000000000
 sin(   180.00000000000000 )=   1.2246063538223773E-016   0.0000000000000000
 cos(   270.00000000000000 )=  -1.8369095307335659E-016   0.0000000000000000
 sin(  -360.00000000000000 )=  -0.0000000000000000        0.0000000000000000
 sin(   36000000.000000000 )=   0.0000000000000000        0.0000000000000000
 sin(   36000180.000000000 )=   1.2246063538223773E-016   0.0000000000000000
 cotan( 90.000000000000000 )=   6.1230317691118863E-017   0.0000000000000000

showing that values for all multiples of 90 degrees can be computed to be exactly zero.

Comment 27 Steve Kargl 2020-02-26 16:02:23 UTC

On Wed, Feb 26, 2020 at 12:58:21PM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #26 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> Created attachment 47914 [details]
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47914&action=edit
> Demonstration of range reduction
> 
> There is a danger of some inaccuracy in the degree trigonometric functions
> (sind, cosd, tand, cotand) because a full period of 360 degrees can not be
> expressed accurately in radians.
> 
> This can be circumvented by using some trigonometric identities to reduce the
> range and also place the inaccurate x range to where |dy/dx|→min, and away from
> y values near zero toward high y values so that the error is mostly cancelled.

I implemented FreeBSD's cosl, sinl, and tanl, did the legwork to produce
all of the sincos family of functions, and have implemented sinpi and
friends for FreeBSD .  I'm well aware of argument reduction and ULPs.  If
you're going to go this route, then you'll need to write functions that
can be called from libgfortran.

My patch rewrites the special handling of degree trig functions to bring
these functions into the general framework that gfortran uses for all other
intrinsic routines (except a few in IEEE_ARITHEMETIC).

> This is a best effort to still be able to use the standard library functions
> but also get an increased accuracy expected from the degree functions.
> 
> Thus:
> 1. Calculate cosd(x) = sind(90 - x)
> 2. Calculate cotand(x) = tand(90 - x)
> 3. Reduce range of sind() argument from (0...360) further to x-360 if it is
> above 270, and to 180-x if it is above 90
> 

(snip)

> $ gf10 -fdec-math -fdefault-real-8 periods.f90 && ./a.exe

Any result that uses -fdefault-real-8 will be ingored by me.
That option should have been removed years ago.  I even posted
a patch that removed it.  It was not well-received.

Comment 28 Steve Kargl 2020-02-27 00:25:11 UTC

On Wed, Feb 26, 2020 at 04:02:23PM +0000, sgk at troutmask dot apl.washington.edu wrote:
> 
> > This is a best effort to still be able to use the standard library functions
> > but also get an increased accuracy expected from the degree functions.
> > 
> > Thus:
> > 1. Calculate cosd(x) = sind(90 - x)
> > 2. Calculate cotand(x) = tand(90 - x)
> > 3. Reduce range of sind() argument from (0...360) further to x-360 if it is
> > above 270, and to 180-x if it is above 90
> > 
> 

Exhaustive testing in the domain [2**(-8),720] for REAL(4) gives

troutmask:sgk[347] gfcx -o z -O2 -fdec e.f90 && ./z
Total values tested: 146014209
               ulp > 1.5   max ulp   x at max ulp 
WIP Patch[1]:   6558266   4331134.9   359.999969    
fcn below[2]:       717   1.6212178   1.79070497    

[1] Work in progress has implemented range reduction in the
    interval [0,360], but does not fold the reduced range
    as is done in fcn(x).  The reference solution in the
    ulp computation is fcn(x) converted to double precision.

[2] Only 9 exceed a max ulp of 1.6.

      ! Compute SIND(x) where x is in degrees.
      function fcn(x) result(f)
         real(sp) f
         real(sp), intent(in) :: x
         integer sgn
         real(sp) arg
         real(sp), parameter :: deg2rad = atan(1._sp) / 45
         !
         ! Required for sin(-x) = - sin(x)
         !
         sgn = 1
         if (x < 0) sgn = -1
         arg = abs(x)

         ! 0 <= arg < 360
         arg = modulo(arg, 360._sp)

         ! Fold  into [0,90] range
         if (arg <= 180) then
            if (arg > 90) arg = 180 - arg
         else
            sgn = -sgn
            arg = arg - 180
            if (arg > 90) arg = 180 - arg
         end if
         if (arg == 180) then
            f = sign(0._sp, real(sgn,sp))
         else
            f = sgn * sin(arg * deg2rad)
         end if
      end function fcn

Probably need to punt the above into libgfortran.  Note,
for x < 2**(-8) we can do sin(x) ~ x

Comment 29 Thomas Henlich 2020-02-27 07:31:43 UTC

(In reply to Steve Kargl from comment #28)
>          ! Fold  into [0,90] range
...
>          if (arg == 180) then

I don't understand how (arg == 180) could be true after folding into [0,90] range.

Comment 30 Steve Kargl 2020-02-27 17:26:20 UTC

On Thu, Feb 27, 2020 at 07:31:43AM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #29 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> (In reply to Steve Kargl from comment #28)
> >          ! Fold  into [0,90] range
> ...
> >          if (arg == 180) then
> 
> I don't understand how (arg == 180) could be true after folding into [0,90]
> range.
> 

Whoops.  I quickly threw that code together to
investigate the symmetry needed to fold [0,360]
into [0:90].  My guess is that I did a last
minute edit and moved the (arg == 180) condition
without checking or there is a subtle rounding
issue with (180 - nearest(180,+-1)).  That code
certainly isn't production quality as there is
no special handling of +-Inf or NaN or tiny x.

We should also consider the upper normal x
limit where x > xmax=2**(p-e) where p is the
precision and e is some value where modulo(x,360)
is always integral.  This is similar to the
radian sin(x) for REAL x and x > 2.**22. 

I suppose that the point is that it is likely that 
gfortran should not in-line the sind(x) function.
Creating the if-statement is doable in trans-intrinsic.c,
but may drive someone nuts to get right.  For cosd(x),
your suggestion of cosd(x) = sind(90 - module(x,360))
looks promising and could be in-lined.

Comment 31 Thomas Henlich 2020-02-27 17:53:38 UTC

I wonder, if some "correct" rounding could further increase accuracy: We know the sign and "real" magnitude of the difference deg2rad-π/180 and can round the result of sin() accordingly.

Comment 32 Steve Kargl 2020-02-27 18:55:49 UTC

On Thu, Feb 27, 2020 at 05:53:38PM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #31 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> I wonder, if some "correct" rounding could further increase
> accuracy: We know the sign and "real" magnitude of the
> difference deg2rad-π/180 and can round the result of sin()
> accordingly.

I think we can exploit a table-driven method.  Once x is 
folded into [0,90], x is split by x = n + dx where
n = 0, 1, ..., 90 and dx is in [0.,1.).

sin(x) = sin(n+d) = sin(n) * cos(d) + cos(n) * sin(d)

The table contains a high,low decomposition of {sin(1),cos(1)},
{sin(3),cos(3)}, ..., {sin(90),cos(90)}.  Denote the decomposition
with shi(n), slo(n), chi(n), clo(n), and both C(d) = cos(d) and
S(d) * sin(d) are minimax polynomials. So, we have

if (x == n)  /* Integral value of x.  */
   sin(x) = shi(n) + slo(n)   ! Is addition needed.
else
   sin(x) = shi(n)*C(d) + slo(n)*C(d) + chi(n)*S(d) + clo(n)*S(d) 

The else-branch summation can be done with Kahan summation.

Comment 33 Thomas Henlich 2020-02-28 09:30:25 UTC

Going back one step, I wonder if it would be good enough to perform a correctly rounded conversion from degrees to radians, and just use sin() as is.

  ...
     f = sgn * sin(deg2rad(arg))
  end function fcn
  
  ! Compute correctly rounded value of x * pi / 180 for x = 0 ... 90
  function deg2rad(x) result(y)
    real(sp) y
    real(sp), intent(in) :: x
    real(sp), parameter :: a = atan(1._4) / 45
    real(sp), parameter :: b = atan(1._16) / 45 - a
    y = a * x + b * x
  end function deg2rad

(The exact values of a and b still need some work)

Comment 34 Steve Kargl 2020-02-28 20:51:34 UTC

On Fri, Feb 28, 2020 at 09:30:25AM +0000, thenlich at gcc dot gnu.org wrote:
>
> Going back one step, I wonder if it would be good enough
> to perform a correctly rounded conversion from degrees to
> radians, and just use sin() as is.
> 
>   ...
>      f = sgn * sin(deg2rad(arg))
>   end function fcn
> 
>   ! Compute correctly rounded value of x * pi / 180 for x = 0 ... 90
>   function deg2rad(x) result(y)
>     real(sp) y
>     real(sp), intent(in) :: x
>     real(sp), parameter :: a = atan(1._4) / 45
>     real(sp), parameter :: b = atan(1._16) / 45 - a
>     y = a * x + b * x
>   end function deg2rad
> 
> (The exact values of a and b still need some work)
> 

The only way I know of that will get the correctly rounded
value without punting to a higher precision is to decompose
the scaling and argumet to high and low precision, and then
do the multiplcation.

shi = high_bits_of(pi/180)  ! # of bits = half of precision
slo = pi/180 - shi
xhi = high_bits_of(arg)
xlo = arg - xhi

You then do

value = slo * xlo + slo * xhi + shi * xlo + shi * xhi

The last term is exact, and the summation is accumulated
from smallest terms to largest (ie., you get correct 
rounding). 

For float precision (and little endian), I use

static const float
deg2rad(float ax)
{
   /* Split pi/180 into 12 high bits and 24 low bits. */
   static const float pihi = 1.74560547e-02f, pilo =-2.76216747e-06f;

   float lo, hi;

   /* Need union to allow the clearly of low bits. */
   union {
      unsigned int u;
      float e;
   } v;

   v.e = ax;
   v.u = (v.u >> 12) << 12;  /* 12 high bits. */
   hi = v.e;			
   lo = ax - hi;             /* low part.

   return (lo * pilo + lo * pihi + hi * pilo + hi * pihi);
}

Even this appears to have some irregularities as my exhaustive
test in the interval [1.e-8,1) with direct call to sinf() yields

count: 223622026
ulp > 1.0: 125
ulp > 1.5: 0
ulp > 1.6: 0
ulp > 1.7: 0
 max_ulp: 1.321531
x at max: 0.447627

Comment 35 Thomas Henlich 2020-03-02 11:20:56 UTC

(In reply to Steve Kargl from comment #34)
> Even this appears to have some irregularities as my exhaustive
> test in the interval [1.e-8,1) with direct call to sinf() yields
> 

This looks like a job for FMA: "Correctly rounded multiplication by arbitrary precision constants" (http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html) show that it can be proven to always work.

Comment 36 Steve Kargl 2020-03-02 20:59:53 UTC

On Mon, Mar 02, 2020 at 11:20:56AM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #35 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> (In reply to Steve Kargl from comment #34)
> > Even this appears to have some irregularities as my exhaustive
> > test in the interval [1.e-8,1) with direct call to sinf() yields
> > 
> 
> This looks like a job for FMA: "Correctly rounded multiplication
> by arbitrary precision constants"
> (http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html)
> show that it can be proven to always work.
> 

It can also be show to be slow. :-)

I spent the last few days looking into an implementation.  Here is
a summary of my findings.  Someone, who does numerical analysis for
a living and is smarter than I, may be able to do better.

Exhaustive testing in the indicated intervals for "float sindf(float x)",
"float cosdf(float x)", and "float tandf(float x)".  The reference values
are taken to be that of double (e.g., "double sindd(double x)".

All tests were done with GCC 9.2.0 with FreeBSD's libm on amd64.

The methods are described below.  It is noted the selected algorithm
should be used for all precision to minimize maintenance.  The choice
seems to come down to method 2 or method 4.  Those methods are likely
slower than method 5.

The domain [1,360] is folded into [0,45] by symmetries in sine and
cosine, and the computation is done by selecting sinf() or cosf().
I've only started to consider tandf(), so have only looked at [0,90]
with [0,45] shown below as [45,90] get messes with tanf(x)->inf
for x->90.

sindf in [1, 360]
----------+----------+----------+----------+----------+----------+
          | Method 1 | Method 2 | Method 3 | Method 4 | Method 5 | 
----------+----------+----------+----------+----------+----------+
    count | 70516737 | 70516737 | 70516737 | 70516737 | 70516737 |
ulp > 1.0 | 101589   | 101586   | 279027   | 101599   | 303168   |
ulp > 1.5 | 0        | 0        | 682      | 0        | 79       |
ulp > 1.6 | 0        | 0        | 9        | 0        | 18       |
ulp > 1.7 | 0        | 0        | 0        | 0        | 3        |
max ulp   | 1.485345 | 1.485345 | 1.621218 | 1.485345 | 1.717592 |
x max ulp | 7.173496 | 7.173496 | 1.790705 | 7.173496 | 1.790707 |
----------+----------+----------+----------+----------+----------+

cosdf in [1, 360]
----------+-----------+-----------+-----------+-----------+-----------+
          | Method 1  | Method 2  | Method 3  | Method 4  | Method 5  |
----------+-----------+-----------+-----------+-----------+-----------+
    count | 70516737  | 70516737  | 70516737  | 70516737  | 70516737  |
ulp > 1.0 | 59680     | 59681     | 94071     | 59682     | 70892     |
ulp > 1.5 | 0         | 0         | 191       | 0         | 8         |
ulp > 1.6 | 0         | 0         | 0         | 0         | 0         |
ulp > 1.7 | 0         | 0         | 0         | 0         | 0         |
max ulp   | 1.475156  | 1.475156  | 1.587572  | 1.475156  | 1.571779  |
x max ulp | 88.209297 | 88.209297 | 82.834091 | 88.209297 | 86.417046 |
----------+-----------+-----------+-----------+-----------+-----------+

tandf in [1, 45]
----------+-----------+-----------+-----------+-----------+-----------+
          | Method 1  | Method 2  | Method 3  | Method 4  | Method 5  |
----------+-----------+-----------+-----------+-----------+-----------+
    count | 45350913  | 45350913  | 45350913  | 45350913  | 45350913  |
ulp > 1.0 | 779426    | 779588    | 1317335   | 779423    | 981293    |
ulp > 1.5 | 4945      | 4946      | 14704     | 4947      | 5017      |
ulp > 1.6 | 796       | 794       | 4905      | 796       | 820       |
ulp > 1.7 | 25        | 25        | 812       | 25        | 33        |
max ulp   | 1.748144  | 1.748144  | 1.947416  | 1.748144  | 1.771557  |
x max ulp | 44.998692 | 44.998692 | 44.998466 | 44.998692 | 44.998627 |
----------+-----------+-----------+-----------+-----------+-----------+

Method 1:
   Split x = xhi + xlo and (pi/180) = shi + slo into high and low parts.
   For 32-bit float, the splitting has 12 bits in high parts shi and xhi,
   and low parts contain the trailing 12 and 24 bits in xlo and slo.
   The conversion from degrees to radians, is then 

   f = slo * xlo + slo * xhi + shi * xlo + shi * xhi.

Method 2:
   Do the multiplication of x*(pi/180) is a higher precision type.  This
   works for 32-bit float with 53-bit double, because the multiplication
   provides the needed 48 bits.  For higher precisions, the methods fails
   or will depend of software extended precision.

Method 3:
   Break x and (pi/180) into their significands and exponents with frexpf,
   then multiply significands followed by scaling with scalbnf.

Method 4:
   Split (pi/180) = shi + slo into high and low parts, then use
   fmaf(x, shi, slo).

Method 5:
   Use a naive splitting of x into an integer and remainder via 
   n = x - (int)x and r = x - n. Split (pi/180) = shi + slo into high and
   low parts.  The conversion is then

   f = slo * r + slo * n + shi * r + shi * n.

   This method while slightly worse with ULP is likely the fastest!

Comment 37 Thomas Henlich 2020-03-04 11:03:28 UTC

It would make sense to keep optimization in mind:

Several calls to conversions of the same value should be performed only once.

As a special case: Calls to compute sind(x) and cosd(x) should be optimized to a conversion, followed by a call to sincos.

The conversion could also be provided as a user function, for example in code like y=tand(x)-deg2rad(x) the conversion would need to be performed only once.

Comment 38 Thomas Henlich 2020-03-04 12:07:18 UTC

(In reply to Thomas Henlich from comment #37)
> It would make sense to keep optimization in mind:
> 
> Several calls to conversions of the same value should be performed only once.
> 
> As a special case: Calls to compute sind(x) and cosd(x) should be optimized
> to a conversion, followed by a call to sincos.
> 
> The conversion could also be provided as a user function, for example in
> code like y=tand(x)-deg2rad(x) the conversion would need to be performed
> only once.

I just realized this is probably a lot harder than it sounds, since it is incompatible with some of the range reduction steps, or at the least requires some further folding into x <= 45 for sincos to get sufficient accuracy.

On the other hand side, always folding sind(45...90) to cosd(45...0) and cosd(45...90) to sind(45...0) probably wouldn't be such a bad thing.

Comment 39 Steve Kargl 2020-03-04 15:09:51 UTC

On Wed, Mar 04, 2020 at 12:07:18PM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> On the other hand side, always folding sind(45...90) to cosd(45...0) and
> cosd(45...90) to sind(45...0) probably wouldn't be such a bad thing.
> 

It is not only a good thing, it is required to get small max ULP
near zero crossings.  Testing fma is as accurate and as fast as
the bit twiddling methods I devised.  I haven't verified, but it
seems gcc inlines fma, which likely is bit twiddling.

Comment 40 Thomas Henlich 2020-03-04 16:19:08 UTC

Now I get it, symmetry is beautiful: Both sin(x) and cos(x) for any x can always be calculated with one range reduction to 0...45, one call to sincos(), and a sign transfer for each result.

Comment 41 Thomas Henlich 2020-03-04 16:35:02 UTC

One would assume that fast FMA (https://en.wikipedia.org/wiki/FMA_instruction_set) is or will be available to the modern Fortran enthusiast, in the year 202x.

Comment 42 Steve Kargl 2020-03-04 17:01:09 UTC

On Wed, Mar 04, 2020 at 04:35:02PM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #41 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> One would assume that fast FMA
> (https://en.wikipedia.org/wiki/FMA_instruction_set) is or will be available to
> the modern Fortran enthusiast, in the year 202x.
> 

It already is.  F2018, 17.11.3 IEEE_FMA.

gfortran currently does not implement IEEE_FMA along
with a few additional IEEE_ARITHMETIC features added
in F2018.

Note, gcc/builtins.def has fma, fmaf, and fmal covered.
We'll need a mapping in libquadmath if it has a __float128
fma.

Comment 43 Thomas Henlich 2020-03-05 05:32:23 UTC

(In reply to Steve Kargl from comment #42)
> gfortran currently does not implement IEEE_FMA along
> with a few additional IEEE_ARITHMETIC features added
> in F2018.
> 
> Note, gcc/builtins.def has fma, fmaf, and fmal covered.
> We'll need a mapping in libquadmath if it has a __float128
> fma.

It has: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46402

Comment 44 Fritz Reese 2020-04-01 18:08:28 UTC

Created attachment 48164 [details]
Patch for trunk

Final version of patch submitted, pending final review.

cf. https://gcc.gnu.org/pipermail/fortran/2020-March/054162.html and the containing thread for details.

The patch includes and extends Steve's changes which rework the degree-valued trigonometric functions.

COTAN is implemented as -TAN(x+pi/2) for REAL values, and COS(x)/SIN(x) for complex values. COTAND is implemented as -TAND(x+90). These are equivalent to 1/TAN[D] but should be faster than division. SIND, COSD, and TAND are rewritten to use Steve's range folding technique into [0, 45] both at runtime from libgfortran and at simplification time for constant expressions.

Comment 45 GCC Commits 2020-04-07 17:18:53 UTC

The master branch has been updated by Fritz Reese <foreese@gcc.gnu.org>:

https://gcc.gnu.org/g:57391ddaf39f7cb85825c32e83feb1435889da51

commit r10-7603-g57391ddaf39f7cb85825c32e83feb1435889da51
Author: Fritz Reese <foreese@gcc.gnu.org>
Date:   Tue Apr 7 11:59:36 2020 -0400

    Fix PR fortran/93871 and re-implement degree-valued trigonometric intrinsics.
    
    2020-04-01  Fritz Reese  <foreese@gcc.gnu.org>
                Steven G. Kargl  <kargl@gcc.gnu.org>
    
    gcc/fortran/ChangeLog
    
            PR fortran/93871
            * gfortran.h (GFC_ISYM_ACOSD, GFC_ISYM_ASIND, GFC_ISYM_ATAN2D,
            GFC_ISYM_ATAND, GFC_ISYM_COSD, GFC_ISYM_COTAND, GFC_ISYM_SIND,
            GFC_ISYM_TAND): New.
            * intrinsic.c (add_functions): Remove check for flag_dec_math.
            Give degree trig functions simplification and name resolution
            functions (e.g, gfc_simplify_atrigd () and gfc_resolve_atrigd ()).
            (do_simplify): Remove special casing of degree trig functions.
            * intrinsic.h (gfc_simplify_acosd, gfc_simplify_asind,
            gfc_simplify_atand, gfc_simplify_cosd, gfc_simplify_cotand,
            gfc_simplify_sind, gfc_simplify_tand, gfc_resolve_trigd2): Add new
            prototypes.
            (gfc_simplify_atrigd, gfc_simplify_trigd, gfc_resolve_cotan,
            resolve_atrigd): Remove prototypes of deleted functions.
            * iresolve.c (is_trig_resolved, copy_replace_function_shallow,
            gfc_resolve_cotan, get_radians, get_degrees, resolve_trig_call,
            gfc_resolve_atrigd, gfc_resolve_atan2d): Delete functions.
            (gfc_resolve_trigd, gfc_resolve_trigd2): Resolve to library functions.
            * simplify.c (rad2deg, deg2rad, gfc_simplify_acosd, gfc_simplify_asind,
            gfc_simplify_atand, gfc_simplify_atan2d, gfc_simplify_cosd,
            gfc_simplify_sind, gfc_simplify_tand, gfc_simplify_cotand): New
            functions.
            (gfc_simplify_atan2): Fix error message.
            (simplify_trig_call, gfc_simplify_trigd, gfc_simplify_atrigd,
            radians_f): Delete functions.
            * trans-intrinsic.c: Add LIB_FUNCTION decls for sind, cosd, tand.
            (rad2deg, gfc_conv_intrinsic_atrigd, gfc_conv_intrinsic_cotan,
            gfc_conv_intrinsic_cotand, gfc_conv_intrinsic_atan2d): New functions.
            (gfc_conv_intrinsic_function): Handle ACOSD, ASIND, ATAND, COTAN,
            COTAND, ATAN2D.
            * trigd_fe.inc: New file. Included by simplify.c to implement
            simplify_sind, simplify_cosd, simplify_tand with code common to the
            libgfortran implementation.
    
    gcc/testsuite/ChangeLog
    
            PR fortran/93871
            * gfortran.dg/dec_math.f90: Extend coverage to real(10) and real(16).
            * gfortran.dg/dec_math_2.f90: New test.
            * gfortran.dg/dec_math_3.f90: Likewise.
            * gfortran.dg/dec_math_4.f90: Likewise.
            * gfortran.dg/dec_math_5.f90: Likewise.
    
    libgfortran/ChangeLog
    
            PR fortran/93871
            * Makefile.am, Makefile.in: New make rule for intrinsics/trigd.c.
            * gfortran.map: New routines for {sind, cosd, tand}X{r4, r8, r10, r16}.
            * intrinsics/trigd.c, intrinsics/trigd_lib.inc, intrinsics/trigd.inc:
            New files. Defines native degree-valued trig functions.

Comment 46 Fritz Reese 2020-04-07 17:20:45 UTC

Fixed on trunk.

Comment 47 GCC Commits 2020-04-08 15:56:41 UTC

The master branch has been updated by Tobias Burnus <burnus@gcc.gnu.org>:

https://gcc.gnu.org/g:faa0817311f43e0d4d223d53c816b0c74ec35c4e

commit r10-7631-gfaa0817311f43e0d4d223d53c816b0c74ec35c4e
Author: Tobias Burnus <tobias@codesourcery.com>
Date:   Wed Apr 8 17:54:04 2020 +0200

    Move gfortran.dg/dec_math_5.f90 to ./ieee/
    
            PR fortran/93871
            * gfortran.dg/dec_math_5.f90: Move to ...
            * gfortran.dg/ieee/dec_math_1.f90: ... here; change
            dg-options to dg-additional-options.

Comment 48 ktkachov 2020-04-20 14:22:36 UTC

(In reply to CVS Commits from comment #45)
> The master branch has been updated by Fritz Reese <foreese@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:57391ddaf39f7cb85825c32e83feb1435889da51
> 
> commit r10-7603-g57391ddaf39f7cb85825c32e83feb1435889da51
> Author: Fritz Reese <foreese@gcc.gnu.org>
> Date:   Tue Apr 7 11:59:36 2020 -0400
> 
>     Fix PR fortran/93871 and re-implement degree-valued trigonometric
> intrinsics.
>     
>     2020-04-01  Fritz Reese  <foreese@gcc.gnu.org>
>                 Steven G. Kargl  <kargl@gcc.gnu.org>
>     
>     gcc/fortran/ChangeLog
>     
>             PR fortran/93871
>             * gfortran.h (GFC_ISYM_ACOSD, GFC_ISYM_ASIND, GFC_ISYM_ATAN2D,
>             GFC_ISYM_ATAND, GFC_ISYM_COSD, GFC_ISYM_COTAND, GFC_ISYM_SIND,
>             GFC_ISYM_TAND): New.
>             * intrinsic.c (add_functions): Remove check for flag_dec_math.
>             Give degree trig functions simplification and name resolution
>             functions (e.g, gfc_simplify_atrigd () and gfc_resolve_atrigd
> ()).
>             (do_simplify): Remove special casing of degree trig functions.
>             * intrinsic.h (gfc_simplify_acosd, gfc_simplify_asind,
>             gfc_simplify_atand, gfc_simplify_cosd, gfc_simplify_cotand,
>             gfc_simplify_sind, gfc_simplify_tand, gfc_resolve_trigd2): Add
> new
>             prototypes.
>             (gfc_simplify_atrigd, gfc_simplify_trigd, gfc_resolve_cotan,
>             resolve_atrigd): Remove prototypes of deleted functions.
>             * iresolve.c (is_trig_resolved, copy_replace_function_shallow,
>             gfc_resolve_cotan, get_radians, get_degrees, resolve_trig_call,
>             gfc_resolve_atrigd, gfc_resolve_atan2d): Delete functions.
>             (gfc_resolve_trigd, gfc_resolve_trigd2): Resolve to library
> functions.
>             * simplify.c (rad2deg, deg2rad, gfc_simplify_acosd,
> gfc_simplify_asind,
>             gfc_simplify_atand, gfc_simplify_atan2d, gfc_simplify_cosd,
>             gfc_simplify_sind, gfc_simplify_tand, gfc_simplify_cotand): New
>             functions.
>             (gfc_simplify_atan2): Fix error message.
>             (simplify_trig_call, gfc_simplify_trigd, gfc_simplify_atrigd,
>             radians_f): Delete functions.
>             * trans-intrinsic.c: Add LIB_FUNCTION decls for sind, cosd, tand.
>             (rad2deg, gfc_conv_intrinsic_atrigd, gfc_conv_intrinsic_cotan,
>             gfc_conv_intrinsic_cotand, gfc_conv_intrinsic_atan2d): New
> functions.
>             (gfc_conv_intrinsic_function): Handle ACOSD, ASIND, ATAND, COTAN,
>             COTAND, ATAN2D.
>             * trigd_fe.inc: New file. Included by simplify.c to implement
>             simplify_sind, simplify_cosd, simplify_tand with code common to
> the
>             libgfortran implementation.
>     
>     gcc/testsuite/ChangeLog
>     
>             PR fortran/93871
>             * gfortran.dg/dec_math.f90: Extend coverage to real(10) and
> real(16).
>             * gfortran.dg/dec_math_2.f90: New test.
>             * gfortran.dg/dec_math_3.f90: Likewise.
>             * gfortran.dg/dec_math_4.f90: Likewise.
>             * gfortran.dg/dec_math_5.f90: Likewise.
>     
>     libgfortran/ChangeLog
>     
>             PR fortran/93871
>             * Makefile.am, Makefile.in: New make rule for intrinsics/trigd.c.
>             * gfortran.map: New routines for {sind, cosd, tand}X{r4, r8,
> r10, r16}.
>             * intrinsics/trigd.c, intrinsics/trigd_lib.inc,
> intrinsics/trigd.inc:
>             New files. Defines native degree-valued trig functions.

I think this broke the build for bare-metal (newlib) targets like aarch64-none-elf:

libgfortran/intrinsics/trigd_lib.inc:55:56: error: implicit declaration of function 'copysignl' [-Werror=implicit-function-declaration]
   55 | #define mpfr_copysign(rop, op1, op2, rnd) rop = SUFFIX(copysign)((op1), (op2))
      |                                                        ^~~~~~~~

I think newlib doesn't support long double functions well so likely doesn't have copysignl. Is there some way this use can be conditionalised on library support?

Comment 49 Fritz Reese 2020-04-20 23:39:09 UTC

(In reply to ktkachov from comment #48)
> (In reply to CVS Commits from comment #45)
[...]
> 
> I think this broke the build for bare-metal (newlib) targets like
> aarch64-none-elf:
> 
> libgfortran/intrinsics/trigd_lib.inc:55:56: error: implicit declaration of
> function 'copysignl' [-Werror=implicit-function-declaration]
>    55 | #define mpfr_copysign(rop, op1, op2, rnd) rop =
> SUFFIX(copysign)((op1), (op2))
>       |                                                        ^~~~~~~~
> 
> I think newlib doesn't support long double functions well so likely doesn't
> have copysignl. Is there some way this use can be conditionalised on library
> support?

I believe I should use HAVE_* macros from libgfortran/config.h to check availability of the required functions fmod[x], fabs[x], and copysign[x]. I will check this out and patch a fix shortly, thanks for the report.

Comment 50 akrl 2020-04-21 14:39:51 UTC

Hi,

I've opened a dedicated bug for the discussed build issue:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871

Comment 51 Jakub Jelinek 2020-04-21 15:35:01 UTC

You meant PR94694, right?

Comment 52 akrl 2020-04-21 15:36:20 UTC

Yeah sorry