[Bug target/103008] poor inlined builtin_fmod on x86_64

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Feb 11 07:59:40 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103008

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Just as data-point on znver2 Uros testcase shows

rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=znver2
rguenther@ryzen:/tmp> numactl --physcpubind=3 /usr/bin/time ./a.out 
19.18user 0.00system 0:19.18elapsed 99%CPU (0avgtext+0avgdata 1528maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=znver2 -fno-builtin-fmod
rguenther@ryzen:/tmp> numactl --physcpubind=3 /usr/bin/time ./a.out 
19.26user 0.00system 0:19.26elapsed 99%CPU (0avgtext+0avgdata 1528maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
rguenther@ryzen:/tmp> gcc-11 t.c -Ofast -lm -march=znver2 -Dfmodf=_fmodf   
rguenther@ryzen:/tmp> numactl --physcpubind=3 /usr/bin/time ./a.out 
4.40user 0.00system 0:04.40elapsed 100%CPU (0avgtext+0avgdata 1528maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps

that's with glibc 2.31.  So the _fmodf variant is very much faster.  But
as Joseph says a general expansion like that is probably a bad idea.

The specific case of blender using doubles and fmod (x, 1.) shows that
glibc is very much slower than x87 in the test below on znver2 but the
proposed inline is very very much faster.

Note that using modf(x, &tem) is more than three times as fast as
using fmod (x, 1.) with glibc 2.31.  While we have an optab for fmod
we don't have one for modf (which has an unfortunate pointer output API).
I'm not sure whether fmod (x, 1.) == modf (x, &tem).

#include <math.h>

double
__attribute__((noinline))
_fmod (double x, double)
{
  return x - trunc (x);
}

int
main ()
{

  double a, b;
  volatile double z;

  for (a = -1000.0; a < 1000.0; a += 0.01)
    for (b = -1000.0; b < 1000.0; b += 0.1)
      {
        volatile double tem = a;
        z = fmod (tem, 1.);
      }

  return 0;
}

Note that replacing a call of fmod (x, 1.) with x - trunc (x) would
not be a simplifcation on GIMPLE so that should be possibly done
by RTL expansion?  Replacing it with modf (x, &tem) would be OK
I think (unfortunately modf doesn't seem to accept a NULL arg).
Both functions are part of C99 / POSIX so replacing one with the
other should be generally OK.

Maybe there's a function that does not compute the integer part
as well.


More information about the Gcc-bugs mailing list