This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Complex multiplication in gcc
- From: Gabriel Paubert <paubert at iram dot es>
- To: Sean McAllister <smcallis at gmail dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Mon, 17 Jul 2017 20:32:11 +0200
- Subject: Re: Complex multiplication in gcc
- Authentication-results: sourceware.org; auth=none
- References: <CALJHFB6A-jn6S+SJRhry5wp5t+SaNGhi+DoSJqreifzyhGZJxg@mail.gmail.com>
On Mon, Jul 17, 2017 at 10:51:21AM -0600, Sean McAllister wrote:
> When generating code for a simple inner loop (instantiated with
> std::complex<float>)
>
> template <typename cx>
> void __attribute__((noinline)) benchcore(const cx* __restrict__ aa,
> const cx* __restrict__ bb, const cx* __restrict__ cc, cx* __restrict__
> dd, cx uu, cx vv, size_t nn) {
> for (ssize_t ii=0; ii < nn; ii++) {
> dd[ii] = (
> aa[ii]*uu +
> bb[ii]*vv +
> cc[ii]
> );
> }
> }
>
> g++ generates the following assembly code (g++ 7.1.0) (compiled with:
> g++ -I. test.cc -O3 -ggdb3 -o test)
[snipped]
>
> The interesting part is the two calls to __mulsc3, which the docs
> indicate computes complex multiplication according to Annex G of the
> C99 standard. This leads me to two questions.
>
> First, disassembling __mulsc3 doesn't seem to contain anything:
>
> (gdb) disassemble __mulsc3
> Dump of assembler code for function __mulsc3@plt:
> 0x0000000000400aa0 <+0>: jmpq *0x2035d2(%rip) # 0x604078
> 0x0000000000400aa6 <+6>: pushq $0xc
> 0x0000000000400aab <+11>: jmpq 0x4009d0
> End of assembler dump.
>
> What's the cause of this?
That you are disassembling the PLT (note __mulsc3@plt), which redirects
to the real function which is provided by libgcc (on my computer the
exact location is /lib/x86_64-linux-gnu/libgcc_s.so.1).
>
> Second, since I don't think I'll convince anyone to generate
> non-standard conforming code by default, could the default performance
> of complex multiplication be enhanced significantly by performing the
> isnan() checks required by Annex G and only calling the function to
> fix the results if they fail? That would move the function call
> overhead out of the critical path at least.
Gabriel