43108 – [4.5 regression] mixed complex<T> multiplication horribly inefficient

Bug 43108 - [4.5 regression] mixed complex<T> multiplication horribly inefficient

Summary: [4.5 regression] mixed complex<T> multiplication horribly inefficient

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c++ (show other bugs)
Version:	4.5.0

Importance:	P3 normal
Target Milestone:	4.5.0
Assignee:	Jason Merrill

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2010-02-17 21:04 UTC by Jan van Dijk
Modified:	2010-02-18 20:18 UTC (History)
CC List:	4 users (show)

See Also:
Host:	x86_64-unknown-linux-gnu
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2010-02-18 17:54:17

Attachments
testcase (377 bytes, text/plain) 2010-02-17 21:05 UTC, Jan van Dijk	Details
patch (229 bytes, patch) 2010-02-17 22:15 UTC, Jan van Dijk	Details \| Diff
changelog entry (182 bytes, text/plain) 2010-02-17 22:16 UTC, Jan van Dijk	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jan van Dijk 2010-02-17 21:04:00 UTC

Dear all,

=== problem description ===

Mixed T - complex<T> algebra is horribly inefficient in svn-trunk. This is demonstrated by the attached testcase 'test.cpp'.

*** gcc-4.4.1:

$ g++ -O2 test.cpp && time echo 5000000000 | ./a.out
...
user    0m8.817s

$ g++ -DUSE_MY_CMULT -O2 test.cpp && time echo 5000000000 | ./a.out
...
user    0m8.825s

*** gcc 4.5.0 20100217 (revision 156834):

$ g++ -O2 test.cpp && time echo 5000000000 | ./a.out
...
user    0m28.574s  # Aaaaaaargh!!!

$ g++ -DUSE_MY_CMULT -O2 test.cpp && time echo 5000000000 | ./a.out
...
user    0m8.817s

=== analysis ===

The libstdc++ implementation of std::complex<T>::operator*=(T val) does a '/= val' on its '__complex__ T' data representation (_GLIBCXX_USE_C99_COMPLEX is enabled on my OpenSUSE GNU/Linux box). In present gcc, 'val' is promoted to a (degenerate) complex type. Next, in 'expand_complex_multiplication' (tree-complex.c) this leads to the emission of a libcall (to __muldc3), since -fcx-limited-range is not in effect, so no reduction is allowed.

Indeed, when the straightforward custom operator*= is active in the testcase above (-DUSE_MY_CMULT), we get back the performance that we wish: this version simply multiplies the scalar components of the complex number separately.

Would a <complex> patch that implements scalar multiplication (and division) of complex specialisations this way be accepted? The factor 3 performance loss is unacceptable for any serious numerical work in C++.

=== Discussion, caveats, related issues ===

* such patch would not address the C99-issue of complex*scalar; but at least math-oriented C++ users will be happy.

* If this change is implemented, PR28408 is a C(99)-only issue. All signs are then correct, the inf/nan issues have apparently been previously fixed.

* From the C++0x working draft (26.4.6, item 7):

        template<class T> complex<T> operator*(const complex<T>& lhs, const complex<T>& rhs);
        template<class T> complex<T> operator*(const complex<T>& lhs, const T& rhs);
        template<class T> complex<T> operator*(const T& lhs, const complex<T>& rhs);

        Returns: complex<T>(lhs) *= rhs.

Note the third overload: of course this should be implemented efficiently as 'return complex<T>(rhs) *= lhs', which is what libstdc++ does --- fortunately. This is however *not* the same as the return value which is specified by the standard (think signed zeros). Is this a libstdc++ bug, or an editorial oversight in N3000=09-0190. In   the last case, is someone on this list in the position to raise this issue?

Comment 1 Jan van Dijk 2010-02-17 21:05:26 UTC

Created attachment 19898 [details]
testcase

Comment 2 Paolo Carlini 2010-02-17 21:24:31 UTC

I don't see why you want to deal with special cases in the library instead of the optimizers: indeed, your *very* analysis shows that the missed optimization is happening in tree-complex.c and improving it would benefit *many* more uses outside the library. Actually, I'm pretty sure we rejected already proposals to change <complex> to deal more explicitly with special cases...

Anyway, Gaby, can you please comment on this issue?

Comment 3 Richard Biener 2010-02-17 21:30:33 UTC

(In reply to comment #2)
> I don't see why you want to deal with special cases in the library instead of
> the optimizers: indeed, your *very* analysis shows that the missed optimization
> is happening in tree-complex.c and improving it would benefit *many* more uses
> outside the library.

Well, but float * complex<float> isn't the same as complex<float> * complex<float>.  But the library (and also C promotion if you write
that in literal C) presents us with complex<float> * complex<float>.

Which we have to handle according to the standard (thus "slow").

So either this is WONTFIX or a library issue.

Comment 4 Paolo Carlini 2010-02-17 21:43:24 UTC

(In reply to comment #3)
> Well, but float * complex<float> isn't the same as complex<float> *
> complex<float>.  But the library (and also C promotion if you write
> that in literal C) presents us with complex<float> * complex<float>.

Where, exactly, the library presents complex * complex instead of complex * float? I really don't see it. I see things like:

      complex<float>&
      operator*=(float __f)
      {
	_M_value *= __f;
	return *this;
      }

and I don't see where the library explicitly transforms __f to a complex, I see something like complex = complex * float. But really, we have been through this *already*, I'm absolutely sure, and I have zero problems with changing the above, if Gaby agrees and you all are sure something changed since the lat time, it's a 5 minutes work, but I'm still convinced that this is something much more generally useful to optimize than the library.

Comment 5 Jan van Dijk 2010-02-17 22:15:06 UTC

Created attachment 19901 [details]
patch

Comment 6 Richard Biener 2010-02-17 22:15:55 UTC

(In reply to comment #4)
> (In reply to comment #3)
> > Well, but float * complex<float> isn't the same as complex<float> *
> > complex<float>.  But the library (and also C promotion if you write
> > that in literal C) presents us with complex<float> * complex<float>.
> 
> Where, exactly, the library presents complex * complex instead of complex *
> float? I really don't see it. I see things like:
> 
>       complex<float>&
>       operator*=(float __f)
>       {
>         _M_value *= __f;
>         return *this;
>       }
> 
> and I don't see where the library explicitly transforms __f to a complex, I see
> something like complex = complex * float. But really, we have been through this
> *already*, I'm absolutely sure, and I have zero problems with changing the
> above, if Gaby agrees and you all are sure something changed since the lat
> time, it's a 5 minutes work, but I'm still convinced that this is something
> much more generally useful to optimize than the library.

Well, also in C++ you get promotion to _M_value * complex <__f, 0.0> which
we cannot optimize in case we care for the sign of zeros (you get the
desired optimization with -fno-signed-zeros).

Comment 7 Jan van Dijk 2010-02-17 22:16:16 UTC

Created attachment 19902 [details]
changelog entry

Comment 8 Richard Biener 2010-02-17 22:17:36 UTC

The patch will make results incorrect regarding the sign of zeros.  Does
the C++ standard library allow this?  Thus, does it specify multiplication
with a scalar float as component-wise multiplication?

Comment 9 Paolo Carlini 2010-02-17 22:27:28 UTC

To be clear, once more: the proposed change has been suggested *already* and rejected, being something which the compiler should be perfectly able to figure out, when legal according to the optimization options. Thus, unless Gaby says something different, the proposed patch will *not* go in.

Comment 10 Paolo Carlini 2010-02-17 22:34:15 UTC

Regarding the specific semantics, Richard, there is little to say: we want the C99 semantics, by and large. Maybe there are some missing details, but we decided already that, when we switched the complex multiplication to be the slow, C99 annex, by default. Thus, in a first, and second too, approx, the C++ standard doesn't count, really, this is just, sketched:

   __complex__ double val;
   double d;
   val *= d;

Comment 11 Richard Biener 2010-02-17 22:38:56 UTC

Then there is nothing to fix.  Use -fno-signed-zeros if you do not care
about the difference of +0.0 and -0.0.  Then you get the desired optimization
which tree-complex indeed can figure out.

Comment 12 Paolo Carlini 2010-02-17 22:40:06 UTC

Excellent.

Comment 13 Jan van Dijk 2010-02-17 22:54:16 UTC

(In reply to comment #10)
> Regarding the specific semantics, Richard, there is little to say: we want the
> C99 semantics, by and large. Maybe there are some missing details, but we
> decided already that, when we switched the complex multiplication to be the
> slow, C99 annex, by default. Thus, in a first, and second too, approx, the C++
> standard doesn't count, really, this is just, sketched:
> 
>    __complex__ double val;
>    double d;
>    val *= d;
> 

But the point in question is that gcc *does not implement Annex G-4.1 (item 2) correctly*. That clearly states that u*(x+iy) = (ux)+i(uy), whereas gcc does (u+0i)*(x+iy), leading to wrongly signed zeros. The patch fixes exactly that (for C++) by circumventing gcc's flawed implementation of mixed arithmetic involving C99's complex type.

Paolo, Richard, thank you for your time and patience. In view of the above, I hope you reconsider this issue. 

PS: could you (or Gabriel dos Reis, if he reads this) also comment on the last bullit of my original message (From the C++0x working draft...)? As indicated, the standard differs from what libstdc++ implements.

Thank you in advance

Comment 14 Paolo Carlini 2010-02-17 22:57:26 UTC

If the compiler does something wrong, then we should fix the compiler, not add hacks to the library, without fixing the plethora of uses of __complex__ outside the library. Thus, if you are sure, please double check Bugzilla, and then file a specific compiler PR.

Comment 15 Jan van Dijk 2010-02-17 23:16:24 UTC

(In reply to comment #8)
> The patch will make results incorrect regarding the sign of zeros.  Does
> the C++ standard library allow this?  Thus, does it specify multiplication
> with a scalar float as component-wise multiplication?

OK, before giving up on this, let me state for the record that 

* this patch *fixes* the signs (also see comment #13). 
* It also fixes a serious regression when std::complex is used (that std::complex relies on __complex__ for its implementation is an implementation detail). So closing this as INVALID does not feel right.
* I agree that a long-term solution would involve fixing the middle-end, for example by using the concept of a 'strong 0 ' (00) in the promotion of scalars to (degenerate) doubles (see LIA-3, ISO/IEC 10967-3:2006(E) page 77). Then
u*(x+iy) -> (u+00i)*(x+iy), and expand_complex_multiplication() can eliminate the terms 00*x and 00*iy. I have no idea how difficult it would be implement that, since this is my first experience with gcc 'under the hood'. Ideas?

With kind regards, and thank you again for your time. Jan.

Comment 16 Paolo Carlini 2010-02-17 23:18:06 UTC

Which regression?!? Nothing changed in this code for *years*.

Comment 17 Jan van Dijk 2010-02-17 23:28:59 UTC

(In reply to comment #16)
> Which regression?!? Nothing changed in this code for *years*.

I use std::complex<double>. With gcc-4.5.0 the performance of c*f is three times worse than with gcc.4.4.x.

That regression. 

And the c*f zero-signs are wrong at present. 

And performance can only be restored by specifying -fno-signed-zeros. Which makes even more signs wrong.

Which is all fixed by the proposed 4-line patch.

The bottom line is: I think the math-oriented libstdc++ users will be unhappy with the present state of affairs. And that is sad.

Comment 18 Paolo Carlini 2010-02-17 23:33:54 UTC

Then file a compiler PR, because between 4.4 and 4.5 *nothing* changed in the library.

Comment 19 Richard Biener 2010-02-17 23:43:44 UTC

(In reply to comment #18)
> Then file a compiler PR, because between 4.4 and 4.5 *nothing* changed in the
> library.

4.5 has fixed tree-complex to honor signed zeros properly.  4.4 is broken
in this regard.

Comment 20 Paolo Carlini 2010-02-17 23:52:31 UTC

Richard, I'm pretty sure I read something about signed zeros and slowness, but now I can't find anything in the 4.5 <changes> web page. Should we add something to it, in particular about -fno-signed-zeros to restore performance?

Comment 21 Richard Biener 2010-02-18 00:02:15 UTC

It was fixed by

2009-05-08  Joseph Myers  <joseph@codesourcery.com>

        PR c/24581
        * c-typeck.c (build_binary_op): Handle arithmetic between one real
        and one complex operand specially.
        * tree-complex.c (some_nonzerop): Do not identify a real value as
        zero if flag_signed_zeros.

eventually the C++ frontend needs similar treatment.

Comment 22 Paolo Carlini 2010-02-18 00:16:54 UTC

Now I'm confused: I can understand that the C++ front-end is still sometimes incorrect about signed zeros in mixed arithmetic (these are very old issues), but why 4.5 is slower than 4.4?!? I something different in the middle-end about that? Is it more strict and doesn't optimize anymore something (partially incorrect) handled by the C++ front-end? Maybe we should do something about that now, Joseph' patch to the C front-end doesn't look terribly complex to me...

Comment 23 Paolo Carlini 2010-02-18 00:45:17 UTC

Now I understand the issue this way: the slow down is unavoidable if we want a correct treatment of signed zero, and the slow down itself is due to the way the middle-end handles mixed arithmetic. However, the front-ends have to be also fixed, otherwise we get only the slow down only without the correctness. If my summary is correct, that seems really, really bad for C++ users, I think we should consider fixing the C++ front-end too in time for 4.5.

And eventually we should anyway tell the users in the release notes about -fno-signed-zeros, if they want to restore the performance of 4.4 and can live with the wrong zeros.

Comment 24 Paolo Carlini 2010-02-18 00:50:53 UTC

I'm re-opening the PR tentatively as C++

Comment 25 Gabriel Dos Reis 2010-02-18 04:55:26 UTC

Subject: Re:  mixed complex<T> multiplication horribly 
	inefficient

On Wed, Feb 17, 2010 at 4:54 PM, jan at epgmod dot phys dot tue dot nl
<gcc-bugzilla@gcc.gnu.org> wrote:
> ------- Comment #13 from jan at epgmod dot phys dot tue dot nl  2010-02-17 22:54 -------
> (In reply to comment #10)
>> Regarding the specific semantics, Richard, there is little to say: we want the
>> C99 semantics, by and large. Maybe there are some missing details, but we
>> decided already that, when we switched the complex multiplication to be the
>> slow, C99 annex, by default. Thus, in a first, and second too, approx, the C++
>> standard doesn't count, really, this is just, sketched:
>>
>>    __complex__ double val;
>>    double d;
>>    val *= d;
>>
>
> But the point in question is that gcc *does not implement Annex G-4.1 (item 2)
> correctly*. That clearly states that u*(x+iy) = (ux)+i(uy), whereas gcc does
> (u+0i)*(x+iy), leading to wrongly signed zeros. The patch fixes exactly that
> (for C++) by circumventing gcc's flawed implementation of mixed arithmetic
> involving C99's complex type.

then, the proper fix is to fix should be in the compiler.  Not in the library.

Comment 26 Gabriel Dos Reis 2010-02-18 04:56:48 UTC

Subject: Re:  mixed complex<T> multiplication horribly 
	inefficient

On Wed, Feb 17, 2010 at 4:27 PM, paolo dot carlini at oracle dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> ------- Comment #9 from paolo dot carlini at oracle dot com  2010-02-17 22:27 -------
> To be clear, once more: the proposed change has been suggested *already* and
> rejected, being something which the compiler should be perfectly able to figure
> out, when legal according to the optimization options. Thus, unless Gaby says
> something different, the proposed patch will *not* go in.

If there is any fix, it should be in the compiler.
std::complex<T> is just a simple notation around _Complex T, for scalar T.

Comment 27 Gabriel Dos Reis 2010-02-18 04:59:43 UTC

Subject: Re:  mixed complex<T> multiplication horribly 
	inefficient

On Wed, Feb 17, 2010 at 3:24 PM, paolo dot carlini at oracle dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> ------- Comment #2 from paolo dot carlini at oracle dot com  2010-02-17 21:24 -------
> I don't see why you want to deal with special cases in the library instead of
> the optimizers: indeed, your *very* analysis shows that the missed optimization
> is happening in tree-complex.c and improving it would benefit *many* more uses
> outside the library. Actually, I'm pretty sure we rejected already proposals to
> change <complex> to deal more explicitly with special cases...
>
> Anyway, Gaby, can you please comment on this issue?

I agree with your comments.  From my point of view, the fix should
be in the compiler.

PS:  I was away from mail for a good part of the day.  I'm catching up.

Comment 28 Paolo Carlini 2010-02-18 09:42:46 UTC

Thanks Gaby, eventually I learned something ;)

Hopefully the remaining correctness issues can be fixed quickly in the C++ front-end, honestly before this PR I wasn't aware of any correctness regression in this area, that is treatment of signed zeros. Performance, that could be also a problem for some users, I think we should anyway add a note to the 4.5 changes explaining the use of -fno-signed-zeros, both for C and C++, in that case.

Comment 29 Richard Biener 2010-02-18 09:46:02 UTC

(In reply to comment #28)
> Thanks Gaby, eventually I learned something ;)
> 
> Hopefully the remaining correctness issues can be fixed quickly in the C++
> front-end, honestly before this PR I wasn't aware of any correctness regression
> in this area, that is treatment of signed zeros. Performance, that could be
> also a problem for some users, I think we should anyway add a note to the 4.5
> changes explaining the use of -fno-signed-zeros, both for C and C++, in that
> case.

C performance is fine.  Joseph fixed it for the C frontend and the fallout
is that it regressed for the C++ frontend.

Comment 30 Paolo Carlini 2010-02-18 09:53:37 UTC

Argh, so the C front-end manages to be both correct and fast. We should definitely do something about C++...

Comment 31 Jason Merrill 2010-02-18 16:52:42 UTC

Performance regression.

Comment 32 Jason Merrill 2010-02-18 19:58:55 UTC

Subject: Bug 43108

Author: jason
Date: Thu Feb 18 19:58:41 2010
New Revision: 156874

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156874
Log:
	PR c++/43108
	* typeck.c (cp_build_binary_op): Adapt mixed complex/non handling from
	C build_binary_op.
	* cp-tree.h (WANT_VECTOR_OR_COMPLEX): Rename from WANT_VECTOR.
	* cvt.c (build_expr_type_conversion): Allow COMPLEX_TYPE.

Added:
    trunk/gcc/testsuite/c-c++-common/complex-alias-1.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-alias-1.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-add.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-add.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-mixed-add.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mixed-add.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-mixed-div.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mixed-div.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-mixed-mul.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mixed-mul.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-mixed-sub.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mixed-sub.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-mul-minus-one.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mul-minus-one.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-mul-one.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mul-one.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-mul.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mul.c
    trunk/gcc/testsuite/c-c++-common/complex-sign-sub.c
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign-sub.c
    trunk/gcc/testsuite/c-c++-common/complex-sign.h
      - copied, changed from r156873, trunk/gcc/testsuite/gcc.dg/torture/complex-sign.h
Removed:
    trunk/gcc/testsuite/gcc.dg/torture/complex-alias-1.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-add.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mixed-add.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mixed-div.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mixed-mul.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mixed-sub.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mul-minus-one.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mul-one.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-mul.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign-sub.c
    trunk/gcc/testsuite/gcc.dg/torture/complex-sign.h
Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/cp-tree.h
    trunk/gcc/cp/cvt.c
    trunk/gcc/cp/typeck.c
    trunk/gcc/testsuite/ChangeLog

Comment 33 Jason Merrill 2010-02-18 19:59:32 UTC

Fixed.

Comment 34 Paolo Carlini 2010-02-18 20:18:43 UTC

Thanks Jason!