This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][GCC] Complex division improvements in GCC
- From: Jeff Law <law at redhat dot com>
- To: Elen Kalda <Elen dot Kalda at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Cc: nd <nd at arm dot com>, "joseph at codesourcery dot com" <joseph at codesourcery dot com>
- Date: Tue, 3 Sep 2019 12:50:39 -0600
- Subject: Re: [PATCH][GCC] Complex division improvements in GCC
- References: <VI1PR08MB33738567A1263B792449D0F284A20@VI1PR08MB3373.eurprd08.prod.outlook.com>
On 8/29/19 9:20 AM, Elen Kalda wrote:
> Hi all,
>
> Advice and help needed!
>
> This patch makes changes to the the complex division in GCC. The algorithm
> used is same as in https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01629.html -
> same problems, same improvement in robustness, same loss in accuracy.
>
> Since Baudin adds another underflow check, two more basic blocks get added
> during the cplxlower1 pass.
>
> No problems with bootstrap on aarch64-none-linux-gnu. Unsurprisingly, there
> are regressions in gcc/testsuite/gcc.dg/torture/builtin-math-7.c. As in the
> patch linked above, the regressions in that test are due to the loss in
> accuracy.
>
> To evaluate the performance, the same test which generates 360 000 000 random
> numbers was used. Doing one less division results in a nice 11.32%
> improvement in performance:
>
> | CPU time
> --------------------
> smiths | 7 290 996
> b1div | 6 465 590
>
> That implementation works (in a sense that it produces an expected result),
> but it could be made more efficient and clean. As an example, the cplxlower1
> pass assigns one variable to another variable, which seems redundant:
>
> [...]
>
> <bb 3> [local count: 1063004407]:
> # i_19 = PHI <0(2), i_15(7)>
> _9 = REALPART_EXPR <x[i_19]>;
> _7 = IMAGPART_EXPR <x[i_19]>;
> _1 = COMPLEX_EXPR <_9, _7>;
> _18 = REALPART_EXPR <y[i_19]>;
> _17 = IMAGPART_EXPR <y[i_19]>;
> _2 = COMPLEX_EXPR <_18, _17>;
> _16 = ABS_EXPR <_18>;
> _21 = ABS_EXPR <_17>;
> _22 = _16 > _21;
> if (_22 != 0)
> goto <bb 9>; [50.00%]
> else
> goto <bb 10>; [50.00%]
>
> <bb 9> [local count: 531502204]:
> _23 = _17 / _18;
> _24 = _17 * _23;
> _25 = _18 + _24;
> _26 = 1.0e+0 / _25;
> _27 = _23 == 0.0;
> if (_27 != 0)
> goto <bb 12>; [50.00%]
> else
> goto <bb 13>; [50.00%]
>
> <bb 12> [local count: 265751102]:
> _28 = _7 / _18;
> _29 = _17 * _28;
> _30 = _9 + _29;
> _31 = _26 * _30;
> _32 = _9 / _18;
> _33 = _17 * _32;
> _34 = _7 - _33;
> _35 = _26 * _34;
> _83 = _31; <----------------------- could these extra assignments be avoided?
> _84 = _35; <-----------------------|
> goto <bb 11>; [100.00%]
I wouldn't really worry about them at this point. They'll almost
certainly get propagated away as they're simple copies.
>
> 2019-08-29 Elen Kalda <elen.kalda@arm.com>
>
> * fold-const.c (fold_negate_const): Make the fold_negate_const function
> non-static
> (const_binop): Implement Baudin's algorithm for complex division
> * fold-const.h (fold_negate_const): Add a fold_negate_const function
> declaration
> * tree-complex.c (complex_div_internal_wide): New function to aid with the
> wide complex division
> (expand_complex_div_wide): Implement Baudin's algorithm for complex
> division
I'd like Joseph to chime in when he can.
jeff