Bug 57073 - __builtin_powif (-1.0, k) should be optimized to "1.0 - 2.0 * (K%2)"
Summary: __builtin_powif (-1.0, k) should be optimized to "1.0 - 2.0 * (K%2)"
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2013-04-25 18:04 UTC by Tobias Burnus
Modified: 2013-06-01 08:41 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-05-31 00:00:00


Attachments
patch that fails (760 bytes, patch)
2013-04-28 22:03 UTC, Thomas Koenig
Details | Diff
Patch that should work, but doesn't. (564 bytes, patch)
2013-05-02 19:56 UTC, Thomas Koenig
Details | Diff
Another patch that doesn't work... (1.02 KB, patch)
2013-05-17 23:08 UTC, Thomas Koenig
Details | Diff
Yet another patch that doesn't work (1.30 KB, patch)
2013-05-30 15:33 UTC, Thomas Koenig
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2013-04-25 18:04:25 UTC
Motivated by PR57071.

In numerical code, it is not unlike to find code of the form "(-1.0) raised to the power of k", in Fortran: (-1.0)**k.

That translates into:
  __builtin_powif (-1.0e+0, k)
which stays that way even with -O3.

Expected: It gets optimized to "1.0 - 2.0 * (K%2)"


Fortran test case:

real function f(k)
  integer, value :: k
  f = (-1.0)**k
end
Comment 1 Jakub Jelinek 2013-04-26 07:06:53 UTC
Modulo is an integer only operation, so I'd say you want
(k & 1) ? -1.0 : 1.0
instead, converting (k & 1) into floating point and multiplying and subtracting it or (k & 1) << 1 into floating point and subtracting it is is likely going to be more expensive, though of course it should be benchmarked.
Comment 2 Tobias Burnus 2013-04-26 07:09:54 UTC
As James pointed out at c.l.f, the algorithm fails if k < 0 - and suggests:

   1.0 - (float) ((k & 1) << 1)
Comment 3 Thomas Koenig 2013-04-28 09:05:31 UTC
(In reply to comment #1)
> Modulo is an integer only operation, so I'd say you want
> (k & 1) ? -1.0 : 1.0
> instead, converting (k & 1) into floating point and multiplying and subtracting
> it or (k & 1) << 1 into floating point and subtracting it is is likely going to
> be more expensive, though of course it should be benchmarked.

You're right, it is much faster:

g25@linux-fd1f:~/Krempel/P2> cat a.c
#include <stdio.h>

volatile float b;

int main()
{
  int i;
  for (i=0; i<1000000000; i++)
    b = (i & 1) ? -1.0 : 1.0;

  printf("%f\n",b);
}
ig25@linux-fd1f:~/Krempel/P2> gcc -O a.c && time ./a.out
-1.000000

real    0m1.948s
user    0m1.946s
sys     0m0.000s
ig25@linux-fd1f:~/Krempel/P2> cat b.c
#include <stdio.h>

volatile float b;

int main()
{
  int i;
  for (i=0; i<1000000000; i++)
    b = 1.0 - ((i&1)<<1);

  printf("%f\n",b);
}
ig25@linux-fd1f:~/Krempel/P2> gcc -O b.c && time ./a.out
-1.000000

real    0m7.059s
user    0m7.053s
sys     0m0.001s
Comment 4 Thomas Koenig 2013-04-28 22:03:35 UTC
Created attachment 29968 [details]
patch that fails

The patch in the attachment fails with the test
case

program main
  integer :: i
  real(8) :: a
  a = -1.0_8
  read (*,*) i
  print *,(-1.0_8)**i,a**i
end program main

gfortran -O foo.f90
foo.f90: In function 'MAIN__':
foo.f90:6:0: internal compiler error: in gimplify_modify_expr, at gimplify.c:4941
   print *,(-1.0_8)**i,a**i
 ^
0x7fbc17 gimplify_modify_expr
        ../../trunk/gcc/gimplify.c:4941
0x7f0211 gimplify_expr(tree_node**, gimple_statement_d**, gimple_statement_d**, bool (*)(tree_node*), int)
        ../../trunk/gcc/gimplify.c:7156

I wonder why.
Comment 5 Tobias Burnus 2013-04-29 08:40:59 UTC
(In reply to comment #4)
> patch that fails

The Fortran patch of the attachments looks fine, except for:

+      one = gfc_copy_expr (op1);
+      gfc_free_expr (op1);
+      gfc_free_expr (op2);
+      *e = *one;

I would simply use "op1" directly instead of copying it and then freeing the original expression. Otherwise, I am okay with that patch (with a test case).

 * * *

For the middle-end patch: Richard things that the problem is probably in the COND_EXPR re-gimplification - which means the problem is elsewhere.
Comment 6 Thomas Koenig 2013-05-02 19:56:33 UTC
Created attachment 30011 [details]
Patch that should work, but doesn't.

Here's the actual patch that fails, without the Fortran
front end part.
Comment 7 Tobias Burnus 2013-05-02 21:08:36 UTC
The problem is the following in gimplify.c:

In gimplify_cond_expr, one has:

  /* If this COND_EXPR has a value, copy the values into a temporary within
     the arms.  */
  if (!VOID_TYPE_P (type))
      ...      
      /* If either an rvalue is ok or we do not require an lvalue, create the
	 temporary.  But we cannot do that if the type is addressable.  */
      if (((fallback & fb_rvalue) || !(fallback & fb_lvalue))
	  && !TREE_ADDRESSABLE (type))
          ...
	  tmp = create_tmp_var (type, "iftmp");
	  result = tmp;

This is then used in the then_ and else_ branches, e.g.

 <modify_expr 0x7ffff67f2e38
    type <real_type 0x7ffff66f5e70 real(kind=4) SF
    arg 0 <var_decl 0x7ffff67ea850 iftmp.1 type <real_type ... real(kind=4)>
    arg 1 <real_cst 0x7ffff67f32a0 type <real_type ... real(kind=4)>
        constant 1.0e+0>>

That's passed to gimplify_modify_expr, where one runs into the assert:

  if (gimplify_ctxp->into_ssa && is_gimple_reg (*to_p))
    {
      /* We should have got an SSA name from the start.  */
      gcc_assert (TREE_CODE (*to_p) == SSA_NAME);
    }

which fails as the TREE_CODE of "iftmp.1" is VAR_DECL.
Comment 8 Tobias Burnus 2013-05-02 22:00:53 UTC
For completeness, gimplify_ctxp->into_ssa is 0 for the -O0 optimization and it gets set to 1 in gimple_regimplify_operands. Thus, it is not surprising that it works with the -O0 example ("(-1.0)**k") but not with the -O1 example ("a=-1;a**k").

If one ignores the assert, one runs into: control flow within basic block 2.
Comment 9 Tobias Burnus 2013-05-03 09:58:20 UTC
After some discussion with Jakub and Richard - and after doing some code reads, I think it is best do handle this in tree-ssa-math-opts.c (search for gimple_expand_builtin_powi and BUILT_IN_POWI).

For the code insertion, asan.c's build_check_stmt can serve as template.
Comment 10 Thomas Koenig 2013-05-17 23:08:34 UTC
Created attachment 30142 [details]
Another patch that doesn't work...

This time with the right PR...

I tried to follow the advice in comment#9, but I hit a wall
(again).

With the attached patch, I hit

foo.f90:1:0: error: SSA_NAME_DEF_STMT is wrong
 program main
 ^
Expected definition statement:
_16 = 1.0e+0;

Actual definition statement:
_16 = _24;

so I suspect I need PHI nodes here.

Is this the right approach in general?  Am I missing something
simple here?
Comment 11 Thomas Koenig 2013-05-30 15:33:25 UTC
Created attachment 30228 [details]
Yet another patch that doesn't work

This one fails with

 program main
 ^
_24 = i_1 & 1;
power_6.f90:4:0: internal compiler error: verify_gimple failed
0x98ca4c verify_gimple_in_cfg(function*)
        ../../trunk/gcc/tree-cfg.c:4792
0x8c4d87 execute_function_todo
        ../../trunk/gcc/passes.c:1969
0x8c56e7 execute_todo
        ../../trunk/gcc/passes.c:2002

We have a few alternatives now:

1. This doesn't get fixed at all
2. This doesn't get fixed by me
3. This gets fixed in the Fortran front end
4. Somebody points out where my patch is going wrong, and helps me
   understant a bit more of the basics of GIMPLE.
Comment 12 Tobias Burnus 2013-05-30 21:33:36 UTC
Author: burnus
Date: Thu May 30 21:32:53 2013
New Revision: 199461

URL: http://gcc.gnu.org/viewcvs?rev=199461&root=gcc&view=rev
Log:
2013-05-30  Tobias Burnus  <burnus@net-b.de>
            Thomas Koenig  <tkoenig@gcc.gnu.org>

        PR middle-end/57073
        * tree-ssa-math-opts.c (execute_cse_sincos): Optimize
        powi (-1.0, k) to (k & 1) ? -1.0 : 1.0.

2013-05-30  Tobias Burnus  <burnus@net-b.de>

        PR middle-end/57073
        * gfortran.dg/power_6.f90: New.


Added:
    trunk/gcc/testsuite/gfortran.dg/power_6.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-math-opts.c
Comment 13 Tobias Burnus 2013-05-30 21:34:32 UTC
FIXED on the trunk (for 4.9).

Thanks, Thomas and Jakub, for the help and the suggestions.
Comment 14 Thomas Koenig 2013-05-31 20:40:03 UTC
The patch at

http://gcc.gnu.org/ml/gcc-cvs/2013-05/msg01050.html

is wrong; the optimization is only done when, for
2.0**i, i is a constant integer.  power_6.f90 also
fails as a result.

Tobias, could you revert this patch?
Comment 15 Tobias Burnus 2013-05-31 20:44:19 UTC
(In reply to Thomas Koenig from comment #14)
> The patch at http://gcc.gnu.org/ml/gcc-cvs/2013-05/msg01050.html 
> is wrong

I realized this myself about 6 hours ago, see http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01884.html - I was hoping to get some feedback - otherwise, I was intending to simply revert it.
Comment 16 Tobias Burnus 2013-06-01 08:41:10 UTC
Now it should be FIXED again.

Author: burnus
Date: Sat Jun  1 08:39:59 2013
New Revision: 199575

URL: http://gcc.gnu.org/viewcvs?rev=199575&root=gcc&view=rev
Log:
2013-06-01  Tobias Burnus  <burnus@net-b.de>

        Partially reverted:
        2013-05-31  Tobias Burnus  <burnus@net-b.de>

        PR middle-end/57073
        * tree-ssa-math-opts.c (execute_cse_sincos): Move check
        further up.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-math-opts.c