Bug 59424 - Optimization issue on min/max
Summary: Optimization issue on min/max
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.7.2
: P3 enhancement
Target Milestone: 14.0
Assignee: Andrew Pinski
URL: https://gcc.gnu.org/pipermail/gcc-pat...
Keywords: missed-optimization, patch
Depends on:
Blocks: 51964 89018 109424
  Show dependency treegraph
 
Reported: 2013-12-09 06:01 UTC by Jean-Marc Valin
Modified: 2023-05-08 07:41 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-03-14 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jean-Marc Valin 2013-12-09 06:01:24 UTC
gcc seems to be having problems optimizing float min/max code and behaves unpredictably. If I compile the following code on an x86-64 (SSE enabled):

#include <math.h>

float func1(float a, float b) {
  return (a < b ? a : b);
}

float func2(float a, float b) {
  return sqrtf(a < b ? a : b);
}

float func3(float a, float b) {
  return fabsf(a < b ? a : b);
}

I get the following disassembly:

0000000000000000 <func1>:
func1():
   0:   f3 0f 5d c1             minss  %xmm1,%xmm0
   4:   c3                      retq   
   5:   66 66 2e 0f 1f 84 00    data32 nopw %cs:0x0(%rax,%rax,1)
   c:   00 00 00 00 

0000000000000010 <func2>:
func2():
  10:   f3 0f 5d c1             minss  %xmm1,%xmm0
  14:   f3 0f 51 c0             sqrtss %xmm0,%xmm0
  18:   c3                      retq   
  19:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000000020 <func3>:
func3():
  20:   0f 2f c8                comiss %xmm0,%xmm1
  23:   77 13                   ja     38 <func3+0x18>
  25:   f3 0f 10 05 00 00 00    movss  0x0(%rip),%xmm0        # 2d <func3+0xd>
  2c:   00 
  2d:   0f 54 c1                andps  %xmm1,%xmm0
  30:   c3                      retq   
  31:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  38:   f3 0f 10 0d 00 00 00    movss  0x0(%rip),%xmm1        # 40 <func3+0x20>
  3f:   00 
  40:   0f 54 c1                andps  %xmm1,%xmm0
  43:   c3                      retq   


So gcc is correctly using maxss in func1 and func2, but not in func2. In practice, I'm using MIN/MAX macros extensively in libopus and gcc is missing the optimization most of the time, slowing down the code.
Comment 1 Marek Polacek 2013-12-09 11:13:12 UTC
I can observe the same behavior with trunk.

For func2, we compute a < b ? a : b first and then call sqrtf on that, while for func3 we have return a < b ? ABS_EXPR <a> : ABS_EXPR <b>;.  Perhaps that's the reason why ifcvt doesn't optimize this using UNSPEC_IEEE_MIN.
Comment 2 Richard Biener 2013-12-09 14:49:22 UTC
(In reply to Marek Polacek from comment #1)
> I can observe the same behavior with trunk.
> 
> For func2, we compute a < b ? a : b first and then call sqrtf on that, while
> for func3 we have return a < b ? ABS_EXPR <a> : ABS_EXPR <b>;.  Perhaps
> that's the reason why ifcvt doesn't optimize this using UNSPEC_IEEE_MIN.

Yep.  Obviously it cannot after that distribution of ABS_EXPR.
Comment 3 Jean-Marc Valin 2013-12-09 17:05:17 UTC
What's strange is that adding -ffast-math makes gcc use maxss on func3() too, even though it was already allowed to without -ffast-math. I had the same problem with absolute value, although in that case fabs() solved the problem. In the case of max(), using fmax() doesn't help because it's too strict about NaN behaviour to match maxss.
Comment 4 Richard Biener 2016-03-14 13:34:20 UTC
Note that phiopt could still do min/max detection of a derived value, thus
for the PHI args look through unary ops it can associate a min/max with
and re-write as op (min/max (a, b)).  Not sure if that's worth it complicating
the code.

Other than that removing the

      else if (TREE_CODE (arg0) == COND_EXPR)
        {

from fold_unary that does the operation sinking would "fix" this as well.
Comment 5 Andrew Pinski 2021-07-06 08:18:47 UTC
I have a fix which I was working on for PR 56223.
Basically factor_out_conditional_conversion can be extended to support more than conversions :).
Comment 6 GCC Commits 2023-05-08 07:38:22 UTC
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>:

https://gcc.gnu.org/g:6d6c17e45f62cfe0b7de502af299348fca548b01

commit r14-575-g6d6c17e45f62cfe0b7de502af299348fca548b01
Author: Andrew Pinski <apinski@marvell.com>
Date:   Thu Apr 27 12:21:54 2023 -0700

    PHIOPT: factor out unary operations instead of just conversions
    
    After using factor_out_conditional_conversion with diamond bb,
    we should be able do use it also for all normal unary gimple and not
    just conversions. This allows to optimize PR 59424 for an example.
    This is also a start to optimize PR 64700 and a few others.
    
    OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
    
    An example of this is:
    ```
    static inline unsigned long long g(int t)
    {
      unsigned t1 = t;
      return t1;
    }
    static int abs1(int a)
    {
      if (a < 0)
        a = -a;
      return a;
    }
    unsigned long long f(int c, int d, int e)
    {
      unsigned long long t;
      if (d > e)
        t = g(abs1(d));
      else
        t = g(abs1(e));
      return t;
    }
    ```
    
    Which should be optimized to:
      _9 = MAX_EXPR <d_5(D), e_6(D)>;
      _4 = ABS_EXPR <_9>;
      t_3 = (long long unsigned intD.16) _4;
    
    gcc/ChangeLog:
    
            * tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ...
            (factor_out_conditional_operation): This and add support for all unary
            operations.
            (pass_phiopt::execute): Update call to factor_out_conditional_conversion
            to call factor_out_conditional_operation instead.
    
            PR tree-optimization/109424
            PR tree-optimization/59424
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/tree-ssa/abs-2.c: Update tree scan for
            details change in wording.
            * gcc.dg/tree-ssa/minmax-17.c: Likewise.
            * gcc.dg/tree-ssa/pr103771.c: Likewise.
            * gcc.dg/tree-ssa/minmax-18.c: New test.
            * gcc.dg/tree-ssa/minmax-19.c: New test.
Comment 7 Andrew Pinski 2023-05-08 07:41:02 UTC
Fixed.