Bug 105715 - [13 Regression] missed RTL if-conversion with COND_EXPR change
Summary: [13 Regression] missed RTL if-conversion with COND_EXPR change
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 13.0
: P2 normal
Target Milestone: 14.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2022-05-24 09:26 UTC by Richard Biener
Modified: 2026-03-23 12:23 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 14.0
Known to fail: 13.1.0
Last reconfirmed: 2023-06-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2022-05-24 09:26:11 UTC
gcc.target/i386/pr45685.c with -march=cascadelake shows missing RTL if-conversion.  The cruical GIMPLE difference is

  _36 = _3 > 0;
  iftmp.0_13 = _36 ? 1 : -1;
  prephitmp_31 = ABS_EXPR <_3>;
  prephitmp_32 = _36 ? -1 : 1;
  prephitmp_33 = _36 ? 4294967295 : 1;
  prephitmp_35 = _36 ? 1 : 4294967295;
...
  _29 = prephitmp_31 != _42;
  val_12 = _29 ? prephitmp_32 : iftmp.0_13;
  prephitmp_37 = _29 ? prephitmp_33 : prephitmp_35;

vs.

  iftmp.0_13 = _3 > 0 ? 1 : -1;
  prephitmp_31 = ABS_EXPR <_3>;
  prephitmp_32 = _3 > 0 ? -1 : 1;
  prephitmp_33 = _3 > 0 ? 4294967295 : 1;
  prephitmp_35 = _3 > 0 ? 1 : 4294967295;
...
  val_12 = i.1_6 == prephitmp_31 ? iftmp.0_13 : prephitmp_32;
  prephitmp_37 = i.1_6 != prephitmp_31 ? prephitmp_33 : prephitmp_35;

where the split out condition is now CSEd and the multi-use makes us not
TER the comparison.  Previously we got two compare & jump sequences while
now we get the compare computing a QImode value and the then two
compare & jump sequences.

While without -march=cascadelake we do get the desired number of cmovs
the generated code is still worse.

The testcase is unfortunately a bit obfuscated due to the many
if-conversions taking place.  Smaller GIMPLE testcases do not exhibit
jumpy RTL expansion.

void __GIMPLE(ssa, startwith("optimized"))
foo (long *p, long a, long b, long c, long d, long e, long f)
{
  _Bool _2;
  long _3;
  long _8;

  __BB(2):
      _2 = a_1(D) < b_10(D);
      _3 = _2 ? c_4(D) : d_5(D);
      _8 = _2 ? f_6(D) : e_7(D);
      __MEM <long> (p_9(D)) = _3;
      __MEM <long> (p_9(D) + 4) = _8;
      return;
}

#if __GNUC__ < 13
void __GIMPLE(ssa, startwith("optimized"))
bar (long *p, long a, long b, long c, long d, long e, long f)
{
  long _3;
  long _8;

  __BB(2):
      _3 = a_1(D) < b_10(D) ? c_4(D) : d_5(D);
      _8 = a_1(D) >= b_10(D) ? e_7(D) : f_6(D);
      __MEM <long> (p_9(D)) = _3;
      __MEM <long> (p_9(D) + 4) = _8;
      return;
}
#endif
Comment 1 Richard Biener 2022-05-24 10:29:11 UTC
Btw, I'm hoping for a smaller/simpler testcase to appear - and yes, something like this was expected I guess (but also latent since the new IL was valid before as well).
Comment 2 Richard Biener 2022-05-24 10:33:43 UTC
(In reply to Richard Biener from comment #0)
>
> void __GIMPLE(ssa, startwith("optimized"))
> foo (long *p, long a, long b, long c, long d, long e, long f)
> {
>   _Bool _2;
>   long _3;
>   long _8;
> 
>   __BB(2):
>       _2 = a_1(D) < b_10(D);
>       _3 = _2 ? c_4(D) : d_5(D);
>       _8 = _2 ? f_6(D) : e_7(D);
>       __MEM <long> (p_9(D)) = _3;
>       __MEM <long> (p_9(D) + 4) = _8;
>       return;
> }
> 
> #if __GNUC__ < 13
> void __GIMPLE(ssa, startwith("optimized"))
> bar (long *p, long a, long b, long c, long d, long e, long f)
> {
>   long _3;
>   long _8;
> 
>   __BB(2):
>       _3 = a_1(D) < b_10(D) ? c_4(D) : d_5(D);
>       _8 = a_1(D) >= b_10(D) ? e_7(D) : f_6(D);
>       __MEM <long> (p_9(D)) = _3;
>       __MEM <long> (p_9(D) + 4) = _8;
>       return;
> }
> #endif

So with the above we do have (on the GCC 12 branch)

foo:
.LFB0:
        .cfi_startproc
        cmpq    %rdx, %rsi
        setl    %al
        testb   %al, %al
        cmovne  8(%rsp), %r9
        cmove   %r8, %rcx
        movq    %rcx, (%rdi)
        movq    %r9, 4(%rdi)
        ret

vs.

bar:
.LFB1:
        .cfi_startproc
        cmpq    %rdx, %rsi
        cmovl   8(%rsp), %r9
        cmovge  %r8, %rcx
        movq    %rcx, (%rdi)
        movq    %r9, 4(%rdi)
        ret

which shows an optimization difference when the condition is split out.
It also shows the condition is split out anyway in the end, it's just
the RTL representation with CCnn modes doesn't nicely match up the
GIMPLE so we somehow need to improve the plumbing here.  But CCnn
get clobbered quite a lot and so any clever "CSE"ing of the (different!)
modes involved is going to require code motion or CCnn spilling (bad).
Comment 3 Richard Biener 2022-05-24 10:38:42 UTC
So I guess the trick might be to notice that ...

;; _2 = a_1(D) < b_10(D);

(insn 12 11 13 (set (reg:CCGC 17 flags)
        (compare:CCGC (reg/v:DI 86 [ a ])
            (reg/v:DI 87 [ b ]))) -1
     (nil))

(insn 13 12 14 (set (reg:QI 92)
        (lt:QI (reg:CCGC 17 flags)
            (const_int 0 [0]))) -1
     (nil))

(insn 14 13 0 (set (reg:QI 82 [ _2 ])
        (reg:QI 92)) -1
     (nil))

... we expand to CCGCmode here ...

_3 replace with --> _3 = _2 ? c_4(D) : d_5(D);
 ;; *p_9(D) = _3;

(insn 15 14 16 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 82 [ _2 ])
            (const_int 0 [0]))) "t2.c":12:7 -1
     (nil))

(insn 16 15 17 (set (reg:DI 93)
        (if_then_else:DI (ne (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (reg/v:DI 88 [ c ])
            (reg/v:DI 89 [ d ]))) "t2.c":12:7 -1
     (nil))

... and thus want to use a CCGCmode based compare here as well?  We can
of course force-forward ("un-CSE") the condition during RTL expansion.
But the question would be what's the best approach to deal with the
situation so that followup RTL passes have a chance to optimize the
redundant compares?
Comment 4 Richard Biener 2023-04-26 06:56:03 UTC
GCC 13.1 is being released, retargeting bugs to GCC 13.2.
Comment 5 Richard Biener 2023-07-18 08:12:54 UTC
I'm testing a patch.
Comment 6 GCC Commits 2023-07-18 13:43:24 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:cbe5f6859a73b2acf203bd7d13f9fb245d63cbd4

commit r14-2620-gcbe5f6859a73b2acf203bd7d13f9fb245d63cbd4
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jul 18 10:02:52 2023 +0200

    middle-end/105715 - missed RTL if-conversion with COND_EXPR expansion
    
    When the COND_EXPR condition operand was split out to a separate stmt
    it became subject to CSE with other condition evaluations.  This
    unfortunately leads to TER no longer applying and in turn RTL
    expansion of COND_EXPRs no longer seeing the condition and thus
    failing to try conditional move expansion.  This can be seen with
    gcc.target/i386/pr45685.c when built with -march=cascadelake which
    then FAILs to produce the expected number of cmovs.
    
    It can also be seen when we create more COND_EXPRs early like for
    instruction selection of MIN/MAX operations that map to IEEE
    a > b ? a : b expression semantics.
    
            PR middle-end/105715
            * gimple-isel.cc (gimple_expand_vec_exprs): Merge into...
            (pass_gimple_isel::execute): ... this.  Duplicate
            comparison defs of COND_EXPRs.
Comment 7 Richard Biener 2023-07-18 13:43:53 UTC
Fixed on trunk.
Comment 8 Richard Biener 2023-07-27 09:23:05 UTC
GCC 13.2 is being released, retargeting bugs to GCC 13.3.
Comment 9 Jakub Jelinek 2024-05-21 09:11:25 UTC
GCC 13.3 is being released, retargeting bugs to GCC 13.4.
Comment 10 Jakub Jelinek 2025-06-05 17:06:27 UTC
GCC 13.4 is being released, retargeting bugs to GCC 13.5.
Comment 11 Richard Biener 2026-03-23 12:23:14 UTC
Fixed for GCC 14.