94006 – Poor codegen for cond ? lval1 : lval2

Bug 94006 - Poor codegen for cond ? lval1 : lval2

Summary: Poor codegen for cond ? lval1 : lval2

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	10.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2020-03-02 22:48 UTC by Jonathan Wakely
Modified:	2023-09-21 14:01 UTC (History)
CC List:	2 users (show)

See Also:	94566 94589
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2023-05-30 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jonathan Wakely 2020-03-02 22:48:44 UTC

I noticed that this (compiled with -std=c++2a and any optimization level except -O0):

#include <compare>

std::strong_ordering f1(int a, int b)
{
  return a == b ? std::strong_ordering::equal
   : std::strong_ordering::less;
}

compiles to:

_Z2f1ii:
        cmp     edi, esi
        jne     .L3
        mov     al, BYTE PTR _ZNSt15strong_ordering5equalE[rip]
        ret
.L3:
        mov     al, BYTE PTR _ZNSt15strong_ordering4lessE[rip]
        ret
_ZNSt15strong_ordering5equalE:
        .zero   1
_ZNSt15strong_ordering4lessE:
        .byte   -1


Whereas this less readable form:

#include <compare>

std::strong_ordering f2(int a, int b)
{
  return (a == b) <=> true;
}

produces much smaller code:

_Z2f2ii:
        cmp     edi, esi
        setne   al
        neg     eax
        ret


Jason observed that it seems to be because the result of <=> is a prvalue, whereas the ?: version is returning from lvalues.

We get similar codegen (but with a less extreme difference in size) for this:

struct A { int i; };
constexpr A a1 { 1 };
constexpr A a2 { 0 };

A f1 (int a, int b)
{
  return a == b ? a1 : a2;
}

A f2 (int a, int b)
{
  return A{a == b ? a1.i : a2.i};
}

The first function is obviously easier to read, but the second produces smaller code because the two global variables are not emitted.

Comment 1 Andrew Pinski 2020-03-03 00:00:44 UTC

Confirmed.

Reduced testcase:
struct f
{
  int t;
};

static const struct f g0 = {0};
static const struct f g1 = {-1};

struct f f1(int a, int b)
{
  return a == b ? g0 : g1;
}

---- CUT -----

NOTE the C front-end has it done correctly.

Comment 2 Andrew Pinski 2020-03-03 00:15:05 UTC

I thought SRA would have done it.
BUT it does not.
The C front-end does the optimization while it is gimplifying the original tree for some reason ...

Comment 3 Richard Biener 2020-03-03 08:22:34 UTC

(In reply to Andrew Pinski from comment #1)
> Confirmed.
> 
> Reduced testcase:
> struct f
> {
>   int t;
> };
> 
> static const struct f g0 = {0};
> static const struct f g1 = {-1};
> 
> struct f f1(int a, int b)
> {
>   return a == b ? g0 : g1;
> }
> 
> ---- CUT -----
> 
> NOTE the C front-end has it done correctly.

Guess it fully-folds this.  We're going through phiprop and then indeed
SRA doesn't scalarize things, obviously because it cannot scalarize globals.

Rejected (2365): not aggregate: a
Rejected (2366): not aggregate: b
Candidate (2370): D.2370
! Disqualifying D.2370 - No scalar replacements to be created.
f1 (int a, int b)
{
  struct f D.2370;

  <bb 2> [local count: 1073741824]:
  if (a_2(D) == b_3(D))
    goto <bb 3>; [34.00%]
  else
    goto <bb 4>; [66.00%]

  <bb 3> [local count: 365072224]:
  D.2370 = g0;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 708669601]:
  D.2370 = g1;

  <bb 5> [local count: 1073741824]:
  return D.2370;

}

There's nothing that would replace the "aggregate copy" with load/store
pairs here.  I guess we could have update-address-taken do "scalar replacement"
of "aggregates" that have an integer mode, but then here we have the return
stmt which wouldn't be happy with a SSA register argument... (maybe we
need to allow VIEW_CONVERT <aggregate-type> (SSA reg) there?).

For GIMPLE this is a representational issue for aggregates we'll expand
as registers (based on TYPE/DECL_MODE) and again not exposing ABI details
at return/arguments which should really drive as to whether to represent
arguments/return values as registers or memory.

Comment 4 Andrew Pinski 2021-08-22 00:11:22 UTC

If we do this, we optimize it at the gimple level:
std::strong_ordering f1(int a, int b)
{
  std::strong_ordering t =  a == b ? std::strong_ordering::equal
   : std::strong_ordering::less;
   return t;
}

:)