Bug 94566 - conversion between std::strong_ordering and int
Summary: conversion between std::strong_ordering and int
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 33315
Blocks: 101701
  Show dependency treegraph
 
Reported: 2020-04-11 22:04 UTC by Marc Glisse
Modified: 2023-09-21 13:55 UTC (History)
9 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-06-14 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2020-04-11 22:04:21 UTC
#include <compare>

int conv1(std::strong_ordering s){
  if(s==std::strong_ordering::less) return -1;
  if(s==std::strong_ordering::equal) return 0;
  if(s==std::strong_ordering::greater) return 1;
  __builtin_unreachable();
}
std::strong_ordering conv2(int i){
  switch(i){
    case -1: return std::strong_ordering::less;
    case 0: return std::strong_ordering::equal;
    case 1: return std::strong_ordering::greater;
    default: __builtin_unreachable();
  }
}

Compiling with -std=gnu++2a -O3. I would like the compiler to notice that those are just NOP (at most a sign-extension). Clang manages it for conv2. Gcc generates:

	movl	$-1, %eax
	cmpb	$-1, %dil
	je	.L1
	xorl	%eax, %eax
	testb	%dil, %dil
	setne	%al
.L1:
	ret

and

	xorl	%eax, %eax
	testl	%edi, %edi
	je	.L10
	cmpl	$1, %edi
	sete	%al
	leal	-1(%rax,%rax), %eax
.L10:
	ret


(apparently the C++ committee thinks it is a good idea to provide a type that is essentially an int that can only be -1, 0 or 1, but not provide any direct way to convert to/from int)
Comment 1 Richard Biener 2020-04-14 06:57:51 UTC
So for conv2 the most immediate issue is that we're failing to sink and common the assignment to D.8516._M_value (I had patches for this).

conv2 (int i)
{
  int i_2(D) = i;
  struct strong_ordering D.8516;

  <bb 2> [local count: 1073741824]:
  if (i_2(D) == 0)
    goto <bb 5>; [33.33%]
  else
    goto <bb 3>; [66.67%]

  <bb 3> [local count: 1073741824]:
  if (i_2(D) == 1)
    goto <bb 6>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 4> [local count: 357878150]:
  D.8516._M_value = -1;
  goto <bb 7>; [100.00%]

  <bb 5> [local count: 357878150]:
  D.8516._M_value = 0;
  goto <bb 7>; [100.00%]

  <bb 6> [local count: 357878150]:
  D.8516._M_value = 1;

  <bb 7> [local count: 1073634451]:
  return D.8516;

for conv1 this is a missed phi-opt, at phiopt2 time:

conv1 (struct strong_ordering s)
{
  int SR.4;
  int _1;

  <bb 2> [local count: 1073741824]:
  SR.4_4 = s._M_value;
  if (SR.4_4 == -1)
    goto <bb 6>; [50.00%]
  else
    goto <bb 3>; [50.00%]

  <bb 3> [local count: 536870913]:
  if (SR.4_4 == 0)
    goto <bb 6>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 4> [local count: 268435456]:
  if (SR.4_4 == 1)
    goto <bb 6>; [100.00%]
  else
    goto <bb 5>; [0.00%]

  <bb 5> [count: 0]:
  __builtin_unreachable ();

  <bb 6> [local count: 1073741824]:
  # RANGE [-1, 1]
  # _1 = PHI <-1(2), 0(3), 1(4)>
  return _1;

it's a bit of a convoluted case of course but the only chance to pattern
match this ...
Comment 2 Richard Biener 2020-04-15 10:23:20 UTC
I've verified my old fix for PR33315 does

t.C:11:45: optimized: sinking common stores to D.8528._M_value
...
conv2 (int i)
{
  struct strong_ordering D.8528;
  int _7;

  <bb 2> [local count: 1073741817]:
  switch (i_2(D)) <default: <L3> [0.00%], case -1: <L5> [33.33%], case 0: <L1> [33.33%], case 1: <L2> [33.33%]>

  <bb 3> [local count: 357913942]:
<L1>:
  goto <bb 6>; [100.00%]

  <bb 4> [local count: 357913942]:
<L2>:
  goto <bb 6>; [100.00%]

  <bb 5> [count: 0]:
<L3>:
  __builtin_unreachable ();

  <bb 6> [local count: 1073741824]:
  # _7 = PHI <-1(2), 0(3), 1(4)>
<L5>:
  D.8528._M_value = _7;
  return D.8528;

still not optimized further tho.
Comment 3 Marc Glisse 2020-04-15 10:32:38 UTC
I thought we had code to recognize a switch that represents a linear function, I was hoping that it would kick in with your hoisting patch...
Comment 4 Andrew Pinski 2021-06-15 01:15:03 UTC
I wonder if we should always convert if-cases to switches and then do switchconv and then lower the smaller ones (3 if max) right after switchconv and then lower the rest in late as we do right now.  This should get the conv1 case.

The conv2 case problem is don't sink until late and never redo switchconv.

So here is the proposed pipeline:
iftoswitch - change to do it even without an heurstics
sink
switchconv
switchlower - early
Comment 5 Martin Liška 2021-08-03 09:37:41 UTC
(In reply to Richard Biener from comment #2)
> I've verified my old fix for PR33315 does

Can you please show me which patch from the PR does that?

> 
> still not optimized further tho.

When does the sinking happen in the optimization pipeline?
Comment 6 Martin Liška 2021-08-03 09:39:34 UTC
(In reply to Marc Glisse from comment #3)
> I thought we had code to recognize a switch that represents a linear
> function, I was hoping that it would kick in with your hoisting patch...

Yep, we have the code but as mentioned the sinking needs to happen first. Otherwise, you'll see:

g++ pr94566.C -O2 -fdump-tree-switchconv=/dev/stdout -std=gnu++2a

;; Function conv2 (_Z5conv2i, funcdef_no=90, decl_uid=8463, cgraph_uid=59, symbol_order=88)

beginning to process the following SWITCH statement (pr94566.C:4) : ------- 
switch (i_2(D)) <default: <L3> [INV], case -1: <L0> [INV], case 0: <L1> [INV], case 1: <L2> [INV]>

Bailing out - bad case - a non-final BB not empty
--------------------------------
struct strong_ordering conv2 (int i)
{
  struct strong_ordering D.8476;

  <bb 2> :
  switch (i_2(D)) <default: <L3> [INV], case -1: <L0> [INV], case 0: <L1> [INV], case 1: <L2> [INV]>

  <bb 3> :
<L0>:
  D.8476._M_value = -1;
  goto <bb 7>; [INV]

  <bb 4> :
<L1>:
  D.8476._M_value = 0;
  goto <bb 7>; [INV]

  <bb 5> :
<L2>:
  D.8476._M_value = 1;
  goto <bb 7>; [INV]

  <bb 6> :
<L3>:
  __builtin_unreachable ();

  <bb 7> :
  return D.8476;

}
Comment 7 Martin Liška 2021-08-03 09:41:56 UTC
> So here is the proposed pipeline:
> iftoswitch - change to do it even without an heurstics
> sink
> switchconv
> switchlower - early

This one is also not ideal as there might be an optimization that eliminates some of the switch case branches.

Note that we have a similar issue PR92005.
Comment 8 Oliver Schönrock 2022-03-14 16:53:33 UTC
how about:

#include <bit>
#include <compare>
#include <cstdint>

int conv3(std::strong_ordering s){
    return std::bit_cast<std::int8_t>(s);
}
std::strong_ordering conv4(int i){
    return std::bit_cast<std::strong_ordering>(static_cast<std::int8_t>(i));
}


conv3(std::strong_ordering):
        movsbl  %dil, %eax
        ret
conv4(int):
        movl    %edi, %eax
        ret


https://godbolt.org/z/szP5MGq4T
Comment 9 Jakub Jelinek 2022-03-14 17:10:10 UTC
That one isn't portable, relies on both the strong_ordering underlying implementation using an 8-bit integer member rather than something else, and also hardcodes the exact values where in C++ the -1 / 0 / 1 are exposition only and unordered value is -127 rather than what gcc uses (2).
By writing a series of ifs or switch one achieves portability and we'd just like to get efficient code if the values the user chose match those used by the implementation.
Comment 10 Oliver Schönrock 2022-03-14 17:15:18 UTC
I agree the switch optimisation is better, but...

 shouldn't std::bit_cast prevent incorrect casting with different underlying implementaion? (ie if the size doesn't match, and the size could be deduced with TMP) 

and "unordered value" doesn't apply to std::strong_ordering?
Comment 11 Jakub Jelinek 2022-03-14 17:25:06 UTC
(In reply to Oliver Schönrock from comment #10)
> I agree the switch optimisation is better, but...
> 
>  shouldn't std::bit_cast prevent incorrect casting with different underlying
> implementaion? (ie if the size doesn't match, and the size could be deduced
> with TMP) 

The size can be deduced, yes.  What the bits actually mean can't be.
 
> and "unordered value" doesn't apply to std::strong_ordering?

Sure, but this PR isn't just about strong_ordering, same problem applies for partial_ordering.
And actually not just those, but any case of some set of enumerators or macros where you don't know the values exactly and mapping them to or from a set of integer constants, ideally with 1:1 mapping but not guaranteed that way.
Comment 12 Andrew Pinski 2023-06-07 17:31:20 UTC
Aldy or Andrew, why in conv1 we don't get a range for 
  SR.4_4 = sD.8798._M_valueD.7665;

Even though the range we have is [-1,1] according to the __builtin_unreachable()?
It seems like we should get that range. Once we do get that the code works.
E.g. If we add:
  signed char *t = (signed char*)&s;
  signed char tt = *t;
  if (tt < -1 || tt > 1) __builtin_unreachable();

In the front before the other ifs, we get the code we are expecting.

conv2 has a similar issue too, though it has also a different issue of ordering for the comparisons.
Comment 13 Andrew Macleod 2023-06-07 20:27:28 UTC
(In reply to Andrew Pinski from comment #12)
> Aldy or Andrew, why in conv1 we don't get a range for 
>   SR.4_4 = sD.8798._M_valueD.7665;
> 
> Even though the range we have is [-1,1] according to the
> __builtin_unreachable()?
> It seems like we should get that range. Once we do get that the code works.
> E.g. If we add:
>   signed char *t = (signed char*)&s;
>   signed char tt = *t;
>   if (tt < -1 || tt > 1) __builtin_unreachable();
> 
> In the front before the other ifs, we get the code we are expecting.
> 
> conv2 has a similar issue too, though it has also a different issue of
> ordering for the comparisons.

its because the unreachable is after the branches, and we have multiple uses of SR.4_4 before the unreachable.

  <bb 2> [local count: 1073741824]:
  SR.4_4 = s._M_value;
  if (SR.4_4 == -1)
    goto <bb 6>; [50.00%]
  else
    goto <bb 3>; [50.00%]
  
  <bb 3> [local count: 536870913]:
  if (SR.4_4 == 0)
    goto <bb 6>; [50.00%]
  else
    goto <bb 4>; [50.00%]
    
  <bb 4> [local count: 268435456]:
  if (SR.4_4 == 1)
    goto <bb 6>; [100.00%]
  else
    goto <bb 5>; [0.00%]
    
  <bb 5> [count: 0]:
  __builtin_unreachable ();

  <bb 6> [local count: 1073741824]:
  # _1 = PHI <-1(2), 0(3), 1(4)>

We know when we get to bb5 that SR.4_4 is [-1, 1] for sure.
But we dont know that before we reach that spot.
if there was a call to 
  foo(SR.4_4)
in bb 3 for instance,   we wouldn't be able to propagate [-1,1] to the call to foo because it happens before we know for sure.    and foo may go and do something if it has a value of 6 and exit the compilation, thus never returning.

So we can only provide a range of [-1, 1] AFTER the unreachable, or if there is only a single use of it.. the multiples uses are what tricks it.

This has come up before.  we need some sort of backwards propagation that can propagate discovered values earlier into the IL to a point where it is known to be safe (ie, it wouldnt be able to propagate it past a call to foo() for instance)
In cases like this, we could discover it is safe to propagate that range back to the def point, and then we could set the global.

Until we add some smarts, either to the builtin unreachable elimination code, or elsewhere, which is aware of how to handle such side effects, we can't set the global because we dont know if it is safe at each use before the unreachable call.