Bug 95699 - __builtin_constant_p inconsistencies
Summary: __builtin_constant_p inconsistencies
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-16 15:39 UTC by Vincent Lefèvre
Modified: 2021-05-30 20:05 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
gcc11-pr95699.patch (1.62 KB, patch)
2020-06-17 11:51 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Lefèvre 2020-06-16 15:39:34 UTC
Consider the following code on x86_64 (written for a 64-bit long), compiled with -O2.

With gcc-9 (Debian 9.3.0-13) 9.3.0, I get:
0
1
1

With gcc-10 (Debian 10.1.0-3) 10.1.0, I get:
0
1
0

I'm not sure about the exact __builtin_constant_p specification, i.e. whether it may be duplicated into two branches so that the argument can become a constant for GCC (I think it should as this may allow the selection of code optimized for constants), and how this can be influenced by "volatile".

But the first two cases are very similar, so that I would expect the same value, but I get 0 / 1 with both GCC 9 and GCC 10. Concerning the third case, the constantness has been lost in GCC 10, which is unexpected if a result 1 is allowed.

Moreover, if I remove the second condition ("&& ...") in the printf, I always get 0 (i.e. false), which is counter-intuitive: adding a "&& ..." condition should never change a false condition to true (the results 1 I obtain above).

int printf (const char *__restrict __format, ...);

#undef C
#define C 0x7fffffffffffffffUL

static void tst1a (void)
{
  volatile unsigned long r0 = 0;
  unsigned long r;

  r = r0;
  if (r < C)
    r = C;

  printf ("%d\n", __builtin_constant_p (r) && r == C);
}

#undef C
#define C 0x8000000000000000UL

static void tst1b (void)
{
  volatile unsigned long r0 = 0;
  unsigned long r;

  r = r0;
  if (r < C)
    r = C;

  printf ("%d\n", __builtin_constant_p (r) && r == C);
}

static void tst2 (void)
{
  volatile unsigned long r0 = 0;
  unsigned long r;

  r = r0;
  if (r < 0x8000000000000000)
    r = 0x8000000000000000;
  r *= r;

  printf ("%d\n", __builtin_constant_p (r) && r < 1);
}

int main (void)
{
  tst1a ();
  tst1b ();
  tst2 ();
  return 0;
}
Comment 1 Andrew Pinski 2020-06-16 18:31:22 UTC
Signed integer overflow is undefined.
Comment 2 Andrew Pinski 2020-06-16 18:34:36 UTC
Oh never mind. It is about [

if (r < 0x8000000000000000)
    r = 0x8000000000000000;
  r *= r;
__builtin_constant

Well it is Jump threading related.
Comment 3 Jakub Jelinek 2020-06-16 18:59:26 UTC
Yeah, in the GCC9 case, there is jump threading of the __builtin_constant_p, so
it is essentially
if (r < 0x8000000000000000)
  printf ("%d\n", __builtin_constant_p (0) && 0 < 1);
else
  printf ("%d\n", __builtin_constant_p (r * r) && r * r < 1);
where in the first printf __builtin_constant_p evaluates to 1 and 0 < 1 too,
while in the second one __builtin_constant_p (r * r) evaluates to 0.
In GCC 10, this just doesn't happen, since r10-3106-g46dfa8ad6c18feb45d35734eae38798edb7c38cd aka PR90387 fix.
Comment 4 Richard Biener 2020-06-17 06:17:11 UTC
I'm inclined to close as WONTFIX or INVALID.  There are several other PRs which
show "surprising" behavior with respect to __builtin_constant_p and jump threading.
Comment 5 Vincent Lefèvre 2020-06-17 10:03:04 UTC
(In reply to Richard Biener from comment #4)
> I'm inclined to close as WONTFIX or INVALID.  There are several other PRs
> which
> show "surprising" behavior with respect to __builtin_constant_p and jump
> threading.

But concerning tst1a and tst1b is there any reason that 0x7fffffffffffffffUL and 0x8000000000000000UL are handled in a different way, while the only type involved in these tests is unsigned long? I don't see why the value of the most significant bit would matter here.

And isn't a missed optimization regarded as a valid bug?
Comment 6 Jakub Jelinek 2020-06-17 10:12:17 UTC
I don't see why that should be considered a bug.
All the tests are using __builtin_constant_p in a way that it wasn't designed for, where it changes the behavior of the program whether it evaluates to 0 or 1.
The intended use of this builtin is to allow optimizations when something can be determined to be a compile time constant and have perhaps slower fallback that does the same thing, where e.g. the former can actually be much more expensive if it is not compile time constant and the latter e.g. a library call.
Comment 7 Jakub Jelinek 2020-06-17 10:33:18 UTC
As for the difference between the first two functions, that boils down to:
unsigned long long f1 (unsigned long long x) { if (x < 0x7fffffffffffffffULL) x = 0x7fffffffffffffffULL; return x; }
unsigned long long f2 (unsigned long long x) { if (x < 0x8000000000000000ULL) x = 0x8000000000000000ULL; return x; }
where the former is optimized into MAX_EXPR by phiopt1, but the latter is not, because another optimization which transforms such comparison into (long long) x >= 0 stands in a way.
Guess we can modify minmax_replacement to virtually undo that.
Comment 8 Vincent Lefèvre 2020-06-17 11:02:07 UTC
(In reply to Jakub Jelinek from comment #6)
> I don't see why that should be considered a bug.
> All the tests are using __builtin_constant_p in a way that it wasn't
> designed for, where it changes the behavior of the program whether it
> evaluates to 0 or 1.

Well, this was just meant to be a simplified testcase and to easily see whether __builtin_constant_p gave true or false. But in GMP's longlong.h file (used by GNU MPFR), there is similar code for aarch64, where the assembly code is different whether the variable is regarded as a constant or not:

#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
  do {                                                                  \
    if (__builtin_constant_p (bl) && -(UDItype)(bl) < 0x1000)           \
      __asm__ ("adds\t%1, %x4, %5\n\tsbc\t%0, %x2, %x3"                 \
               : "=r,r" (sh), "=&r,&r" (sl)                             \
               : "rZ,rZ" ((UDItype)(ah)), "rZ,rZ" ((UDItype)(bh)),      \
                 "r,Z"   ((UDItype)(al)), "rI,r" (-(UDItype)(bl)) __CLOBBER_CC);\
    else                                                                \
      __asm__ ("subs\t%1, %x4, %5\n\tsbc\t%0, %x2, %x3"                 \
               : "=r,r" (sh), "=&r,&r" (sl)                             \
               : "rZ,rZ" ((UDItype)(ah)), "rZ,rZ" ((UDItype)(bh)),      \
                 "r,Z"   ((UDItype)(al)), "rI,r"  ((UDItype)(bl)) __CLOBBER_CC);\
  } while(0);

The assembly code is actually buggy in the "if" case (this was how one could find out that there was a difference between GCC 9, where the "if" case was selected, and GCC 10, where the "else" case was selected), but I doubt that GCC is aware about this bug:

  https://sympa.inria.fr/sympa/arc/mpfr/2020-06/msg00052.html
  https://sympa.inria.fr/sympa/arc/mpfr/2020-06/msg00059.html

I suppose that the general goal of this test using __builtin_constant_p is to have faster assembly code when some value is known at compile time. So the intent (with the fixed assembly code[*]) is to have the same behavior, but have faster code if possible. This is what we got with GCC 9, but this is no longer the case with GCC 10.

[*] https://gmplib.org/list-archives/gmp-bugs/2020-June/004807.html
Comment 9 Jakub Jelinek 2020-06-17 11:51:58 UTC
Created attachment 48749 [details]
gcc11-pr95699.patch

Untested patch to improve the minmax optimization.
Comment 10 CVS Commits 2020-06-18 10:13:05 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:2e0f4a18bc978c73624dd016e4cce229c2809c9c

commit r11-1504-g2e0f4a18bc978c73624dd016e4cce229c2809c9c
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jun 18 12:11:09 2020 +0200

    phiopt: Improve minmax optimization [PR95699]
    
    As discussed in the PR, the
    x < 0x80000000U to (int) x >= 0
    optimization stands in the way of minmax_replacement optimization,
    so for comparisons with most of the constants it works well, but when the
    above mentioned optimization triggers, it is unable to do it.
    The match.pd (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2)
    optimization is able to look through that and this patch
    teaches minmax_replacement about it too.
    
    2020-06-18  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/95699
            * tree-ssa-phiopt.c (minmax_replacement): Treat (signed int)x < 0
            as x > INT_MAX and (signed int)x >= 0 as x <= INT_MAX.  Move variable
            declarations to the statements that set them where possible.
    
            * gcc.dg/tree-ssa/pr95699.c: New test.