Bug 97312 - [11 regression] aarch64/pr90838.c fails
Summary: [11 regression] aarch64/pr90838.c fails
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: 11.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-07 07:06 UTC by Christophe Lyon
Modified: 2020-10-12 09:31 UTC (History)
2 users (show)

See Also:
Host:
Target: aarch64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-10-07 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Lyon 2020-10-07 07:06:43 UTC
I've noticed that
FAIL: gcc.target/aarch64/pr90838.c scan-assembler-times and\t 2

This appeared between r11-3681 (g:29c650cd899496c4f9bc069d03d0d7ecfb632176)
and r11-3685 (g:fcae5121154d1c3382b056bcc2c563cedac28e74)
but r11-3684 broke the toolchain build so the wrong commit may be either of those.
Comment 1 Aldy Hernandez 2020-10-07 13:10:34 UTC
Confirmed.

This test is checking the final assembly for a specific sequence.  I don't speak aarch64 assembly, but the IL is different coming out of evrp.

The first culprit is this difference in the mergephi1 dump:

   _9 = .CTZ (x_6(D));
-  _10 = _9 & 31;
+  _10 = _9;

These are unsigned ints, so assuming they are 32 bits on aarch64, __builtin_ctz is always less than 32.  This is because a CTZ of 0 is undefined according to the GCC manual:

Built-in Function: int __builtin_ctz (unsigned int x)

    Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined. 

So a bitwise AND of anything less than 32 with 0x1f (31) is a no-op.

Are aarch64 int's 32-bits?

Here are the full IL differences:

--- legacy-evrp/pr90838.c.038t.mergephi1        2020-10-07 08:44:12.152358885 -0400
+++ ranger/pr90838.c.038t.mergephi1     2020-10-07 08:39:12.339296502 -0400
@@ -1,41 +1,41 @@
 
 ;; Function ctz1 (ctz1, funcdef_no=0, decl_uid=3587, cgraph_uid=1, symbol_order=0)
 
 ctz1 (unsigned int x)
 {
   static const char table[32] = "\x00\x01\x1c\x02\x1d\x0e\x18\x03\x1e\x16\x14\x0f\x19\x11\x04\b\x1f\x1b\r\x17\x15\x13\x10\x07\x1a\f\x12\x06\v\x05\n\t";
   unsigned int _1;
   unsigned int _2;
   unsigned int _3;
   unsigned int _4;
   char _5;
   int _9;
   int _10;
 
   <bb 2> :
   _1 = -x_6(D);
   _2 = _1 & x_6(D);
   _3 = _2 * 125613361;
   _4 = _3 >> 27;
   _9 = .CTZ (x_6(D));
-  _10 = _9 & 31;
+  _10 = _9;
   _5 = (char) _10;
   return _10;
 
 }
 
 
 
 ;; Function ctz2 (ctz2, funcdef_no=1, decl_uid=3591, cgraph_uid=2, symbol_order=1)
 
 ctz2 (unsigned int x)
 {
   static short int table[64] = {32, 0, 1, 12, 2, 6, 0, 13, 3, 0, 7, 0, 0, 0, 0, 14, 10, 4, 0, 0, 8, 0, 0, 25, 0, 0, 0, 0, 0, 21, 27, 15, 31, 11, 5, 0, 0, 0, 0, 0, 9, 0, 0, 
24, 0, 0, 20, 26, 30, 0, 0, 0, 0, 23, 0, 19, 29, 0, 22, 18, 28, 17, 16, 0};
   unsigned int _1;
   unsigned int _2;
   unsigned int _3;
   short int _4;
   int _8;
 
   <bb 2> :
   _1 = -x_5(D);
@@ -87,27 +87,27 @@
 
 
 ;; Function ctz4 (ctz4, funcdef_no=3, decl_uid=3601, cgraph_uid=4, symbol_order=5)
 
 ctz4 (long unsigned int x)
 {
   long unsigned int lsb;
   long unsigned int _1;
   long long unsigned int _2;
   long long unsigned int _3;
   char _4;
   int _9;
   int _10;
 
   <bb 2> :
   _1 = -x_5(D);
   lsb_6 = _1 & x_5(D);
   _2 = lsb_6 * 283881067100198605;
   _3 = _2 >> 58;
   _9 = .CTZ (x_5(D));
-  _10 = _9 & 63;
+  _10 = _9;
   _4 = (char) _10;
   return _10;
 
 }

The difference in assembly matches.  We have 2 less AND's in the final output:

$ diff -u legacy.s ranger.s
--- legacy.s    2020-10-07 09:06:13.420446783 -0400
+++ ranger.s    2020-10-07 09:06:42.646646949 -0400
@@ -8,7 +8,6 @@
 ctz1:
        rbit    w0, w0
        clz     w0, w0
-       and     w0, w0, 31
        ret
        .size   ctz1, .-ctz1
        .align  2
@@ -36,7 +35,6 @@
 ctz4:
        rbit    x0, x0
        clz     x0, x0
-       and     w0, w0, 63
        ret
        .size   ctz4, .-ctz4

If my analysis is correct, someone aarch64 savvy should adjust this:

/* { dg-final { scan-assembler-times "and\t" 2 } } */
Comment 2 GCC Commits 2020-10-09 08:20:02 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:781634daea8cb788efb33994f4a19df76598542e

commit r11-3744-g781634daea8cb788efb33994f4a19df76598542e
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Oct 9 10:19:16 2020 +0200

    vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801]
    
    > Perhaps another way out of this would be document and enforce that
    > __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
    > calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2
    
    The following patch implements that, i.e. __builtin_c?z* now take full
    advantage of them being UB at zero, while the ifns are well defined at zero
    if *_DEFINED_VALUE_AT_ZERO (*) == 2.  That is what fixes PR94801.
    
    Furthermore, to fix PR97312, if it is well defined at zero and the value at
    zero is prec, we don't lower the maximum unless the argument is known to be
    non-zero.
    For gimple-range.cc I guess we could improve it if needed e.g. by returning
    a [0,7][32,32] range for .CTZ of e.g. [0,137], but for now it (roughly)
    matches what vr-values.c does.
    
    2020-10-09  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94801
            PR target/97312
            * vr-values.c (vr_values::extract_range_basic) <CASE_CFN_CLZ,
            CASE_CFN_CTZ>: When stmt is not an internal-fn call or
            C?Z_DEFINED_VALUE_AT_ZERO is not 2, assume argument is not zero
            and thus use [0, prec-1] range unless it can be further improved.
            For CTZ, don't update maxi from upper bound if it was previously prec.
            * gimple-range.cc (gimple_ranger::range_of_builtin_call) <CASE_CFN_CLZ,
            CASE_CFN_CTZ>: Likewise.
    
            * gcc.dg/tree-ssa/pr94801.c: New test.
Comment 3 Aldy Hernandez 2020-10-12 09:31:14 UTC
fixed in trunk