I've noticed that FAIL: gcc.target/aarch64/pr90838.c scan-assembler-times and\t 2 This appeared between r11-3681 (g:29c650cd899496c4f9bc069d03d0d7ecfb632176) and r11-3685 (g:fcae5121154d1c3382b056bcc2c563cedac28e74) but r11-3684 broke the toolchain build so the wrong commit may be either of those.
Confirmed. This test is checking the final assembly for a specific sequence. I don't speak aarch64 assembly, but the IL is different coming out of evrp. The first culprit is this difference in the mergephi1 dump: _9 = .CTZ (x_6(D)); - _10 = _9 & 31; + _10 = _9; These are unsigned ints, so assuming they are 32 bits on aarch64, __builtin_ctz is always less than 32. This is because a CTZ of 0 is undefined according to the GCC manual: Built-in Function: int __builtin_ctz (unsigned int x) Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined. So a bitwise AND of anything less than 32 with 0x1f (31) is a no-op. Are aarch64 int's 32-bits? Here are the full IL differences: --- legacy-evrp/pr90838.c.038t.mergephi1 2020-10-07 08:44:12.152358885 -0400 +++ ranger/pr90838.c.038t.mergephi1 2020-10-07 08:39:12.339296502 -0400 @@ -1,41 +1,41 @@ ;; Function ctz1 (ctz1, funcdef_no=0, decl_uid=3587, cgraph_uid=1, symbol_order=0) ctz1 (unsigned int x) { static const char table[32] = "\x00\x01\x1c\x02\x1d\x0e\x18\x03\x1e\x16\x14\x0f\x19\x11\x04\b\x1f\x1b\r\x17\x15\x13\x10\x07\x1a\f\x12\x06\v\x05\n\t"; unsigned int _1; unsigned int _2; unsigned int _3; unsigned int _4; char _5; int _9; int _10; <bb 2> : _1 = -x_6(D); _2 = _1 & x_6(D); _3 = _2 * 125613361; _4 = _3 >> 27; _9 = .CTZ (x_6(D)); - _10 = _9 & 31; + _10 = _9; _5 = (char) _10; return _10; } ;; Function ctz2 (ctz2, funcdef_no=1, decl_uid=3591, cgraph_uid=2, symbol_order=1) ctz2 (unsigned int x) { static short int table[64] = {32, 0, 1, 12, 2, 6, 0, 13, 3, 0, 7, 0, 0, 0, 0, 14, 10, 4, 0, 0, 8, 0, 0, 25, 0, 0, 0, 0, 0, 21, 27, 15, 31, 11, 5, 0, 0, 0, 0, 0, 9, 0, 0, 24, 0, 0, 20, 26, 30, 0, 0, 0, 0, 23, 0, 19, 29, 0, 22, 18, 28, 17, 16, 0}; unsigned int _1; unsigned int _2; unsigned int _3; short int _4; int _8; <bb 2> : _1 = -x_5(D); @@ -87,27 +87,27 @@ ;; Function ctz4 (ctz4, funcdef_no=3, decl_uid=3601, cgraph_uid=4, symbol_order=5) ctz4 (long unsigned int x) { long unsigned int lsb; long unsigned int _1; long long unsigned int _2; long long unsigned int _3; char _4; int _9; int _10; <bb 2> : _1 = -x_5(D); lsb_6 = _1 & x_5(D); _2 = lsb_6 * 283881067100198605; _3 = _2 >> 58; _9 = .CTZ (x_5(D)); - _10 = _9 & 63; + _10 = _9; _4 = (char) _10; return _10; } The difference in assembly matches. We have 2 less AND's in the final output: $ diff -u legacy.s ranger.s --- legacy.s 2020-10-07 09:06:13.420446783 -0400 +++ ranger.s 2020-10-07 09:06:42.646646949 -0400 @@ -8,7 +8,6 @@ ctz1: rbit w0, w0 clz w0, w0 - and w0, w0, 31 ret .size ctz1, .-ctz1 .align 2 @@ -36,7 +35,6 @@ ctz4: rbit x0, x0 clz x0, x0 - and w0, w0, 63 ret .size ctz4, .-ctz4 If my analysis is correct, someone aarch64 savvy should adjust this: /* { dg-final { scan-assembler-times "and\t" 2 } } */
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:781634daea8cb788efb33994f4a19df76598542e commit r11-3744-g781634daea8cb788efb33994f4a19df76598542e Author: Jakub Jelinek <jakub@redhat.com> Date: Fri Oct 9 10:19:16 2020 +0200 vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801] > Perhaps another way out of this would be document and enforce that > __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn > calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2 The following patch implements that, i.e. __builtin_c?z* now take full advantage of them being UB at zero, while the ifns are well defined at zero if *_DEFINED_VALUE_AT_ZERO (*) == 2. That is what fixes PR94801. Furthermore, to fix PR97312, if it is well defined at zero and the value at zero is prec, we don't lower the maximum unless the argument is known to be non-zero. For gimple-range.cc I guess we could improve it if needed e.g. by returning a [0,7][32,32] range for .CTZ of e.g. [0,137], but for now it (roughly) matches what vr-values.c does. 2020-10-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/94801 PR target/97312 * vr-values.c (vr_values::extract_range_basic) <CASE_CFN_CLZ, CASE_CFN_CTZ>: When stmt is not an internal-fn call or C?Z_DEFINED_VALUE_AT_ZERO is not 2, assume argument is not zero and thus use [0, prec-1] range unless it can be further improved. For CTZ, don't update maxi from upper bound if it was previously prec. * gimple-range.cc (gimple_ranger::range_of_builtin_call) <CASE_CFN_CLZ, CASE_CFN_CTZ>: Likewise. * gcc.dg/tree-ssa/pr94801.c: New test.
fixed in trunk