cat test.c void foo(void); static char a; static short d(unsigned e) { char b; short c; a = b = e; if (b) return 0; if (1 >= e) { c = e == 0; if (c) foo(); } return 0; } int main() { d(a ^ 233); } 10.3.0 at -O3 can eliminate the call to foo but neither trunk nor 11.2.0 at -O3 can: gcc-10 -O3 -S test.c -o /dev/stdout main: .LFB1: .cfi_startproc xorb $-23, a(%rip) xorl %eax, %eax ret .cfi_endproc gcc-11 -O3 -S test.c -o /dev/stdout ... main: .LFB1: .cfi_startproc movsbl a(%rip), %eax xorb $-23, %al movb %al, a(%rip) cmpl $1, %eax ja .L10 testb %al, %al je .L14 .L10: xorl %eax, %eax ret .L14: pushq %rax .cfi_def_cfa_offset 16 call foo xorl %eax, %eax popq %rdx .cfi_def_cfa_offset 8 ret .cfi_endproc gcc-trunk -O3 -S test.c -o /dev/stdout main: .LFB1: .cfi_startproc movsbl a(%rip), %eax xorb $-23, %al movb %al, a(%rip) cmpl $1, %eax ja .L10 testb %al, %al je .L14 .L10: xorl %eax, %eax ret .L14: pushq %rax .cfi_def_cfa_offset 16 call foo xorl %eax, %eax popq %rdx .cfi_def_cfa_offset 8 ret .cfi_endproc gcc-trunk -v Using built-in specs. Target: x86_64-pc-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.0.0 20211022 (experimental) (GCC) Introduced with https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=fcae5121154d1c3382b056bcc2c563cedac28e74
Confirmed. In GCC 10, forwprop2 gets: a.0_1 = a; _2 = (int) a.0_1; _3 = _2 ^ 233; _4 = (unsigned int) _3; b_7 = (char) _4; a = b_7; if (b_7 != 0) goto <bb 6>; [34.00%] else goto <bb 3>; [66.00%] <bb 3> [local count: 708669601]: if (_4 <= 1) goto <bb 4>; [41.00%] else goto <bb 6>; [59.00%] <bb 4> [local count: 290554533]: if (_4 == 0) goto <bb 5>; [33.00%] else goto <bb 6>; [67.00%] <bb 5> [local count: 95882995]: foo (); While in GCC 11 we get: a.0_1 = a; _2 = (int) a.0_1; _3 = _2 ^ 233; _4 = (unsigned int) _3; b_7 = (char) _4; a = b_7; if (b_7 != 0) goto <bb 5>; [34.00%] else goto <bb 3>; [66.00%] <bb 3> [local count: 708669601]: if (_4 <= 1) goto <bb 4>; [25.50%] else goto <bb 5>; [74.50%] I still can't figure out why forwprop2 can do it in GCC 10 but not in GCC 11.
r11-3685 is bad and r11-3683 is good.
Im not sure what the pre-ranger trick was, but the shortcoming we have it the following: a.0_1 = a; _2 = (int) a.0_1; _3 = _2 ^ 233; _4 = (unsigned int) _3; b_8 = (char) _3; a = b_8; if (b_8 != 0) we know _2 : int [-128, 127] but when we calculate _3, [-128, 127] ^ 233 uses the original bitwise XOR code, and it returns VARYING for that range. therefore We only know _3 is VARYING and therefore 2->3 (F) _3 : int [-INF, -256][0, 0][256, +INF] 2->3 (F) _4 : unsigned int [0, 0][256, 4294967040] When when we later get to if (_4 <= 1) goto <bb 4>; [25.50%] we're kinda of stuck. whereas in reality, properly calculated, we'd know that _3 = [-128, 127], _4 = [-128, 127] And as you can see on the outgoing edges, we see thru the casts to trim out the other bits in _3 and _4 on the 2->3 edge, so with those proper inputs, we would end up with _4 and _3 == [0,0]. so, if no one else gets to it, I'll eventually teach range-op.cc::operator_bitwise_xor::wi_fold to do something about this. special case constants, or maybe look at the ranges and if the RHS fits within the LHS effective precision, produce a better result.
Patch proposed https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html
GCC 11.3 is being released, retargeting bugs to GCC 11.4.
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>: https://gcc.gnu.org/g:b3e98eb3396e16ae8b20c94916bc2bd7862d2c97 commit r13-89-gb3e98eb3396e16ae8b20c94916bc2bd7862d2c97 Author: Roger Sayle <roger@nextmovesoftware.com> Date: Tue May 3 14:38:50 2022 -0400 PR tree-optimization/102950: Improved EVRP for signed BIT_XOR_EXPR. This patch fixes PR tree-optimization/102950, which is a P2 regression, by providing better range bounds for BIT_XOR_EXPR, BIT_AND_EXPR and BIT_IOR_EXPR on signed integer types. In general terms, any binary bitwise operation on sign-extended or zero-extended integer types will produce results that are themselves sign-extended or zero-extended. More precisely, we can derive signed bounds from the number of leading redundant sign bit copies, from the equation: clrsb(X op Y) >= min (clrsb (X), clrsb(Y)) and from the property that for any (signed or unsigned) range [lb, ub] that clrsb([lb, ub]) >= min (clrsb(lb), clrsb(ub)). These can be used to show that [-1, 0] op [-1, 0] is [-1, 0] or that [-128, 127] op [-128, 127] is [-128, 127], even when tracking nonzero bits would result in VARYING (as every bit can be 0 or 1). This is equivalent to determining the minimum type precision in which the operation can be performed then sign extending the result. One additional refinement is to observe that X ^ Y can never be zero if the ranges of X and Y don't overlap, i.e. X can't be equal to Y. Previously, the expression "(int)(char)a ^ 233" in the PR was considered VARYING, but with the above changes now has the range [-256, -1][1, 255], which is sufficient to optimize away the call to foo. 2022-05-03 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR tree-optimization/102950 * range-op.cc (wi_optimize_signed_bitwise_op): New function to determine bounds of bitwise operations on signed types. (operator_bitwise_and::wi_fold): Call the above function. (operator_bitwise_or::wi_fold): Likewise. (operator_bitwise_xor::wi_fold): Likewise. Additionally, the result can't be zero if the operands can't be equal. gcc/testsuite/ChangeLog PR tree-optimization/102950 * gcc.dg/pr102950.c: New test case. * gcc.dg/tree-ssa/evrp10.c: New test case.
This has now been fixed on mainline.
Fixed. Thanks Roger.