As mentioned in a recent discussion (https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01617.html), a truncating unsigned conversion such as from unsigned long to unsigned in LP64 appears to defeat VRP and prevent GCC from emitting optimal code. In the test case below, the indirect call to abort() in g() is optimized away because n is known to be less than 3. However, the same indirect call to abort() is not optimized away in h(). $ cat t.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout t.c void f (unsigned long n) { if (n > 3) __builtin_abort (); } void g (unsigned n) { if (n < 3) f (n); } void h (unsigned long m) { unsigned n = m; if (n < 3) f (n); } ;; Function f (f, funcdef_no=0, decl_uid=1795, cgraph_uid=0, symbol_order=0) f (long unsigned int n) { <bb 2> [100.00%]: if (n_1(D) > 3) goto <bb 3>; [0.04%] else goto <bb 4>; [99.96%] <bb 3> [0.04%]: __builtin_abort (); <bb 4> [99.96%]: return; } ;; Function g (g, funcdef_no=1, decl_uid=1798, cgraph_uid=1, symbol_order=1) g (unsigned int n) { <bb 2> [100.00%]: return; } ;; Function h (h, funcdef_no=2, decl_uid=1801, cgraph_uid=2, symbol_order=2) Removing basic block 6 Removing basic block 7 h (long unsigned int m) { unsigned int n; long unsigned int _5; <bb 2> [100.00%]: n_2 = (unsigned int) m_1(D); if (n_2 <= 2) goto <bb 3>; [54.00%] else goto <bb 5>; [46.00%] <bb 3> [54.00%]: _5 = m_1(D) & 4294967295; if (_5 > 3) goto <bb 4>; [0.04%] else goto <bb 5>; [99.96%] <bb 4> [0.02%]: __builtin_abort (); <bb 5> [99.98%]: return; }
I suspect this might be the same issue as pointed out in bug 71690.
Looks like h is optimized the same as g when compiling for 32-bits: $ /usr/local/bin/gcc -c -O2 -S -Wall -fdump-tree-optimized=/dev/stdout -Wextra -pedantic -Wconversion 79191.c ;; Function f (f, funcdef_no=0, decl_uid=1769, cgraph_uid=0, symbol_order=0) f (long unsigned int n) { <bb 2> [100.00%] [count: INV]: if (n_1(D) > 3) goto <bb 3>; [0.04%] [count: 0] else goto <bb 4>; [99.96%] [count: INV] <bb 3> [0.04%] [count: 0]: __builtin_abort (); <bb 4> [99.96%] [count: INV]: return; } ;; Function g (g, funcdef_no=1, decl_uid=1772, cgraph_uid=1, symbol_order=1) g (unsigned int n) { <bb 2> [100.00%] [count: INV]: return; } ;; Function h (h, funcdef_no=4, decl_uid=1775, cgraph_uid=2, symbol_order=2) h (long unsigned int m) { <bb 2> [100.00%] [count: INV]: return; } $ When I add an explicit "-m64" to the compile command I get the same results as you, though. Also, it'd be nice if there were a warning from -Wconversion for the assignment from m to n in h. Or a warning from -Wimplicit-int for uses of bare "unsigned" without the "int" in general (I've actually been meaning to file a separate bug for that, but haven't gotten around to it yet). Anyways, confirming for LP64
(In reply to Eric Gallager from comment #2) > When I add an explicit "-m64" to the compile command I get the same results > as you, though. Also, it'd be nice if there were a warning from -Wconversion > for the assignment from m to n in h. Or a warning from -Wimplicit-int for > uses of bare "unsigned" without the "int" in general (I've actually been > meaning to file a separate bug for that, but haven't gotten around to it > yet). > Actually looking back on the output there is actually a warning from -Wconversion when I use -m64 so never mind about that part: $ /usr/local/bin/gcc -c -O2 -S -Wall -fdump-tree-optimized=/dev/stdout -Wextra -pedantic -Wconversion -m64 79191.c 79191.c: In function ‘h’: 79191.c:15:15: warning: conversion from ‘long unsigned int’ to ‘unsigned int’ may change value [-Wconversion] unsigned n = m; ^ ;; Function f (f, funcdef_no=0, decl_uid=1832, cgraph_uid=0, symbol_order=0) f (long unsigned int n) { <bb 2> [100.00%] [count: INV]: if (n_1(D) > 3) goto <bb 3>; [0.04%] [count: 0] else goto <bb 4>; [99.96%] [count: INV] <bb 3> [0.04%] [count: 0]: __builtin_abort (); <bb 4> [99.96%] [count: INV]: return; } ;; Function g (g, funcdef_no=1, decl_uid=1835, cgraph_uid=1, symbol_order=1) g (unsigned int n) { <bb 2> [100.00%] [count: INV]: return; } ;; Function h (h, funcdef_no=2, decl_uid=1838, cgraph_uid=2, symbol_order=2) Removing basic block 6 Removing basic block 7 h (long unsigned int m) { unsigned int n; long unsigned int _5; <bb 2> [100.00%] [count: INV]: n_2 = (unsigned int) m_1(D); if (n_2 <= 2) goto <bb 3>; [50.00%] [count: INV] else goto <bb 5>; [50.00%] [count: INV] <bb 3> [50.00%] [count: INV]: _5 = m_1(D) & 4294967292; if (_5 != 0) goto <bb 4>; [0.04%] [count: 0] else goto <bb 5>; [99.96%] [count: INV] <bb 4> [0.02%] [count: 0]: __builtin_abort (); <bb 5> [99.98%] [count: INV]: return; } $
Where exactly in the compiler is this optimization supposed to be done and who's the maintainer of that file?
(In reply to Eric Gallager from comment #4) > Where exactly in the compiler is this optimization supposed to be done and Eric, that's part of the problem. The optimization already exists in VRP, but it is defeated by another optimization in match.pd. We could restrict the match.pd optimization with single_use, although this might cause us to miss other optimizations. If we replace Y=(unsigned long)(unsigned)X with Y=X&4294967295, we could try and replace (unsigned)X<3 with Y<3 at the same time, but it isn't clear how/where/when to do that (restrict the transformation to single_use in match.pd and add the super special code in tree-ssa-forwprop.c?). We could try and introduce some magic in VRP (without value numbering, it could be, when processing X & 4294967295, checking the other uses of X for a conversion to a 32-bit int and using its interval if there is a suitable domination relation between the relevant statements), but that seems hard and ugly. Basically the PR seems to be asking for a better idea on where to perform this optimization ;-)
There are a couple of things going on. VRP understands casts perfectly well. If we were to run EVRP "earlier", we get this fine in EVRP. ie. with -fno-tree-forwprop we get <bb 2> : n_4 = (unsigned int) m_3(D); if (n_4 <= 2) goto <bb 3>; [INV] else goto <bb 5>; [INV] <bb 3> : _1 = (long unsigned int) n_4; if (_1 > 3) goto <bb 4>; [0.00%] else goto <bb 5>; [100.00%] <bb 4> [count: 0]: __builtin_abort (); and that works out fine: <bb 2> : n_4 = (unsigned int) m_3(D); if (n_4 <= 2) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 3> : _1 = (long unsigned int) n_4; <bb 4> : return; With forwprop, we get the masking of the original value: n_4 = (unsigned int) m_3(D); if (n_4 <= 2) goto <bb 3>; [INV] else goto <bb 5>; [INV] <bb 3> : _7 = m_3(D) & 4294967295; _8 = m_3(D) & 4294967292; if (_8 != 0) goto <bb 4>; [0.00%] Ranger knows the range of m_3 on entry to BB3 is: [0, 2][4294967296, 18446744069414584322] we cant enumerate all the ranges that have [0,2] in the lower words, or there'd just be too many subranges. and sadly, if that code were instead: _7 = m_3(D) & 4294967295; _8 = _7 & 4294967292; we would again get [0,2] for a range for _8. but no. someone thinks they are smarter again. Why the heck don't we do that?? Another way Ranger is going to be able to figure this out would be when we implement bitmask tracking along with ranges. n_4 = (unsigned int) m_3(D); if (n_4 <= 2) goto <bb 3>; [INV] in addition to [0, 2][4294967296, 18446744069414584322] we'd should be able to track in addition the lower bit are masked with 0x03 The final option that may eventually handle this is the new equivalency tracking slated for next release. It will track n_4 as a "less precision" equivalence of m_3, and when thats is tied in with the rangers knowledge that n_4 is [0,2], it should (hopefully) be able to recognize that _8 = m_3(D) & 4294967292; brings m_3 into the same precision as n_4 and then use the value of n_4. but we aren't there yet. Maybe we need an early early VRP pass.. right after early optimizations before the IL gets mucked with too much :-P
(In reply to Andrew Macleod from comment #6) > Ranger knows the range of m_3 on entry to BB3 is: > [0, 2][4294967296, 18446744069414584322] > we cant enumerate all the ranges that have [0,2] in the lower words, or > there'd just be too many subranges. > > and sadly, if that code were instead: > _7 = m_3(D) & 4294967295; > _8 = _7 & 4294967292; > > we would again get [0,2] for a range for _8. but no. someone thinks they > are smarter again. Why the heck don't we do that?? > scratch that. we still wouldn't get it. I was delusional. we still dont know that all the upper ranges don't fill some of those bits.. ie 0x100000000 thru 0x1FFFFFFF is in the range of m_3... and so we'd still fail the masking test. only bitmask tracking or precision sensitive equivalence will capture this case.
If we take a look at the current dump: ``` =========== BB 2 ============ Imports: m_1(D) Exports: m_1(D) n_2 n_2 : m_1(D)(I) m_1(D) [irange] long unsigned int VARYING Partial equiv (n_2 pe32 m_1(D)) <bb 2> [local count: 1073741824]: n_2 = (unsigned int) m_1(D); if (n_2 <= 2) goto <bb 3>; [50.00%] else goto <bb 5>; [50.00%] 2->3 (T) m_1(D) : [irange] long unsigned int [0, 2][4294967296, 18446744069414584322] 2->3 (T) n_2 : [irange] unsigned int [0, 2] NONZERO 0x3 ``` Since we know that n_3 has a nonzero value of 0x3, then m_1 has a nonzero of 0xffffffff00000003 since it is a truncating cast, we can only say what the lower bits are. And then it will just work correctly I think ...