Since r236831 I get: ... LD init/built-in.o kernel/built-in.o:memremap.c:function update_wall_time: error: undefined reference to '____ilog2_NaN' make: *** [Makefile:955: vmlinux] Error 1 markus@x4 linux % cat timekeeping.i int a, b; extern int ____ilog2_NaN(void); void by(void) { int c = 1; b = a ?: c; __builtin_constant_p(b) ? b ? ____ilog2_NaN() : 0 : 0; } markus@x4 linux % icc -O2 -c timekeeping.i -S -o - | grep ____ilog2_NaN markus@x4 linux % clang -O2 -c timekeeping.i -S -o - | grep ____ilog2_NaN markus@x4 linux % gcc-6 -O2 -c timekeeping.i -S -o - | grep ____ilog2_NaN markus@x4 linux % gcc-trunk -O2 -c timekeeping.i -S -o - | grep ____ilog2_NaN jmp ____ilog2_NaN
Confirmed.
tree-ssa-threadedge.c has some BUILT_IN_CONSTANT_P code already, but clearly it isn't enough. "Similarly for __builtin_constant_p: r = PHI <1(2), 2(3)> __builtin_constant_p (r) Both PHI arguments are constant, but x ? 1 : 2 is still not constant." is what it does right now, but we have instead: # iftmp.0_2 = PHI <a.1_5(2), 1(3)> b = iftmp.0_2; _1 = __builtin_constant_p (iftmp.0_2); where thread1 turns it into: # iftmp.0_8 = PHI <1(2)> b = iftmp.0_8; _10 = __builtin_constant_p (iftmp.0_8); ... # iftmp.0_2 = PHI <a.1_5(2)> b = iftmp.0_2; _1 = __builtin_constant_p (iftmp.0_2); This is undesirable, iftmp.0_2 really isn't constant, so we shouldn't turn it into sometimes constant, sometimes non-constant.
If someone has a .i file for this issue, it'd be awful handy.
(In reply to Jeffrey A. Law from comment #3) > If someone has a .i file for this issue, it'd be awful handy. Well, see comment0: markus@x4 linux % cat timekeeping.i int a, b; extern int ____ilog2_NaN(void); void by(void) { int c = 1; b = a ?: c; __builtin_constant_p(b) ? b ? ____ilog2_NaN() : 0 : 0; }
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-October/461597.html
So this is a very interesting little issue that highlights a deficiency in the semantics of __builtin_constant_p. So prior to threading we have: ;; basic block 4, loop depth 0, count 0, freq 10000, maybe hot ;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) ;; pred: 2 [50.0%] (TRUE_VALUE,EXECUTABLE) ;; 3 [100.0%] (FALLTHRU,EXECUTABLE) # iftmp.0_2 = PHI <a.1_5(2), 1(3)> b = iftmp.0_2; _1 = __builtin_constant_p (iftmp.0_2); if (_1 != 0) goto <bb 5>; else goto <bb 7>; ;; succ: 5 [54.0%] (TRUE_VALUE,EXECUTABLE) ;; 7 [46.0%] (FALSE_VALUE,EXECUTABLE) ;; basic block 5, loop depth 0, count 0, freq 5400, maybe hot ;; prev block 4, next block 6, flags: (NEW, REACHABLE, VISITED) ;; pred: 4 [54.0%] (TRUE_VALUE,EXECUTABLE) if (iftmp.0_2 != 0) goto <bb 6>; else goto <bb 7>; ;; succ: 6 [36.6%] (TRUE_VALUE,EXECUTABLE) ;; 7 [63.4%] (FALSE_VALUE,EXECUTABLE) ;; basic block 6, loop depth 0, count 0, freq 1979, maybe hot ;; prev block 5, next block 7, flags: (NEW, REACHABLE, VISITED) ;; pred: 5 [36.6%] (TRUE_VALUE,EXECUTABLE) ____ilog2_NaN (); That's all fine and good. Threading wants to lookup the value of iftmp.0_2 in an attempt to eliminate the conditional in BB5. And it finds that the path 3->4->5 will result in iftmp.0_2 having the value 1. So it isolates that path. After isolation things look like this: ;; basic block 3, loop depth 0 ;; pred: 2 # iftmp.0_8 = PHI <1(2)> b = iftmp.0_8; _10 = __builtin_constant_p (iftmp.0_8); if (_10 != 0) goto <bb 6>; else goto <bb 7>; ;; succ: 6 ;; 7 ;; basic block 4, loop depth 0 ;; pred: 2 # iftmp.0_2 = PHI <a.1_5(2)> b = iftmp.0_2; _1 = __builtin_constant_p (iftmp.0_2); if (_1 != 0) goto <bb 5>; else goto <bb 7>; ;; succ: 5 ;; 7 ;; basic block 5, loop depth 0 ;; pred: 4 if (iftmp.0_2 != 0) goto <bb 6>; else goto <bb 7>; ;; succ: 6 ;; 7 ;; basic block 6, loop depth 0 ;; pred: 5 ;; 3 ____ilog2_NaN (); ;; succ: 7 Note how the path leading out of block 3 will never pass through the conditional in block 5. That's the primary effect of jump threading. The most "obvious" interpretation is that "b" is not constant in this case at the source level and thus __b_c_p must return zero. But how does one deal with cases where analysis such as VRP, CPROP, etc prove certain edges are not executable and thus certain paths to a b_c_p call are eliminated and the "obvious" non-constant b_c_p argument turns into a constant argument. This could happen virtually with any optimization that eliminates an edge in the CFG. Proving edge removal does not change the runtime result of a b_c_p call is almost certainly akin to the halting problem. Similarly such an interpretation would essentially mean that a block with a b_c_p call can only be duplicated if we can prove that the original and duplicate have the same runtime result as the original -- which we might be able to prove in some cases, but the context necessary to do so is likely not available in the places where we have to make that determination -- essentially we'll likely have to disallow duplication of blocks with b_c_p. I want folks to realize that the "obvious" interpretation may have very significant secondary effects and may ultimately result in cases where we must either totally cripple optimizers or declare that the "obvious" semantics simply can't be supported.
(In reply to Jakub Jelinek from comment #2) > This is undesirable, iftmp.0_2 really isn't constant, so we shouldn't turn > it into sometimes constant, sometimes non-constant. I am not sure about that. The way I use __builtin_constant_p for optimization purposes, I'd be happy to see the kind of threading that is going on here. The discussion on the linux ML also doesn't clearly conclude that the code absolutely has to be written this way. Maybe we could get 2 separate builtins, one for optimization, and one for checking, since the desired behavior is not the same?
I've spent a goodly part of the morning pondering this BZ. While I think the semantics of b_c_p are under/ill defined and they will continue to cause problems, the as-if rule requires us to not optimize this case until such time as the semantics are changed. If look at the observable effects, "b" is not a constant in the source and b_c_p will always evaluate to zero and we can never call ilog_NaN. After path isolation, if "a" is zero then we call ilog_NaN. This violates the as-if rule. So the existence of a b_c_p call (and likely a b_o_s call) in a path means that path must not be isolated. More generally a block with a b_c_p call must not be duplicated. That is sufficient to fix this problem. There's a secondary concern that removing edges which potentially lead to a b_c_p call can cause similar problems, but I'm inclined to punt that for now. -- In response to c#7, I'm sympathetic to the behavior you want Marc. But I can't justify it under the current semantics. -- I'd love someone to step up and suggest semantics that would allow this kind of path isolation, but I think that's going to be difficult and the result would be highly non-obvious to folks trying to use b_c_p.
(In reply to Jeffrey A. Law from comment #8) > I've spent a goodly part of the morning pondering this BZ. While I think > the semantics of b_c_p are under/ill defined and they will continue to cause > problems, the as-if rule requires us to not optimize this case until such > time as the semantics are changed. Note that we can similarly get "spurious" fortify warnings from glibc headers. On the other hand, those seem comparable to me to maybe_uninitialized warnings, you have to accept some false positives from code that is dead but not in a way the compiler can notice. > If look at the observable effects, "b" is not a constant in the source and > b_c_p will always evaluate to zero and we can never call ilog_NaN. After > path isolation, if "a" is zero then we call ilog_NaN. This violates the > as-if rule. Uh, that's a very strict interpretation. In particular, a function parameter can never satisfy it, whereas many users of bcp rely on inlining turning parameters into constants. > So the existence of a b_c_p call (and likely a b_o_s call) in a path means > that path must not be isolated. More generally a block with a b_c_p call > must not be duplicated. That is sufficient to fix this problem. There's a > secondary concern that removing edges which potentially lead to a b_c_p call > can cause similar problems, but I'm inclined to punt that for now. No duplication, so functions cannot be inlined or cloned until all the bcp calls they contain have been folded. That's going to cause regressions. And it actually breaks the fortify functions. I would also assume that bos is more useful if you can isolate paths where you can give a better estimate for it. > I'd love someone to step up and suggest semantics that would allow this kind > of path isolation, but I think that's going to be difficult and the result > would be highly non-obvious to folks trying to use b_c_p. I think it is the reverse, actually. Looking at random uses of bcp in the headers on my system, most uses are for optimization, with a safe fallback, and are fine with any kind of optimization that duplicates them. The intuition is that any time the compiler can prove it is handling a constant (be it through inlining, threading, whatever), it may use the special code. This is fairly easy for users to grasp. Checking uses (as in the linux testcase, or fortify glibc headers) are the odd ones out, which require a bcp with unclear semantics.
Yes, it's a fairly strict interpretation, but I think it's the right one to be using in the absence of additional language around the semantics of b_c_p. In particular refer to c#6 in BZ38789. Essentially b_c_p or b_o_s is more like a PHI than a function call. All paths to the b_c_p/b_o_s site participate in the result. Thus isolating a path where there's a b_c_p/b_o_s is an invalid transformation unless we prove the original and isolated paths produce the same results for the b_c_p/b_o_s call. But there's no real infrastructure in place to do that. Even something like block duplication for loop header copying or unrolling has the potential to make a b_c_p/b_o_s call give different results between the two paths. So ISTM the only correct thing to do is disable duplication of blocks with those builtins.
(In reply to Marc Glisse from comment #9) > Uh, that's a very strict interpretation. In particular, a function parameter > can never satisfy it, whereas many users of bcp rely on inlining turning > parameters into constants. Yeah, it would be certainly very bad if b_c_p stopped returning true in such cases. I believe in the past we had to tweak a couple of optimizations to be more careful about b_c_p (before it is finally folded), e.g. PR49642.
I believe the testcase is simply abusing b_c_p. Can you elaborate on the "as-if" rule here? Note we drop all b_c_p to true/false in the first VRP pass (after inlining), so a less agressive fix would be to simply make sure this happens before any path isolation is done (of course we now have early threading...) The BB duplication CFG hook is certainly too aggressive for this "fix" as it doesn't affect only path isolation.
Another possibility, at least for handling ilog2(), could be to provide __builtin_ilog2(unsigned long x) as an alternative. Note that the kernel ilog2() has the property that the result is undefined if x==0 (and will jump to ____ilog2_NaN() if x is constant 0).
(In reply to dhowells@redhat.com from comment #13) > Another possibility, at least for handling ilog2(), could be to provide > __builtin_ilog2(unsigned long x) as an alternative. > > Note that the kernel ilog2() has the property that the result is undefined > if x==0 (and will jump to ____ilog2_NaN() if x is constant 0). Ugh, no. Why not just x && (x & -x) == x ? __builtin_ctz (x) : -1 (or ctzl or ctzll depending on what the type is)?
(In reply to Jakub Jelinek from comment #14) > (In reply to dhowells@redhat.com from comment #13) > ... > Ugh, no. Why not just x && (x & -x) == x ? __builtin_ctz (x) : -1 > (or ctzl or ctzll depending on what the type is)? Comparing the kernel's ilog2() to your suggestion: int kernel_ilog2(unsigned long x) { return ilog2(x); } int jakub_ilog2(unsigned long x) { return x && (x & -x) == x ? __builtin_ctz (x) : -1; } compiled with -Os for x86_64 gives: 0000000000000000 <kernel_ilog2>: 0: 83 c8 ff or $0xffffffff,%eax 3: 48 0f bd c7 bsr %rdi,%rax 7: c3 retq 0000000000000008 <jakub_ilog2>: 8: 83 c8 ff or $0xffffffff,%eax b: 48 85 ff test %rdi,%rdi e: 74 17 je 27 <jakub_ilog2+0x1f> 10: 48 89 fa mov %rdi,%rdx 13: 0f bc c7 bsf %edi,%eax 16: 48 f7 da neg %rdx 19: 48 21 fa and %rdi,%rdx 1c: 48 39 d7 cmp %rdx,%rdi 1f: ba ff ff ff ff mov $0xffffffff,%edx 24: 0f 45 c2 cmovne %edx,%eax 27: c3 retq Note that in the kernel variant, the initial OR instruction is superfluous, but is required by fls() and fls64() which x86_64 is using behind the scenes. Not all arches, however, use fls() and fls64() to implement ilog2().
I guess the following could be used: int clz_ilog2(unsigned long x) { return __builtin_clz(x); } which compiles to: 0000000000000027 <clz_ilog2>: 27: 0f bd c7 bsr %edi,%eax 2a: 83 f0 1f xor $0x1f,%eax 2d: c3 retq though the XOR is superfluous - for any valid input to ilog2(), I think the output is always in the range 0-31.
(In reply to dhowells@redhat.com from comment #16) > ... > 0000000000000027 <clz_ilog2>: > 27: 0f bd c7 bsr %edi,%eax > 2a: 83 f0 1f xor $0x1f,%eax > 2d: c3 retq > > though the XOR is superfluous - for any valid input to ilog2(), I think the > output is always in the range 0-31. Ah - it's not actually an AND, so the XOR isn't necessarily superfluous.
*** Bug 78653 has been marked as a duplicate of this bug. ***
Lets close this one as invalid. It looks like a kernel bug after all.
*** Bug 78879 has been marked as a duplicate of this bug. ***
(In reply to Markus Trippelsdorf from comment #20) > *** Bug 78879 has been marked as a duplicate of this bug. *** Kernel bug or not, it should be noted that this means that you cannot use gcc from r236831 to compile any kernel from the introduction and use of ilog2() to the current day - and these kernel versions cannot be retroactively fixed.
(In reply to dhowells@redhat.com from comment #21) > (In reply to Markus Trippelsdorf from comment #20) > > *** Bug 78879 has been marked as a duplicate of this bug. *** > > Kernel bug or not, it should be noted that this means that you cannot use > gcc from r236831 to compile any kernel from the introduction and use of > ilog2() to the current day - and these kernel versions cannot be > retroactively fixed. No. I build allmodconfig kernels regularly with gcc trunk and it works fine.
(In reply to Markus Trippelsdorf from comment #22) > (In reply to dhowells@redhat.com from comment #21) > > (In reply to Markus Trippelsdorf from comment #20) > > > *** Bug 78879 has been marked as a duplicate of this bug. *** > > > > Kernel bug or not, it should be noted that this means that you cannot use > > gcc from r236831 to compile any kernel from the introduction and use of > > ilog2() to the current day - and these kernel versions cannot be > > retroactively fixed. > > No. I build allmodconfig kernels regularly with gcc trunk and it works fine. I still hit this when building an allmodconfig arm64 kernel with trunk
I still see this when building an allmodconfig 4.9 Linux kernel for aarch64 (arm64 in kernel-speak). Just do: <Set CROSS_COMPILE to the aarch64 toolchain> make ARCH=arm64 allmodconfig make ARCH=arm64 -j<n> Does anyone know of a kernel patch in flight to fix this in the kernel? If not, should GCC work around this somehow? (perhaps by making jump-threading less aggressive when __builtin_constant_p is involved?)
see the following thread: http://lists.infradead.org/pipermail/linux-arm-kernel/2016-October/462224.html
(In reply to dhowells@redhat.com from comment #21) > (In reply to Markus Trippelsdorf from comment #20) > > *** Bug 78879 has been marked as a duplicate of this bug. *** > > Kernel bug or not, it should be noted that this means that you cannot use > gcc from r236831 to compile any kernel from the introduction and use of > ilog2() to the current day - and these kernel versions cannot be > retroactively fixed. This is really the crux of the issue. Even if we were to agree that this is a kernel bug, and even if that kernel bug has a fix, there is a lot of kernel code out there that won't ever carry that patch. As far as I can see the patch from Will Dacon you've linked above hasn't made it to a mainline tree, so the kernel bug continues to propagate around mainline and stable versions. If this were a clear case of GCC being right, I'd agree with the bug being closed, but GCC's documentation of __builtin_constant_p doesn't make clear just how unintuitive __builtin_constant_p semantics are. At best this is a grey area in need of documentation clarification, at worst this is GCC being "weird" for the sake of it. Pragmatically, the question is whether we think path specialization which turns a previously FALSE __b_c_p call TRUE is more likely to confuse our users than improve the code generation. Personally, I think it is confusing, but I appreciate we've not all agreed on that position.
(In reply to James Greenhalgh from comment #26) > (In reply to dhowells@redhat.com from comment #21) > > (In reply to Markus Trippelsdorf from comment #20) > > > *** Bug 78879 has been marked as a duplicate of this bug. *** > > > > Kernel bug or not, it should be noted that this means that you cannot use > > gcc from r236831 to compile any kernel from the introduction and use of > > ilog2() to the current day - and these kernel versions cannot be > > retroactively fixed. > > This is really the crux of the issue. Even if we were to agree that this is > a kernel bug, and even if that kernel bug has a fix, there is a lot of > kernel code out there that won't ever carry that patch. > > As far as I can see the patch from Will Dacon you've linked above hasn't > made it to a mainline tree, so the kernel bug continues to propagate around > mainline and stable versions. > > If this were a clear case of GCC being right, I'd agree with the bug being > closed, but GCC's documentation of __builtin_constant_p doesn't make clear > just how unintuitive __builtin_constant_p semantics are. > > At best this is a grey area in need of documentation clarification, at worst > this is GCC being "weird" for the sake of it. > > Pragmatically, the question is whether we think path specialization which > turns a previously FALSE __b_c_p call TRUE is more likely to confuse our > users than improve the code generation. Personally, I think it is confusing, > but I appreciate we've not all agreed on that position. As far as I can tell the kernel is the only project where this issue ever popped up. The fix is straightforward. It just needs to be send to the correct kernel maintainer. Why do you think this case is any different from any other buggy application code that needs adjustments as gcc improves?
> As far as I can tell the kernel is the only project where this issue ever > popped up. The fix is straightforward. It just needs to be send to the > correct kernel maintainer. Right, but getting the patch in mainline is the easy bit! This bug hits many released kernels. > Why do you think this case is any different from any other buggy application > code that needs adjustments as gcc improves? In most cases we improve GCC to exploit well defined behaviors of the standard. In this case we created defined __builtin_constant_p with insufficient documentation to allow a user to reasonably predict the surprising behavior shown in this testcase. GCC has created a path which will never be executed and used that to introduce a constant which does not exist in the source. Unless you know what jump-threading can do, this transformation isn't obvious.
*** Bug 79778 has been marked as a duplicate of this bug. ***
For references, Linus himself fixed the issue: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=474c90156c8dcc2fa815e6716cc9394d7930cb9c