I don't quite know how to characterize this one, so i'll let it up to those in the know to fix the summary/description. Primo, AFAIK, gcc has always struggled to get this right but 4.1 is setting a new record; so i'll qualify that as a regression vs 3.3/3.4. Secundo, i haven't found any related bugreports but one about PPC. So here we go. Consider this testcase: #include <algorithm> template<class T> static inline T max(const T a, const T b) { return a<b ? b : a; } template<class T> static inline T min(const T a, const T b) { return a<b ? a : b; } template<class T> static inline const T &ref_max(const T &a, const T &b) { return a<b ? b : a; } template<class T> static inline const T &ref_min(const T &a, const T &b) { return a<b ? a : b; } template<class T> struct foo_t { T a0, a1; T bar(const T b, const T c) { return max(min(a0, c), min(max(a1, c), b)); } T bar_ref(const T b, const T c) { return ref_max(ref_min(a0, c), ref_min(ref_max(a1, c), b)); } T bar_stl(const T b, const T c) { return std::max(std::min(a0, c), std::min(std::max(a1, c), b)); } }; template struct foo_t<int>; template struct foo_t<float>; int main() { return 0; } With g++-4120050501 -O3 i get such creative & entertaining code as: 0000000000400610 <foo_t<float>::bar_ref(float, float)>: 400610: ucomiss 0x4(%rdi),%xmm1 400614: lea 0x4(%rdi),%rax 400618: lea 0xfffffffffffffff8(%rsp),%rdx 40061d: movss %xmm0,0xfffffffffffffffc(%rsp) 400623: movss %xmm1,0xfffffffffffffff8(%rsp) 400629: movaps %xmm1,%xmm2 40062c: cmova %rdx,%rax 400630: movss (%rax),%xmm1 400634: ucomiss %xmm1,%xmm0 400637: ja 400641 <foo_t<float>::bar_ref(float, float)+0x31> 400639: lea 0xfffffffffffffffc(%rsp),%rax 40063e: movaps %xmm0,%xmm1 400641: ucomiss (%rdi),%xmm2 400644: cmova %rdi,%rdx 400648: movss (%rdx),%xmm0 40064c: ucomiss %xmm0,%xmm1 40064f: jbe 400655 <foo_t<float>::bar_ref(float, float)+0x45> 400651: movss (%rax),%xmm0 400655: repz retq Compare that to what g++-3.4.4-20050314 gives (g++ 3.3.6 is similar): 0000000000400610 <foo_t<float>::bar_ref(float, float)>: 400610: ucomiss (%rdi),%xmm1 400613: lea 0xfffffffffffffffc(%rsp),%rsi 400618: mov %rdi,%rcx 40061b: lea 0x4(%rdi),%rdx 40061f: movss %xmm0,0xfffffffffffffff8(%rsp) 400625: lea 0xfffffffffffffff8(%rsp),%rax 40062a: movss %xmm1,0xfffffffffffffffc(%rsp) 400630: cmovbe %rsi,%rcx 400634: ucomiss 0x4(%rdi),%xmm1 400638: cmova %rsi,%rdx 40063c: ucomiss (%rdx),%xmm0 40063f: cmova %rdx,%rax 400643: movss (%rax),%xmm0 400647: ucomiss (%rcx),%xmm0 40064a: cmovbe %rcx,%rax 40064e: movss (%rax),%xmm0 400652: retq Certainly not optimal, and in fact quite ugly, but at least there's no branch. Happens as soon as references are used and enough min/max are piled up, and that means that unless you use your own min/max instead of the STL version, you're doomed.
I forgot to say that using ternary operators or if/else while changing the codegen slightly doesn't make much difference.
*** This bug has been marked as a duplicate of 21462 ***
Actually I take that back.
Confirmed, the problem here is slightly different from 21462 but closely related.
Additional note, using -fno-gcse slightly reduce the cruft (this one is my new pet peeve :).
On the mainline we get: _ZN5foo_tIfE7bar_refEff: .LFB661: ucomiss 4(%rdi), %xmm1 leaq -8(%rsp), %rdx leaq 4(%rdi), %rax movss %xmm0, -4(%rsp) movss %xmm1, -8(%rsp) cmova %rdx, %rax ucomiss (%rdi), %xmm1 movss (%rax), %xmm0 minss -4(%rsp), %xmm0 cmova %rdi, %rdx movss (%rdx), %xmm1 maxss %xmm1, %xmm0 ret But on the tree level, there are a couple of issues. The first is that, we have a_1 = &b_2->c; temp_3 = *a_1; ... use a_1 Which really should be converted to temp_3 = b_2->c; Which will help.
The problem here is actually not I would I had orginally thought but instead the following issue: struct f { int i; }; int g(int i, int c, struct f *ff) { int *t; if (c) t = &i; else t = &ff->i; return *t; } We don't change the *t into the i and ff->i which causes the min/max not be reconized. This is related to PR 19431 which is a similar issue.
The patch I have also fixes this one.
Subject: Bug 21463 Author: rguenth Date: Wed Apr 18 12:45:09 2007 New Revision: 123946 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123946 Log: 2007-04-18 Richard Guenther <rguenther@suse.de> PR tree-optimization/19431 PR tree-optimization/21463 * tree-pass.h (pass_phiprop): Declare. * passes.c (init_optimization_passes): New phiprop pass. * tree-ssa-forwprop.c (struct phiprop_d): New structure. (phivn_valid_p): New helper function. (phiprop_insert_phi): Likewise. (propagate_with_phi): Likewise. (tree_ssa_phiprop): New propagator propagating loads through phi nodes if profitable. * gcc.c-torture/execute/20070212-1.c: New testcase. * gcc.c-torture/execute/20070212-2.c: Likewise. * gcc.c-torture/execute/20070212-3.c: Likewise. * gcc.dg/tree-ssa/pr19431.c: Likewise. * gcc.dg/tree-ssa/pr21463.c: Likewise. * g++.dg/tree-ssa/pr21463.C: Likewise. * g++.dg/tree-ssa/pr30738.C: Likewise. Added: trunk/gcc/testsuite/g++.dg/tree-ssa/pr21463.C trunk/gcc/testsuite/g++.dg/tree-ssa/pr30738.C trunk/gcc/testsuite/gcc.c-torture/execute/20070212-1.c trunk/gcc/testsuite/gcc.c-torture/execute/20070212-2.c trunk/gcc/testsuite/gcc.c-torture/execute/20070212-3.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr19431.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c Modified: trunk/gcc/ChangeLog trunk/gcc/passes.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-pass.h trunk/gcc/tree-ssa-forwprop.c
Fixed.