Bug 45267 - [4.5 regression] inlining fails with -m32
Summary: [4.5 regression] inlining fails with -m32
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.5.1
: P2 normal
Target Milestone: 4.5.3
Assignee: Jason Merrill
URL:
Keywords: rejects-valid
Depends on:
Blocks:
 
Reported: 2010-08-12 16:15 UTC by Matthias Kretz (Vir)
Modified: 2011-04-20 06:48 UTC (History)
5 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu
Known to work: 4.4.4, 4.6.0
Known to fail: 4.5.0
Last reconfirmed: 2010-08-12 19:09:41


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Kretz (Vir) 2010-08-12 16:15:41 UTC
My code, which previously compiled with GCC 4.3 and 4.4 with -m32 as well as for 64 bit, fails to compile with GCC 4.5.[01] when compiling for 32 bit (-m32).

I tried to reduce the problem to a minimal testcase and arrived at this:

typedef int __v4si __attribute__ ((__vector_size__ (16)));
typedef long long __v2di __attribute__ ((__vector_size__ (16)));
typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_set_epi32 (int __q3, int __q2, int __q1, int __q0)
{
  return __extension__ (__m128i)(__v4si){ __q0, __q1, __q2, __q3 };
}
extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_set1_epi32 (int __A)
{
  return _mm_set_epi32 (__A, __A, __A, __A);
}
extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpeq_epi32 (__m128i __A, __m128i __B)
{
  return (__m128i)__builtin_ia32_pcmpeqd128 ((__v4si)__A, (__v4si)__B);
}
extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_slli_epi32 (__m128i __A, int __B)
{
  return (__m128i)__builtin_ia32_pslldi128 ((__v4si)__A, __B);
}
extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_testc_si128 (__m128i __M, __m128i __V)
{
  return __builtin_ia32_ptestc128 ((__v2di)__M, (__v2di)__V);
}

template<typename T>
class Vector
{
    public:
        inline Vector(__m128i x) : d(x) {}
        inline Vector(T a) : d(_mm_set1_epi32(a)) {}

        inline Vector<T> operator<<(int x) const __attribute__((always_inline));

        inline bool operator==(const Vector<T> &x) const
        {
            return !_mm_testc_si128(_mm_cmpeq_epi32(d, x.d), _mm_set1_epi32(0xffffffffu));
        }
    private:
        __m128i d;
};

template<> inline Vector<int> Vector<int>::operator<<(int x) const
{
    return _mm_slli_epi32(d, x);
}

template<typename T1, typename M> inline void foo(const T1 &, const M &) {}
class Fail {};

int main()
{
    Vector<int> a(1);
    if ((a << 2) == (a << 2)) {
        foo(a << 2, (a << 2) == (a << 2));
        throw Fail();
    }
    return 0;
}

g++ -m32 -O3 -Wall -march=core2 -msse4 -ansi -o arithmetics arithmetics.cpp
arithmetics.cpp: In function &#8216;int main()&#8217;:
arithmetics.cpp:47:31: sorry, unimplemented: inlining failed in call to &#8216;Vector<T> Vector<T>::operator<<(int) const [with T = int]&#8217;: call is unlikely and code size would grow
arithmetics.cpp:59:40: sorry, unimplemented: called from here
arithmetics.cpp:47:31: sorry, unimplemented: inlining failed in call to &#8216;Vector<T> Vector<T>::operator<<(int) const [with T = int]&#8217;: call is unlikely and code size would grow
arithmetics.cpp:59:40: sorry, unimplemented: called from here
arithmetics.cpp:47:31: sorry, unimplemented: inlining failed in call to &#8216;Vector<T> Vector<T>::operator<<(int) const [with T = int]&#8217;: call is unlikely and code size would grow
arithmetics.cpp:59:41: sorry, unimplemented: called from here
Comment 1 H.J. Lu 2010-08-12 19:09:41 UTC
It is fixed by revision 158732:

http://gcc.gnu.org/ml/gcc-cvs/2010-04/msg00839.html

on trunk.
Comment 2 H.J. Lu 2010-08-12 20:16:46 UTC
It was triggered by revision 151511:

http://gcc.gnu.org/ml/gcc-cvs/2009-09/msg00257.html
Comment 3 Steven Bosscher 2010-08-12 21:07:01 UTC
Re. comment #1: Looks more like that commit made this problem latent on the trunk.
Comment 4 Matthias Kretz (Vir) 2010-08-12 21:17:33 UTC
Actually the function where inlining fails here consists of one single instruction in the end. Possibly the code in trunk is smarter at recognizing this.

Anyway thanks for the pointer to the "guilty" commit. I will add --param early-inlining-insns=12 to my GCC options whenever GCC 4.5.[01] are used.
Comment 5 Steven Bosscher 2010-08-12 21:30:48 UTC
I don't think this is a heuristics bug at all. You're not getting an error or a warning about a missed optimization (inlining).

You get a sorry, and that only happens if GCC tries but fails to inline an always_inline function. But the reason for the sorry is that the "call is unlikely and code size would grow". That should never be checked for always_inline functions.

The always_inline attribute is basically an attribute to shut off the inlining heuristics. A "do as I say"-attribute. The sorry means that GCC is not doing what you tell it to do. The heuristics changes in GCC 4.6 just paper over this problem.

If I'm right, anyway, that this is an always_inline problem.

Comment 6 Richard Biener 2010-08-12 21:35:17 UTC
Yes, something sets the inline failed reason but shouldn't.
Comment 7 Richard Biener 2010-08-12 21:49:26 UTC
Slightly reduced testcase, fails at -O1 -msse2:

typedef int __v4si __attribute__ ((__vector_size__ (16)));
typedef long long __m128i __attribute__ ((__vector_size__ (16)));
__inline __m128i __attribute__((__always_inline__))
_mm_set1_epi32 (int __A) { }
extern __m128i foobar(__m128i, int);
__inline __m128i __attribute__((__always_inline__)) 
_mm_slli_epi32 (__m128i __A, int __B) { return foobar (__A, __B); }
template<typename T> class Vector {
public:
    inline Vector(__m128i x) : d(x) { }
    inline Vector(T a) : d(_mm_set1_epi32(a)) { }
    inline Vector<T> operator<<(int x) const __attribute__((always_inline));
    inline bool operator==(const Vector<T> &x) const { }
    __m128i d;
};
template<> inline Vector<int> Vector<int>::operator<<(int x) const {
    return _mm_slli_epi32(d, x);
}
template<typename T1> inline void foo(const T1 &) { }
class Fail { };
int main() {
    Vector<int> a(1);
    if ((a << 2) == (a << 2)) {
        foo(a << 2);
        throw Fail();
    }
}
Comment 8 Matthias Kretz (Vir) 2010-08-12 22:08:37 UTC
Right, but then I probably have two bugs to report:
1. always_inline is not honored (heuristics are applied even though they shouldn't)
2. inlining of small functions does not work reliably (i.e. a function declared with "inline" and containing just one ore two instructions is discarded for inlining by GCC). I think I've seen this on earlier compiler versions too - which is why I added the always_inline attributes in the first place. I've never been able to reduce a testcase, though.

I believe this testcase shows both problems, no?
Comment 9 Richard Biener 2010-08-12 22:16:07 UTC
116		{
1117	          if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
1118					            &edge->inline_failed))
1119		    {
1120		      edge->inline_failed = not_good;
1121		      if (dump_file)
1122			fprintf (dump_file, " inline_failed:%s.\n",
1123				 cgraph_inline_failed_string (edge->inline_failed));
1124		    }
1125		  continue;
(gdb) p not_good
$13 = CIF_UNLIKELY_CALL
(gdb) p edge->callee->local.disregard_inline_limits 
$16 = 0
(gdb) p edge->callee->decl->function_decl.disregard_inline_limits 
$17 = 0
(gdb) call debug_tree ( edge->callee->decl->decl_common.attributes)
 <tree_list 0xb779da68
    purpose <identifier_node 0xb77a4784 always_inline
    bindings <(nil)>
    local bindings <(nil)>>>

the flag is cleared from the template instantiation functin decl here:

  if (TREE_CODE (newdecl) == FUNCTION_DECL)
    {
      int function_size;

      function_size = sizeof (struct tree_decl_common);

      memcpy ((char *) olddecl + sizeof (struct tree_common),
              (char *) newdecl + sizeof (struct tree_common),
              function_size - sizeof (struct tree_common));

      memcpy ((char *) olddecl + sizeof (struct tree_decl_common),
              (char *) newdecl + sizeof (struct tree_decl_common),
              sizeof (struct tree_function_decl) - sizeof (struct tree_decl_common));

overwritten from a variant built via

#0  0x0811eb3e in grokfndecl (ctype=0xb77a64e0, type=0xb77a67e0, 
    declarator=0xb76f0888, parms=0xb77aa230, orig_declarator=0xb76f0888, 
    virtualp=0, flags=NO_SPECIAL, quals=1, raises=0x0, check=1, friendp=0, 
    publicp=1, inlinep=1, sfk=sfk_none, funcdef_flag=1 '\001', 
    template_count=1, in_namespace=0x0, attrlist=0xbffff088, location=2025)
    at /home/richard/src/gcc-4_5-branch/gcc/cp/decl.c:6655
#1  0x0812c65c in grokdeclarator (declarator=0x90a7814, declspecs=0xbffff19c, 
    decl_context=NORMAL, initialized=1, attrlist=0xbffff088)
    at /home/richard/src/gcc-4_5-branch/gcc/cp/decl.c:9586
#2  0x081466c4 in start_function (declspecs=0xbffff19c, declarator=0x90a78b8, 
    attrs=0x0) at /home/richard/src/gcc-4_5-branch/gcc/cp/decl.c:12096
#3  0x0826a771 in cp_parser_function_definition_from_specifiers_and_declarator
    (parser=0xb77a4854, decl_specifiers=0xbffff19c, attributes=0x0, 
    declarator=0x90a78b8)
    at /home/richard/src/gcc-4_5-branch/gcc/cp/parser.c:18548

we can again check the attribute lists all the time, but really
DECL_DISREGARD_INLINE_LIMITS should be kept valid here.

We already merge that in various places in duplicate_decls, but appearantly
not here.
Comment 10 Richard Biener 2010-08-30 15:57:04 UTC
Wonder what fixed it on the trunk.
Comment 11 Richard Biener 2010-12-16 13:03:40 UTC
GCC 4.5.2 is being released, adjusting target milestone.
Comment 12 Richard Biener 2011-04-18 14:08:04 UTC
C++ people - you should know this area better than me.  See comment #9.
Comment 13 Jason Merrill 2011-04-19 00:31:21 UTC
The language issue here is that 14.7.3/12 says,

An explicit specialization of a function template is inline only if it is declared with the inline specifier or defined as deleted, and independently of whether its function template is inline.

So in this case G++ is not propagating inline-related flags from the in-class declaration to the explicit specialization.

It's not clear to me that the above passage should apply to this case, since the declaration in the class template is not itself a function template.  But it probably should if there is a definition in the class body, so perhaps the most straightforward thing is to have it apply in this case as well.

Since we aren't propagating inline flags, I suppose we should prune the always_inline attribute from DECL_ATTRIBUTES as well.
Comment 14 Jason Merrill 2011-04-19 23:48:18 UTC
OTOH, since the specialization is also declared inline we could decide to set DECL_DISREGARD_INLINE_LIMITS
Comment 15 Jason Merrill 2011-04-20 00:06:22 UTC
Author: jason
Date: Wed Apr 20 00:06:19 2011
New Revision: 172745

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172745
Log:
	PR c++/45267
	* decl.c (duplicate_decls): Keep always_inline attribute
	in sync with DECL_DISREGARD_INLINE_LIMITS.

Added:
    branches/gcc-4_5-branch/gcc/testsuite/g++.dg/ext/attrib41.C
Modified:
    branches/gcc-4_5-branch/gcc/cp/ChangeLog
    branches/gcc-4_5-branch/gcc/cp/decl.c
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
Comment 16 Jason Merrill 2011-04-20 00:06:22 UTC
Author: jason
Date: Wed Apr 20 00:06:19 2011
New Revision: 172744

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172744
Log:
	PR c++/45267
	* decl.c (duplicate_decls): Keep always_inline attribute
	in sync with DECL_DISREGARD_INLINE_LIMITS.

Added:
    trunk/gcc/testsuite/g++.dg/ext/attrib41.C
Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/decl.c
    trunk/gcc/testsuite/ChangeLog
Comment 17 Jason Merrill 2011-04-20 00:25:26 UTC
Fixed.
Comment 18 Jakub Jelinek 2011-04-20 06:42:19 UTC
Is it intentional that the change hasn't been committed to 4.6, just 4.5 and 4.7?
Comment 19 Jason Merrill 2011-04-20 06:48:18 UTC
(In reply to comment #18)
> Is it intentional that the change hasn't been committed to 4.6, just 4.5 and
> 4.7?

Yes.  In 4.5 it fixes an ICE; in 4.6 and 4.7 it is a minor correctness fix that doesn't seem important to apply to the release branch since we inline the function anyway.