This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Spurious optimization failures - unnecessary stack frame management


Hi,

[please CC me, I am not subscribed to this list]

I am writing a C++ expression template wrapper library for FLINT [0]. I am finding that across gcc versions, and with no apparent pattern, the optimizer sometimes fails to properly eliminate stack frame management. Is this a known problem? What parameter values should one increase to have the optimizer do this more aggressively?

I am working on x86-64, if that is relevant.

Please excuse my being so vague, unfortunately I do not know much about optimizer internals. Let me show you an example. Consider the function

void
test_fmpzxx_asymadd_1 (fmpzxx& out, const fmpzxx& a,
        const fmpzxx& b, const fmpzxx& c, const fmpzxx& d)
{
    out = (a + (((b + (c + (a + b))) + c) + d));
}

The type fmpzxx has a single data member, which is a "long". One may obtain a pointer to this data member using the _fmpz() method. Using some expression template magic [1], the above line is turned into function calls to a C library, essentially equivalent to the following:

void
test_fmpzxx_asymadd_2 (fmpzxx& out, const fmpzxx& a,
        const fmpzxx& b, const fmpzxx& c, const fmpzxx& d)
{
    fmpz_t tmp;
    fmpz_init (tmp);

    fmpz_add (tmp, a._fmpz (), b._fmpz ());
    fmpz_add (tmp, c._fmpz (), tmp);
    fmpz_add (tmp, b._fmpz (), tmp);
    fmpz_add (tmp, tmp, c._fmpz ());
    fmpz_add (tmp, tmp, d._fmpz ());
    fmpz_add (out._fmpz(), a._fmpz (), tmp);

    fmpz_clear (tmp);
}

However, to attain this, the optimizer has to eliminate many temporaries, inline calls, track pointers etc. It seems to me that, for no apparent reason, this goes wrong sometimes. For example, in g++-4.6.4 or g++-4.8.1, both of the above functions yield essentially equal machine code, with a stack frame size of about 56 bytes. On the other hand, g++-4.7.3 produces the attached code [NB: this is compiled without exception suppert, to simplify comparison to the pure C code]. (I obtained this via objdump, since I did not find the extra labels etc produced by g++ -S helpful.) Notice that the stack frame size has grown to 376 bytes! I have been trying to understand the produced code, but could not make much sense of it. Some parts of the stack frame are initialized, then copied around, and then other data is used in calling the C functions. It seems like the optimizer just stopped arbitrarily, presumably because of some heuristic cutoff. My main question is: is there a switch to tune this heuristic?

Please note that this problem is not specific to version 4.7.3. There are other (similar) examples where e.g. 4.7.3 optimizes just fine, but say 4.8.1 produces similarly silly code, etc.

Thanks,
Tom

[0] http://www.flintlib.org/
[1] It is a rather big library by now. I am trying to avoid showing the relevant c++ code. In particular all my attempts at isolating a "minimal problematic example" have caused the optimizer to kick in before the code reached an acceptably small size.

You can find all the code at https://github.com/ness01/flint2/tree/gsoc, the functions test_fmpzxx_asymadd_? discussed are found in cxx/test/t-codegen.cpp.

Attachment: asymadd_1.s
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]