Spurious optimization failures - unnecessary stack frame management

Tom Bachmann e_mc_h2@web.de
Tue Jul 9 12:04:00 GMT 2013


Hi,

[please CC me, I am not subscribed to this list]

I am writing a C++ expression template wrapper library for FLINT [0]. I 
am finding that across gcc versions, and with no apparent pattern, the 
optimizer sometimes fails to properly eliminate stack frame management. 
Is this a known problem? What parameter values should one increase to 
have the optimizer do this more aggressively?

I am working on x86-64, if that is relevant.

Please excuse my being so vague, unfortunately I do not know much about 
optimizer internals. Let me show you an example. Consider the function

void
test_fmpzxx_asymadd_1 (fmpzxx& out, const fmpzxx& a,
         const fmpzxx& b, const fmpzxx& c, const fmpzxx& d)
{
     out = (a + (((b + (c + (a + b))) + c) + d));
}

The type fmpzxx has a single data member, which is a "long". One may 
obtain a pointer to this data member using the _fmpz() method. Using 
some expression template magic [1], the above line is turned into 
function calls to a C library, essentially equivalent to the following:

void
test_fmpzxx_asymadd_2 (fmpzxx& out, const fmpzxx& a,
         const fmpzxx& b, const fmpzxx& c, const fmpzxx& d)
{
     fmpz_t tmp;
     fmpz_init (tmp);

     fmpz_add (tmp, a._fmpz (), b._fmpz ());
     fmpz_add (tmp, c._fmpz (), tmp);
     fmpz_add (tmp, b._fmpz (), tmp);
     fmpz_add (tmp, tmp, c._fmpz ());
     fmpz_add (tmp, tmp, d._fmpz ());
     fmpz_add (out._fmpz(), a._fmpz (), tmp);

     fmpz_clear (tmp);
}

However, to attain this, the optimizer has to eliminate many 
temporaries, inline calls, track pointers etc. It seems to me that, for 
no apparent reason, this goes wrong sometimes. For example, in g++-4.6.4 
or g++-4.8.1, both of the above functions yield essentially equal 
machine code, with a stack frame size of about 56 bytes. On the other 
hand, g++-4.7.3 produces the attached code [NB: this is compiled without 
exception suppert, to simplify comparison to the pure C code]. (I 
obtained this via objdump, since I did not find the extra labels etc 
produced by g++ -S helpful.) Notice that the stack frame size has grown 
to 376 bytes! I have been trying to understand the produced code, but 
could not make much sense of it. Some parts of the stack frame are 
initialized, then copied around, and then other data is used in calling 
the C functions. It seems like the optimizer just stopped arbitrarily, 
presumably because of some heuristic cutoff. My main question is: is 
there a switch to tune this heuristic?

Please note that this problem is not specific to version 4.7.3. There 
are other (similar) examples where e.g. 4.7.3 optimizes just fine, but 
say 4.8.1 produces similarly silly code, etc.

Thanks,
Tom

[0] http://www.flintlib.org/
[1] It is a rather big library by now. I am trying to avoid showing the 
relevant c++ code. In particular all my attempts at isolating a "minimal 
problematic example" have caused the optimizer to kick in before the 
code reached an acceptably small size.

You can find all the code at https://github.com/ness01/flint2/tree/gsoc, 
the functions test_fmpzxx_asymadd_? discussed are found in 
cxx/test/t-codegen.cpp.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: asymadd_1.s
URL: <https://gcc.gnu.org/pipermail/gcc-help/attachments/20130709/18651009/attachment.ksh>


More information about the Gcc-help mailing list