This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: How to efficiently unpack 8 bytes from a 64-bit integer?
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Phil Ruffwind <rf at rufflewind dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Fri, 19 Feb 2016 10:53:11 +0100
- Subject: Re: How to efficiently unpack 8 bytes from a 64-bit integer?
- Authentication-results: sourceware.org; auth=none
- References: <CADb8sTOdGE7UJtCHQA4q8y94av70=U_wOd2vv1ibwyQs2xtbmw at mail dot gmail dot com> <CAFiYyc1DqAqUcD8mZLhbdbSonOakXDPqBXSGNEw2_8GneHkUeQ at mail dot gmail dot com> <CADb8sTMjuuz+Vi37scUMOW1CLq-zpPYKZBbA+majj7qr78w42Q at mail dot gmail dot com>
On Fri, Feb 19, 2016 at 10:44 AM, Phil Ruffwind <rf@rufflewind.com> wrote:
> I tried to look for a workaround for this. It seemed that using a
> union instead of memcpy was enough to convince GCC to optimize into a
> single "mov".
>
> struct alpha unpack(uint64_t x)
> {
> union {
> struct alpha r;
> uint64_t i;
> } u;
> u.i = x;
> return u.r;
> }
>
> But that trick turned out to be short-lived. If I wrap the wrapper
> with another function:
>
> struct alpha wrapperwrapper(uint64_t y)
> {
> return wrapper(y);
> }
>
> I get the same 37-line assembly generated for this function. What's
> even more strange is that if I just define two identical wrappers in
> the same translation unit:
>
> struct alpha wrapper(uint64_t y)
> {
> return unpack(y);
> }
>
> struct alpha wrapper2(uint64_t y)
> {
> return unpack(y);
> }
>
> One of them gets optimized perfectly, while the other fails, even
> though the bodies of the two functions are completely identical!
Yes, as said GCC tries to optimize the copy that results from copying
the return value aggregate to the caller return value slot. GCC hopes
for followup optimization opportunities here but obviously there are none
in this case.
Can you please open a bugreport? We eventually can tweak SRA
heuristics in some way here. Note that you only get good code because
the aggregate is passed and returned in a register (and thus "alignment"
doesn't matter here) - something which is exposed too late to GCC
to make use of that fact in SRA (well, easily at least).
Richard.