Bug 87062 - mis-optimized code with -O3 and std::pair
Summary: mis-optimized code with -O3 and std::pair
Status: RESOLVED DUPLICATE of bug 84101
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: unknown
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2018-08-22 17:40 UTC by Tom Tromey
Modified: 2018-10-16 10:28 UTC (History)
0 users

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-08-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Tromey 2018-08-22 17:40:21 UTC
I'm filing this on behalf of someone who posted this bug on reddit.
https://www.reddit.com/r/cpp/comments/99e1ri/interesting_gcc_optimizer_bug/

Copying text from there:

Looks like there is an interesting gcc optimizer bug in gcc 7+.

#include <utility>
std::pair<long, long> fret(long i) { return {i, i}; }

​

With -O2 gcc generates the expected:

        mov     rdx, rdi
        mov     rax, rdi

But with -O3 it generates:

        mov     QWORD PTR [rsp-24], rdi
        movq    xmm0, QWORD PTR [rsp-24]
        punpcklqdq      xmm0, xmm0
        movaps  XMMWORD PTR [rsp-24], xmm0
        mov     rax, QWORD PTR [rsp-24]
        mov     rdx, QWORD PTR [rsp-16]

https://godbolt.org/z/lXoaA4
Comment 1 Tom Tromey 2018-08-22 17:41:32 UTC
Analysis in the comments there puts the blame on -ftree-slp-vectorize
Comment 2 Andrew Pinski 2018-08-22 17:48:30 UTC
(In reply to Tom Tromey from comment #1)
> Analysis in the comments there puts the blame on -ftree-slp-vectorize

Actually it is a cost model issue ...
Comment 3 Richard Biener 2018-08-23 09:21:48 UTC
A dup of PR84101 and others.  The vectorizer has a hard time accounting for
ABI details of parameter passing and return value handling because those are
not reflected in GIMPLE.  There's a patch posted that maybe handles this
case, but I don't see a RESULT_DECL in the IL so it might not:

fret (long int i)
{
  struct pair D.7982;

  <bb 2> [local count: 1073741825]:
  MEM[(struct pair *)&D.7982] = i_2(D);
  MEM[(struct pair *)&D.7982 + 8B] = i_2(D);
  return D.7982;

}

that is, the vectorizer doesn't know D.7982 is forcefully allocated to
a rax/rdx register pair but thinks it is memory (it is memory in GIMPLE).

A heuristic besides the one in the posted patch would be to slightly
pessimize non-TREE_ADDRESSABLE sources/destinations for vectorization,
but if the ABI would return std::pair<long, long> in %xmm0 we'd lose.
Comment 4 Richard Biener 2018-08-23 09:23:15 UTC
Actually quite exact dup.

*** This bug has been marked as a duplicate of bug 84101 ***