I'm filing this on behalf of someone who posted this bug on reddit. https://www.reddit.com/r/cpp/comments/99e1ri/interesting_gcc_optimizer_bug/ Copying text from there: Looks like there is an interesting gcc optimizer bug in gcc 7+. #include <utility> std::pair<long, long> fret(long i) { return {i, i}; } With -O2 gcc generates the expected: mov rdx, rdi mov rax, rdi But with -O3 it generates: mov QWORD PTR [rsp-24], rdi movq xmm0, QWORD PTR [rsp-24] punpcklqdq xmm0, xmm0 movaps XMMWORD PTR [rsp-24], xmm0 mov rax, QWORD PTR [rsp-24] mov rdx, QWORD PTR [rsp-16] https://godbolt.org/z/lXoaA4
Analysis in the comments there puts the blame on -ftree-slp-vectorize
(In reply to Tom Tromey from comment #1) > Analysis in the comments there puts the blame on -ftree-slp-vectorize Actually it is a cost model issue ...
A dup of PR84101 and others. The vectorizer has a hard time accounting for ABI details of parameter passing and return value handling because those are not reflected in GIMPLE. There's a patch posted that maybe handles this case, but I don't see a RESULT_DECL in the IL so it might not: fret (long int i) { struct pair D.7982; <bb 2> [local count: 1073741825]: MEM[(struct pair *)&D.7982] = i_2(D); MEM[(struct pair *)&D.7982 + 8B] = i_2(D); return D.7982; } that is, the vectorizer doesn't know D.7982 is forcefully allocated to a rax/rdx register pair but thinks it is memory (it is memory in GIMPLE). A heuristic besides the one in the posted patch would be to slightly pessimize non-TREE_ADDRESSABLE sources/destinations for vectorization, but if the ABI would return std::pair<long, long> in %xmm0 we'd lose.
Actually quite exact dup. *** This bug has been marked as a duplicate of bug 84101 ***