87062 – mis-optimized code with -O3 and std::pair

Bug 87062 - mis-optimized code with -O3 and std::pair

Summary: mis-optimized code with -O3 and std::pair

Status:	RESOLVED DUPLICATE of bug 84101

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	unknown

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2018-08-22 17:40 UTC by Tom Tromey
Modified:	2018-10-16 10:28 UTC (History)
CC List:	0 users

See Also:
Host:
Target:	x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:	2018-08-23 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Tromey 2018-08-22 17:40:21 UTC

I'm filing this on behalf of someone who posted this bug on reddit.
https://www.reddit.com/r/cpp/comments/99e1ri/interesting_gcc_optimizer_bug/

Copying text from there:

Looks like there is an interesting gcc optimizer bug in gcc 7+.

#include <utility>
std::pair<long, long> fret(long i) { return {i, i}; }



With -O2 gcc generates the expected:

        mov     rdx, rdi
        mov     rax, rdi

But with -O3 it generates:

        mov     QWORD PTR [rsp-24], rdi
        movq    xmm0, QWORD PTR [rsp-24]
        punpcklqdq      xmm0, xmm0
        movaps  XMMWORD PTR [rsp-24], xmm0
        mov     rax, QWORD PTR [rsp-24]
        mov     rdx, QWORD PTR [rsp-16]

https://godbolt.org/z/lXoaA4

Comment 1 Tom Tromey 2018-08-22 17:41:32 UTC

Analysis in the comments there puts the blame on -ftree-slp-vectorize

Comment 2 Andrew Pinski 2018-08-22 17:48:30 UTC

(In reply to Tom Tromey from comment #1)
> Analysis in the comments there puts the blame on -ftree-slp-vectorize

Actually it is a cost model issue ...

Comment 3 Richard Biener 2018-08-23 09:21:48 UTC

A dup of PR84101 and others.  The vectorizer has a hard time accounting for
ABI details of parameter passing and return value handling because those are
not reflected in GIMPLE.  There's a patch posted that maybe handles this
case, but I don't see a RESULT_DECL in the IL so it might not:

fret (long int i)
{
  struct pair D.7982;

  <bb 2> [local count: 1073741825]:
  MEM[(struct pair *)&D.7982] = i_2(D);
  MEM[(struct pair *)&D.7982 + 8B] = i_2(D);
  return D.7982;

}

that is, the vectorizer doesn't know D.7982 is forcefully allocated to
a rax/rdx register pair but thinks it is memory (it is memory in GIMPLE).

A heuristic besides the one in the posted patch would be to slightly
pessimize non-TREE_ADDRESSABLE sources/destinations for vectorization,
but if the ABI would return std::pair<long, long> in %xmm0 we'd lose.

Comment 4 Richard Biener 2018-08-23 09:23:15 UTC

Actually quite exact dup.

*** This bug has been marked as a duplicate of bug 84101 ***