Summary: | Generates 20 lines of assembly while only one assembly instruction is enough. | ||
---|---|---|---|
Product: | gcc | Reporter: | mcccs |
Component: | tree-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | Keywords: | missed-optimization |
Priority: | P3 | ||
Version: | 8.0.1 | ||
Target Milestone: | 11.0 | ||
Host: | x86_64-linux-gnu | Target: | x86_64-linux-gnu |
Build: | x86_64-linux-gnu | Known to work: | |
Known to fail: | Last reconfirmed: | 2018-04-09 00:00:00 | |
Bug Depends on: | |||
Bug Blocks: | 53947 |
Description
mcccs
2018-04-08 07:56:45 UTC
This isn't handled by basic-block vectorization because there are no stores and CONSTRUCTORs are not SLP "seeds". IIRC there are duplicates. We can vectorize a variant with doubles but that results in awful code because the ABI isn't known. The float variant now looks like the following before vectorization: _1 = a.x; _2 = b.x; _3 = _1 + _2; _4 = a.y; _5 = b.y; _6 = _4 + _5; MEM[(struct *)&D.1915] = _3; MEM[(struct *)&D.1915 + 4B] = _6; return D.1915; here the issue is again that we do not know the ABI details plus MMX is disabled and the vectorizer expects 4 floats for vectorization (that is, it cannot vectorize using partial vector regs - the ABI may specify the upper half of %xmm0 is zero for example). Fixed in GCC 11. Where the x86_64 target emulates 2 float vector inside SSE. |