Summary: | x86 -mavx256-split-unaligned-load (and store) is affecting AVX2 code, but probably shouldn't be. | ||
---|---|---|---|
Product: | gcc | Reporter: | Peter Cordes <pcordes> |
Component: | target | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | Keywords: | missed-optimization |
Priority: | P3 | ||
Version: | 7.0 | ||
Target Milestone: | --- | ||
Host: | Target: | x86_64-*-*, i?86-*-* | |
Build: | Known to work: | 6.3.0 | |
Known to fail: | 8.0 | Last reconfirmed: | |
Attachments: | bswap16.cc |
Description
Peter Cordes
2017-04-29 21:12:12 UTC
It was a bugfix and it's now working as intended AFAIK. You can search for duplicate bugreports. Using ISA-extension options removes some microarchitectures from the set of CPUs that can run the code, so it would be appropriate for them to have some effect on tuning. A "generic AVX2 CPU" is much more specific than a "generic x86-64 CPU". For example, rep ret is useless with -mavx, since PhenomII doesn't support AVX (or SSE4, actually). As it stands now, gcc doesn't have a way to tune for a "generic avx2 CPU". (i.e. only try to avoid problems on Haswell, Skylake, KNL, Excavator, and Ryzen. Don't care about things that are slow on IvyBridge, Steamroller, or Atom.) -mtune=haswell tells gcc that bsf/bsr are fast, but that's not the case on Excavator (at least it isn't on Steamroller). So -mtune=intel or -mtune=haswell aren't necessarily appropriate, especially if we're just talking about -mavx, not -mavx2. --- In the absence of any -mtune or -march options, -mavx could imply -mtune=generic-avx, the way -march implies a tuning but can be overridden with -march=foo -mtune=bar. Or maybe the default -mtune option should be changed to -mtune=generic-isa, so users can think of it as a tuning that looks at what -m options are enabled to decide which uarches it can ignore. It might be easier to maintain if those tune options are limited to only disabling workaround-options like rep ret and splitting 256b loads/stores. Or maybe this suggestion would already add too much maintenance work. --- I don't know whether -mavx256-split-unaligned-load/store is still worth it if we take SnB/IvB out of the picture. If it helps a lot for Excavator/Zen, then maybe. It probably hurts for KNL, which easily bottlenecks on decode throughput according to Agner Fog, so more instructions is definitely bad. --- I didn't find any related bug reports, searching even on closed bugs for split unaligned load, or for -mavx256-split-unaligned-load. I did search some (including in git for the commit that changed this), but didn't find anything. Thanks for confirming that it was an intentional bugfix. |