[Bug tree-optimization/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view

amonakov at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Sat Jan 21 11:48:06 GMT 2023


Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
          Component|rtl-optimization            |tree-optimization
           Keywords|                            |needs-bisection
            Summary|~20-30x slowdown in         |[10/11/12/13 Regression]
                   |populating std::vector from |~20-30x slowdown in
                   |std::ranges::iota_view      |populating std::vector from
                   |                            |std::ranges::iota_view
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Regarding fn1, would you mind re-running the test on your Xeon CPU with fn2
removed from the source code and -falign-loops=32 added to gcc command line?
For fn1, assembly of the inner loop should be identical, so I think the 20% you
were seeing may result from different loop alignment with respect to 32b fetch

Also please note that cloud instances backing godbolt.org have different CPUs,
so timing results from different runs are not directly comparable.

Regarding fn2, this may partially be a library issue, compiling preprocessed
source from gcc-10.4 using gcc-10.2 also exhibits the problem. Inner loop
becomes significantly more complicated. Bisecting should be helpful.

More information about the Gcc-bugs mailing list