This is the mail archive of the
mailing list for the GCC project.
Re: GCC missing -flto optimizations? SPEC lbm benchmark
On Fri, Feb 15, 2019 at 4:46 AM Hi-Angel <email@example.com> wrote:
> I never could understand, why field reordering was removed from GCC? I
> mean, I know that it's prohibited in C and C++, but, sure, GCC can
> detect whether it possibly can influence application behavior, and if
> not, just do the reorder.
> The veto is important to C/C++ as programming languages, but not to
> machine code that is being generated from them. As long as app can't
> detect that its fields were reordered through means defined by C/C++,
> field reordering by compiler is fine, isn't it?
In my opinion field reordering is very hard for the compiler to do
correctly and trivial for a human programmer to do correctly. So in
practice the best approach is for the compiler, or some other tool, to
say "you should reorder the fields here." As far as I can see, the
only real reason to implement field reordering in a compiler is for
benchmark cracking, since benchmarks typically don't let you modify
the source code. It's not a useful optimization in practice other
than for benchmarks.
(Array transformations and struct splitting, on the other hand, can be useful.)
> On Fri, 15 Feb 2019 at 12:49, Jun Ma <firstname.lastname@example.org> wrote:
> > Bin.Cheng <email@example.com> 于2019年2月15日周五 下午5:12写道：
> > > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <firstname.lastname@example.org> wrote:
> > > >
> > > > I have a question about SPEC CPU 2017 and what GCC can and cannot do
> > > > with -flto. As part of some SPEC analysis I am doing I found that with
> > > > -Ofast, ICC and GCC were not that far apart (especially spec int rate,
> > > > spec fp rate was a slightly larger difference).
> > > >
> > > > But when I added -ipo to the ICC command and -flto to the GCC command,
> > > > the difference got larger. In particular the 519.lbm_r was more than
> > > > twice as fast with ICC and -ipo, but -flto did not help GCC at all.
> > > >
> > > > There are other tests that also show this type of improvement with -ipo
> > > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
p> > > > 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone have
> > > > any idea on what ICC is doing that GCC is missing? Is GCC just not
> > > > agressive enough with its inlining?
> > >
> > > IIRC Jun did some investigation before? CCing.
> > >
> > > Thanks,
> > > bin
> > > >
> > > > Steve Ellcey
> > > > email@example.com
> > ICC is doing much more than GCC in ipo, especially memory layout
> > optimizations. See https://software.intel.com/en-us/node/522667.
> > ICC is more aggressive in array transposition/structure splitting
> > /field reordering. However, these optimizations have been removed
> > from GCC long time ago.
> > As for case lbm_r, IIRC a loop with memory access which stride is 20 is
> > most time-consuming. ICC will optimize the array(maybe structure?)
> > and vectorize the loop under ipo.
> > Thanks
> > Jun