PR 23551: why should we coalesce inlined variables?

Thu Jun 28 09:04:00 GMT 2007

On 6/27/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> Sorry that it took me so long to get back to this.
>
> On Jun  1, 2007, "Andrew Pinski" <pinskia@gmail.com> wrote:
>
> > On 6/1/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> >> I didn't.  Today I set some time aside to try to get SPEC2000 to run
> >> on an x86_64 box with Fedora 7.  I still haven't got all of the
> >> benchmarks to compile and run successfully with the GCC trunk, but the
> >> results I got for the patch are promising:
>
> >> Again, the left column is the run-time WITH the patch, the right
> >> columnt is the run-time WITHOUT the patch.  That's right, removing the
> >> patch actually slows things down.  I couldn't quite believe it, after
> >> what you said, so I triple-checked.
>
> > Oh I can believe it for x86_64 :)  You might want to try on more than
> > x86_64, the register allocator might not be doing a good job for
> > x86_64.  Without the patch is causing more register pressure than with
> > the patch.  Try either on ia64 or PPC where you have more registers.
> > In that case with the patch might slow down the runtime.
>
> Looks like variations for worse are mostly in the noise, and there are
> some variations for better than look consistent.  Here's what I got on
> ppc32 SPEC2K with -O3 -fomit-frame-pointer.  Left column is pristine,
> right column is patched to avoid coalescing of inlined variables:
>
> 164_gzip.000.reported_time: 201.922768 201.103212
> 164_gzip.001.reported_time: 198.960087 198.624947
> 164_gzip.002.reported_time: 199.073588 198.71297
> 175_vpr.000.reported_time: 298.687033 297.300598
> 175_vpr.001.reported_time: 297.122948 297.58685
> 175_vpr.002.reported_time: 295.630361 297.723687
> 176_gcc.000.reported_time: 123.435803 126.320587
> 176_gcc.001.reported_time: 122.343189 124.88423
> 176_gcc.002.reported_time: 122.800357 123.864006
> 181_mcf.000.reported_time: 361.04524 360.337646
> 181_mcf.001.reported_time: 359.981651 360.322147
> 181_mcf.002.reported_time: 359.908608 360.219215
> 186_crafty.000.reported_time: 121.013495 117.454093
> 186_crafty.001.reported_time: 117.082273 117.642486
> 186_crafty.002.reported_time: 117.416734 117.723788
> 197_parser.000.reported_time: 281.625919 279.478479
> 197_parser.001.reported_time: 281.570151 279.82866
> 197_parser.002.reported_time: 281.804008 279.595665
> 253_perlbmk.000.reported_time: 287.251103 286.796575
> 253_perlbmk.001.reported_time: 286.697103 287.089939
> 253_perlbmk.002.reported_time: 286.286923 286.806686
> 254_gap.000.reported_time: 159.230154 153.683678
> 254_gap.001.reported_time: 157.404963 152.992977
> 254_gap.002.reported_time: 158.465634 150.772173
> 256_bzip2.000.reported_time: 257.393719 256.039831
> 256_bzip2.001.reported_time: 256.692201 254.992797
> 256_bzip2.002.reported_time: 256.105407 255.93728
>
> 177_mesa.000.reported_time: 163.103962 163.715843
> 177_mesa.001.reported_time: 162.948224 163.274986
> 177_mesa.002.reported_time: 162.827828 163.231781
> 179_art.000.reported_time: 433.203321 428.189829
> 179_art.001.reported_time: 434.796632 433.193104
> 179_art.002.reported_time: 433.542143 431.804417
> 183_equake.000.reported_time: 124.691579 124.512015
> 183_equake.001.reported_time: 124.418219 124.450906
> 183_equake.002.reported_time: 124.455124 124.621132
> 188_ammp.000.reported_time: 595.206248 595.92599
> 188_ammp.001.reported_time: 596.071531 595.285807
> 188_ammp.002.reported_time: 595.5348 595.350713
>
> vortex output was wrong for both builds, so I cut it out from the
> report above.  A few other testcases failed to compile and are not
> reported, most (all?) of them were just missing an f77 compiler (I
> didn't even enable fortran in the tools I built).
>
> Is this enough evidence that the patch is not harmful to run-time
> performance, and that it may actually help debugging?
>
> http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00703.html

Can you show me one example of the code
where this patch helps, and why it helps the runtime performance ?
e.g. gap

The performance difference could simply be due to
small number of inlining happening on critical paths.
And the amount of performance variation suggests
that this might be simply *luck*
(e.g. gap is known to be a bit volatile in performance).
Also, the slowdown of gcc worries me more
as it has one of the most flat profile among spec int suite
and has presumably more automatic inlining happening.
Alas, it's unfortunate that we don't have data for 252.eon.
Most of fp benchmarks are not interesting (except fma3d and sixtrack)
since they spend most of their time on their hot loops.

So while your data is still very useful and interesting,
it doesn't give nearly enough information to determine
whether this is the right thing to do in general,
especially when in theory this isn't "The Right Thing To Do" in general.

This is not to say I object to your patch - rather I'm flabbergasted
than anything that disabling coalescing at gimple has such a low impact -
I have to wonder if there's other passes that coalesce names later
(or if RTL level coalescing/register assignment makes the lack of tree level
coalescing less of a problem).
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com"