This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa] A pass to remove a large number of casts in C++ code



On Apr 5, 2004, at 00:30, Diego Novillo wrote:
Andrew,

Good stuff. Thanks. A couple of things that I'd like to address before
we commit to this:


     1. The patch is poorly formatted and contains several typos and
        grammar problems.  Please format everything so that it fits on
        80 columns.  Functions need to have documentation about each of
        their arguments and you'll need to add some spacing to make
        things more clear.

I figured that when I submitted it, that I did not take my time in writing the
comments; I am cleaning it up right now. I also had one big function with
almost no comments at the point when you asked me to submit it so it was
much worse then. I had forgot about documenting the arguments, I need
more practice at commenting code :).



2. Why did you implement it as a separate pass? The
transformations use almost no data flow information. Wouldn't
it be better to implement these routines as subroutines of
fold_stmt()? I want to understand what led you to choose this
route. It doesn't seem to take long, but it does require a full
IL scan, and since all the transformations are related to
"folding", perhaps they belong there?

It cannot be in fold_stmt is because later on I would like to be able to
create some PHI nodes where the variable has a different type than what the
original variable had so we optimize PR 14820 and PR 14843. Both have to do
with selecting the right type in the first place either by the coder (14843)
or by the gimplifier (14820).


It is run multiple times and right after PHI-OPT. We still catch cast
elimination, mainly in C++ code even in the third and sometimes even at the
fourth and final pass.


Also after the tree-ssa gets merged into the mainline, I will submit a
patch for PHI-OPT which handles cases like the following:
int a(int i)
{
int t;
if (i > 10)
t = 1;
else
t = 0;
return t;
}
To turn it into:
int a(int i)
{
int t;
_Bool t1;
t1 = i >10;
t = (int)t;
return t;
}
But it is not handled right now as PHI-OPT does different types right now. So we
will want to clean up after PHI-OPT and other passes too.
Cleaning up after phiopt will help out also as then jump threading will not be confused.
Note the previous case gets turned into (with the cast pass) and this is PR 14466:
int a(int i)
{
int t;
t = i >10;
return t;
}



     3. If we decide to have it as a separate pass, it should be
        documented in passes.texi (I think there are other tree-ssa
        passes missing from passes.texi that we will need to add before
        the merge).

Okay I will document it then (and my other passes which I will add in the
future and the ones I already added to the lno branch). I can also add more
documentation for the PHI node optimizations.



I did some tests over the weekend and it looks pretty decent,
particularly for some C++ codes:

      * For DLV, code size was reduced by 1.8% and compile time reduced
        by 2.7%.
      * For cc1-i-files, there was a 0.2% code reduction and almost no
        reduction in compile time (less than a second).
      * For tramp3d-v3.cpp (compiled with -O2) I noticed no change in
        compile time, code size was reduced by 0.4% and run time was
        reduced by 1.5% (from 6.8s/it to 6.7s/it).

Thanks for testing the patch and see that it improves the code generation
and compile time which I figured it would do but I did hardly any timings
myself. Also I noticed for DLV, the *.vars dump file was half the size with
my patch, I did not look at the first rtl dump file but I would assume it
was also much smaller also and I noticed there were less basic blocks in
the dump file.



SPEC2000 results are within the usual values: gzip, perlbmk and twolf
are the best performers, but we lose some in crafty and eon.  Overall,
the scores are very similar, though.

Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
164.gzip 1400 220 637* 1400 218 643*
175.vpr 1400 336 417* 1400 332 422*
176.gcc X X
181.mcf 1800 412 437* 1800 429 435*
186.crafty 1000 153 652* 1000 155 645*
197.parser 1800 314 573* 1800 319 564*
252.eon 1300 228 569* 1300 242 537*
253.perlbmk 1800 242 743* 1800 231 778*
254.gap 1100 152 725* 1100 149 737*
255.vortex 1900 223 852* 1900 224 850*
256.bzip2 1500 284 528* 1500 284 528*
300.twolf 3000 561 534* 3000 555 540*
Est. SPECint_base2000 593
Est. SPECint2000 593



The eon result is 6% which means that it needs to looked into but when I looked into
dump files they were about the same, I need to look again. This might also just be
a normal variation that I did not know about before.


I will add a flag to make my life easier and make it able to disable it if it causes
a regression which I do not know about at this point.



I also tested the patch on ia64, alpha, ia32e and x86-64.  No problems
on any arch, modulo the fortran regressions which I think we should
address by removing support for MINUS_EXPR in is_gimple_min_invariant.

This regression has now been fixed plus also I can now run the fortran testsuite more
reliable also on powerpc-apple-darwin.


Thanks,
Andrew Pinski


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]