This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, Several of optimization passes seems to need badly availability of liveness information. From what is currently in the mainline tree, the ifconverison and jump threading at least become immediately stronger when liveness is available, similary for GCSE that can do code hoisting of instruction clobbering hard registers and so on. Currently we do so after flow1 pass that is bit too late, as CSE can't cleanup after then transformation. Also the early dead code removal is important step as observed on the Stephanov benchamrk. On the cfg-branch I am having for a while an third liveness pass just before GCSE. Because of recent discussions about the compiler perofmrance I see this decision as somewhat contraversal, so I would like to discuss it. I've implemented simple patch to add third liveness to mainline (attached) and asked Andreas to do the benchmarking. The results are interesting. As expected, the bootstrap is about 1% slower and there is just small increase in performance (and decrease of size) of C benchmarks. But interestingly the only C++ benchamrk, eon shows different figures. The savings are about 2.2% in code size and 1.6% performance (*). I am not sure how representative it is for C++, but it seeems to suggest that the abstraction penalties can be significatly lowered, since the stephanov results are similar as well. The overall savings for SPECs is 0.2% in size and similar ammount of perfomrance I guess, but SPECs generally do have very low abstraction penalties, as many loops are handoptimized. Would this be considered as strong enought purpose to have the pass? As mentioned, I believe this will pay back more with extra effort, once GCSE is made stronger (for i386 this has neutral perofmrance effect, but I guess it is register allocation problem) and other passes use it. I believe that for instance CSE can be easilly hacked to use the notes to reduce register pressure instead of current local heuristics. I also have the double-test converison pass that should be somewhat stronger when run before CSE than before combine I do currently. Similary I think the liveness costs can be made much lower, since currently we don't compute bitmaps of local properties, instead re-scan every time that can be expensive especially when dead store removal has been added. Perhaps the dead code removal is better done using DU/UD chains and curent ssa-dce code converted to these, but I am not sure how popular step this can be. I am attaching the patch and results for reference. Honza (*) For some purpose the mainline eon has failed for Andreas, but the machine is same as one used by periodic tester and other results are consistent, so I've just filled in the gap from the official results.
Attachment:
le
Description: Text document
Attachment:
live
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |