This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Enable EBX for x86 in 32bits PIC code
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: Ilya Enkovich <enkovich dot gnu at gmail dot com>, gcc at gnu dot org, gcc-patches at gcc dot gnu dot org
- Cc: Evgeny Stupachenko <evstupac at gmail dot com>, Richard Biener <richard dot guenther at gmail dot com>, Uros Bizjak <ubizjak at gmail dot com>, law at redhat dot com
- Date: Mon, 25 Aug 2014 11:08:52 -0400
- Subject: Re: Enable EBX for x86 in 32bits PIC code
- Authentication-results: sourceware.org; auth=none
- References: <CAOvf_xxsQ_oYGqNAVQ1+BW+CuD3mzebZ2xma0jpF=WfyZMCRCA at mail dot gmail dot com> <CAFiYyc1mFtTezkTJORmJJq+yht=qPSwiN7KDn19+bSuSdaqvMQ at mail dot gmail dot com> <CAOvf_xyeVeg2oB9Xxz8RMEQ6gyfJY5whd9s4ygoAAEaMU9efnA at mail dot gmail dot com> <20140707114750 dot GB31640 at tucnak dot redhat dot com> <CAMbmDYZV_fx0jxmKHhLsC2pJ7pDzuu6toEAH72izOdpq6KGyfg at mail dot gmail dot com> <20140822121151 dot GA60032 at msticlxl57 dot ims dot intel dot com>
On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
Hi,
On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results.
The idea of the patch was very simple and included few things;
1. Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do not have any hard reg fixed for PIC.
2. Initialize pic_offset_table_rtx with a new pseudo register in the begining of a function expand.
3. Change ABI so that there is a possible implicit PIC argument for calls; pic_offset_table_rtx is used as an arg value if such implicit arg exist.
Such approach worked well on small tests but trying to run some benchmarks we faced a problem with reload of address constants. The problem is that when we try to rematerialize address constant or some constant memory reference, we have to use pic_offset_table_rtx. It means we insert new usages of a speudo register and alocator cannot handle it correctly. Same problem also applies for float and vector constants.
Rematerialization is not the only case causing new pic_offset_table_rtx usage. Another case is a split of some instructions using constant but not having proper constraints. E.g. pushtf pattern allows push of constant but it has to be replaced with push of memory in reload pass causing additional usage of pic_offset_table_rtx.
There are two ways to fix it. The first one is to support modifications of pseudo register live range during reload and correctly allocate hard regs for its new usages (currently we have some hard reg allocated for new usage of pseudo reg but it may contain value of some other pseudo reg; thus we reveal the problem at runtime only).
I believe there is already code to deal with this situation. It is code
for risky transformations (please check flag
lra_risky_transformation_p). If this flag is set, next lra assign
subpass is running and checking correctness of assignments (e.g.
checking situation when two different pseudos have intersected live
ranges and the same assigned hard reg. If such dangerous situation is
found, it is fixed).
The second way is to avoid all cases when new usages of pic_offset_table_rtx appear in reload. That is a way I chose because it appeared simplier to me and would allow me to get some performance data faster. Also having rematerialization of address anf float constants in PIC mode would mean we have higher register pressure, thus having them on stack should be even more efficient. To achieve it I had to cut off reg equivs to all exprs using symbol references and all constants living in the memory. I also had to avoid instructions requiring split in reload causing load of constant from memory (*push[txd]f).
Resulting compiler successfully passes make check, compiles EEMBC and SPEC2000 benchmarks. There is no confidence I covered all cases and there still may be some templates causing split in reload with new pic_offset_table_rtx usages. I think support of reload with pseudo PIC would be better and more general solution. But I don't know how difficult is to implement it though. Any ideas on resolving this reload issue?
Please see what I mentioned above. May be it can fix the degradation.
Rematerialization is important for performance and switching it of
completely is not wise.
I collected some performance numbers for EEMBC and SPEC2000 benchmarks. Here are patch results for -Ofast optlevel with LTO collectd on Avoton server:
AUTOmark +1,9%
TELECOMmark +4,0%
DENmark +10,0%
SPEC2000 -0,5%
There are few degradations on EEMBC benchmarks but on SPEC2000 situation is different and we see more performance losses. Some of them are caused by disabled rematerialization of address constants. In some cases relaxed ebx causes more spills/fills in plaecs where GOT is frequently used. There are also some minor fixes required in the patch to allow more efficient function prolog (avoid unnecessary GOT register initialization and allow its initialization without ebx usage). Suppose some performance problems may be resolved but a good fix for reload should go first.
Ilya, the optimization you are trying to implement is important in many
cases and should be in some way included in gcc. If the degradations
can be solved in a way i mentioned above we could introduce a
machine-dependent flag.