This is the mail archive of the
mailing list for the GCC project.
Re: Enable EBX for x86 in 32bits PIC code
- From: Ilya Enkovich <enkovich dot gnu at gmail dot com>
- To: Vladimir Makarov <vmakarov at redhat dot com>
- Cc: gcc at gnu dot org, gcc-patches <gcc-patches at gcc dot gnu dot org>, Evgeny Stupachenko <evstupac at gmail dot com>, Richard Biener <richard dot guenther at gmail dot com>, Uros Bizjak <ubizjak at gmail dot com>, Jeff Law <law at redhat dot com>
- Date: Tue, 26 Aug 2014 11:49:44 +0400
- Subject: Re: Enable EBX for x86 in 32bits PIC code
- Authentication-results: sourceware.org; auth=none
- References: <CAOvf_xxsQ_oYGqNAVQ1+BW+CuD3mzebZ2xma0jpF=WfyZMCRCA at mail dot gmail dot com> <CAFiYyc1mFtTezkTJORmJJq+yht=qPSwiN7KDn19+bSuSdaqvMQ at mail dot gmail dot com> <CAOvf_xyeVeg2oB9Xxz8RMEQ6gyfJY5whd9s4ygoAAEaMU9efnA at mail dot gmail dot com> <20140707114750 dot GB31640 at tucnak dot redhat dot com> <CAMbmDYZV_fx0jxmKHhLsC2pJ7pDzuu6toEAH72izOdpq6KGyfg at mail dot gmail dot com> <20140822121151 dot GA60032 at msticlxl57 dot ims dot intel dot com> <53FB5184 dot 3030500 at redhat dot com>
2014-08-25 19:08 GMT+04:00 Vladimir Makarov <firstname.lastname@example.org>:
> On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in
>> 32bit PIC mode. It was decided that the best approach would be to not fix
>> ebx register, use speudo register for GOT base address and let allocator do
>> the rest. This should be similar to how clang and icc work with GOT base
>> address. I've been working for some time on such patch and now want to
>> share my results.
>> The idea of the patch was very simple and included few things;
>> 1. Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do
>> not have any hard reg fixed for PIC.
>> 2. Initialize pic_offset_table_rtx with a new pseudo register in the
>> begining of a function expand.
>> 3. Change ABI so that there is a possible implicit PIC argument for
>> calls; pic_offset_table_rtx is used as an arg value if such implicit arg
>> Such approach worked well on small tests but trying to run some benchmarks
>> we faced a problem with reload of address constants. The problem is that
>> when we try to rematerialize address constant or some constant memory
>> reference, we have to use pic_offset_table_rtx. It means we insert new
>> usages of a speudo register and alocator cannot handle it correctly. Same
>> problem also applies for float and vector constants.
>> Rematerialization is not the only case causing new pic_offset_table_rtx
>> usage. Another case is a split of some instructions using constant but not
>> having proper constraints. E.g. pushtf pattern allows push of constant but
>> it has to be replaced with push of memory in reload pass causing additional
>> usage of pic_offset_table_rtx.
>> There are two ways to fix it. The first one is to support modifications
>> of pseudo register live range during reload and correctly allocate hard regs
>> for its new usages (currently we have some hard reg allocated for new usage
>> of pseudo reg but it may contain value of some other pseudo reg; thus we
>> reveal the problem at runtime only).
> I believe there is already code to deal with this situation. It is code for
> risky transformations (please check flag lra_risky_transformation_p). If
> this flag is set, next lra assign subpass is running and checking
> correctness of assignments (e.g. checking situation when two different
> pseudos have intersected live ranges and the same assigned hard reg. If
> such dangerous situation is found, it is fixed).
I tried to remove my restrictions from setup_reg_equiv and initialize
lra_risky_transformation_p with 'true' in lra_constraints instead. I
got only 50% pass rate for SPEC2000 on Ofast with LTO. Will search
for fail reason.
>> The second way is to avoid all cases when new usages of
>> pic_offset_table_rtx appear in reload. That is a way I chose because it
>> appeared simplier to me and would allow me to get some performance data
>> faster. Also having rematerialization of address anf float constants in PIC
>> mode would mean we have higher register pressure, thus having them on stack
>> should be even more efficient. To achieve it I had to cut off reg equivs to
>> all exprs using symbol references and all constants living in the memory. I
>> also had to avoid instructions requiring split in reload causing load of
>> constant from memory (*push[txd]f).
>> Resulting compiler successfully passes make check, compiles EEMBC and
>> SPEC2000 benchmarks. There is no confidence I covered all cases and there
>> still may be some templates causing split in reload with new
>> pic_offset_table_rtx usages. I think support of reload with pseudo PIC
>> would be better and more general solution. But I don't know how difficult
>> is to implement it though. Any ideas on resolving this reload issue?
> Please see what I mentioned above. May be it can fix the degradation.
> Rematerialization is important for performance and switching it of
> completely is not wise.
>> I collected some performance numbers for EEMBC and SPEC2000 benchmarks.
>> Here are patch results for -Ofast optlevel with LTO collectd on Avoton
>> AUTOmark +1,9%
>> TELECOMmark +4,0%
>> DENmark +10,0%
>> SPEC2000 -0,5%
>> There are few degradations on EEMBC benchmarks but on SPEC2000 situation
>> is different and we see more performance losses. Some of them are caused by
>> disabled rematerialization of address constants. In some cases relaxed ebx
>> causes more spills/fills in plaecs where GOT is frequently used. There are
>> also some minor fixes required in the patch to allow more efficient function
>> prolog (avoid unnecessary GOT register initialization and allow its
>> initialization without ebx usage). Suppose some performance problems may be
>> resolved but a good fix for reload should go first.
> Ilya, the optimization you are trying to implement is important in many
> cases and should be in some way included in gcc. If the degradations can be
> solved in a way i mentioned above we could introduce a machine-dependent