This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Enable EBX for x86 in 32bits PIC code


On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
Hi,

On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.

The idea of the patch was very simple and included few things;
  1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do not have any hard reg fixed for PIC.
  2.  Initialize pic_offset_table_rtx with a new pseudo register in the begining of a function expand.
  3.  Change ABI so that there is a possible implicit PIC argument for calls; pic_offset_table_rtx is used as an arg value if such implicit arg exist.

Such approach worked well on small tests but trying to run some benchmarks we faced a problem with reload of address constants.  The problem is that when we try to rematerialize address constant or some constant memory reference, we have to use pic_offset_table_rtx.  It means we insert new usages of a speudo register and alocator cannot handle it correctly.  Same problem also applies for float and vector constants.

Rematerialization is not the only case causing new pic_offset_table_rtx usage.  Another case is a split of some instructions using constant but not having proper constraints.  E.g. pushtf pattern allows push of constant but it has to be replaced with push of memory in reload pass causing additional usage of pic_offset_table_rtx.

There are two ways to fix it.  The first one is to support modifications of pseudo register live range during reload and correctly allocate hard regs for its new usages (currently we have some hard reg allocated for new usage of pseudo reg but it may contain value of some other pseudo reg; thus we reveal the problem at runtime only).


I believe there is already code to deal with this situation. It is code for risky transformations (please check flag lra_risky_transformation_p). If this flag is set, next lra assign subpass is running and checking correctness of assignments (e.g. checking situation when two different pseudos have intersected live ranges and the same assigned hard reg. If such dangerous situation is found, it is fixed).

The second way is to avoid all cases when new usages of pic_offset_table_rtx appear in reload.  That is a way I chose because it appeared simplier to me and would allow me to get some performance data faster.  Also having rematerialization of address anf float constants in PIC mode would mean we have higher register pressure, thus having them on stack should be even more efficient.  To achieve it I had to cut off reg equivs to all exprs using symbol references and all constants living in the memory.  I also had to avoid instructions requiring split in reload causing load of constant from memory (*push[txd]f).

Resulting compiler successfully passes make check, compiles EEMBC and SPEC2000 benchmarks.  There is no confidence I covered all cases and there still may be some templates causing split in reload with new pic_offset_table_rtx usages.  I think support of reload with pseudo PIC would be better and more general solution.  But I don't know how difficult is to implement it though.  Any ideas on resolving this reload issue?


Please see what I mentioned above. May be it can fix the degradation. Rematerialization is important for performance and switching it of completely is not wise.


I collected some performance numbers for EEMBC and SPEC2000 benchmarks.  Here are patch results for -Ofast optlevel with LTO collectd on Avoton server:
AUTOmark +1,9%
TELECOMmark +4,0%
DENmark +10,0%
SPEC2000 -0,5%

There are few degradations on EEMBC benchmarks but on SPEC2000 situation is different and we see more performance losses.  Some of them are caused by disabled rematerialization of address constants.  In some cases relaxed ebx causes more spills/fills in plaecs where GOT is frequently used.  There are also some minor fixes required in the patch to allow more efficient function prolog (avoid unnecessary GOT register initialization and allow its initialization without ebx usage).  Suppose some performance problems may be resolved but a good fix for reload should go first.



Ilya, the optimization you are trying to implement is important in many cases and should be in some way included in gcc. If the degradations can be solved in a way i mentioned above we could introduce a machine-dependent flag.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]