This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Can we speed up the gcc_target structure?


Back in the old days, gcc had a lot of code which was conditionally
compiled with #ifdef.  That was ugly, but the resulting code was fast.
Over time, a lot of the parameters checked with #ifdef were converted
into macros which were checked at runtime using if.  That was less
ugly, and, since the macros normally had constant values, when gcc was
compiled with an optimizing compiler, the code was just as fast in the
normal case.  When it was slower, it was generally because the
compiler was doing something it couldn't do before.

More recently, some of those parameters have moved into the gcc_target
structure.  They are still checked at run time, but now the if
condition never has a constant value.  It always requires fetching a
value from memory in the target vector, and often requires calling a
function.  This results in cleaner, more comprehensible code.

However, it also slows the compiler down.

Just for fun, I converted every instance of
    targetm.calls.xxxx
to be
    TARGETM_CALLS_XXXX
instead.  Then I added stuff like this to the end of target.h:

#ifndef TARGETM_CALLS_PROMOTE_FUNCTION_ARGS
#define TARGETM_CALLS_PROMOTE_FUNCTION_ARGS(FNTYPE) \
  targetm.calls.promote_function_args ((FNTYPE))
#endif

Then I added stuff like this to i386.h:

#define TARGETM_CALLS_PROMOTE_FUNCTION_ARGS(FNTYPE) false

Then I rebuilt the compiler and tried it on some reasonably small C++
example (with a native i386 GNU/Linux compiler).  I saw compilation
speedups of up to 3% when compiling without optimization.  The
resulting assembler output was, as expected, identical.

These tests were far from rigorous.  However, compilation speed is a
concern these days, and this suggests that the target vector is a
measurable speed problem.

Somebody must have noticed this before, but I couldn't find anything
in the gcc mailing list.

It seems to me that we should try to find a way to regain the speed
which was lost when we switched to the target vector, without losing
the comprehensibility which was gained.

Here is a sketch of a possible approach which would require fairly
minimal changes in the way the target vector works today:

1) Turn hooks.c and targhooks.c into .h files which define inline
   functions (with appropriate fallbacks to support older non-gcc
   compilers for bootstrapping, of course).

2) Move all definitions of target initializer macros from tm.c files
   into new CPU-target.h files.

3) Include CPU-target.h at the end of target-def.h, where it will
   redefine and undefine target initializer macros.  For cases in
   which targetm is changed at run time, CPU-target.h must #undef the
   corresponding initializer macro (and CPU.c must #define it before
   initializing targetm) (alternatively, force targetm to be const,
   and adjust the relatively few cases in which it is changed at run
   time).

4) Change all uses of targetm.xxxx into code which uses TARGETM_XXXX
   macros, as above.

5) Define the TARGETM macros as either using the target vector or
   using the initializer macro from target-def.h.  The choice would be
   made based on whether the initializer macro was defined and
   probably based on some other control.

6) Now code which uses the new inline versions of hooks.c and
   targhooks.c, and which includes target-def.h and CPU-target.h, will
   automatically use the inlined versions of the functions when
   possible, and will see constant variable definitions when possible.

The main problem that I see with this approach is the requirement to
#undef an initializer macro which is changed at run time.  That's why
I suggest the alternative of making targetm const.

We can convert to this approach over time if we require a particular
macro to be defined in order to define the TARGETM macros as using the
initializer macros rather than the target vector.  Then a backend
which has been converted to use CPU-target.h would define that macro.

If we eventually want to configure gcc to support multiple target
vectors, that would still be possible.  When more than one target
vector was to be supported, the code would force the TARGETM macros to
always use the target vector.  This would be determined at configure
time.

I considered more complex approaches, such as creating a target.def
file which defined the target vector, but the basic problem boils down
to detecting when the target does not use the default version of a
target vector field.  Inventing CPU-target.h seems as effective an
approach as any to solving this particular problem.

Any thoughts?  Does anybody think this would be a waste of time?  Does
anybody have a better approach to solving the general problem?

Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]