This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFA: Fix other/44566

From: Joern Rennecke <amylaar at spamcop dot net>
To: "Joseph S. Myers" <joseph at codesourcery dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Sat, 26 Jun 2010 16:37:01 -0400
Subject: Re: RFA: Fix other/44566
References: <20100626061802.45a3nzzz4088o840-nzlynne@webmail.spamcop.net> <Pine.LNX.4.64.1006261615410.30463@digraph.polyomino.org.uk>

Quoting "Joseph S. Myers" <joseph@codesourcery.com>:

I should point out that the model I was thinking of in that analysis (of
the features supported by the multi-target toolchain) was somewhat
different from yours; the two overlap in what changes are needed, but each
model involves significant changes not needed by the other model.  Thus,
for several issues I identify, in your model they may well be less of an
issue because they relate to things you explicitly do not expect to work.
Nevertheless, I think they are relevant to understanding what the
limitations of any proposed approach are.


I had a limited set of requirements, but I think my approach can be readily
extended to cover your requirements as well.

Instead of having just one main target and a set of secondary targets, we'd
have a set of targets, some or all of which would be eligible as main
targets, and some or none of which would be eligible as secondary targets -
and not every secondary target would be necessarily compatible with
all the main targets.

All the !EXTRA_TARGET code (outside of the Makefile) would probably have
to be hookized or dispatched to or something similar.

The model I has was that all the targets supported by the compiler are
essentially equal.  It would be possible to run the compiler - or other
components such as the assembler - with a --target= option to select a
particular target, and the results would be identical to those of a
single-target compiler for that target.


Yes, that's also a useful goal.  Of course, during compilation of a specific
module, you would have one target which is in charge - in your model, it
would be the only one active.

One thing you haven't mentioned that I'd need to do to accomodate this
would be to remove the special significance of targetm_array[0] - instead
there would have to be a variable that is an index in targetm_array to
indicate the currently active main target.

Some method would be added for
configure options to be specified to apply to a single target only.

Sounds sensible.

(Your
patch appears to discard all configure options when running sub-configures
for secondary targets, which doesn't seem right either.)


Well, I just had no need for options passed to sub-configures.  So although
the generated Makefile has the gcc_config_arguments variable, I ended up
not making any use of it.
It's just a matter of defining a syntax and implementing it.

 Thus each target
could have its own search paths for libraries and headers, for example,
and its own set of multilibs.  Runtime libraries would be configured (for
each multilib, for each target) with the appropriate --host for that
target, and when the testsuite is run DejaGnu would be told it was run for
one of the targets involved, so hardcoded target triplet names in those
places would not need to change.


Sounds sensible.  I think that'll be mostly a matter of adding some more
stanzas to EXTRA_TARGET_RULES.

Can the testsuite be run at all for
secondary targets with your implementation?


No, I didn't add a command line option to switch the target.  And although
basic libraries (basically, libgcc) for all targets were planned I didn't get
around to implement it, and besides, program startup was supposed to be
strictly controlled from the main target.
The original work done for Milepost targeted the ARCompact / mxp
architectures; IIRC the mxp had something like 32K words of instruction
memory, so you wouldn't want to put too much there anyway.

(When I refer to target macros, macros in config.in that are defined by
configure, and might be defined differently for different targets, are
subject to exactly the same issues of needing to depend on the target and
needing to identify which files use them.)


Well, they already depend on the target, as configure gets run in the various
EXTRA_TARGETS subdirectories.  But you do have to make sure they are only
used in places that are bracketed bewteen START_TARGET_SPECIFIC and
END_TARGET_SPECIFIC.
And I agree that getting these regions to file granularity is likely to
make maintenance easier.

We can use grep to get a list of all tm.h and config.in defines, and feed that into another grep to find affected files - but than it'll get tedious, deciding which macros we can ignore because we don't really require any of the targets that use them in a multi-target configuration (document these targets), and fix up all the remaining files.

Changes I expected to be needed
-------------------------------

* Completing the move of every last target macro into the target
structure.  The intermediate conversion step of functions in targhooks.c
that call target macros is not enough.


By having targhooks.c compiled multiple times in different namespaces,
keeping the old macros generally suits is fine; we can change the way
the target macros are used without having to patch every single hook in
for every single target.

* Likewise in some way for all the target macros specific to a given CPU
target (e.g. those local to config/rs6000).

* All target macros used in the driver gcc.c (etc.) must move into a
separate target structure (that doesn't presently exist) in the driver
program.  (This covers many spec strings, for example.)  Other programs
such as collect2 also need their own target structures.


We could also do the multile-namespaces trick with gcc.c, although it
will probably be cleaner to factor out the target dependencies; we can just
move them to a new file a' la targhooks.c , to convert target macros to
driver target hooks / driver target vector data items, without having to
patch every target; of course, targets can then start to define the driver
target vector components instead of the old target macros if we see any
benefit in that.

* libgcc must stop using target macros from tm.h;


I don't see why this is necessary, as long as you make sure each target's
libgcc sees the right tm.h .  I would think that will come automatic from
the build directory tree structure.

* Generated data and functions used in the compiler proper must move into
the target structure.  This includes the gen_* functions for individual
instructions, for which there are very many calls hardcoded throughout the
compiler: some sort of renaming is needed to allow multiple targets which
get different generated function implementations to be used.


I've put the optab table in the target vector.
Any code that uses directly gen_* rtx functions needs to be marked target
specific, or weaned from the direct gen_* use.

* There must be configure syntax to make all the configure options depend
on the particular target for which the compiler is run, instead of being
global.

Yes, for a set of equal targets, that is needed. It was a complete non-issue for ARCompact/mxp because the mxp compiler is really not very useful on its own because of the severe limitations on the code and data sizes, and both targets were maintained by me anyway. They lived in different config directories because they were just way too different (CISCy RISC vs. VLIW, small general-purpose register file with hand-tuned register classes vs. large vector-only register file with vector CC0-like flags and vector accumulator and thousands of auto-generated register classes).

Once a configure syntax is agreed, a few lines of lispy GNU make stanzas
will probably take care of this.

* Similar issues arise elsewhere.  The assembler has no beginnings of a
target structure like that in GCC.


That was also a non-issue for ARCompact/mxp because an assembler
existed which supported both targets.  Well, sort of, the support wasn't
quite what we wanted it to be, but it was a unified assembler from the
start.

I did anticipate that some target combinations might have problems with that,
which is why the documentation for the hook to switch output targets says
that it may switch output files.
So if your could write your Makefiles so that it expects multiple *.s files
which have to be compiled with different assemblers, that'll work.
Or if the Makefile rewriting pain is too much, you might have an assembler
wrapper script that takes a single multi-target assembly file, separate
it into single-target *.s files, run the individual assemblers, packages the
results up again into combined 'object' files, and do a similar trick with
the linker, possibly also involving objcopy to get the secondary target
partially linked files in appropriate sections of the executable.

Of course, if you can find time/manpower to clean up / rewrite gas to
handle this all nicely internally, more power to you.

Differences with your approach
------------------------------

With your approach, you define some specific areas where secondary targets
will not behave like primary targets.  Thus header search paths do not
need defining for secondary targets, since headers and predefined macros
will always be based on the primary target, and secondary targets are not
expected to get their own type sizes.  You would also appear to have the
driver's behavior based on the primary target only.


Yet, but that's only the implementation, not a constraint dictated by the
basic design, and it's a direct result of the requirements.
We wanted (and succeeded at making) machine-learning decide when to
move a loop from the main target to a secondary target so that its
auto-vectorization on the secondary target would save more time than the
overhead of moving the data and control around cost.

Changing the driver to support a set of multiple alternative main targets
can follow the same pattern as libbackend.a of using a target vector and
target hooks file.

I would however expect functions built for secondary targets still to need
libgcc for some purposes, so you would need to work out how it is to be
built, where it is to be installed and how it is to be found at runtime
(for each secondary target multilib) ... with additional complications
relating to functions for different targets being combined in a single
object.


It depends on what transformations you allow if libgcc is actually needed.
For the milepost project we avoided the issue by only shifting code to the
mxp that didn't need libgcc.  Well, a lot of libgcc functions, when considered
with their dependencies, wouldn't have fit on the hardware anyway.

But in general, I'd agree that it would be desirable that libgcc can be build
for every target.
And of course it is a requirement for a target that is supposed to be able
to become a main target, like in your model all are.

I noted the issues of target macros used within a config/$arch directory
to control compilation of code within that directory.  Do you treat every
source file in such a directory as target-specific, so they are built
multiple times if there are multiple targets for that architecture?  (For
example, Power targets that do not include e500.h have e500 support
disabled at compile time.  If configured with target powerpc-linux and
extra target powerpc-eabispe, would two copies of the config/rs6000 files
be built, and two copies of all the generated insn-* files, the copies for
powerpc-linux having e500 support disabled and those for powerpc-eabispe
having it present?)

Yes.

In general, you'd be expected to have one config directory supply only
one target, for efficiency reasons if for nothing else.

But in principle, you can combine target triplets that map to the same
config dir, and they will map to different namespaces, according to the
target triples (or pairs, or quadruplets) that you supply to configure.
The main target gets the global namespace, while extra targets get a
namespace named after the configure target name.

E.g. if you configure with --target=powerpc64-hurd
--with-extra-targets-list='powerpc-linux powerpc-eabispe',
powerpc64-hurd code will live in the global namespace,  powerpc-linux will
live in the powerpc_linux namespace, and powerpc-eabispe code will live in
the powerpc_eabispe namespace.

FWIW I didn't use the canonicalized names because I though that would make
mangled identifiers unnecessarily long.  But of course you could also play
games with this by making a target appear multiple times with different
non-canonical names.  And the name is also part of the target vector, so
you could make them behave slightly differently, e.g. have different
settings in OPTIMIZATION_OPTIONS.  The name of the target for target-specific
code is this_targetm.name, whereas the dynammically chosen target is
targetm_pnt->name.  In most files targetm.name is the same as the latter,
but in tm.c it is the same as the former.

By building files multiple times you avoid dealing with *some* target
macros - but my claim is that it needs to be a well-defined set of files
that are built multiple times, and all macros used outside that set of
files either need to be defined not to matter for clear reasons, or need
to be converted to hooks.


Yes, but that hook conversion can take the form of creating a new file that
is built multiple times and uses the existing amcros to provide the hooks
for each target.

I noted issues with the intermediate conversion of macros to hooks via
default definitions in targhooks.c that call a macro for unconverted
targets.  Your patch places a lot of conditionals / target-specific
markers in targhooks.c.  It might be significantly simplified if you
completed the transition for all the partly-transitioned macros....


Well, we could just separate the target-specific and the non-target specific
hooks to reduce the number of these markers.  Or we could change non-target
specific hooks to pro-forma target specific hooks - there are a number that
just return a constant.

I mentioned command-line options.  This is an area where I would think
your approach makes things much *more* complicated since you need to deal
with option settings on a per-target basis (different targets change the
settings of target-independent optimization flags in different ways, for
example, even if you don't actually allow command-line options to be
marked as applying to an individual target or allow "target" attributes to
be used in conjunction with your per-architecture functions.


Yes, part of the problem is that it's so hard to define it properly.
E.g. a user might want to have delay branch scheduling for one target
but not the other.  OTOH (s)he might not even think of the possibility
and get exasperated when -f{no-,}delayed-branch won't work on all
functions as expected.
Conceptionally it would be simpler if all options are passed to all targets,
but all options variables are target-specific, so that one target doesn't see
the other's change.
However, that wouldn't make semantic sense.  E.g. the SH defaults to
flag_finite_math_only (because the hard FP comparison instruction set was
designed so that it's expensive to do -fno-flag_finite_math_only), while
most other targets default to -fflag_finite_math_only.
If your main target is an i386 and you have sh4 as a secondary target, you
wouldn't want optimizations to randomly disregard the finer points of
NaNs in comparisons.
So at least the flags that potentially change program behaviour must be
global, since they represent the user's choices in the context of the
selected main target.

As you will have seen I've been working on option handling machinery
lately with a view to changes to how multilib selection works.  Your patch
seems to insert a lot of conditionals in option-handling code (and
inserting a lot of conditionals in *any* file generally seems a bad idea).


On the plus side, the changes to override_options are only needed on targets
that are expected to function as seconadry targets.  I mainly used the SH as
secondary target in early tests because it was easy for me to hack; I'm not
sure if anyone would really want to use it this way.
Targets that would be natural targets as secondary targets are mxp in
ARCompact/mxp, spu in Cell, and GPUs in CPU/GPU combinations.

Part of the model I envisage implementing is that option-handling hooks
will set not global variables but elements of an options structure and
that the compiler can contain multiple such structures.  I'd think that
multi-target option handling should be a lot easier after those changes
are in place, since you would have separate structures (including both
target-independent and target-dependent options) for each target and hooks
could set target-independent options in a target-specific copy of the
structure.


This still doesn't solve the problem that the program should have one set
of semantics, irrespective of what optimizers decide to do.

What would be helpful would be if your project could help us to get a better grasp on what options are actually tuning code generation for a single target (even though nominally target-independent), which ones represent a speed/size/compile time/debug precision tradeoff choice that the user wants to make on all targets, and which ones actually influence the semantics of the program.

 (Properly, you'd then change all references to global options
to explicitly look at current-function options instead; the quick hack
would be to swap options structures when changing target architecture.)


No, then you'd swap semantics.  That might be acceptable if the user
selected the target on the command line, but not if a function was moved
to a different target by the optimizer.

Follow-Ups:
- Re: RFA: Fix other/44566
  - From: Joern Rennecke
- Re: RFA: Fix other/44566
  - From: Joseph S. Myers

References:
- RFA: Fix other/44566
  - From: Joern Rennecke
- Re: RFA: Fix other/44566
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]