This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Multilib selection issues


This message describes problems with how GCC presently handles
multilib selection, and proposes changes (at least some hopefully to
be implemented for GCC 4.6) to fix some of those problems; please let
me know any comments on these proposals.

In summary, multilibs are selected using textual matching of options
whose logic is largely independent of that used in the compiler proper
(cc1) to determine what options are enabled when compiling.  If the
options passed when linking are not exactly those used when building a
multilib, but include an option (say a -mcpu option) that implies
other options, or one option on the command line partially overrides
another option, this can lead to inappropriate multilibs being
selected, and existing mechanisms such as MULTILIB_MATCHES are unable
to handle all cases of this reliably and effectively.  For example, if
there is a "-march=armv7-a -mfloat-abi=softfp -mfpu=neon" multilib,
and if the user instead passes the option -mcpu=cortex-a8 (which
logically should imply the above options), the multilib will not be
matched, and MULTILIB_MATCHES can only handle mapping the option to a
single one of the three options it implies and requires duplicating
all the information about such implications.

To fix this, it is proposed to share more option processing logic
between the compiler driver and the compiler proper and to base
multilib selection more closely on the actual features enabled by
options instead of the text of those options.

The present multilib selection logic
------------------------------------

A multilib consists of a collection of files (.o files, .a files,
shared libraries and headers) to be used when compiling and linking
with particular options (for example, for 32-bit compilation, or for
big-endian compilation, or for compilation with hardware floating
point).

These files are scattered across the filesystem and may be located by
GCC in several different ways.  GCC has a "multilib directory", shown
when -print-multi-directory is passed to the GCC driver along with
other compilation options, and a "multilib OS directory", shown with
-print-multi-os-directory.  Both of these are relative directory
names, appended to various other directory locations; the OS directory
is intended to be relative to a directory called "lib" and so may have
values such as "../lib64".  In addition to these values, there are the
"sysroot suffix" and "sysroot headers suffix" in sysrooted
toolchains.  The latter does not typically have a different value for
each multilib; in a toolchain with both GLIBC and uClibc multilibs,
the sysroot headers suffix would typically be set so there is one set
of GLIBC headers and one of uClibc headers.

The first two directories are determined by MULTILIB_* settings in the
target makefile fragments t-* when GCC is built; the genmultilib
script converts these settings into a header file used by the GCC
driver.  The second two directories are determined by
SYSROOT_SUFFIX_SPEC and SYSROOT_HEADERS_SUFFIX_SPEC in the target
headers.  Targets may, optionally, use one of three different
print-sysroot-suffix.sh scripts that attempt to reimplement the
genmultilib logic to produce SYSROOT_SUFFIX_SPEC settings replicating
the other multilib directory layouts.

Problems with the present logic
-------------------------------

The two sorts of multilib selection logic - for GCC multilib
directories and for sysroot suffixes - both work entirely based on the
text of command-line options, as processed by the GCC driver with
logic that is not necessarily the same as that used in cc1 to convert
options to features.  Much the same issue applies to specs used for
other purposes than multilib selection: they are based on textual
processing only with nothing to keep it in sync with how cc1 handles
options.  It is especially complicated to keep them in sync when
various configure options affect the compiler's default settings.

Those problems with multilib selection logic are to an extent
theoretical - in any particular case it might be possible to make
specs reflect cc1's logic accurately, at least as long as options
overriding each other are not present on the command line.  (There is
a limited mechanism for Negative markings in .opt files to allow some
overriding options to be pruned before multilib selection and spec
processing, but it certainly doesn't cover all relevant cases or all
ways in which options can imply or partially override others.)  There
are however cases that cannot effectively be represented.

In the present system, an option either is or is not an option
affecting multilib selection.  If it affects multilib selection, it
may be one of the options directly involved in multilib selection, or
it may be mapped to such an option with MULTILIB_MATCHES.
MULTILIB_MATCHES only allows for one option being treated as an exact
alias of another for multilib selection purposes.  This suffices for
the use in config/rs6000/t-fprules to map certain -mcpu options to
-msoft-float libraries, for example.

However, it is useful to have non-orthogonal multilib configurations
with libraries for various common configurations, in which case each
combination of multilib options that does not have its own multilib
should be mapped to some "best" compatible multilib - but the present
system does not allow a combination of multiple options (and the
absence of others) to serve as the source of a match.  It is also
useful for an option or combination to map not to one option but to
multiple ones.  A motivating example of that is that, on ARM,
-mcpu=cortex-a8 ought to imply -mfloat-abi=softfp -mfpu=neon (in
addition to -march=armv7-a), and with such implications implemented in
the compiler the option should select a NEON multilib if one is
available.  Just as multiple options as the source of a match is not
presently possible, so multiple options as the target of a match is
not possible either.

The notion of a single OS directory is problematic as well; these
directories need to be unique outside the sysroot because GCC installs
libraries in them for each multilib, but if there are multiple
sysroots each of which has libraries for multiple ABIs, names such as
../lib64 should properly be used in multiple sysroots.  There is a
workaround for this, used by config/mips/t-st and config/mips/st.h,
whereby STARTFILE_PREFIX_SPEC specifies the paths to search within the
sysroot, but this way the files are only found when GCC is searching
what it thinks is the non-multilib path.

It should be notes that there are other problems relating to multilibs
that do not relate to how a multilib is selected when compiling and
linking but to such issues as how the set of multilibs to build is
configured (and that it is fixed at compiler build time rather than it
being possible to add extra multilibs later without rebuilding the
compiler).  This proposal does not address such problems, although it
is not intended to hinder fixing any of them.

Proposed changes
----------------

Although it is possible to make local changes to genmultilib to allow
additional forms of multilib aliasing, and similarly to extend the
mechanism for matches, manually maintained lists of mappings from
options to other options and multilibs rapidly increase in size and
become unmaintainable as the number of multilibs increases.  Thus, it
is proposed to change multilib selection - and potentially related
spec handling - so that the same logic used in cc1 to determine how
the compiler is configured when compiling is also used to determine
the best multilib to use with given options.

There are two specific new structures the compiler needs to deal with
for this.  It needs a structure representing a multilib explicitly,
including the four directories used for that multilib, so replacing
the separate mechanism for sysroot suffixes with something better tied
into multilib selection.  It also needs a structure containing the
various variables derived from the command line that might be used in
multilib selection; the present global variable names would become
macros expanding to a reference to the relevant element in a
global_options variable.[1]

When it is required to select a multilib, the structures would be
computed both for the options passed to the compiler and for the
options specified for each available multilib.  Some tests would then
be applied to the resulting structures:

* Has a new option, say --multilib=, been used to say which multilib
  to use?  If so, override all the following tests and just use the
  multilib requested.

* Can any multilib be linked at all with the newly compiled code?  For
  example, if all multilibs are little-endian but big-endian code was
  requested, linking is not possible.  If there are no compatible
  multilibs, the default multilib would be used to find headers for
  compiling (it can be useful to build testcases you can't link) and
  an error would be given on linking.  If at least one compatible
  multilib remains, only the compatible multilibs are considered in
  what follows.

* Is any multilib conservatively compatible with the newly compiled
  code, so that the code in the libraries for that multilib will run
  on all systems that will run the newly compiled code?  If so,
  restrict attention to the conservatively compatible multilibs, and
  pick the best one according to target-specific scoring for how
  important matches on particular features are.

* If no multilib is conservatively compatible with the newly compiled
  code, pick the best match among all multilibs that can be linked
  with the newly compiled code.  For example, if you compile with
  -march=i386 but your only multilibs are -march=i486 and -march=i686,
  this might pick the -march=i486 multilib.  Again, this is useful for
  testcases, but also for some code hardcoding options to the compiler
  that may now be suboptimal.

Logically, this selects a single multilib that is best for the given
options.  Because of various cases that may rely on the present
behavior of searching non-multilib directories after multilib ones, as
discussed in
<http://gcc.gnu.org/ml/gcc-patches/2009-09/msg00826.html>, it is
proposed that at least initially the directories for the default
multilib would still be searched after those for any non-default
multilib selected above, but fixes could be added incrementally, as
could a split of MULTILIB_OSDIRNAMES into two settings for uses inside
and outside sysroots.  The logic described above could in fact give an
ordering of many multilibs, with some libraries being present for only
some multilibs and each multilib having its directories searched in
turn; although such searching of directories for further multilibs is
not initially proposed, feature-based selection would make it much
more straightforward to implement than it would be at present.

In the absence of target-specific logic for matching features, each
option specified in MULTILIB_OPTIONS may be mapped to the
corresponding feature, with all features being considered equally
important to match and all non-exact matches of a feature being
considered alike, while matches and non-matches of features not
controlled by an option in MULTILIB_OPTIONS would be disregarded.

This new logic is proposed to be able to replace MULTILIB_MATCHES
altogether (although transitionally it might be helpful to retain
MULTILIB_MATCHES for a while so targets can be converted one at a
time).  In some cases, such as that in config/rs6000/t-fprules, where
MULTILIB_MATCHES is used to map an option to another one it exactly
implies, no target-specific logic is needed for the replacement to
work.  In others, such as where a CPU option is mapped to a multilib
for a sufficiently similar CPU, target-specific matching code may be
needed.

It was noted above that specs have similar issues with matching option
text.  It is proposed that where an option sets a single feature
through the .opt file, specs matching that option will instead act to
match that feature.  This can be overridden with
%:option(option-name), and %:feature(feature-name) can be used to
match a feature not directly corresponding to a single option.  For
example, the setting of BE8_LINK_SPEC in config/arm/bpabi.h has the
problem of not accounting for tools where big-endian is the default,
but with the proposed changes a spec matching -mbig-endian would match
big-endian whatever the default, while a spec matching the v7-A
architecture (and any future architectures for which big-endian
implies BE8) would replace the need to list all relevant -mcpu
options.

Matching based on features runs into the existence of aliases for
groups of options that are defined specifically for the purpose of
controlling multilib selection.  For example, config/arm/vxworks.h
defines such -t aliases in CC1_SPEC.  It is proposed that a mechanism
be provided to define such aliases; each alias would define a feature,
such that if an alias is used then only multilibs matching the last
alias used are considered compatible in multilib selection.  If
however a user does not specify any such -t option, the best multilib
would be selected based on matching to the options they specify.

For multilib selection to be based on features, it is necessary that
the driver has access to feature information as computed by cc1.
There are two basic approaches to this possible:

* Have cc1 output feature information on request by the driver.  It is
  necessary to ensure that no specs required to invoke cc1 correctly
  rely on this feature information, and that -t option aliases are
  passed as is to cc1 so it has the information that an alias was used
  for use in multilib selection; spec processing might need to ask cc1
  about the settings of many features.

* Use the same logic in both the driver and cc1 to process options.
  It is necessary to ensure that all logic relevant to determining
  feature settings that are used for multilib selection or specs will
  work the same when linked into both places; for example, that all
  relevant variables are present in both places.  Code that creates
  trees or RTL or otherwise depends on features not present in the
  driver needs to execute after option processing, and not be used to
  determine feature settings.

Both these options involve per-target checks for option aliases that
need changing to a different mechanism; the first also involves checks
of spec strings for any cases where there would be problems with specs
for invoking cc1 relying on information obtained by calling cc1, while
the second involves moving each target's option processing code and
making it work in the driver as well as in cc1, as well as changing
code and specs that add options when calling cc1 to add them at an
earlier stage so the driver can process them effectively.  I prefer
the second option; there is already some code in that direction
(opts-common.c), used in the driver for pruning certain options
indicated as overridden using Negative markers in .opt files, and it
seems more beneficial to users - and to GCC developers calling cc1
directly - to have predictable, common logic for processing options.
Further reasons for this to be the logically correct approach are
discussed in Appendix 2.

Common logic means that option handling in the driver should be driven
by .opt files as it is in cc1.  This would replace the present ad hoc
handling, including macros such as SWITCH_TAKES_ARG and the code in
gcc.c for translating various long options.[2]

Footnotes
---------

[1] These structures could also help make the "optimize" attribute
implementation more reliable; the compiler could save the structure as
it is immediately after all options have been handled but final
defaults based on what options were or were not present have not been
applied, and "optimize" attributes would act like adding some options
to the end of the command line, applying those to a copy of this saved
structure before doing any defaulting.

[2] Common logic, with a more structured notion of the sequence of
options and argument passed, could also naturally be used to produce
better -frecord-gcc-switches output.

Appendix 1: Partial overriding of options
-----------------------------------------

Many of the problems with option handling, multilibs and specs relate
to cases where one option overrides another, or implies another, in
whole or in part, so it seems desirable to define general principles
for how partially overriding options are handled.  Most options should
be considered to set the value for some feature; this may be boolean
(-foption / -fno-option), enumerated (-mcpu=), string, integer or some
other type.  (The exceptions are options such as -I which logically
append to a list of values, so that having more than one of the same
option, with different arguments, makes sense.)  Of the options
directly setting the value for a feature, the last one on the command
line takes precedence.

There is a directed acyclic graph of implications among feature
settings; if a feature has not been explicitly set by an option, its
value is that implied by the "closest" other feature that was
explicitly set and is able to imply a setting for the feature not
explicitly set.  "Closest" is defined thus: if A implies a setting for
B, and B implies a setting for C, then B is closer to C than A is,
while if more than one equally close feature could imply a setting for
C, the last one on the command line takes precedence.  Finally,
default settings are applied for features whose settings are not
implied by any command-line option.

(There are some slight further complications to this model.  In
particular defaults for features not explicitly set may be functions
of the explicit or implicit values of other features, as with
configured defaults such as --with-arch-64, so defaults must be
computed in the correct order.  Simple cases of --with-arch and
--with-tune fit cleanly into the above model: architecture settings
imply tuning settings, so the --with-arch value is used unless an
explicit -march is given, while a --with-tune value is used unless
either -march or -mtune is given.  But on ARM, -mcpu logically implies
-march and -mtune, while a --with-cpu setting is presently ignored if
-march is given; it may be appropriate to treat this as a default
-mcpu setting that depends on the -march setting.  Furthermore,
implications may be dynamically computed in the driver; if
-march=native is the final -march setting, explicit or defaulted, it
may imply a series of other options.)

Appendix 2: Maps between options, features and multilibs
--------------------------------------------------------

The compiler - driver and cc1 considered as a whole - has a mapping
(f) from options (passed to the driver) to multilibs, and one (g) to
features affecting how cc1 generates code.

The problem being addressed is essentially that multilibs do not
depend in the natural, desired way on features.  That is, we want
there to exist a mapping h from features to multilibs, such that hg =
f (see commutative diagram below) *and* h is a mapping that seems
natural and desired to compiler users.

Options
  |    \
  |g    \
  |      \f
  |       \
  |        \
  V        _\|
Features--->Multilibs
         h

For the desired properties to hold reliably, we want the relation hg =
f to be how f is calculated.

Now the information about features is fully reliably available inside
cc1, so it may seem most reliable for cc1 to be where the mapping h is
computed, or for the features information to be sent from cc1 back to
the driver.  But let's look at cc1 from the point of view of the
driver.  cc1 is one of several binaries, along with as and ld, that
the driver can run.  Given the options passed to the driver, command
lines for all of these are computed.  The multilibs form one aspect of
the command line passed to ld; they also affect those for as and cc1.
Furthermore, the command line for as is affected by the same issues as
those affecting multilib selection.  When we add a -t option, it
should not be necessary to define in specs that it implies certain
options to the assembler (as in VXWORKS_ENDIAN_SPEC in arm/vxworks.h);
these should logically be determined by features as well.  (The ARM
approach of putting much of this information in the .s output has its
advantages in this regard, but also the disadvantages Richard
Sandiford has pointed out with inconsistency between the handling of
.c and .S input to GCC.)

So actually the natural approach would seem to be that features are
computed, then used to determine the command lines for each
subprocess.  cc1 is not anything special (of course many different
compiler binaries may be used for different source languages, such as
cc1plus).  To ensure consistency of features between the driver and
cc1, the natural approach in this model would be that the features
structure is passed to cc1, rather than rewriting a command line given
to the driver to pass the same options to cc1.  (The reordering
involved in how cc1's command line is generated has been a source of
bugs in the past.)  So the driver should pass some binary or textual
representation of the features structure to cc1 instead of the normal
command line, with almost all the option processing code in cc1 only
being a convenience for manual invocation and not normally being
used.  This is not part of this proposal, but a logical conclusion
from it as to how consistency could best be assured.

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]