target macro removal (fwd)

Joseph Myers joseph@codesourcery.com
Mon Feb 23 21:24:00 GMT 2015


I sent this discussion of how one might go about large-scale target macro 
removal in response to an off-list enquiry last month, but it may be of 
more general interest.

-- 
Joseph S. Myers
joseph@codesourcery.com

---------- Forwarded message ----------
Date: Tue, 20 Jan 2015 18:04:27 +0000 (UTC)
From: Joseph Myers <joseph@codesourcery.com>
Subject: Re: target macro removal

Say you want to convert all (or nearly all) 680 (or thereabouts) target 
macros into hooks, and have several person-months to spend on this 
conversion.  (Much the same applies even if dealing with smaller subsets 
such as all target macros used in front ends.)  This won't get 
target-independence in code that no longer needs to include tm.h - in 
particular, option handling involves a global enumeration of all options 
and brings defines relating to one part of the compiler into other parts 
of the compiler (similarly, insn-* files would also need considering) - 
but it's a reasonable starting point and we can discuss further 
target-dependence removal after the target macro removal.

Although this would involve 680 conversion patches (except where it makes 
sense to convert a set of closely related target macros at once), it 
should not need to involve 680 manually-written patches.  Rather, if doing 
a large-scale target macro removal project I think a good starting point 
would be to write a set of robust Python scripts that (a) parse the 
structure of GCC source code at the preprocessor level (so understanding, 
for example, what macros are used directly in #if / #elif conditions; what 
are used indirectly through being in the expansion of another macro used 
in such a condition; and, for each macro definition, what #if conditions 
apply for that definition to be active), and (b) can carry out 
refactorings based on that understanding.  The results of such a 
refactoring may need manual editing where e.g. it's hard to get the 
scripts to get the formatting of new hooks completely right, or where the 
English wording of the documentation of the macro, converted to 
documentation for a hook, needs fine-tuning, but having refactoring 
scripts should save a lot of work with the actual conversions.

Now, such refactoring scripts do not need to handle the fully general case 
so that one script can handle converting all 680 macros.  It's quite 
reasonable to have a script that detects problems and gives up, and 
separate refactorings to make things ready for that script.  And some of 
the preparation patches might well be completely manual.

I listed the main problem cases in 
<https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02046.html>.  Code built 
for the target simply cannot use hooks, which are for the host; tests in 
#if, whether direct or indirect, again simply cannot use hooks.  The 
following approaches apply to fixing such cases to prepare for hook 
conversions; I think of these as high priorities because they open the way 
to automatic refactorings converting more hooks.

In all cases, for multi-target changes it's a good idea to test, as a 
minimum, building cc1 for all affected architectures (with 
--enable-werror=always building starting from a native compiler of the 
same version, if possible; see contrib/config-list.mk for a more thorough 
check, though I don't think the full set of targets there is needed for 
every patch), as well as a clean bootstrap and regression test for at 
least one configuration.


(a) Where a target macro is used in code built for the host and in code 
built for the target, predefine a macro with -fbuilding-libgcc and then 
make the code built for the target use that new predefine; in general I 
think such patches would be manually written rather than generated by 
refactoring scripts.  Note that:

(i) Sometimes an existing macro may suffice (e.g. LONG_LONG_TYPE_SIZE in 
target code could be changed to __SIZEOF_LONG_LONG__ * __CHAR_BIT__).

(ii) Sometimes a literal one-to-one conversion may not be cleanest (e.g. 
<https://gcc.gnu.org/wiki/Top-Level_Libgcc_Migration> lists     
GTHREAD_USE_WEAK SUPPORTS_WEAK TARGET_ATTRIBUTE_WEAK - do all really need 
separate defines?).  But if it's not clear how to do a cleaner conversion, 
it may be best to do a literal conversion and leave further cleanup to 
later.

(iii) Some source files are used both for the host and for the target.  
Consider, for example, libgcc/libgcov.h.  The code inside "#ifndef 
IN_GCOV_TOOL" is for the target, so it needs e.g. LONG_LONG_TYPE_SIZE 
converted as above (and BITS_PER_UNIT - whether BITS_PER_UNIT strictly 
counts as a target macro now is unclear, but it comes from host-side code 
so is best treated as one for target-side code).  But other code is for 
the host - or for both host and target and so needs handling accordingly.  
Much the same applies to various files under gcc/ada/ - they are built for 
both host and target.

(iv) After such a change there may still be other obstacles to converting 
to a hook (e.g. the macro may be used in #if on the host).  So there may 
be multiple incremental changes for a macro before it becomes possible to 
convert it to a hook.


(b) Where a target macro is used only in code built for the target, it can 
be moved to libgcc_tm.h (the headers in libgcc/config/).  It's not 
problematic to have target macros defined there.  I think it should be 
possible to generate such patches by refactoring scripts (or by hand - 
there aren't that many).  The scripts would need to ensure that the macros 
are defined in libgcc_tm.h under the same conditions as in the host tm.h - 
this includes making sure the header lists in libgcc/config.host 
correspond appropriately to those in gcc/config.gcc (some manual checking 
might be involved there) and that any conditionals in the gcc/config/ 
headers controlling when a macro is defined are also appropriately 
reflected in the libgcc/config/ header.  Note that:

(i) In some cases, a macro in this category might be so closely related to 
one that's also used on the host that it's better to handle both the same 
(i.e. define macros with -fbuilding-libgcc instead of moving the macro to 
libgcc_tm.h).

(ii) Some libgcc files include tm.h but not libgcc_tm.h (unwind-seh.h and 
config/cr16/unwind-cr16.c at least; maybe more).  If you change a macro 
used in such a file, you need to add the missing libgcc_tm.h includes.

(iii) Some code built for the target outside the libgcc/ directory may not 
include libgcc_tm.h - this could include e.g. libobjc and some Ada files.  
If such code uses the macro in question and can't readily be made to 
include libgcc_tm.h then it may be necessary to use the -fbuilding-libgcc 
approach instead for that code.

(iv) Sometimes a macro used for the target may have a definition that 
depends on other macros that are also used for the host (whether a 
dependency in #if conditionals, or an expansion using those other macros).

(v) When a target macro stops being used on the host it should be poisoned 
in system.h - this applies whether it was converted to a hook, moved to 
libgcc_tm.h, or eliminated from host code in any other way.


(c) Where a target macro is used in #if or #ifdef or #elif, it's a good 
idea to convert to a more restricted pattern of defining the macro to a 
default definition if not already defined, with that default going in 
defaults.h and all #if uses elsewhere being removed - such a restricted 
pattern is more amenable to automatic refactoring into a hook.  Of course, 
you need to take care not to change the semantics in the process.  If a 
macro was tested with #ifdef before, definitions to empty or 0 or 1 would 
all have had the same effect.  If you're changing the macro to be 0/1 
valued then existing definitions need to be made to define it to 1 and the 
#if tests need to change to C "if (MACRO)" tests.  Or, if you have e.g.

#ifdef MACRO
  if (MACRO (args))
    {
      lots of code
    }
#endif

then the new default definition might be "#define MACRO(args) 0".  There 
are probably lots of other cases, each of which requires understanding the 
code enough to satisfy yourself, and explain in the patch submission 
write-up, why the semantics are not changed for any target.

In some cases, the defaults already exist - just not in defaults.h.  A 
move to defaults.h is simple, but still needs checking that you don't e.g. 
have different defaults in different source files, or another source files 
using #ifdef/#ifndef on the macro in a way that would be affected by 
adding a default definition to defaults.h.

It's likely such patches are largely manually written.  Each such patch 
reduces the risk of GCC changes breaking the build for targets they 
weren't tested for, by reducing the amount of code that's conditionally 
compiled (if (0) code still gets checked for syntax, not referring to 
undeclared variables, etc., whereas #if 0 code doesn't).


(d) Some target macros are used in contexts such as enum definitions, case 
labels and array or bit-field sizes that can't readily be changed to 
hooks.  Let's ignore these for now.  These mainly relate to the RTL parts 
of the compiler and we can take it that front ends and GIMPLE optimizers 
are higher priority to wean off target macros.  A design for 
target-independence for these few macros will be harder.  It's best also 
to ignore BITS_PER_UNIT for now except for target-side code.  (It's no 
longer defined in tm.h anyway - rather, the definition is output by 
genmodes - so uses of BITS_PER_UNIT don't require you to include tm.h.)


(e) Now let's suppose you have a target macro or macros to which the above 
issues do not apply - probably hundreds right now, and the vast bulk of 
target macros after cleanups (a), (b), (c) are applied to all macros for 
which they are applicable.  You wish to do a target hook conversion.  This 
includes moving the documentation of the macro to target.def (CC me on the 
patch and say you want docstring relicensing approval), appropriately 
edited, with an @hook like going in tm.texi.in.  It includes setting a 
default hook definition (typically from one of the files such as hooks.c, 
if such a hook is available), and adding hook definitions for each target 
whose definition was not the default.  You'll need to select the prototype 
for the hook manually - but then the replacement of macro calls by 
function calls will eliminate a potential source of architecture-specific 
build failures from type differences.

(i) Some targets define their target structure at the bottom of <arch>.c.  
Others define it near the top of <arch>.c, which requires forward 
declarations of all the functions used as hooks.  Any refactoring scripts 
will need to allow for this variation.  (My view would be that all targets 
should define it at the bottom of <arch>.c, and generally topologically 
sort static functions to reduce the need for forward declarations - if you 
get consensus for that on the mailing list you could do a preliminary 
refactoring pass to move all targets to that approach, so subsequent 
refactorings don't need to deal with this issue.)

(ii) Sometimes the target macro has the same definition for all OS targets 
for an architecture.  These are the simple cases to convert.  Sometimes it 
depends on the target OS or other aspects of configuration (e.g. being 
defined in <arch>/<os>.h - or being undefined there, or being defined in 
one header based on macros defined in another).  Refactoring tools will 
need to take account of this.  Typically hooks are functions not data so 
can have conditionals, e.g. "if (IS_LINUX_TARGET) return 2; else return 
3;".  That is, you can move from a tm.h target macro visible to the whole 
compiler to an architecture-specific macro visible only within the back 
end.  While it would be desirable to eliminate such macros as well (with 
e.g. a back-end-specific target structure) I think that's another thing to 
defer and separate from the main target macro removal.

(iii) Some target macros are used not just in the compilers proper but 
also in the driver, or are used only in the driver.  ("driver" includes 
collect2 and lto-wrapper for these purposes, and front-end-specific 
drivers.)  Those used in both places can go in the existing "common" 
target structure.  Those used only in the driver would go in a new driver 
target structure.  That driver target structure would probably be defined 
in a separate C file including driver_tm.h (given the extent to which such 
macros are OS-specific); the driver/config/ refactoring, as I noted in 
<https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02046.html>, would be 
similar to the move of macros to libgcc/config/ in that the refactoring 
tool needs to take care of #if structure of tm.h headers (and in that the 
definitions may depend on other tm.h macros used outside the driver).  It 
might well make sense to defer the move for macros used in the driver, 
since they are mostly well-separated from those used elsewhere.

(iv) Sometimes the correct design for hooks is not a direct one-to-one 
conversion from macros.  If there's a group of closely-related macros, 
such as *_TYPE_SIZE or the *_TYPE macros for various typedefs that 
currently expand to strings, it's best to start by discussing on the GCC 
mailing list what the right design for corresponding hooks is.


(f) For information I attach my scripts for listing and classifying target 
macros.  Note that these have false positives (and maybe false negatives), 
as well as hardcoded paths, but they may be a helpful starting point for 
identifying target macros and where they are used.  In the semi-automated 
refactoring approach I envisage above, I'd expect an early step to be 
replacing these scripts by a rather more robust and general set of Python 
modules that deal with understanding the target macro structure of GCC 
source code.

-- 
Joseph S. Myers
joseph@codesourcery.com
-------------- next part --------------
#! /usr/bin/env bash

# Run this from the toplevel GCC source directory.  Produces a list of
# target macros: macros defined in headers in gcc/config/ or in
# defaults.h or in tm_defines in config.gcc and used outside
# gcc/config/ and gcc/common/config/ (or in explicitly listed
# target-side gcc/config/ files).  This excludes macros defined only
# in generated files, including those from config.in and from .opt
# files as well as those such as HAVE_<insn> from generator programs.
# It also excludes macros defined via makefiles.  For uses, it
# excludes uses inside gcc/config/ except for the explicitly listed
# target-side files.  Macros defined in libgcc/config/ are also
# excluded now those headers are no longer included on the host.

set -e

outdir=$HOME/gcc/target-macros
if [ "$1" ]; then
    outdir=$1
fi
script_dir=$(cd $(dirname "$0") && pwd)
process_script=$script_dir/process-source-file

# gcc/config/ headers, plus gcc/defaults.h.
config_file_list=$outdir/tmac-config-files

# Files outside gcc/config/ and gcc/common/config/ that might
# potentially use a macro from a gcc/config/ header.
non_config_file_list=$outdir/tmac-non-config-files

# List of potential target macros.
maybe_target_macro_list=$outdir/tmac-maybe-target-macros

# List of target macro uses.
target_macro_use_list=$outdir/tmac-target-macro-uses

# List of files using tm.h but not directly using target macros.
target_macro_no_use_list=$outdir/tmac-stray-tm-h

# defaults.h is included to catch some target macros with no
# definition, or only defined there by derivation from other macros
# but used in target code.Intrinsic headers are excluded as they are
# installed target code rather than providing macros saying how to
# configure the libraries or GCC.  frv-asm.h likewise.
{
    echo gcc/defaults.h
    find gcc/config -name '*.h'
} | sort | egrep -v '^(gcc/config/(.*/xm-.*\.h|.*[0-9a-z]intrin\.h|arm/arm_neon\.h|m68k/math-68881\.h|i386/cpuid\.h|i386/mm3dnow\.h|i386/cross-stdarg\.h|mips/loongson\.h|rs6000/(ppc-asm|altivec|spe|ppu_intrinsics|paired|spu2vmx|vec_types|si2vmx)\.h|alpha/va_list\.h|sh/.*shmedia\.h|spu/(spu_intrinsics|spu_internals|vmx2spu|spu_mfcio|vec_types|spu_cache)\.h|frv/frv-asm\.h))$' > $config_file_list

{
    # Empirical list of directories using tm.h.
    # .s .S .asm files excluded on the basis that tm.h has C
    # declarations not just macros so cannot be used there.
    find gcc libdecnumber libgcc libobjc \
	-name '*.c' -o -name '*.h' -o -name '*.cc' -o -name '*.def' \
	| sort | egrep -v '^gcc/(config|testsuite|common/config)/'
} | sort > $non_config_file_list

: > $non_config_file_list.new

for f in $(cat $non_config_file_list); do
    target=false
    case $f in
	(lib*)
	    target=true
	    ;;
	(gcc/coretypes.h | gcc/defaults.h)
	    ;;
	(*)
	    if egrep -q 'COPYING\.RUNTIME|if *you *link' $f; then
		target=true
	    fi
	    ;;
    esac
    if $target; then
	echo "=$f" >> $non_config_file_list.new
    else
	echo "$f" >> $non_config_file_list.new
    fi
done

mv $non_config_file_list.new $non_config_file_list

: > $maybe_target_macro_list
for f in $(cat $config_file_list); do
    $process_script $f >> $maybe_target_macro_list
done
tm_defines_list=$(grep tm_defines= gcc/config.gcc|perl -pe 's/^.*?tm_defines=//; s/\"//g; s/\$\{?tm_defines\}?//g; s/=\S*//g; s/;;//g; s/'\''//g; s/SUPPORT_\`.*//g; s/\$sh_.*//g;')
for d in $tm_defines_list; do
    echo $d >> $maybe_target_macro_list
done
sort < $maybe_target_macro_list | uniq | egrep -v '^(__int64|ALTIVEC_VECTOR_MODE|PV_FOR|RA_REGNUM|REG_AT|SP_REGNUM|UNW_FLAG_EHANDLER|UNW_LENGTH|UNW_FLAG_UHANDLER|R_LR)$' \
    > $maybe_target_macro_list.new
mv $maybe_target_macro_list.new $maybe_target_macro_list

: > $target_macro_use_list
: > $target_macro_no_use_list
for f in $(cat $non_config_file_list); do
    fname="${f#=}"
    $process_script $fname $maybe_target_macro_list \
	| sed -e "s|\$| $f|" -e "s|# | #|" > $target_macro_use_list.tmp
    cat $target_macro_use_list.tmp >> $target_macro_use_list
    if grep -q '"tm\.h"' $fname && ! [ -s $target_macro_use_list.tmp ]; then
	echo $fname >> $target_macro_no_use_list
    fi
    rm $target_macro_use_list.tmp
done
-------------- next part --------------
#! /usr/bin/perl -w

# $ARGV[0] names a source file from which target macros, or macro
# uses, are to be extracted.  If $ARGV[1] is defined, it names a file
# with a list of target macros, and uses of those macros should be
# checked for; otherwise a list of definitions should be printed.

undef $/;

$source = $ARGV[0];

if ($#ARGV >= 1) {
    $macro_list_file = $ARGV[1];
    $print_uses = 1;
} else {
    $print_uses = 0;
}

open(SOURCE, "<$source") || die("open $source: $!\n");
$contents = <SOURCE>;
close(SOURCE) || die("close $source: $!\n");

$contents = "\n$contents\n\n";
$contents =~ s/\r\n/\n/g;
$contents =~ s/\\[ \t]*\n//g;
$contents =~ s/[ \t]*\n[ \t]*/\n/g;

$left = "";
while ($contents ne "") {
    $contents =~ s/^((?:[^\/\"\']|\/(?![\/\*]))*)//s;
    $left = "$left$1";
    if ($contents =~ s/^\/\/[^\n]*\n//s) {
	$left = "$left\n";
    } elsif ($contents =~ s/^\/\*.*?\*\///s) {
	$left = "$left ";
    } elsif ($contents =~ s/^\"(?:[^\"\\\n]|\\[^\n])*\"//s) {
	$left = "$left\"\"";
    } elsif ($contents =~ s/^\'(?:[^\'\\\n]|\\[^\n])*\'//s) {
	$left = "$left\'\'";
    } elsif ($contents ne "") {
	warn "Lex error in $source\n";
	$contents = "";
    }
}

$left =~ s/[ \t]*\n[ \t*]/\n/g;
$left =~ s/[ \t]+/ /g;
$left =~ s/\n\# /\n\#/g;
$left =~ s/\n+/\n/g;

if ($print_uses) {
    open(MACROS, "<$macro_list_file") || die("open $macro_list_file: $!\n");
    $maclist_text = <MACROS>;
    close(MACROS) || die("close $macro_list_file: $!\n");
    @maclist = split(/\n/, $maclist_text);
    $left =~ s/\n\#define (\w+)\b/\n/g;
    foreach my $macro (@maclist) {
	if ($left =~ /\n#(if|elif)[^\n]*\b$macro\b/) {
	    print "$macro#\n";
	} elsif ($left =~ /\b$macro\b/) {
	    print "$macro\n";
	}
    }
} else {
    @lines = split(/\n/, $left);
    foreach my $line (@lines) {
	if ($line =~ /^\#define (\w+)\b/) {
	    print "$1\n";
	}
    }
}
-------------- next part --------------
#! /usr/bin/env bash

set -e

{
    egrep 'gcc/(ada/|cp/|fortran/|go/|java/|lto/|objc/|objcp/|c-)' tmac-target-macro-uses |grep -v spec|sed -e 's/ .*/ FrontEnd/'
    egrep 'gcc/(ada/|cp/|fortran/|go/|java/|lto/|objc/|objcp/|c-)' tmac-target-macro-uses |grep spec|sed -e 's/ .*/ FrontEndDriver/'
    egrep '=' tmac-target-macro-uses |sed -e 's/ .*/ Target/'
    egrep 'gcc/(cppspec|gcc|gccspec|collect2|collect2-aix|tlink|prefix|lto-wrapper)\.[ch]' tmac-target-macro-uses |sed -e 's/ .*/ Driver/'
    egrep 'gcc/defaults\.h' tmac-target-macro-uses |sed -e 's/ .*/ Defaults/'
    egrep -v 'gcc/(ada/|cp/|fortran/|go/|java/|lto/|objc/|objcp/|c-)|=|gcc/(cppspec|gcc|gccspec|collect2|collect2-aix|tlink|lto-wrapper)\.[ch]|gcc/defaults\.h' tmac-target-macro-uses |sed -e 's/ .*/ MiddleEnd/'
} | sort | uniq | perl -ne 'chomp; if (/^(\S*) (\S*)$/) { if (defined($type{$1})) { $type{$1} .= " $2"; } else { $type{$1} = $2; } } else { die "bad line $_\n"; } END { foreach my $k (sort keys %type) { printf "%-44s %s\n", $k, $type{$k}; } }'


More information about the Gcc mailing list