This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
I sent this discussion of how one might go about large-scale target macro removal in response to an off-list enquiry last month, but it may be of more general interest. -- Joseph S. Myers joseph@codesourcery.com ---------- Forwarded message ---------- Date: Tue, 20 Jan 2015 18:04:27 +0000 (UTC) From: Joseph Myers <joseph@codesourcery.com> Subject: Re: target macro removal Say you want to convert all (or nearly all) 680 (or thereabouts) target macros into hooks, and have several person-months to spend on this conversion. (Much the same applies even if dealing with smaller subsets such as all target macros used in front ends.) This won't get target-independence in code that no longer needs to include tm.h - in particular, option handling involves a global enumeration of all options and brings defines relating to one part of the compiler into other parts of the compiler (similarly, insn-* files would also need considering) - but it's a reasonable starting point and we can discuss further target-dependence removal after the target macro removal. Although this would involve 680 conversion patches (except where it makes sense to convert a set of closely related target macros at once), it should not need to involve 680 manually-written patches. Rather, if doing a large-scale target macro removal project I think a good starting point would be to write a set of robust Python scripts that (a) parse the structure of GCC source code at the preprocessor level (so understanding, for example, what macros are used directly in #if / #elif conditions; what are used indirectly through being in the expansion of another macro used in such a condition; and, for each macro definition, what #if conditions apply for that definition to be active), and (b) can carry out refactorings based on that understanding. The results of such a refactoring may need manual editing where e.g. it's hard to get the scripts to get the formatting of new hooks completely right, or where the English wording of the documentation of the macro, converted to documentation for a hook, needs fine-tuning, but having refactoring scripts should save a lot of work with the actual conversions. Now, such refactoring scripts do not need to handle the fully general case so that one script can handle converting all 680 macros. It's quite reasonable to have a script that detects problems and gives up, and separate refactorings to make things ready for that script. And some of the preparation patches might well be completely manual. I listed the main problem cases in <https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02046.html>. Code built for the target simply cannot use hooks, which are for the host; tests in #if, whether direct or indirect, again simply cannot use hooks. The following approaches apply to fixing such cases to prepare for hook conversions; I think of these as high priorities because they open the way to automatic refactorings converting more hooks. In all cases, for multi-target changes it's a good idea to test, as a minimum, building cc1 for all affected architectures (with --enable-werror=always building starting from a native compiler of the same version, if possible; see contrib/config-list.mk for a more thorough check, though I don't think the full set of targets there is needed for every patch), as well as a clean bootstrap and regression test for at least one configuration. (a) Where a target macro is used in code built for the host and in code built for the target, predefine a macro with -fbuilding-libgcc and then make the code built for the target use that new predefine; in general I think such patches would be manually written rather than generated by refactoring scripts. Note that: (i) Sometimes an existing macro may suffice (e.g. LONG_LONG_TYPE_SIZE in target code could be changed to __SIZEOF_LONG_LONG__ * __CHAR_BIT__). (ii) Sometimes a literal one-to-one conversion may not be cleanest (e.g. <https://gcc.gnu.org/wiki/Top-Level_Libgcc_Migration> lists GTHREAD_USE_WEAK SUPPORTS_WEAK TARGET_ATTRIBUTE_WEAK - do all really need separate defines?). But if it's not clear how to do a cleaner conversion, it may be best to do a literal conversion and leave further cleanup to later. (iii) Some source files are used both for the host and for the target. Consider, for example, libgcc/libgcov.h. The code inside "#ifndef IN_GCOV_TOOL" is for the target, so it needs e.g. LONG_LONG_TYPE_SIZE converted as above (and BITS_PER_UNIT - whether BITS_PER_UNIT strictly counts as a target macro now is unclear, but it comes from host-side code so is best treated as one for target-side code). But other code is for the host - or for both host and target and so needs handling accordingly. Much the same applies to various files under gcc/ada/ - they are built for both host and target. (iv) After such a change there may still be other obstacles to converting to a hook (e.g. the macro may be used in #if on the host). So there may be multiple incremental changes for a macro before it becomes possible to convert it to a hook. (b) Where a target macro is used only in code built for the target, it can be moved to libgcc_tm.h (the headers in libgcc/config/). It's not problematic to have target macros defined there. I think it should be possible to generate such patches by refactoring scripts (or by hand - there aren't that many). The scripts would need to ensure that the macros are defined in libgcc_tm.h under the same conditions as in the host tm.h - this includes making sure the header lists in libgcc/config.host correspond appropriately to those in gcc/config.gcc (some manual checking might be involved there) and that any conditionals in the gcc/config/ headers controlling when a macro is defined are also appropriately reflected in the libgcc/config/ header. Note that: (i) In some cases, a macro in this category might be so closely related to one that's also used on the host that it's better to handle both the same (i.e. define macros with -fbuilding-libgcc instead of moving the macro to libgcc_tm.h). (ii) Some libgcc files include tm.h but not libgcc_tm.h (unwind-seh.h and config/cr16/unwind-cr16.c at least; maybe more). If you change a macro used in such a file, you need to add the missing libgcc_tm.h includes. (iii) Some code built for the target outside the libgcc/ directory may not include libgcc_tm.h - this could include e.g. libobjc and some Ada files. If such code uses the macro in question and can't readily be made to include libgcc_tm.h then it may be necessary to use the -fbuilding-libgcc approach instead for that code. (iv) Sometimes a macro used for the target may have a definition that depends on other macros that are also used for the host (whether a dependency in #if conditionals, or an expansion using those other macros). (v) When a target macro stops being used on the host it should be poisoned in system.h - this applies whether it was converted to a hook, moved to libgcc_tm.h, or eliminated from host code in any other way. (c) Where a target macro is used in #if or #ifdef or #elif, it's a good idea to convert to a more restricted pattern of defining the macro to a default definition if not already defined, with that default going in defaults.h and all #if uses elsewhere being removed - such a restricted pattern is more amenable to automatic refactoring into a hook. Of course, you need to take care not to change the semantics in the process. If a macro was tested with #ifdef before, definitions to empty or 0 or 1 would all have had the same effect. If you're changing the macro to be 0/1 valued then existing definitions need to be made to define it to 1 and the #if tests need to change to C "if (MACRO)" tests. Or, if you have e.g. #ifdef MACRO if (MACRO (args)) { lots of code } #endif then the new default definition might be "#define MACRO(args) 0". There are probably lots of other cases, each of which requires understanding the code enough to satisfy yourself, and explain in the patch submission write-up, why the semantics are not changed for any target. In some cases, the defaults already exist - just not in defaults.h. A move to defaults.h is simple, but still needs checking that you don't e.g. have different defaults in different source files, or another source files using #ifdef/#ifndef on the macro in a way that would be affected by adding a default definition to defaults.h. It's likely such patches are largely manually written. Each such patch reduces the risk of GCC changes breaking the build for targets they weren't tested for, by reducing the amount of code that's conditionally compiled (if (0) code still gets checked for syntax, not referring to undeclared variables, etc., whereas #if 0 code doesn't). (d) Some target macros are used in contexts such as enum definitions, case labels and array or bit-field sizes that can't readily be changed to hooks. Let's ignore these for now. These mainly relate to the RTL parts of the compiler and we can take it that front ends and GIMPLE optimizers are higher priority to wean off target macros. A design for target-independence for these few macros will be harder. It's best also to ignore BITS_PER_UNIT for now except for target-side code. (It's no longer defined in tm.h anyway - rather, the definition is output by genmodes - so uses of BITS_PER_UNIT don't require you to include tm.h.) (e) Now let's suppose you have a target macro or macros to which the above issues do not apply - probably hundreds right now, and the vast bulk of target macros after cleanups (a), (b), (c) are applied to all macros for which they are applicable. You wish to do a target hook conversion. This includes moving the documentation of the macro to target.def (CC me on the patch and say you want docstring relicensing approval), appropriately edited, with an @hook like going in tm.texi.in. It includes setting a default hook definition (typically from one of the files such as hooks.c, if such a hook is available), and adding hook definitions for each target whose definition was not the default. You'll need to select the prototype for the hook manually - but then the replacement of macro calls by function calls will eliminate a potential source of architecture-specific build failures from type differences. (i) Some targets define their target structure at the bottom of <arch>.c. Others define it near the top of <arch>.c, which requires forward declarations of all the functions used as hooks. Any refactoring scripts will need to allow for this variation. (My view would be that all targets should define it at the bottom of <arch>.c, and generally topologically sort static functions to reduce the need for forward declarations - if you get consensus for that on the mailing list you could do a preliminary refactoring pass to move all targets to that approach, so subsequent refactorings don't need to deal with this issue.) (ii) Sometimes the target macro has the same definition for all OS targets for an architecture. These are the simple cases to convert. Sometimes it depends on the target OS or other aspects of configuration (e.g. being defined in <arch>/<os>.h - or being undefined there, or being defined in one header based on macros defined in another). Refactoring tools will need to take account of this. Typically hooks are functions not data so can have conditionals, e.g. "if (IS_LINUX_TARGET) return 2; else return 3;". That is, you can move from a tm.h target macro visible to the whole compiler to an architecture-specific macro visible only within the back end. While it would be desirable to eliminate such macros as well (with e.g. a back-end-specific target structure) I think that's another thing to defer and separate from the main target macro removal. (iii) Some target macros are used not just in the compilers proper but also in the driver, or are used only in the driver. ("driver" includes collect2 and lto-wrapper for these purposes, and front-end-specific drivers.) Those used in both places can go in the existing "common" target structure. Those used only in the driver would go in a new driver target structure. That driver target structure would probably be defined in a separate C file including driver_tm.h (given the extent to which such macros are OS-specific); the driver/config/ refactoring, as I noted in <https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02046.html>, would be similar to the move of macros to libgcc/config/ in that the refactoring tool needs to take care of #if structure of tm.h headers (and in that the definitions may depend on other tm.h macros used outside the driver). It might well make sense to defer the move for macros used in the driver, since they are mostly well-separated from those used elsewhere. (iv) Sometimes the correct design for hooks is not a direct one-to-one conversion from macros. If there's a group of closely-related macros, such as *_TYPE_SIZE or the *_TYPE macros for various typedefs that currently expand to strings, it's best to start by discussing on the GCC mailing list what the right design for corresponding hooks is. (f) For information I attach my scripts for listing and classifying target macros. Note that these have false positives (and maybe false negatives), as well as hardcoded paths, but they may be a helpful starting point for identifying target macros and where they are used. In the semi-automated refactoring approach I envisage above, I'd expect an early step to be replacing these scripts by a rather more robust and general set of Python modules that deal with understanding the target macro structure of GCC source code. -- Joseph S. Myers joseph@codesourcery.com
Attachment:
list-target-macros
Description: Text document
Attachment:
process-source-file
Description: Text document
Attachment:
classify-target-macros
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |