Google Summer of Code 2016 Project
Replace libiberty with gnulib
Student: Ayush Goel (Email: ayushgoel1610 google com)
Mentor: Manuel López-Ibáñez
Link to project summary: https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01551.html
Contents
Synopsis
Libiberty library is a collection of subroutines used by various GNU programs, gcc, binutils to name a few. Gnulib is a central location for common GNU code, intended to be shared among GNU packages. For projects using Libiberty, most of that functionality can be provided by Gnulib and therefore there is no real advantage to the GNU project to have two libraries for substantially the same purpose. This project involves replacing the libiberty library with gnulib within the scope of GCC.
Details about the project
1. Libiberty and Gnulib both take a different approach from other libraries. Their components are intended to be shared at the source level, rather than being built, installed and linked against. The idea is to copy these files into the source tree.
2. GCC leverages a lot of functionality provided by libiberty. The GCC project has the libiberty source code inside its source tree. The first thing that needs to be done is to setup gnulib inside the gcc source tree. Since gnulib components are intended to be shared at the source level, this involves importing the gnulib source tree inside the gcc source tree. This setup will be somewhat similar to how gdb is using gnulib. Gdb has imported only those files from gnulib which it depends on. To be precise, it using 74 files from the gnulib source directory, which it has placed inside its own directory.
3. Another important aspect of having this gnulib codependency is to be able to update these set of files from time to time. This can be done using the gnulibtool -import command. Once updated, these files will need to be built and installed. This entire setup can be automated using a bash script, once you know which set of files need to be imported.
4. Replacing libiberty with gnulib is going to be an iterative process. I will start with identifying small and mostly self contained gnulib modules which can be used to replace as few functions from libiberty as possible. Preferably something that won't require me to modify libiberty nor gnulib, but simply replace one header with another, in gcc. Another way to go about this will be to delete just one function from libiberty and replace it with exactly one module from gnulib. In order to find these small functions which can be replaced, one step at a time, I would be going through the list of libiberty functions that gcc uses and see which of them can be replaced by a single module of gnulib. And proceed in a similar manner. Once these functions have been figured out, I will modify gcc in such a way that it will first find the gnulib version of these files. The functions in libiberty which aren't available in gnulib can be put in a header file libiberty-gcc.h.
5. Another approach can be to only include libiberty on those GCC files which actually need it. And therefore when a file doesn't use a function provided by libiberty, it can simply stop using libiberty.h or any other sub-header.
6. After replacing a few functions, or maybe even a file from libiberty with gnulib, I would make sure that the gcc test suite runs fine and test my changes with as many host and target combinations as is practical. Wherever possible, I will automate the tests to cover the changes and add them to the GCC test-suite. I will also perform regression tests to ensure that my change does not break anything else. Typically, this would mean comparing post-change test results to pre-change results.
Overview of the Project
These are the high level steps to leverage gnulib source library.
- Differential analysis on libiberty and gnulib to figure out common code which leverage the same functionality
- Import these modules from gnulib source library into gcc using gnulib-tool
- Hook gnulib into Autoconf
- Hook gnulib into Automake
- Ensure that proper include statements are used in gcc source files, to replace the dependence on libiberty for functions which are now being used from gnulib
Steps (Progress)
- Importing Gnulib files
(Note that even though this is not the first step in the above mentioned overview, this seemed like an important starting point of the project. This step is about setting up the import process with placeholders in place of module names as the exact set of files to be imported will be figured out in the subsequent steps.)
- In order to replace the libiberty library for gnulib, it is necessary to import the source code files from gnulib into the gcc tree. Gnulib provides us with the gnulib-tool for importing the gnulib modules. There are two ways the gnulib clients do this -
- Most of the gnulib clients directly move the gnulib source files into the directory containing their own local source files using the gnulib files. All the required changes to configuration and makefile are also made to their main configuration file in the source directory .However this convention is usually followed by the smaller projects.
- Bigger projects like gdb have a separate directory altogether marked for gnulib source files. There is also a script to timely update these set of gnulib source files. The required configuration and make files are also present in this directory itself.
- Since gcc is an extremely large package, I decided to go with the convention followed by gdb to import gnulib modules. I have created a gnulib directory in the root directory along with other host/build libraries. This will contain all the source files from gnulibs (once that is finalized). It also contains a script to update this gnulib source files time and again using the gnulib-tool and the entire gnulib tree.
- Following this step, gnulib was added as a build and host library and all-gcc was made to depend on gnulib.
- All the previously mentioned changes successfully built gnulib in the build directory. Now in order to make use of this library, gnulib headers and static compiled library (libgnu.a) was added to gcc/Makefile.in
- All the above changes have been already submitted as a patch.
- Once all these changes are done, gcc is ready to use any function that can be leveraged from gnulib. In order to use any function from gnulib or replace an existing function with it's corresponding function in gnulib, the following steps are required:
- Import the function from gnulib. In order to do this, edit the update-gnulib.sh file inside the gnulib directory and enter the corresponding module to be imported under the IMPORTED_GNULIB_MODULES.
- Once this is done, run the script (./update-gnulib.sh) which will import the module and create the configure and makefiles.
- Now in order to make sure that gcc uses gnulib's version of the function and not someone else's, you'll have to replace the path for the corresponding header file in gcc/Makefile.in so that it points to gnulib's version's header file.
- Following this, you'll have to build and test to see that gcc correctly uses gnulib's version of the function.
- For reference, these are two patches, replacing fnmatch and md5 functions.
For more details regarding the above patches, kindly refer to the PATCHES EXPLAINED section.
Differential Analysis of libiberty and gnulib
The idea is to traverse through the functions leveraged by libiberty and see which one of them can be replaced by corresponding functions in gnulib. However it is not a simple syntax based differential analysis. It is important to very carefully find functions which are syntactically similar and also semantically similar, that is a notion of functional equivalence between the two snippets of code.
Once this is done, gcc will be compiled with these new files and tested for. I'll be listing functions which are similar from both the libraries in the following table
Similar functions between libiberty and gnulib
Libiberty Function/File |
Gnulib Function/File |
Summary |
bcopy |
bcopy |
Not used by GCC |
obstack |
obstack |
Used. Memory Management GNU extension to the C standard library |
fnmatch |
fnmatch |
Manuel Lopez: in both OSX and GNU/Linux, fnmatch is provided by the GNU libc already, so the copy in libiberty is not used in your systems, thus you have no way of testing it. |
xasprintf |
xasprintf |
|
memmem.c |
memmem.c |
Not used by GCC ? Imported from gnulib on 2009. Perhaps diverged?? Provided by glibc (therefore can't be tested) |
sha1.c |
sha1.c |
Used only by Go FE Imported from gnulib Perhaps diverged?? |
crc32.c |
Not used by GCC ?? |
|
alloca.c |
alloca.c |
|
asprintf.c |
asprintf.c |
Provided by glibc (therefore can't be tested) |
atexit.c |
atexit.c |
Provided by glibc (therefore can't be tested) |
basename.c |
basename.c |
Provided by glibc (therefore can't be tested) |
calloc.c |
calloc.c |
|
copysign.c |
copysign.c |
|
ffs.c |
ffs.c |
Provided by glibc (therefore can't be tested) |
getcwd.c |
getcwd.c |
Provided by glibc (therefore can't be tested) |
getopt.c |
getopt.c |
Not used by gcc (Manuel: This is wrong, gentype.c, gcov-tool.c, and cp-demangle.c do use getopt or getopt_long). Provided by glibc (therefore can't be tested) |
getpagesize.c |
getpagesize.c |
Provided by glibc (therefore can't be tested) |
gettimeofday.c |
gettimeofday.c |
Provided by glibc (therefore can't be tested) |
md5.c |
md5.c |
used?? Functions to compute MD5 message digest of files or memory blocks according to the definition of MD5 in RFC 1321 from April 1992. |
memchr.c |
memchr.c |
Provided by glibc (therefore can't be tested) |
memcmp.c |
memcmp.c |
Provided by glibc (therefore can't be tested) |
memcpy.c |
memcpy.c |
Provided by glibc (therefore can't be tested) |
memmove.c |
memmove.c |
Provided by glibc (therefore can't be tested) |
mempcpy.c |
mempcpy.c |
Not used by GCC. Provided by glibc (therefore can't be tested) |
memset.c |
memset.c |
Provided by glibc (therefore can't be tested) |
mkstemps.c |
mkstemps.c |
Provided by glibc (therefore can't be tested) |
physmem.c |
physmem.c |
|
putenv.c |
putenv.c |
|
random.c |
random.c |
Not used by GCC |
regex.c |
regex.c |
|
rename.c |
rename.c |
Provided by glibc (therefore can't be tested) |
setenv.c |
setenv.c |
Provided by glibc (therefore can't be tested) |
snprintf.c |
snprintf.c |
Provided by glibc (therefore can't be tested) |
stpcpy.c |
stpcpy.c |
Provided by glibc (therefore can't be tested) |
stpncpy.c |
stpncpy.c |
Not used by GCC. Provided by glibc (therefore can't be tested) |
strcasecmp.c |
strcasecmp.c |
Provided by glibc (therefore can't be tested) |
strdup.c |
strdup.c |
Not used by GCC (Manuel: ./lto-plugin/lto-symtab.c) |
strerror.c |
strerror.c |
Provided by glibc (therefore can't be tested) |
strncasecmp.c |
strncasecmp.c |
|
strndup.c |
strndup.c |
Not used by GCC |
strnlen.c |
strnlen.c |
Used |
strsignal.c |
strsignal.c |
Used |
strstr.c |
strstr.c |
Used |
strtod.c |
strtod.c |
used |
strtol.c |
strtol.c |
|
strtoll.c |
strtoll.c |
|
strtoul.c |
strtoul.c |
|
strtoull.c |
strtoull.c |
|
strverscmp.c |
strverscmp.c |
|
vasprintf.c |
vasprintf.c |
|
vfprintf.c |
vfprintf.c |
used |
vprintf.c |
vprintf.c |
used |
vsnprintf.c |
vsnprintf.c |
used |
vsprintf.c |
vsprintf.c |
Not used |
waitpid.c |
waitpid.c |
Provided by glibc (therefore can't be tested) |
xmalloc.c |
xmalloc.c |
used |
xstrndup.c |
xstrndup.c |
used |
xvasprintf.c |
xvasprintf.c |
used |
dupargv (argv.c) |
Not available |
Not used |
buildargv (argv.c) |
Not available |
Used (Manuel: where?) |
writeargv, expandargv, countargv (argv.c) |
Not available |
used |
bsearch.c |
Not available |
Used. Provided by glibc (therefore can't be tested) |
bcmp.c |
NA |
Not used |
bzero.c |
NA |
used. Provided by glibc (therefore can't be tested) |
choose-temp.c |
NA |
Not used |
clock.c |
NA |
used. Provided by glibc (therefore can't be tested) |
concat.c |
NA |
used |
dyn-string.c |
NA |
not used |
fdmatch.c |
NA |
not used |
fibheap.c |
NA |
Not used |
filename_cmp.c |
NA |
used |
floatformat.c |
NA |
Not used |
fopen_unlocked.c |
NA |
Not used |
getpwd.c |
NA |
used |
getruntime.c |
NA |
Not used |
hashtab.c |
NA |
used |
hex.c |
NA |
used |
index.c |
NA |
used |
insque.c |
NA |
Not used |
lbasename.c |
NA |
used |
lrealpath.c |
NA |
used |
make-relative-prefix.c |
NA |
Not used |
make-temp-file.c |
NA |
Not used |
msdos.c |
NA |
Not used |
objalloc.c |
NA |
Not used |
partition.c |
NA |
used |
pex-common.c |
NA |
Not used |
pex-djgpp.c |
NA |
Not used |
pexecute.c |
NA |
Not used |
pex-msdos.c |
NA |
Not used |
pex-one.c |
NA |
Not used |
pex-unix.c |
NA |
Not used |
pex-win32.c |
NA |
Not used |
rindex.c |
NA |
Not used |
safe-ctype.c |
NA |
Not used |
setproctitle.c |
NA |
Not used |
sigsetmask.c |
sigsetmask.c |
used |
simple-object.c |
NA |
Not used |
simple-object-coff.c |
available |
used |
simple-object-common.h |
NA |
Not used |
simple-object-elf.c |
NA |
Not used |
simple-object-mach-o.c |
NA |
Not used |
simple-object.txh |
NA |
Not used |
simple-object-xcoff.c |
NA |
Not used |
sort.c |
NA |
used |
spaces.c |
NA |
used |
splay-tree.c |
NA |
Not used (Manuel: It is actually used ./gcc/ipa-utils.c |
stack-limit.c |
NA |
Not used (Manuel: actually used ./gcc/toplev.c |
strchr.c |
NA |
Provided by glibc (therefore can't be tested) |
strncmp.c |
NA |
used |
strrchr.c |
NA |
used |
timeval-utils.c |
NA |
Not used |
tmpnam.c |
tmpnam.c |
used |
unlink-if-ordinary.c |
NA |
Not used |
vfork.c |
Available |
used |
xatexit.c |
NA |
used |
xexit.c |
NA |
used |
xmemdup.c |
NA |
used |
xstrdup.c |
Available |
used |
xstrerror.c |
NA |
used |
- The first function to be replaced from libiberty was obstack. Obstack is a memory-management GNU extension to the C standard library.An "obstack" is a "stack" of "objects" (data items) which is dynamically managed.It implements a region-based memory management scheme. This was done to test that the build setup of gnulib indeed works fine. In order to do this, first obstack module from gnulib was imported using the update-gnulib.sh script. Once imported, the necessary header file location was replaced inside gcc/Makefile.in. Finally to ensure that the system actually relies on gnulib's version of obstack and not libiberty's, files obstack.[ch] were locally removed. The entire system was built and tested just fine.
- I replaced bcopy as well so that the system starts using gnulib's version of bcopy however as per link no 3 below , the changes were undone.
- md5.c which is used to compute md5 message digest of files is also replaced.
- fnmatch has also been replaced. Despite the comment from Manuel which has been mentioned above, it has been replaced only so that the code doesn't show that dependency anymore.
Links
Joseph Myers proposed replacing libiberty with gnulib: Ideally I think most of libiberty would be replaced by use of gnulib in the projects using libiberty - I see no real advantage to the GNU Project in having those two separate libraries for substantially the same purposes - but that's a much larger and harder task, which would also involve, for each libiberty file replaced by a gnulib version, ascertaining whether there are any features or local changes in the libiberty version that should be merged into the gnulib version or any common upstream such as glibc. And some files in libiberty would probably need adding to gnulib as part of such a project. [ source ]
Jeff Law said: I suspect we'll probably want to go with direct use of gnulib obstack at some point. [ source ]
- Joseph Myers: GCC should not depend on bcopy. Any bcopy use is a bug and it should be replaced by memcpy or memmove as appropriate. The poisoning in system.h should prevent such uses from building in the first place.
Patches Explained and Future Work
This section will provide some more insights into the patches that have already been filed as a part of this project.
https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01554.html (Older version: https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01302.html)
- This patch contains the essential code changes required to insert gnulib as a host/build library.
GDB has already written scripts to import gnulib. After discussion, it was decided to make use of these scripts for gcc. https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=tree;f=gdb/gnulib;h=cdf326774716ae427dc4fb47c9a410fcdf715563;hb=HEAD .
- A directory with the name gnulib was created in the src directory and the above scripts were placed inside it.
- The update-gnulib.sh was edited to import only those modules which are relevant to gcc.
- Once the gnulib directory was created and the necessary modules imported changes were done to add gnulib as a host/build library.
- In order to do this gnulib was added as a build/host library inside src/Makefile.def and src/configure.ac and the corresponding Makefile.in, configure files regenerated.
- Changes were also made to src/gcc/Makefile.in : Added path to gnulib static library (libgnu.a) and gnulib header files.
- All these changes were successfully compiled and tested.
Review: https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01208.html
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01649.html
- This patch replaces libiberty's md5 with gnulib's md5
- In order to do this md5 was imported using the update-gnulib.sh script.
- The header path to md5.h file was also modified inside the gcc/Makefile.in
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01648.html
- Exactly same changes as the previous patch
Note that the above patches are still undergoing review. Once these patches are accepted and applied, it will be extremely easy to replace any function from libiberty with the corresponding function from gnulib.
The table above lists all the libiberty functions available and also tells whether a corresponding function from gnulib is available or not, and also whether the particular function is being used by GCC or not. The functions from libiberty can be divided into the following-
Functions not used by gcc and not present in gnulib: Nothing has to be done for these functions.
Functions not used by gcc but present in gnulib: These functions should be replaced as a part of the process, however since it will be difficult task to test these functions.
Functions used by gcc and present in gnulib: These can be furthered divided into two categories depending on whether they are present inside glibc or not. If they are not, replacing and testing is a straightforward process. However if they are provided by glibc, then certain hacks need to be made in order to test them.
Functions used by gcc and not present in gnulib: All these functions would required to be introduced in the gnulib library and therefore would require you to file patches to gnulib.