Google Summer of Code 2016 Project

Replace libiberty with gnulib

Student: Ayush Goel (Email: ayushgoel1610 google com)

Mentor: Manuel López-Ibáñez

Link to project summary: https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01551.html

Synopsis

Libiberty library is a collection of subroutines used by various GNU programs, gcc, binutils to name a few. Gnulib is a central location for common GNU code, intended to be shared among GNU packages. For projects using Libiberty, most of that functionality can be provided by Gnulib and therefore there is no real advantage to the GNU project to have two libraries for substantially the same purpose. This project involves replacing the libiberty library with gnulib within the scope of GCC.

Details about the project

1. Libiberty and Gnulib both take a different approach from other libraries. Their components are intended to be shared at the source level, rather than being built, installed and linked against. The idea is to copy these files into the source tree.

2. GCC leverages a lot of functionality provided by libiberty. The GCC project has the libiberty source code inside its source tree. The first thing that needs to be done is to setup gnulib inside the gcc source tree. Since gnulib components are intended to be shared at the source level, this involves importing the gnulib source tree inside the gcc source tree. This setup will be somewhat similar to how gdb is using gnulib. Gdb has imported only those files from gnulib which it depends on. To be precise, it using 74 files from the gnulib source directory, which it has placed inside its own directory.

3. Another important aspect of having this gnulib codependency is to be able to update these set of files from time to time. This can be done using the gnulibtool -import command. Once updated, these files will need to be built and installed. This entire setup can be automated using a bash script, once you know which set of files need to be imported.

4. Replacing libiberty with gnulib is going to be an iterative process. I will start with identifying small and mostly self contained gnulib modules which can be used to replace as few functions from libiberty as possible. Preferably something that won't require me to modify libiberty nor gnulib, but simply replace one header with another, in gcc. Another way to go about this will be to delete just one function from libiberty and replace it with exactly one module from gnulib. In order to find these small functions which can be replaced, one step at a time, I would be going through the list of libiberty functions that gcc uses and see which of them can be replaced by a single module of gnulib. And proceed in a similar manner. Once these functions have been figured out, I will modify gcc in such a way that it will first find the gnulib version of these files. The functions in libiberty which aren't available in gnulib can be put in a header file libiberty-gcc.h.

5. Another approach can be to only include libiberty on those GCC files which actually need it. And therefore when a file doesn't use a function provided by libiberty, it can simply stop using libiberty.h or any other sub-header.

6. After replacing a few functions, or maybe even a file from libiberty with gnulib, I would make sure that the gcc test suite runs fine and test my changes with as many host and target combinations as is practical. Wherever possible, I will automate the tests to cover the changes and add them to the GCC test-suite. I will also perform regression tests to ensure that my change does not break anything else. Typically, this would mean comparing post-change test results to pre-change results.

Overview of the Project

These are the high level steps to leverage gnulib source library.

Steps (Progress)

Importing Gnulib files

(Note that even though this is not the first step in the above mentioned overview, this seemed like an important starting point of the project. This step is about setting up the import process with placeholders in place of module names as the exact set of files to be imported will be figured out in the subsequent steps.)

  1. In order to replace the libiberty library for gnulib, it is necessary to import the source code files from gnulib into the gcc tree. Gnulib provides us with the gnulib-tool for importing the gnulib modules. There are two ways the gnulib clients do this -
    • Most of the gnulib clients directly move the gnulib source files into the directory containing their own local source files using the gnulib files. All the required changes to configuration and makefile are also made to their main configuration file in the source directory .However this convention is usually followed by the smaller projects.
    • Bigger projects like gdb have a separate directory altogether marked for gnulib source files. There is also a script to timely update these set of gnulib source files. The required configuration and make files are also present in this directory itself.
  2. Since gcc is an extremely large package, I decided to go with the convention followed by gdb to import gnulib modules. I have created a gnulib directory in the root directory along with other host/build libraries. This will contain all the source files from gnulibs (once that is finalized). It also contains a script to update this gnulib source files time and again using the gnulib-tool and the entire gnulib tree.
  3. Following this step, gnulib was added as a build and host library and all-gcc was made to depend on gnulib.
  4. All the previously mentioned changes successfully built gnulib in the build directory. Now in order to make use of this library, gnulib headers and static compiled library (libgnu.a) was added to gcc/Makefile.in
  5. All the above changes have been already submitted as a patch.
  6. Once all these changes are done, gcc is ready to use any function that can be leveraged from gnulib. In order to use any function from gnulib or replace an existing function with it's corresponding function in gnulib, the following steps are required:
    1. Import the function from gnulib. In order to do this, edit the update-gnulib.sh file inside the gnulib directory and enter the corresponding module to be imported under the IMPORTED_GNULIB_MODULES.
    2. Once this is done, run the script (./update-gnulib.sh) which will import the module and create the configure and makefiles.
    3. Now in order to make sure that gcc uses gnulib's version of the function and not someone else's, you'll have to replace the path for the corresponding header file in gcc/Makefile.in so that it points to gnulib's version's header file.
    4. Following this, you'll have to build and test to see that gcc correctly uses gnulib's version of the function.
    5. For reference, these are two patches, replacing fnmatch and md5 functions.
  7. For more details regarding the above patches, kindly refer to the PATCHES EXPLAINED section.

Differential Analysis of libiberty and gnulib

The idea is to traverse through the functions leveraged by libiberty and see which one of them can be replaced by corresponding functions in gnulib. However it is not a simple syntax based differential analysis. It is important to very carefully find functions which are syntactically similar and also semantically similar, that is a notion of functional equivalence between the two snippets of code.

Once this is done, gcc will be compiled with these new files and tested for. I'll be listing functions which are similar from both the libraries in the following table

Similar functions between libiberty and gnulib

Libiberty Function/File

Gnulib Function/File

Summary

bcopy

bcopy

Not used by GCC

obstack

obstack

Used. Memory Management GNU extension to the C standard library

fnmatch

fnmatch

Manuel Lopez: in both OSX and GNU/Linux, fnmatch is provided by the GNU libc already, so the copy in libiberty is not used in your systems, thus you have no way of testing it.

xasprintf

xasprintf

memmem.c

memmem.c

Not used by GCC ? Imported from gnulib on 2009. Perhaps diverged?? Provided by glibc (therefore can't be tested)

sha1.c

sha1.c

Used only by Go FE Imported from gnulib Perhaps diverged??

crc32.c

crc.c

Not used by GCC ??

alloca.c

alloca.c

asprintf.c

asprintf.c

Provided by glibc (therefore can't be tested)

atexit.c

atexit.c

Provided by glibc (therefore can't be tested)

basename.c

basename.c

Provided by glibc (therefore can't be tested)

calloc.c

calloc.c

copysign.c

copysign.c

ffs.c

ffs.c

Provided by glibc (therefore can't be tested)

getcwd.c

getcwd.c

Provided by glibc (therefore can't be tested)

getopt.c

getopt.c

Not used by gcc (Manuel: This is wrong, gentype.c, gcov-tool.c, and cp-demangle.c do use getopt or getopt_long). Provided by glibc (therefore can't be tested)

getpagesize.c

getpagesize.c

Provided by glibc (therefore can't be tested)

gettimeofday.c

gettimeofday.c

Provided by glibc (therefore can't be tested)

md5.c

md5.c

used?? Functions to compute MD5 message digest of files or memory blocks according to the definition of MD5 in RFC 1321 from April 1992.

memchr.c

memchr.c

Provided by glibc (therefore can't be tested)

memcmp.c

memcmp.c

Provided by glibc (therefore can't be tested)

memcpy.c

memcpy.c

Provided by glibc (therefore can't be tested)

memmove.c

memmove.c

Provided by glibc (therefore can't be tested)

mempcpy.c

mempcpy.c

Not used by GCC. Provided by glibc (therefore can't be tested)

memset.c

memset.c

Provided by glibc (therefore can't be tested)

mkstemps.c

mkstemps.c

Provided by glibc (therefore can't be tested)

physmem.c

physmem.c

putenv.c

putenv.c

random.c

random.c

Not used by GCC

regex.c

regex.c

rename.c

rename.c

Provided by glibc (therefore can't be tested)

setenv.c

setenv.c

Provided by glibc (therefore can't be tested)

snprintf.c

snprintf.c

Provided by glibc (therefore can't be tested)

stpcpy.c

stpcpy.c

Provided by glibc (therefore can't be tested)

stpncpy.c

stpncpy.c

Not used by GCC. Provided by glibc (therefore can't be tested)

strcasecmp.c

strcasecmp.c

Provided by glibc (therefore can't be tested)

strdup.c

strdup.c

Not used by GCC (Manuel: ./lto-plugin/lto-symtab.c)

strerror.c

strerror.c

Provided by glibc (therefore can't be tested)

strncasecmp.c

strncasecmp.c

strndup.c

strndup.c

Not used by GCC

strnlen.c

strnlen.c

Used

strsignal.c

strsignal.c

Used

strstr.c

strstr.c

Used

strtod.c

strtod.c

used

strtol.c

strtol.c

strtoll.c

strtoll.c

strtoul.c

strtoul.c

strtoull.c

strtoull.c

strverscmp.c

strverscmp.c

vasprintf.c

vasprintf.c

vfprintf.c

vfprintf.c

used

vprintf.c

vprintf.c

used

vsnprintf.c

vsnprintf.c

used

vsprintf.c

vsprintf.c

Not used

waitpid.c

waitpid.c

Provided by glibc (therefore can't be tested)

xmalloc.c

xmalloc.c

used

xstrndup.c

xstrndup.c

used

xvasprintf.c

xvasprintf.c

used

dupargv (argv.c)

Not available

Not used

buildargv (argv.c)

Not available

Used (Manuel: where?)

writeargv, expandargv, countargv (argv.c)

Not available

used

bsearch.c

Not available

Used. Provided by glibc (therefore can't be tested)

bcmp.c

NA

Not used

bzero.c

NA

used. Provided by glibc (therefore can't be tested)

choose-temp.c

NA

Not used

clock.c

NA

used. Provided by glibc (therefore can't be tested)

concat.c

NA

used

dyn-string.c

NA

not used

fdmatch.c

NA

not used

fibheap.c

NA

Not used

filename_cmp.c

NA

used

floatformat.c

NA

Not used

fopen_unlocked.c

NA

Not used

getpwd.c

NA

used

getruntime.c

NA

Not used

hashtab.c

NA

used

hex.c

NA

used

index.c

NA

used

insque.c

NA

Not used

lbasename.c

NA

used

lrealpath.c

NA

used

make-relative-prefix.c

NA

Not used

make-temp-file.c

NA

Not used

msdos.c

NA

Not used

objalloc.c

NA

Not used

partition.c

NA

used

pex-common.c

NA

Not used

pex-djgpp.c

NA

Not used

pexecute.c

NA

Not used

pex-msdos.c

NA

Not used

pex-one.c

NA

Not used

pex-unix.c

NA

Not used

pex-win32.c

NA

Not used

rindex.c

NA

Not used

safe-ctype.c

NA

Not used

setproctitle.c

NA

Not used

sigsetmask.c

sigsetmask.c

used

simple-object.c

NA

Not used

simple-object-coff.c

available

used

simple-object-common.h

NA

Not used

simple-object-elf.c

NA

Not used

simple-object-mach-o.c

NA

Not used

simple-object.txh

NA

Not used

simple-object-xcoff.c

NA

Not used

sort.c

NA

used

spaces.c

NA

used

splay-tree.c

NA

Not used (Manuel: It is actually used ./gcc/ipa-utils.c

stack-limit.c

NA

Not used (Manuel: actually used ./gcc/toplev.c

strchr.c

NA

Provided by glibc (therefore can't be tested)

strncmp.c

NA

used

strrchr.c

NA

used

timeval-utils.c

NA

Not used

tmpnam.c

tmpnam.c

used

unlink-if-ordinary.c

NA

Not used

vfork.c

Available

used

xatexit.c

NA

used

xexit.c

NA

used

xmemdup.c

NA

used

xstrdup.c

Available

used

xstrerror.c

NA

used

  1. The first function to be replaced from libiberty was obstack. Obstack is a memory-management GNU extension to the C standard library.An "obstack" is a "stack" of "objects" (data items) which is dynamically managed.It implements a region-based memory management scheme. This was done to test that the build setup of gnulib indeed works fine. In order to do this, first obstack module from gnulib was imported using the update-gnulib.sh script. Once imported, the necessary header file location was replaced inside gcc/Makefile.in. Finally to ensure that the system actually relies on gnulib's version of obstack and not libiberty's, files obstack.[ch] were locally removed. The entire system was built and tested just fine.
  2. I replaced bcopy as well so that the system starts using gnulib's version of bcopy however as per link no 3 below , the changes were undone.
  3. md5.c which is used to compute md5 message digest of files is also replaced.
  4. fnmatch has also been replaced. Despite the comment from Manuel which has been mentioned above, it has been replaced only so that the code doesn't show that dependency anymore.

Patches Explained and Future Work

This section will provide some more insights into the patches that have already been filed as a part of this project.

Note that the above patches are still undergoing review. Once these patches are accepted and applied, it will be extremely easy to replace any function from libiberty with the corresponding function from gnulib.

The table above lists all the libiberty functions available and also tells whether a corresponding function from gnulib is available or not, and also whether the particular function is being used by GCC or not. The functions from libiberty can be divided into the following-

None: replacelibibertywithgnulib (last edited 2017-09-06 12:19:44 by ManuelLopezIbanez)