Google Summer Of Code
Thanks for your interest in the GNU Compiler Collection as your mentoring organization in Google's Summer of Code (GSoC). GCC has applied independently to be a GSoC mentoring organization in 2021. The primary org-admin is Martin Jambor. In the past, GCC has also applied under the umbrella of GNU project mentoring organization.
If you are a student with a project idea or want to work on any of the ideas below, please discuss it as soon as possible (way before the application) via the mailing list and feel free to raise it on IRC. Also make sure you have read the Before you apply and Application sections on this page.
The GCC is owned by the Free Software Foundation (FSF). As such, all contributors must assign their copyright to the FSF before any of their changes are accepted. The copyright assignment process is described in Contributing to GCC. See also GettingStarted.
Contents
Selected Project Ideas for 2021
When discussing GSoC project ideas for 2021 in the community, we have found out we are especially interested in the following few. One of their main advantages, apart from their particular relevance and usefulness this year, is that we are confident we can find mentors for them. We will however also consider other projects and we will be happy to discuss with you your own ideas. Nevertheless, please do consider applying for the following:
Make cp-demangle non-recursive. C++ mangled names use a recursive grammar, and naturally led to a recursive demangler in libibiberty (used for __cxa_demangle and other entry points). Very long symbols and malformed inputs can cause deep recursion (even unbounded in the malformed case). Stack overflow is unpredictable and has terrible failure modes, which has led to imposing arbitrary recursion limits in the demangler. The demangler should be converted to use a bespoke heap-allocated data stack for the recursive state, and the recursion removed. Resource allocation failure can be indicated via the existing failure code. The demangler is implemented in C, the project is turning recursion into a LIFO worklist. You will also learn about C++ mangled names. Stretch-goal: Add demangler support for C++20 module manglings. This project would be mentored by Nathan Sidwell. Difficulty: Medium
Bypass assembler when generating LTO object files. Currently we create Link-time-optimization (LTO) object files with the help of assembler, which however only creates ELF files with the provided binary contents. The aim of this patch is to create them directly from the compiler. Preliminary patch is at https://gcc.gnu.org/ml/gcc/2014-09/msg00340.html. Finishing this would require work on libiberty simple object file handling, in the GCC wrapper and in GCC itself. If finished, the compile time performance should improve by several percent. This project would be mentored by Jan Hubička. Required skills include: C/C++, working with ELF file format.
Extend the static analysis pass GCC 10 has gained an experimental static analysis pass which performs some rudimentary checking of malloc/free and the stdio FILE stream API. There is plenty of scope for extending this pass in ways that may interest a student, such as
- checking of the POSIX file-descriptor APIs (int rather than FILE *), or some other POSIX API that we're not yet checking,
- write a plugin to add a project specific-checker for a project of interest to the student (Linux kernel?),
- C++ support (new/delete checking, exceptions, etc).
add support for SARIF as an output format
Rust Front-End. a new compiler front-end for rust is in development https://github.com/Rust-GCC/gccrs, there is a lot of open development opportunities available here: https://github.com/Rust-GCC/gccrs/wiki/Google-Summer-of-Code
- Improve debugging experience by improving the existing HIR dump to include HIR mappings information akin to rustc. We can leverage work here to then emit a name and type resolution dump. This will greatly improve the debugging experience with gcc-rust. We can use the official rustc compiler as a reference here for the HIR dump. Difficulty Medium
- Add more unreachable static analysis if a function is unused all functions after that point are unused. In this example bar is unused therefore foo is unused. Difficulty: Medium
- Warn for names that could be made immutable if a name is only written to once we can warn the user that this name could be made immutable. Difficulty: Medium
Fortran – run-time argument checking. – In particular older Fortran code, which does not use modules, but also code which uses implicit-size or explicit-size arrays is prone to argument mismatches. The goal of this item is to add an optional run-time test which works by storing the argument-type/size data before the call in a global variable – and check against it in the callee. (A pointer to the called function is stored alongside to permit calls from uninstrumented code to instrumented code.) This project would be mentored by Tobias Burnus. Required skills include C/C++; some knowledge of Fortran helps, but is not needed.
Fortran – improved argument compile-time checking – The compiler does check for the arguments in the same file – but it could do better in some cases, i.e. checking better the interface data or updating the expected input better from the use. This project would be mentored by Tobias Burnus. Required skills include C/C++; some knowledge of Fortran helps, but is not needed.
Fortran – shared-memory coarrays NOTE: Partial implementation available in devel/coarray_native branch, if interested, ask about remaining tasks in this area – Coarrays are a means of parallelizing code; conceptually, all memory is local memory, except for coarrays which are on multiple processes ("images") and remote can be directly accessed. (Internally: one-sided communication.) GCC/gfortran supports "single" (compiles but does not do any actual parallelization) and "lib" (requires a communication library). The goal of this task is to add a shared-memory implementation – such that parallel coarray programs runs out of the box without additional external libraries. This project would be mentored by Tobias Burnus. This project consists of work mostly on a run-time library written in C but also on the compiler itself written in C/C++. Hence, required skills include C/C++, knowledge about POSIX Threads; some knowledge of Fortran helps, but is not needed.
Enable incremental Link Time Optimization (LTO) linking. At the moment, LTO re-optimizes and generates code for the whole program or library if just one object file changes, even an insignificant way. The student working on this project will write code determining what changes in LTO input are significant for various stages of LTO processing and try not to redo the work which is would lead to the same results as before. The mentor of this project would be Jan Hubička. Required skills include C/C++ and familiarity with the LTO model.
Binutils support for AIX 7.2. GNU Binutils. Binutils (Gas, Gld, etc.) currently support AIX 4.3.3 and partially support AIX 5.1. GNU BFD library has existing support for XCOFF and GDB functions. This project would update AIX support in GNU Binutils support for the latest release of AIX 7.2. The main goal is the ability to bootstrap GCC with GNU Binutils (accept all current GCC instructions, directives and options for AIX Assembler and Linker) and produce correct, functioning GCC executable and GCC runtime shared object libraries (libgcc, libstdc++). This project would be mentored by David Edelsohn. Required skills include: C/C++, Binutils, AIX or at least willingness to learn it (access to an AIX system will be provided).
Unless a project above was explicitly marked with a different difficulty stated otherwise, consider it to be hard. Generally speaking, GCC is a production compiler and working on one of those is always hard, especially if you are new. On the other hand, the community of GCC developers is very helpful and goes out of its way to assist newcomers with various difficulties they inevitably encounter.
If the list above was not exhaustive enough for you, you can have a look also at Other Project Ideas section below.
Before you apply
...and perhaps before you even reach out to us on the mailing list, make sure that you can check out the GCC source code from its Git repository, build GCC from it and run the testsuite (this is something that would need doing very many times in the course of any project working on GCC).
The following links should help you:
How to checkout our sources using Git is described at https://gcc.gnu.org/git.html.
Steps linked from https://gcc.gnu.org/install/ show you how to configure, build and test GCC (look for --disable-bootstrap, among other things). The Installing GCC page shows an easy way to obtain the libraries required to build GCC which people often find most problematic and other advice related to building and installing GCC for the first time.
Make sure you also look at Getting Started wiki page.
Wiki page DebuggingGCC, David Malcolm's blogpost on Debugging GCC and the manual page about Developer options are of particular interest. Read through those, compile a simple but non-trivial program with
-O3 -S -fdump-tree-all -fdump-ipa-all -fdump-rtl-all
and look through the generated files. Look at the the source code, especially in the gcc subdirectory and try to set a breakpoint somewhere and hit it. Then look around in gdb.
If you have done all of the above and still find it a little bit intimidating or if you have difficulties figuring out where to start looking for particular things, do not despair. That is something the mentors and the community at large are willing to help you with.
Application
Students applying for a GCC Google Summer of Code project need to have experience coding in C/C++. Furthermore, if you want to work on the actual compiler you must have at least rudimentary theoretical background in the area of compilers and compiler optimizations. This may not be strictly necessary if your project aims to improve a different tool or library that is part of GCC, such as the demangler.
First, you need to select a project. If you have been following GCC development, you might have an idea of your own, otherwise look at the suggested projects above and try to pick one there. In the course of selecting a project, do not hesitate to ask questions or request more details from the community by email to the gcc@gcc.gnu.org mailing list with the string "GSoC" in the email subject or on our #gcc IRC channel at irc.oftc.net. Please note that the mailing list does not accept HTML messages, you must set your email client to plain text. We also encourage you to browse through our web site at https://gcc.gnu.org/ and of course this wiki.
After you you have chosen your project, please make sure you send us an email about your intention to apply to the gcc@gcc.gnu.org mailing list with the string "GSoC" in the email subject, in addition to any general required steps to apply to the GSoC program.
Last but not least, the GCC is owned by the Free Software Foundation (FSF), as such, all contributors must assign their copyright to the FSF before any of their changes are accepted. The copyright assignment process is described on pages:
Formal application document
GCC does not have any application form or a mandatory application format to follow.
In the formal application document that you submit to GSoC you should primarily describe the project and clearly define its goals. Generally speaking, it is probably a good idea to accompany the proposed project description with a brief motivation, an expected time-line (we understand it is likely to change) and a brief introduction of your technical background, skills and/or accomplishments. The project description is the most important part however and each project is perhaps best explained differently. We will mostly judge your ability to finish the project from your interactions with us, on mailing lists and IRC, rather than from a CV.
Further tips and guidelines
A gcc Summer of Code participant for 2006, Laurynas Biveinis, wrote a blog about it.
The Drupal project has a great page on How to write an SOC application.
- Be honest and realistic. We prefer a smaller project with clearly defined goals to a far-reaching but vague proposal (that is likely never going to be finished by the student).
- Students that have already submitted good patches give a much better impression to reviewers and potential mentors.
Starting with some small patch for the area you are interested in before the proposal submittal period can help (ask for guidance and a simple enough project): It helps you to get the code known and to decide whether you really want to do the project, it shows how the development procedure is, and helps potential mentors to judge the proposal based on actual work. Besides: Also small fixes are good and getting people known by email (or IRC) exchange is nice by itself
And let's stress again that you need to present your project in the mailing list gcc@gcc.gnu.org to be sure it is a good idea. Prepend "GSoC" to the subject.
Other Project Ideas
Note that some of the ideas found below might be fully or partially obsolete. This is another reason why it is always a good idea to discuss the project of interest on the mailing list and/or via IRC before submitting a GSoC proposal.
Link-time and interprocedural optimization improvements
Link-time optimization (LTO) is powerful infrastructure in GCC and there are many areas how to make it better, for example:
- Implement tree level section anchors to improve code generation at ARM/PPC.
Language front-ends and run-time libraries
- Refactor libstdc++ for optional smaller footprint configuration, e.g., iostreams without templates
Fortran front end (please discuss ideas on the Fortran mailing list):
Extend OOP support (possible mentors: Janus Weil, Tobias Burnus) by improving partially-implemented features, such as:
- Associate construct
coarray support for SMP and distributed memory systems (possible mentor: Tobias Burnus)
Coarrays are a PGAS extension, which is part of Fortran 2008.
Possible task: Improving the multi-image support, which uses OpenCoarray communication library
- Implement Fortran 2018 additions (TS18508) additions
- Run-time argument checking: Uninitialized variables, correctness of arguments
- Improve handling of allocatable characters
- Improve parameterized derived types (possible sponsor: Paul Thomas)
- Better IEEE support
Other Unimplemented Fortran 2003, Fortran 2008 features, TS 29113 features
- IO optimization. Currently formatted scalar IO is quite slow and uses lots of stack space.
OpenMP runtime improvements: openmp
- GCC Go escape analysis: in Go, taking the address of something means that it lives on the heap--Go has no such thing as a dangling pointer. It is possible to use escape analysis to determine whether the pointer ever escapes its scope. If it does not, then the object whose address is taken can be allocated on the stack rather than the heap, which is more efficient. A particular example is calls to functions like fmt.Printf, which allocate a slice of the arguments passed in. Escape analysis can allocate that slice on the stack rather than the heap.
Enhance the GimpleFrontEnd with CFG and SSA annotation reconstruction to make writing and extracting unit-tests easier.
New optimization passes
- Implement code motion of stores towards entry (and use this to improve code for int to float conversion on rs6000-based targets)
- Implement a prototype for early instruction selection
- Propagate interprocedural dataflow from GIMPLE to RTL
- Add Factored Use-Def (FUD) chains to RTL
Loop optimizations and automatic parallelization based on Graphite
- Implement a basic-block local scheduling pass to improve SSA name coalescing opportunities at RTL expansion time
Implement a (prototype) addressing mode selection (AMS) pass as a replacement of auto-inc-dec. For more details see PR 56590.
Other projects and project ideas
Type Sanitizer. Both LLVM and GCC compilers do share a common sanitizer library called libsanitizer. The library has recently received support of typed-based sanitization (TySan). Goal of the task would be to investigate and prototype usage of type-based aliasing rules information provided by GCC in order to detect violations of strict aliasing rules.
Replace libiberty with gnulib. See http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00362.html Initial work was done in GSoC 2016 (replacelibibertywithgnulib).
Finish the implementation of a stable introspection plugin API (with the possibility of extending it to cover non-introspection cases)
- Modify any GCC optimization decisions externally through plugins (see MILEPOST GCC, for example). -- G. Fursin, 2014.
- Systematize learning of optimal optimization decisions for multiple benchmarks, data sets and architectures (see c-mind.org/repo, for example). -- G. Fursin, 2014.
- Extend GCC plugin framework to enable code instrumentation (insert calls to external function after individual instructions) for dynamic code analysis. We need it to extend our TM/TLS models. -- G. Fursin, 2014.
- Fix -ftrapv so that it works.
- Improve the regression testing system, for example to detect places where the generated code changed (useful for refactoring).
Promote C++ operator new to alloca when pointer does not escape and user allows non-conformance to C++ standard
- Improve loop unrolling heuristics and enable loop unrolling with default optimization
- Analyze and improve inlining, loop unrolling, reassociation and predictive commoning heuristics for PowerPC architecture
- Use TARGET_EXPAND_TO_RTL_HOOK for pipelined divide on PowerPC
- Support AIX XCOFF file format for LTO (David Edelsohn)
Implement something similar to Clang's -ftime-trace feature which generates performance reports that show where the compiler spends compile time. For more information, please check the following blog post. There's also an existing bugzilla entry for this (if this becomes a GSoC project, the assignee will of course change). Required skills include C/C++ and finding a way through a large code-base.
There are several pages with general ideas for GCC, many of which we linked below for easy access. These ideas usually are not just one project but a group of distinct projects.
- This category of projects deals with a range of changes, from simple to challenging. These projects are of great interest to us, because they address some long-standing architectural issues that we want to fix.
Other project ideas can be found in the bug database, look for old bugs which are still open.
Or invent your own project. We're always open to good ideas. But note that we are probably not too interested in projects to add new extensions to the C or C++ languages. We've found over time that these tend to introduce more problems than they solve.
Thanks, and we look forward to your submissions!
Improving GCC Developer Documentation
The rules of the GSoC program do not allow projects to consist of documentation improvements only. Nevertheless, note that writing documentation may be an important part of your project or even an essential one if you introduce user-visible changes and plan your work accordingly.
Accepted GCC Projects
2020
Project |
Student |
Mentors |
Giuliano Belinassi |
Richard Biener |
|
John Ravi |
Martin Liška and Nathan Sidwell |
|
Implementation of OMPD in GCC and libgomp |
Tony Sim |
Jakub Jelínek and Martin Jambor |
2019
Project |
Student |
Mentors |
On vector<bool> and optimized Standard Algorithms in libstdc++ |
ThePhD |
Thomas Rodgers, Jonathan Wakely and Ville Voutilainen |
Tejas Joshi |
Martin Jambor and Jan Hubička |
|
Shubham Narlawar |
Martin Liška and Andi Kleen |
|
Khurai Kim |
Jakub Jelínek |
|
Make C/C++ not automatically promote memory_order_consume to memory_order_acquire |
akshatg |
Paul E. McKenney and Ramana Radhakrishnan |
Parallelize GCC with Threads (see also ParallelGcc) |
Giuliano Belinassi |
Richard Biener |
2018
Project |
Student |
Mentor |
Hrishikesh Kulkarni |
Martin Liška and Jan Hubička |
2016
Project |
Student |
Mentor |
Ayush Goel |
Manuel Lopez-Ibanez |
|
Prasad Ghangal |
Richard Biener |
|
erikvarga |
Oleg Endo |
2015
Project |
Student |
Mentor |
C++ Library Fundamentals: shared_ptr and polymorphic memory resources |
Fan You |
Tim Shen |
Erik Krisztian Varga |
Oleg Endo |
2014
Project |
Student |
Mentor |
Coarray support in GNU GFortran |
Alessandro Fanfarillo |
Tobias Burnus |
Concepts Separate Checking |
Braden Obrzut |
Andrew Sutton |
Integration of ISL code generator into Graphite |
Roman Gareev |
Tobias Grosser |
Generating folding patterns from meta description |
Prathamesh Kulkarni |
Richard Biener |
GCC Go escape analysis |
Ray Li |
Ian Lance Taylor |
2013
Project |
Student |
Mentor |
Fotis Koutoulakis |
Thomas Schwinge |
|
Martin Liška |
Jan Hubicka |
|
Tim Shen |
Stephen M. Webb |
2012
Project |
Student |
Mentor |
Dimitrios Apostolou |
Andrey Belevantsev |
|
Morgen Matvey |
Benjamin De Kosnik |
|
Gimple FE : Extending the text gimple recognizer to a real front end |
Sandeep Soni |
Diego Novillo |
Sergey Lega |
Benjamin De Kosnik |
2011
Project |
Student |
Mentor |
Extend GFortran's Coarray support with MPI-based paralellization (project page) |
Daniel Carrera |
Tobias Burnus |
GCC Optimisation Final Report, Various Notes: (1) (2) (3) |
Dimitrios Apostolou |
Steven Bosscher |
Integration of transactional memory support into a data-flow extension of OpenMP |
Ismail KURU |
Richard Henderson |
Ketaki |
Diego Novillo |
|
Philip Herron |
Ian Lance Taylor |
|
Piervit |
Basile Starynkevitch |
|
Sho Nakatani (中谷 翔) |
Jakub Jelínek |
2010
The source code for finished projects can be found at Google's code hosting site and their respective SVN branches.
Project |
Student |
Mentor |
Yi-Hong Lu |
H. J. Lu |
|
Sandeep Soni |
Diego Novillo |
|
Artjoms Sinkarovs |
Richard Günther |
|
Philip Herron |
Ian Taylor |
|
Improving the static control part detection mechanism in Graphite |
Vladimir Kargov |
Sebastian Pop |
Ankur Deshwal |
David Edelsohn |
|
ScopLib support for Graphite - Linking Graphite to the huge industrial and research community |
Sebastian Pop |
|
Andreas Simbuerger |
Tobias Grosser |
|
Tobias Burnus |
||
Extending Fortran 2003 and 2008 support for gfortran (esp. Co-Arrays) |
Daniel Kraft |
Tobias Burnus |
2009
The source code for finished projects can be found at Google's code hosting site.
Project |
Student |
Mentor |
Li Feng |
Tobias Grosser |
|
Enable generic function cloning and program instrumentation in GCC to be able to create static binaries adaptable to varying program and system behavior or different architectures at run-time |
Liang Peng |
Grigori Fursin |
gfortran: Procedure Pointer Components & OOP |
Tobias Burnus |
|
Traditional Loop Transformations |
pranav garg |
Sebastian Pop |
Make the OpenCL Platform Layer API and Runtime API for the Cell Processor and CPUs |
phil prattszeliga |
Paolo Bonzini |
Provide fine-grain optimization selection and tuning abilities in GCC to be able to tune default optimization heuristic of the compiler or fine optimizations for a given program on a given architecture entirely automatically using statistical and machine learning techniques from the MILEPOST project. |
Yuanjie Huang |
Grigori Fursin |
2008
The source code for finished projects can be found at Google's code hosting site.
Project |
Student |
Mentor |
Steven Bosscher |
||
Improving Dead Store Elimination |
Jakub Staszak |
|
Extend Fortran 2003 support for gfortran |
Daniel Kraft |
François-Xavier Coudert |
C++0x lambda functions for GCC |
John Freeman |
|
Tobias Grosser |
||
Finish work on propagation aliasing and array dependence information from Tree-SSA to RTL. |
Alexander Monakov |
Diego Novillo |
Tobias Burnus |
2007
The source code for finished projects can be found at Google's code hosting site.
Project |
Student |
Mentor |
Propagating array data dependence information from Tree-SSA to RTL |
Alexander Monakov |
Daniel Berlin |
Manuel López-Ibáñez |
Diego Novillo |
|
Speeding up GCC for fun and profit |
James Webber |
Eric Marshall Christopher |
Steven Bosscher |
||
Open Mutliprogramming Interprocedural Analasis and Optimalizations |
Jakub Staszak |
Daniel Berlin |
Integrating OpenJDK's javac bytecode compiler into gcj |
Dalibor Topic |
Mark J. Wielaard |
New static scheduling heuristic for GCC |
Dmitry Zhurikhin |
Vladimir Makarov |
GCC support for Windows-compatible Structured Exception Handling (SEH) on the i386 platform |
Michele Cicciotti |
Ian Lance Taylor |
2006
Project |
Student |
Mentor |
Code parallelization using polyhedral model |
Plesco Alexandru |
Daniel Berlin |
Paul Biggar |
Daniel Berlin |
|
Laurynas Biveinis |
Daniel Berlin |
|
java.lang.management in Classpath |
Andrew John Hughes |
Mark Wielaard |
Lock free C++ containers |
Phillip Jordan |
Benjamin Kosnik |
Manuel López-Ibáñez |
Ian Lance Taylor |