GNU Tools Cauldron 2015

<< GNU Tools Cauldron 2014 | GNU Tools Cauldron 2016 >>

cake.jpg


Photo small.jpg

Organizers

Organizing committee:

Sponsors

Slides and Notes

The videos for all recorded presentations are available on the Cauldron 2015 playlist on YouTube. youtube-dl can be used to download the videos with free software.

Title

Slides

Video

Accelerator BoF

slides1, slides2

video

Aditya Kumar, Sebastian Pop: Loop optimizer and vectorization BOF

slides

video

Andreas Arnez: Debugging Linux kernel dumps with GDB

slides

video

Andreas Arnez: Debugging versus hardware transactional

slides

video

Bill Schmidt & Michael Gschwind: Supporting Vector Programming on a Bi-Endian Architecture

slides

Bin Cheng: IVOPTs current implementation and challenges

slides

video

Claudiu Zissulescu: Scheduling for ARC HS cores

slides

video

Deshpande Sameera: Improving the Effectiveness and Generality of GCC Auto-Vectorization

slides

video

Dodji Seketeli: ABI comparison with Libabigail based tools

slides

video

Ed Jones, Simon Cook & Jeremy Bennett: Keeping Other Compilers Honest: Validating LLVM with the GCC Regression Test Suite

slides

video

Hafiz Abid Qadeer: What is new in DWARF5

slides

video

James Pallister & Jeremy Bennett: GNU Superoptimizer 2.0

slides

video

Jan Hubicka: LTO BOF

slides

video

Jan Hubicka: Types and type based optimizations in GCC

slides

video

Kirill Yukhin: OpenMP 4 Offloading Features implementation in GCC

slides

video

Ilya Enkovich: Vectorization for Intel AVX-512

slides

video

Mark Polacek: Fedora Mass Rebuilds Testing GCC in the wild

slides

Martin Jambor: Compiling for HSA accelerators with GCC

slides

video

Martin Liska: Inter-producedural Identical Code Folding in GCC

slides

video

Michael Meissner: Gnu PowerPC support in 2015

slides

video

Olga Golovanevsky: Memory Layout Optimizations of Structures and Objects

slides

video

Siddhesh Poyarekar: Tunables for the C Library

slides

video

Siddhesh Poyarekar: glibc microbenchmarking and whole system benchmarking BoF

slides

video

Torvald Riegel: Modern concurrent code in C/C++

slides

video

Torvald Riegel: Updating glibc concurrency

slides

video

Ulrich Weigand: Supporting the new IBM z13 mainframe and its SIMD vector unit

slides

video

Fabian Schnell, Mircea Gherzan: Debugging Compute Shaders on Intel(R) GPUs

video

Peter Bergner: PowerPC BOF

video

David Edelsohn: GCC Cost Model BoF

video

David Edelsohn, Jim Wilson: GCC Steering Committee Q&A

video

Carlos O'Donell: GNU C Library BOF

video

Ian Taylor: Go BOF

video

Nathan Sidwell: OpenACC

video

Ramana Radhakrishnan: BoF for the ARM / AArch64 ports

video

Carlos O'Donell, Marek Polacek: Continuous Integration

video

Mailing lists

  1. Abstract submissions, registration, administrivia questions: tools-cauldron-admin@googlegroups.com

  2. Announcements and discussions related to the conference: gcc@gcc.gnu.org .

Workshop description

We are pleased to announce another gathering of GNU tools developers. The basic format of this meeting will be similar to the previous meetings.

The purpose of this workshop is to gather all GNU tools developers, discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions.

This time We will meet again at the Lesser Town Campus of Charles University in Prague (Malostranske Namesti 25, Prague, Czech Republic map1, map2). (The same location as of GNU Tools Cauldorn 2012).

We are inviting every developer working in the GNU toolchain: GCC, GDB, binutils, runtimes, etc. In addition to discussion topics selected at the conference, we are looking for advance submissions.

If you have a topic that you would like to present, please submit an abstract describing what you plan to present. We are accepting three types of submissions:

Note that we will not be doing in-depth reviews of the presentations. Mainly we are looking for applicability and to decide scheduling. There will be time at the conference to add other topics of discussion, similarly to what we did at the previous meetings.

To register your abstract, send e-mail to tools-cauldron-admin@googlegroups.com .

Your submission should contain the following information:

If you intend to participate, but not necessarily present, please let us know as well. Send a message to tools-cauldron-admin@googlegroups.com stating your intent to participate. Please indicate your affiliation, dietary requirements and t-shirt size.

Schedule

Abstracts

Talks

Andreas Arnez: Debugging versus hardware transactional memory

Since a few years, some commercial server CPUs, like IBM zEC12, Intel Haswell, and IBM POWER8, implement "hardware transactional memory", a feature that lets a sequence of operations appear as a single atomic transaction. This feature is focused on simplifying and/or speeding up certain synchronization scenarios in parallel computing. But what about debug-capability? When a transaction is interrupted, it is rolled back, and the state at the time of interruption is lost. This leads to various difficulties with breakpoints, watchpoints, single-stepping, etc. The talk outlines these issues and discusses possible solutions.

Andreas Arnez: Debugging Linux kernel dumps with GDB?

GDB can principally be used for analyzing Linux kernel dumps, but there are significant limitations. By contrast, GDB contains special logic for the user-space runtime, such as for the threading library and for the dynamic loader. Analogous functionality is currently lacking for the kernel runtime. Also, GDB does not offer any capabilities specific to kernel dump debugging, like reading compressed kdumps or deploying C-like dumper functions that can be derived from existing code in the Linux kernel sources. In general, there are various possible ways of improving GDB's Linux kernel dump support. Such improvements may also benefit kernel live debugging, such as with a JTAG probe or with Qemu's gdbserver.

James Pallister, Jeremy Bennett: GNU Superoptimizer 2.0

For nearly 20 years, GSO has been the reference superoptimizer, and proved successful in uncovering new peephole optimizations for compilers. The code has been relatively stable for some years. In this talk we'll discuss our work on a new version of GSO, drawing on more recent research in the field. We'll cover:

The result is a complete rewrite - GSO 2.0. We won't have finished the job by the time of the Cauldron, but we'll give a progress update.

Jeremy Bennett: Keeping other compilers honest: How to validate LLVM with the GCC

LLVM comes with two test suites. The regression test suite is used to validate the compiler, and does not involve any execution tests. The LLVM nightly tests are a collection of applications which can be used to test execution of generated code on larger targets. However LLVM lacks any large body of small tests to exercise all aspects of a compiler. By comparison the latest GCC regression test suite includes around 75,000 C tests and 50,000 C++ tests, many of which are execution tests.

For a long time, Apple and Intel have used the GCC 4.2.2 and GDB 6.3 regression tests to validate their LLVM implementations. However these are heavily hand-modified versions, and it is hard to either roll forward to newer tests or to different architectures. So, for example, there is no testing of the 2011 and 2014 C/C++ standards.

In this short talk, I'll present our experience of using the GCC regression tests with a LLVM compiler for a deeply embedded target. I'll outline a unified solution, which makes it feasible to use the latest GCC regression test suite with other compilers and other architectures. It requires a generic patch to DejaGnu (in gdb.exp), which provides a mechanism for a central database to control which tests should run and their expected output.

The result is that only tests which should give a result on the architecture are run. There is no problem with large numbers of tests timing out, and a good compiler should be able to achieve zero FAIL, XPASS and UNRESOLVED results. The approach is generic, so it can be applied to other GNU regression test suites, such as GDB and binutils, providing further validation.

The intention of this talk is to stimulate discussion about one aspect of interaction between LLVM and GCC. LLVM is in many areas led by GCC - if only because it needs to be able to compile code which has historically been compiled with GCC. To what extent should GCC actively support LLVM. For example should DejaGnu tests include dg- directives to indicate relevance to LLVM?

Bin Cheng: IVOPTs current implementation and challenges

The topic consists of two parts. First part is an overview of how IVO currently works. It has been changed a lot since tree level IVO was firstly merged in early 4.* versions. This might also result in a refined top-level comment for tree-ssa-loop-ivopts.c. The second part is about some non-trivial problems that current IVO doesn’t handle very well (in other words, points that I think IVO could be improved):

This problem list might change since I am still working/learning IVO now, but they are major problems for IVO I had noticed for last two years. Though most of them are difficult, I expect that I can fixed some before the Cauldron. I will talk about each problem (and possible solution) in detail with help of examples. As for possible outcome of this talk, I would expect to get feedbacks for further IVO works.

Ilya Enkovich: Vectorization for Intel AVX-512

Intel AVX-512 is a major instructions set extension which introduces new 512-bit vector register, mask registers, embedded masking, embedded rounding and exception control, new instructions etc. This presentation focuses on a new masking feature and how it can be used by GCC to improve loops vectorization for Intel Architecture

Olga Golovanevsky: Memory Layout Optimizations of Structures and Objects.

This presentation comprises of two parts: first, we want to share with GCC community number of aspects related to our experience of porting data layout optimizations (aka struct-reorg) to LTO/WHOPR infrastructure (version 4.9.0). The second part intends to show our exploratory work on extending original idea of "changing C-like structure memory layout" to "changing object memory layout" in C++ language paradigm.

Being previously intended to run under whole-program flag, data layout optimizations required restructuring to run under LTO/WHOPR with strict division into three separate optimization phases: (1) compile time collection of the data, (2) propagation and analyses of combined data, and (3) transformation phase. Technical routines were added for streaming of both collected data and optimization decisions in and out of data layout specific section in lto object file.

Resolution of symbols, provided by linker at phase (2), essentially simplified data collection and analysis phases ((1) and (2)), mostly relieving conservative assumptions of type-escape analysis. If all symbols related to a candidate type (as variables, or functions with function parameters or return values of a candidate type) are defined inside current compilation (i.e. specified as PREVAILING_DEF_IRONLY) then we can transform a candidate type safely, given it was never casted. In previous version of type-escape analysis we conservatively assumed that address taken operation causes a candidate type to "escape" (since, for example, its address might be passed as actual parameter into a function defined externally). With known symbol resolutions it is possible to transform a candidate types across function boundaries even when function parameters as pointers to a candidate type, as, for example, happens in Spec2006 462.libquantum benchmark. However, reasons like custom defined malloc; a presence of bitfields; an inline assembly; or a pointer arithmetic with a candidate type, might still prevent a candidate type from being transformed.

When dealing with the phase (3), we used both function and variable transforms, extending compiler pass manager with varibale_tranform () support. In our case, generation of new global variables should precede individual functions transformation, where these variables should be visible. We leveraged existing jump-function mechanism to change function definitions. For example, in case of structure peeling, if a function parameters is a pointer to a candidate type, then this function prototype is changed to receive multiple parameters that are pointers to the new peeled types.

Finally we extended original set of data layout optimizations, comprised of structure splitting, peeling and reordering, with structure inlining. Being a combination of a substructure full peeling with subsequent inlininginto containing structure, this optimization has definite benefits of reducing number of redirections required for individual field accesses. Also it opens further opportunities for containing structure reordering. As a result of applying this optimization on SPEC 2006 libquentum benchmark, we achieved approximately +30% run time improvement on Intel Xeon and armv7 platforms, and as high as +90% on armv8 platform, that emphasize this optimization efficiency for paltforms with medium size L2 cache.

The idea of extending structure layout reorganization on C++ objects come up in analysis of multiple applications. It became clear that an order in which data class members are specified inside class bears no particular meaning for developers, rather taking in account logical convenience than performance of application. The purpose of initial experiment was to estimate efficiency of simple reordering of data members, done in the manner similar to structure reordering, when guided by profile information. Further experiments aimed reordering of virtual table entries based on profile or/and trace information. We present results of this experiments applied to pagerank <http://en.wikipedia.org/wiki/PageRank> - multithreded and distributed application based onGraphLab <https://dato.com/products/create/open_source.html> library.

For both parts of our presentation we are looking forward for an open discussion and a feedback from GCCcommunity, and would like to invite its members to actively participate in their development.

Jan Hubicka: Types and type based optimizations in GCC

GCC middle-end is able to represent types of C, C++, Java, Fortran, Ada, and Go. Number of interesting questions arise during the link-time optimization when the types originating from different languages are merged to single translation unit and unified semantics needs to be established. I will discuss current representations, issues and future plans. I will also cover current type based optimizations - alias analysis and devirtualization and possible new uses in the future.

Martin Jambor: Compiling for HSA accelerators with GCC

The talk will describe the HSA development branch of gcc. We will describe what it can and cannot do, how it is structured so that it does not need LTO, and how it is going to co-exist with the other LTO-based accelerators. Because our effort primarily targets OpenMP 4.0, we will also describe what changes we deemed necessary in OpenMP expansion so that we can generate efficient GPGPU code. We will conclude by presenting plans to merge the branch to trunk.

Martin Liska: Inter-producedural Identical Code Folding in GCC

I will talk about initial pass implementation of IPA ICF, which is part of the GCC compiler, starting from version 5.0. Presentation will include comparison with a current implementation in GOLD linker and unexpected issues observed during development of the pass. Moreover, I would like to introduce possible improvements which can make ICF even more powerful. Finally, I will explain how can we adapt current infrastructure to replace tree-ssa-tail-merge pass comparison engine.

Mikhail Maltsev: High Level Loop Optimizations in GCC

We present a simple framework for performing iteration domain modifying loop transformations with an implementation of loop splitting and thoughts on how to implement loop fusion.

Michael Meissner: Gnu PowerPC support in 2015

This talk will cover changes that we have done for PowerPC support in 2014-2015. Among other things this talk will include:

Bill Schmidt, Michael Gschwind: Supporting Vector Programming on a Bi-Endian Processor Architecture

The POWER instruction set architecture is designed to support both big-endian and little-endian memory models. However, many of the instructions designed for vector support assume that vector elements in registers appear in big endian order, that is, with the lowest-numbered vector element in the most significant portion of the register. This is not particularly natural for programmers used to vector programming on little-endian architectures such as x86. We have designed a vector programming model that provides more natural interfaces for porting from standard little-endian environments, and that facilitates writing vector library code that runs in both endian modes with minimal changes. We also have an alternate model to facilitate porting existing big-endian POWER vector code to little-endian. This talk will outline some of the issues faced in designing a sensible vector programming model on a bi-endian architecture with a big-endian bias, and how we've addressed them. We will also discuss some of the more interesting implementation and performance issues we've encountered.

Siddhesh Poyarekar: Tunables for the C Library

The GNU C library has a number of magic constants that were decided based on performance and resource data available when they were first introduced. Those constants may be suboptimal for some loads and may have even been rendered incorrect due to advances in other components or hardware. Further, there are a number of global configuration variables that were added over the years to work around the problems posed by such magic constants (the MMAP_THRESHOLD in malloc is one such example). These variables have ad hoc names and each have their own scheme of initialization and maintenance.

A tunables framework aims to provide a layer that manages such global configuration and provide a unified interface to programmers and system administrators an integrators to tweak this configuration.This talk describes the architecture of this layer and the interface it provides. If the feature is not ready by then, this would be a BoF to decide on the architecture and interface of the tunables layer.

Hafiz Abid Qadeer: What is new in DWARF5

The version 5 of the DWARF standard is expected to be published later this year. In this talk, I will talk about the new features of the DWARF5 and where these features can be helpful for the debug information consumers.

Torvald Riegel: Updating glibc concurrency

I will give an overview of recent and future changes to concurrent code in glibc. In particular, I will cover (1) the transition to a C11-like memory model and data-race-freedom, (2) updates to the futex documentation and how this relates to POSIX/C++ mutex destruction requirements, and (3) the new semaphore and condition variable algorithms. I will also give an outlook on ongoing or future work: read-write lock scalability and spinning vs. blocking.

Note: We also thought about perhaps proposing a glibc BoF. This presentation could also be a BoF with a presentation side to it. I'm not sure whether you'd want to provide different slots for presentations and BoFs, or put one of those in parallel tracks but not the other. Therefore, if you think a BoF should be better, just let me know.

Deshpande Sameera: Improving the Effectiveness and Generality of GCC Auto-Vectorization

The presentation will demonstrate the approach to improve efficiency and generality of vectorization in GCC by

Dodji Seketeli and Sinny Kumari: ABI comparison with Libabigail based tools: state of the onion

slides

Many interesting developments have occurred in the Libabigail space since our last presentation at the 2014 edition of GNU Cauldron in Cambridge.

The purpose of this talk is to walk the audience through the main achievements, provide guidelines about the ways upstream projects and distributions can now include continuous ABI comparison into their work flow and give hints about the new challenges that we see coming next in this area.

Ulrich Weigand: Supporting the new IBM z13 mainframe and its SIMD vector unit

The IBM z13, the latest model of the IBM z Systems line of mainframe computers, has been recently announced. For the first time in the history of z/Architecture, this model provides a Single Instruction Multiple Data (SIMD) vector unit, intended to speed workloads such as analytics and mathematical modeling.

Supporting a significant new architecture feature like this on Linux requires changes across the stack, starting from the kernel and system libraries, through assemblers and related binary utilities, up to all compilers and debugging tools.

In this talk I'll give an overview of the z13 architecture changes, in particular the integer, floating-point, and string vector instructions. I'll also describe the ABI choices we made to support SIMD, as well as the language extensions we defined to allow source code to exploit vector instructions across the various compilers on the platform. In particular, I'll address similarities and differences to vector extensions on other platforms, like VMX/VSX on Power.

Finally, I'll report on where we stand in implementing those new features across the Linux on z ecosystem, with particular focus on the implementation in the GNU toolchain, and address a couple of challenges that still need to be resolved.

Kirill Yukhin: OpenMP 4 Offloading Features implementation in GCC

GCC 5 was released with support of OpenMP 4.0 offloading to Intel Xeon Phi (Knights Landing) target. Offloading infrastructure was implemented in a very common way, so almost any accelerator support can be integrated easily (provided corresponding backend is contributed). This talk presents high level overview of offloading internals. Xeon Phi is taken as an example of the target card.

Claudiu Zissulescu: Scheduling for ARC HS cores

Synopsys's ARC HS Family processors are 32-bit high-performance CPUs that can be customized for a wide range of uses, from deeply embedded to high-performance host applications. To achieve the desired performance level, we need to properly schedule the instruction stream on two ALUs designed for a low-latency configuration. The present talk will cover the GCC backend port modifications that were required to obtain the desired performance. We will cover the following topics:

Tutorials

Torvald Riegel: Modern concurrent code in C/C++

In this tutorial, I will present foundations, tools, and guidelines for how to write modern concurrent code in C and C++. I will (1) give a brief introduction to concurrency and the kind of reasoning necessary to write correct concurrent code, (2) explain the C11/C++11 memory model and data-race-freedom, why it should be used as foundation, and tools that can make this easier, (3) discuss trade-offs between complexity and performance of different synchronization programming abstractions, and (4) propose guidelines for how to document concurrent code so that it is easier to maintain for other people.

BoFs

Peter Bergner: PowerPC BOF

Carlos O'Donell: GNU C Library BOF.

The GNU C Library is used as the C library in the GNU systems and most systems with the Linux kernel. The library is primarily designed to be a portable and high performance C library.It follows all relevant standards including ISO C11 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known.

This BOF aims to bring together developers of other components that have dependencies on glibc and glibc developers to talk about the following topics:

David Edelsohn, Jim Wilson: GCC Steering Committee Q&A

David Edelsohn: GCC Cost Model BoF

Discuss cost models descriptions within GCC (RTX costs, register costs, memory costs, addressing costs, etc.) and utilization of those costs models to affect compiler optimization heuristics (Combine, register allocator, loop optimization, etc.).

Jan Hubicka: LTO BoF

Siddhesh Poyarekar: glibc microbenchmarking and whole system benchmarking BoF

The aim of this BoF is to discuss the direction of the benchmarks going forward and also come up with a framework for whole system benchmarking that feeds back into the glibc development to help us decide on algorithmic tweaks and also tweaks to tunables within the library.

Ramana Radhakrishnan: BoF for the ARM / AArch64 ports

Aditya Kumar, Sebastian Pop: Loop optimizer and vectorization BOF

We would like to discuss the state of GCC's vectorizer compared to other compilers, and areas that need improvement. We will present testcases and performance differences for the opportunities of vectorization.

The second point to be discussed is how to use Graphite to enable more loops to be vectorized in a similar way as Polly drives the vectorizer of LLVM. We will lay out a plan of action to get better loop transforms for vectorization.

Roland McGrath: BoF for glibc hackers.

Carlos O'Donell, Marek Polacek: Continuous Integration

The topics are Continuous Integration (Carlos O'Donell), news from libabigail (Dodji Seketeli), and I might utter a few words about how we do Fedora mass rebuilds.

Martin Jambor: Accelerator BoF

BoF to bring together all those involved in supporting compilation for accelerators so that we can coordinate and share experience and expectations.

Accomodation

The conference venue can be conveniently reached by the public transport, either by Metro (subway, underground train) line A (green line), to the station of Malostranská and then by a short walk, or by the tramway lines No. 12, 20 or 22 to the stop of Malostranské náměstí. The tramway stop is situated right across the square to the conference venue. A public traffic schemes can be downloaded at http://www.dpp.cz/en/transport-around%20prague/transit-schematics/.

Because of the location just in the center of Prague, it is easy to check lodging options on common booking sites, like http://www.marys.cz/.

Some options in walking distance from the venue include:

None: cauldron2015 (last edited 2016-09-07 09:36:01 by SimonCook)