DCAT at CGO 2016

International Workshop on Dynamic Code Auto-Tuning

The READEX project is organising the International Workshop on Dynamic Code Auto-Tuning (DCAT) as part of the workshop programme of the 2016 International Symposium for Code Generation and Optimization (CGO), held March 12, 2016 in Barcelona.

The workshop will consist of a selection of short invited talks, covering the basics and concepts behind auto-tuning, novel developments on tuning strategies, different approaches from the embedded systems domain as well as the area of high-performance computing, and background information on the READEX project (status, goals, outlook etc). The workshop is meant to bring together researchers from different areas to discuss current and future challenges in auto-tuning and to give interested users insight into the state-of-the-art in auto-tuning.

Schedule

The workshop will run from 2:00pm to 6:00pm on March 12, 2016 at BNC-B. The following schedule is preliminary and might be subject to change.

Session 1 (2:00pm – 3:30pm)

Michael Gerndt: Welcome

Martin Schulz: Performance Tuning in a Power-Limited World

Power and energy consumption are critical design factors for any next generation large-scale HPC system. The costs for energy are shifting the budgets from investment to operating costs, and more and more often the size of systems will be determined by its power needs. As a consequence, it is likely that we will end up with power limited systems that can no longer power all their components at peak power. In these systems, system software must manage power caps at all layers of the system, which creates new tuning challenges. In this talk I will cover a novel runtime system, Conductor, that optimizes application performance under strict power limits by first determining best application configurations for a series of power limits and then dynamically adjusts power limits to ensure fastest possible progress on the critical path. As a result, Conductor can reduce execution time compared to naïve static power distributions by up to 30%.

Dr. Martin Schulz is a Computer Scientist at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL). He earned his Doctorate in Computer Science in 2001 from the Technische Universitaet Muenchen (Munich, Germany) and also holds a Master of Science in Computer Science from the University of Illinois at Urbana Champaign. He has published over 200 peer-reviewed papers. He currently serves as the chair of the MPI forum, the standardization body for the Message Passing Interface. He is the PI for the Office of Science X-Stack project “Performance Insights for Programmers and Exascale Runtimes” (PIPER) as well as for the ASC/CCE project on Open|SpeedShop, and is involved in the DOE/Office of Science exascale projects CESAR, ExMatEx, and ARGO. Dr. Schulz’s research interests include parallel and distributed architectures and applications; performance monitoring, modeling and analysis; memory system optimization; parallel programming paradigms; tool support for parallel programming; power-aware parallel computing; and fault tolerance at the application and system level. Dr. Schulz was a recipient of the IEEE/ACM Gordon Bell Award in 2006 and an R&D100 award in 2011.

Marc Geilen: Scenario-based Design of Dynamic Streaming Applications on Predictable Embedded Multiprocessors

Data-intensive, dynamic, streaming applications for heterogeneous embedded systems display wide variations in execution times and resource usage. Some of this variation can be classified off-line and predicted ahead of time online. Other variation cannot be anticipated and should be addressed by online tuning methods. This presentation introduces the scenario-driven design methodology for embedded systems. It advocates the combined use of offline profiling and classification and online exploitation of the classification to adapt to dynamic changes that can be predicted. The classification discerns system scenarios, which are run-time situations that are similar with respect to multi-objective cost criteria. The run-time system responds in a feedforward manner to anticipated scenarios. Unpredictable variation in execution time and other costs occurring online are addressed with online calibration of the scenarios and other feedback based auto-tuning methods. We introduce a dedicated model-of-computation (Scenario-Aware Data Flow) that is specifically designed and equipped to support the methodology in analysis and synthesis for predictable multiprocessor platforms.

Marc Geilen is an assistant professor in the Department of Electrical Engineering at Eindhoven University of Technology. He holds an MSc in Information Technology and a Ph.D. from the Eindhoven University of Technology. In 2010, he was a McKay Visiting Professor at the University of California, Berkeley. His research interests include modeling, simulation and programming of multimedia systems, formal models-of-computation, model-based design, multiprocessor systems-on-chip, networked embedded systems and cyber-physical systems, and multi-objective optimization and trade-off analysis. He has served as member or chair of more than 25 technical program committees. He has been involved with several Dutch and European research projects and programs on the above topics with strong industrial connections.

Session 2 (4:00 – 6:00 pm)

Andrea Bartolini: Trends in Energy and Thermal efficiency of High Performance computing infrastructure

Supercomputers peak performance is expected to reach the ExaFlops (1018) scale in 2023. With almost 100x more computational capabilities than today’s most powerful supercomputers, the Exascale machine will revolutionise many aspects of our society. However, to be sustainably powered at the Exascale level, current supercomputers must achieve a “quantum leap” in energy efficiency, pushing towards the goal of 50 GFlops/W. In the first part of the talk we evaluate and quantify the impact of variability on Supercomputer’s energy-performance tradeoffs under a wide range of workloads intensity. Our experiments demonstrate that variability comes from hardware component mismatches as well as from the interplay between run-time energy management and workload variations. Thus, variability has a significant impact on energy efficiency even at the moderate scale of the Eurora machine, thereby substantiating the critical importance of variability management in future green supercomputers. In the second part of the talk I will present the Antarex project which aim to express by a Domain Specific Language (DSL) the application self-adaptivity and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to the Exascale level.

Andrea Bartolini received a Ph.D. degree in Electrical Engineering from the University of Bologna, Italy, in 2011. He is currently a postdoc researcher in the Department of Electrical, Electronic and Information Engineering Guglielmo Marconi (DEI) at the University of Bologna. He also holds a post-doc position in the Integrated Systems Laboratory at ETH Zurich. His research interests concern dynamic resource management ranging from embedded to large scale HPC systems with special emphasis on software-level thermal and power-aware techniques. His research interest also includes ultra-low power design strategies for biosensors nodes operating in near-threshold.

Thomas Fahringer: Multi-objective auto-tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage

The increasing complexity of modern multi- and many-core hardware design makes performance tuning of parallel applications a difficult task. In the past, auto-tuners have been successfully applied to minimize execution time. However, besides execution time, additional optimization goals have recently arisen, such as energy consumption or computing costs. Therefore, more sophisticated methods capable of exploiting and identifying the trade-offs among these goals are required. In this work, we present and discuss results of applying a multi-objective search-based auto-tuner to optimize for three conflicting criteria: execution time, energy consumption, and resource usage. We examine a method, called RS-GDE3, to tune HPC codes using the Insieme parallelizing and optimizing compiler. Our results demonstrate that RS-GDE3 offers solutions of superior quality than those provided by a hierarchical and a random search at a fraction of the required time (5%) or energy (8%). A comparison to a state-of-the-art multi-objective optimizer (NSGA-II) shows that RS-GDE3 computes solutions of higher quality.

Thomas Fahringer is a Professor of Computer Science at the University of Innsbruck in Austria. He is leading a research group in the area of distributed and parallel processing which develops the ASKALON programming environment to support researchers worldwide in various fields of science and engineering to develop, analyse, optimize and run distributed applications for Cloud systems. Furthermore, he leads a research team that created the Insieme parallelizing and optimizing compiler for heterogeneous multicore parallel computers ranging from mobile systems to high-end supercomputers.  Before joining the University of Innsbruck, Fahringer worked as an assistant and associate professor at the University of Vienna in Austria where his research focused on compiler technology and tools for high performance applications. Fahringer is a graduate of the Technical University of Vienna with a doctorate in computer science. Fahringer was involved in numerous national and international research projects including 12 EU funded projects, three of them were coordinated by him. Fahringer has published 5 books, 35 journal and magazine articles and more than 200 reviewed conference papers including 4 best/distinguished IEEE/ACM/Springer papers.

Joseph Schuchart: Introducing the READEX Project: Runtime Exploitation of Application Dynamism for Energy-efficient eXascale Computing

The READEX project aims at combining technologies from two different ends of the computing spectrum. Based on the system scenario methodology from the embedded systems domain, a tools-aided methodology for dynamic auto-tuning for performance and energy-efficiency of scalable parallel applications will be developed. During a design-time analysis, the application will be analyzed for dynamism using the Periscope Tuning Framework (PTF). A tuning model containing optimal system configurations for different program regions will be derived. At runtime, i.e., during production runs, this tuning model will be applied by predicting the upcoming runtime situation and adjusting the system configuration if deemed beneficial. Additionally, a calibration step will be used to further improve the tuning model and to handle previously unseen runtime situations. This talk will give a short introduction into the background and ideas of the READEX project.

Joseph Schuchart is a computer scientist at the Center for Information Services and High Performance Computing (ZIH) at the Technische Universität Dresden. He has a strong background in performance analysis tools like Score-P and VampirTrace. After a research stay at the Oak Ridge National Laboratory, Joseph Schuchart joined the energy-efficiency research group at ZIH and has been working on the HDEEM project on building a scalable power-instrumentation framework in a collaboration with Bull. Joseph Schuchart is currently coordinating the Horizon 2020 READEX project.

Organization Committee

Joseph Schuchart, Technische Universität Dresden
Prof. Michael Gerndt, Technische Universität München
Renato Miceli, SENAI CIMATEC