Advanced Computing, Mathematics and Data
A Publishing Machine
Gioiosa gets the word out about what it takes to build viable extreme-scale systems
Of late, Roberto Gioiosa, a research scientist with PNNL’s High Performance Computing group, seemingly has been a publishing machine with several papers slated for presentation at the upcoming 31st IEEE International Parallel & Distributed Processing Symposium, or IPDPS 2017, and the Seventh International Workshop on Accelerators and Hybrid Exascale Systems; a poster at the International Symposium on Cluster, Cloud and Grid Computing; and a chapter in the recently released book, Rugged Embedded Systems: Computing in Harsh Environments.
One IPDPS paper, “Argo NodeOS: Toward Unified Resource Management for Exascale,” co-authored with scientists from Argonne and Lawrence Livermore national laboratories, examines the effort to create a new operating system, OS, expressly for exascale computing systems. In their work, the authors describe NodeOS, an approach that affords integrated control over individual hardware and software resources on HPC nodes, creating “compute containers” that focus on hardware resource management to simplify complex workloads.
Roberto Gioiosa with the Data Vortex system, Jolt, which is part of PNNL’s Center for Advanced Technology Evaluation. Enlarge Image.
Gioiosa explained that a computer’s OS manages the interactions between software and hardware, or, in more familiar terms, the OS is what makes a person’s laptop or desktop work in a way that appears seamless to users. The problem with most OS is that as the number of active applications increases, the OS currently has very little knowledge about how to prioritize the activities, so they conflict and “run into each other.” One recognizable result of that conflict is the “wheel of death” that can appear when, for example, anti-virus software suddenly launches while other applications are running. This considerably slows system operations. While this can be a mild nuisance for a laptop user, the problem becomes considerably pronounced when you consider the scope of an exascale system with its complex hardware and workflows.
“For exascale, they want a different execution model in which scientific simulations and data analytics run next to each other with improved efficiency,” Gioiosa said. “NodeOS isolates activities in a single compute node and provides a task scheduler that prioritizes the important tasks that need to be completed before engaging the lesser tasks that can wait. For example, physics simulations that produce data are prioritized over data analytics that consume data.”
In a demonstration for DOE last summer, NodeOS showed it could efficiently run complex workloads faster than even the current OS can run a single operation.
According to Gioiosa, the Linux-based custom NodeOS software currently is housed at the National Energy Research Scientific Computing Center (Berkeley, Calif.), but it also is available for use at PNNL to help run applications in a better way.
Another IPDPS paper, “Exploring Data Vortex Systems for Irregular
Applications,” features several PNNL co-authors, including Antonino Tumeo (HPC group), Jian Yin (ACMD Division Data Sciences), and Thomas Warfel (with NSD's Sensors & Measurement Systems). In it, they examine the Data Vortex self-routed hierarchical architecture and its potential to support emerging irregular applications, that is, ones with unpredictable communication patterns that currently do not work well with traditional HPC systems. These irregular applications, which employ algorithms to process enormous data sets and require large-scale clusters to provide sufficient memory and necessary performance, are becoming ever-more commonplace, especially for knowledge-discovery and data analytics applications in fields such as bioinformatics, cybersecurity, and machine learning.
With its distinctly designed network that employs synchronous timing and distributed traffic-control signaling, Data Vortex operates opposite of current supercomputers that have been optimized for scientific simulations where communications are mostly predictable. In their examination, Gioiosa and his co-authors analyzed the performance improvement provided by the Data Vortex system over corresponding message passing interface (MPI) implementations. They determined that the Data Vortex structure showed distinct promise for improving performance of irregular applications that experience difficulties when aggregating messages directed to the same destination.
“While Data Vortex still is in its nascent operating stages and is not a simple plug-and-play improvement, it has definite promise to be a game changer for Big Data, machine learning, and graph applications if they can be restructured to exploit its hardware features,” Gioiosa added.
The Road Toward Recovery
For his contribution to Rugged Embedded Systems, Gioiosa took a closer look at what it will take to keep supercomputers operational—or resilient—in the age of exascale. Despite advances that have put powerful computers in your pocket (cell phones), solving big problems that affect domains spanning quantum mechanics, climate research, physical simulations, and much more still requires big computers. Over time, building supercomputers out of common off-the-shelf components, known as COTS, has helped curtail some of the costs of these systems, resulting in their broader availability. However, COTS also introduced a new level of “harsh” operation, where such components are more vulnerable than more costly, special-purpose parts that formed the hallmark of early HPC systems.
According to Gioiosa, by their sheer construction, HPC systems are vulnerable to hard errors because they have so many components. In turn, they also can be plagued by soft errors due to broken fixtures or design mistakes. Add in thermal and mechanical stresses, and it becomes a recipe for system failure that is detrimental to achieving the goals for future exascale systems.
As a remedy, Gioiosa examined task-based programming models that follow a “divide-and-conquer” approach, where a large problem is subdivided into smaller sub-problems recursively until a task is identified. This approach is good for exascale systems because it allows for adapting a computation to a given set of resources whose availability varies in time and for performing automatic load balancing, meaning application traffic is distributed to increase capacity and improve reliability.
“On the plus side, task-based programming models also can help curtail soft errors,” Gioiosa explained. “If a fault is detected before a new task initiates and the task at hand has only modified local variables, the current task potentially could re-execute without issuing notice of the detected failure. This is particularly well suited for data-flow programming models, where tasks exchange data through input and output parameters.”
- Gioiosa R. 2016. “Resilience for extreme scale computing.” Chapter 5 in Rugged Embedded Systems: Computing in Harsh Environments, pp. 123-148, eds. A Vega, P Bose, and A Buyuktosunoglu, Morgan Kaufmann, Cambridge, Massachusetts. DOI: 10.1016/B978-0-12-802459-1.00005-1.
- Gioiosa R, A Tumeo, J Yin, T Warfel, D Haglin, and S Betelu. 2017. “Exploring Data Vortex Systems for Irregular Applications.” To be presented at: 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS 2017). May 29-June 02, 2017, Orlando Florida.
- Peng IB, S Markidis, G Kestor, P Cicotti, E Laure, and R Gioiosa. 2017. “Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In: Seventh International Workshop on Accelerators and Hybrid Exascale Systems (AsHES 2017) to be held in conjunction with the 31st IEEE International Parallel and Distributed Processing Symposium. May 29, 2017, Orlando, Florida.
- Perarnau S, JA Zounmevo, M Dreher, BC Van Essen, R Gioiosa, K Iskra, MB Gokhalez, K Yoshii, and P Beckman. 2017. “Argo NodeOS: Toward Unified Resource Management for Exascale.” To be presented at: 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS 2017). May 29-June 02, 2017, Orlando Florida.
- Rivas-Gomez S, S Markidis, IB Peng, E Laure, G Kestor, and R Gioiosa. 2017. “Extending Message Passing Interface Windows to Storage.” Poster in: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2017), May 14-17, 2017, Madrid, Spain.