Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts. We extend the memory management of Linux to be able to subdivide NUMA memory nodes, allowing better resource partitioning among processes running on the same node. We also add support for memory-mapped access to node- local, PCIe-attached NVRAM devices and introduce a new scheduling class targeted at parallel runtimes supporting user- level load balancing.
Published: April 16, 2022
Citation
Perarnau S., J.A. Zounmevo, M. Dreher, B.C. Van Essen, R. Gioiosa, K. Iskra, and M. Gokhale, et al. 2017.Argo NodeOS: Toward Unified Resource Management for Exascale. In IEEE 31st International Parallel and Distributed Processing Symposium (IPDPS 2017), May 29-June 2, 2017 Orlando, FL, 153-162. Los Alamitos, California:IEEE Computer Society.PNNL-SA-123277.doi:10.1109/IPDPS.2017.25