Nathan Tallent

Chief Computer Scientist

Biography

Nathan Tallent is a chief computer scientist in the Future Computing Technologies Group within the Advanced Computing, Mathematics, and Data Division at Pacific Northwest National Laboratory.

Tallent's research is motivated by emerging challenges in distributed systems, scientific workflows, machine learning, and data management. He leads activities in continuum computing and the Performance Lab for EXtreme Computing and daTa where his contributions have spanned the challenges of performance measurement, modeling, bottleneck diagnosis, and optimization; and includes special attention to bottlenecks in networks, storage, and memory. He has made notable contributions to performance tools, both for performance modeling and for parallel performance analysis. He has more than 70 peer-reviewed publications, serves on several reviewing committees, and received a DOE Early Career award. He is one of the original developers of HPCToolkit, a widely used suite of performance tools on supercomputers. He received a Ph.D. in 2010 from Rice University.

More information can be found at his personal page.

His Google Scholar page can be found here.

Tallent has led development of several research software prototypes for distributed AI systems, scientific workflows, and performance analysis and prediction.:

Distributed Workflows
- DataFlowDrs is a new comprehensive suite of tools (DataLife, DaYu, FastFlow, FlowForecaster) for performance optimization of scientific HPC workflows that automates several previously difficult manual analyses and substantially reduces the impact of data flow bottlenecks.
- TAZeR, BigFlowSim: Remote I/O framework; and workflow I/O simulator-emulator and trace generator.
AI Systems • Data Analytics
- MassiveGNN: A framework for scaling graph neural networks to new levels with communication-efficient training for massive (distributed) GNNs within the state-of-the-art Amazon DistDGL (distributed Deep Graph Library).
- PowerMorph and PowerTrip, for addressing the power constraints of large-scale training with federated heterogeneous datacenter power and intelligent adaptation of demand-response power.
- SamIAm: Microstructure segmentation for transmission electron microscopy that recognizes geometric and textural features and that is based on semantic boosting of the Segment Anything Model (SAM).
Hardware/Software Co-design • Application Performance Analysis
- MemGaze/MemFriend, a memory analysis toolset that combines low-overhead measurement; sophisticated, high-resolution trace analysis; and emulation of memory-placement policies.
- OCEAN (Open-source CXL Emulation at Hyperscale Architecture and Networking), an emerging tool for emulating CXL-extended distributed memory systems.
- Palm: Palm is a suite of performance modeling tools (Palm, Palm-Task, Representative-Paths, Palm/FastFootprints, MIAMI-NW) to assist performance analysis and predictive model generation.
- HPCToolkit: HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to GPU-accelerated supercomputers.
Workload Benchmarking and Characterization
- SEAK Suite: The SEAK Suite is a collection of constraining problems for common embedded computing challenges.
- PERFECT Suite: The PERFECT Suite consists of kernels and applications for evaluating tradeoffs between performance, power, and architecture within the domains of radar and image processing.

He contributed to OpenAD (info), a tool for automatic differentiation (AD) of numerical computer programs.

Research Interests

Algorithm Analysis
Compilers
Computer Information Systems
Hardware Analysis
Networking Software
Parallel Algorithms
Parallel Computing
Performance Benchmarking
Programming Languages
Workload Characterization

Education

PhD in Computer Science, Rice University
MS in Computer Science, Rice University
MDiv in Theology, Westminster Theological Seminary
BA in Computer Science, Rice University

Affiliations and Professional Service

Institute of Electrical and Electronics Engineers (IEEE) Computer Society (CS)
Association for Computing Machinery (ACM)

Awards and Recognitions

Department of Energy (DOE) Early Career (FY 2021)
Best paper nominees: International Symposium on Workload Characterization (IISWC) 2018, 2015, Programming Language Design and Implementation (PLDI) 2009
2009 ACM/IEEE-CS George Michael Memorial HPC Fellowship

Publications

2021

Bel O., J. Pata, J. Vlimant, N. R. Tallent, J. Balcas, and M. Spiropulu. 2021. "Diolkos: Improving ethernet throughput through dynamic port selection." In Proceedings of the 18th ACM International Conference on Computing Frontiers. CF 2021 Virtual Event, Italy, May 11–13, 2021, 83–92. New York, New York: ACM. PNNL-SA-160853. doi:10.1145/3457388.3458659
Gawande N. A., S. Ghosh, M. Halappanavar, M. H. Khan, A. Kalyanaraman, M. Minutoli, and N. R. Tallent, et al. 2021. "ExaGraph: Graph and Combinatorial Methods for Enabling Exascale Applications." The International Journal of High Performance Computing Applications, 35 (6):109434E. PNNL-SA-155863. doi:10.1177/10943420211029299
Ghosh S., N. R. Tallent, and M. Halappanavar. “Characterizing performance of graph neighborhood communication patterns.” IEEE Transactions on Parallel and Distributed Systems, August 2021. doi:10.1109/TPDS.2021.3101425
Ghosh S., N. R. Tallent, M. Minutoli, M. Halappanavar, R. Peri, and A. Kalyanaraman. 2021. "Single-node Partitioned-Memory for Huge Graph Analytics: Cost and Performance Trade-offs." In Proceedings of the International Conference for High Performance Computing, Network, Storage and Analysis. SC 2021, Virtual, November 14–19, 2021, No. 55. New York, New York: Association for Computing Machinery. PNNL-SA-161359. doi:10.1145/3458817.3476156

2020

Barik R., M. Minutoli, M. Halappanavar, N. R. Tallent, and A. Kalyanaraman. 2020. "Vertex Reordering for Real-world Graphs and Applications: An Empirical Evaluation." In IEEE International Symposium on Workload Characterization. IISWC 2020, Beijing, China, October 27–30, 2020, 240–251. Piscataway, New Jersey: IEEE. PNNL-SA-154319. doi:10.1109/IISWC50251.2020.00031
Bel O., K. Chang, N. R. Tallent, D. Duellmann, E.L. Miller, F. Nawab, and D. D. E. Long. “Geomancy: Automated performance enhancement through data layout optimization.” In 36th Intl. Conf. on Massive Storage Systems and Technology, October 2020 doi:10.1109/ISPASS48437.2020.00025
Friese R. D., B. Mutlu, N. R. Tallent, J. D. Suetterlein, and J. F. Strube. 2020. "Effectively Using Remote I/O For Work Composition in Distributed Workflows." In IEEE International Conference on Big Data. Big Data 2020, Atlanta, GA, December 10–13, 2020, 426–433. Piscataway, New Jersey: IEEE. PNNL-SA-155757. doi:10.1109/BigData50022.2020.9378352
Gawande N. A., J. A. Daily, C. Siegel, N. R. Tallent, and A. Vishnu. 2020. "Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing." Future Generation Computer Systems 108. PNNL-SA-134513. doi:10.1016/j.future.2018.04.073
Kilic O. O., N. R. Tallent, and R. D. Friese. 2020. "Rapid Memory Footprint Access Diagnostics." In 2020 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2020, Boston, MA, August 23–25, 2020, 273–284. Piscataway, New Jersey: IEEE. PNNL-SA-151215. doi:10.1109/ISPASS48437.2020.00047
Li A., S. Song, J. Chen, J. Li, X. Liu, N. R. Tallent, and K. J. Barker. 2020. "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect." IEEE Transactions on Parallel and Distributed Systems 31(1): 94–110. PNNL-SA-141707. doi:10.1109/TPDS.2019.2928289

2019

Bhuiyan T.H., M. Halappanavar, R. D. Friese, H. Medal, L. de la Torre, A. Sathanur, and N. R. Tallent. “Stochastic programming approach for resource selection under demand uncertainty.” In Dalibor Klusáček, Walfredo Cirne, and Narayan Desai, editors, Job Scheduling Strategies for Parallel Processing, pages 107–126, Cham, 2019. Springer International Publishing. doi:10.1007/978-3-030-10632-4_6
Kilic O. O., N. R. Tallent, and R. D. Friese. 2019. "Rapidly Measuring Loop Footprints." In IEEE International Conference on Cluster Computing. CLUSTER 2019, Albuquerque, NM, September 23–26, 2019. Piscataway, New Jersey: IEEE. PNNL-SA-146801. doi:10.1109/CLUSTER.2019.8891025
Schram M., N. R. Tallent, R. D. Friese, A. Singh, and I. Altintas. 2019. "Application of Deep Learning on Integrating Prediction, Provenance, and Optimization." In Proceedings of the 23rd International Conference on Computing in High Energy and Nuclear Physics. CHEP 2018, EPJ Web of Conferences, 214(06007). PNNL-SA-147454. doi:10.1051/epjconf/201921406007
Suetterlein J. D., R. D. Friese, N. R. Tallent, and M. Schram. 2019. "TAZeR: Hiding the Cost of Remote I/O in Distributed Scientific Workflows." In IEEE International Conference on Big Data. Big Data 2019, Los Angeles, CA, December 9–12, 2019, 383–394. Piscataway, New Jersey: IEEE. PNNL-SA-148879. doi:10.1109/BigData47090.2019.9006418

2018

Bhuiyan T. H., M. Halappanavar, R. D. Friese, H. Medal, L. De La Torre, A. Visweswara Sathanur, and N. R. Tallent. 2018. "Stochastic Programming Approach for Resource Selection under Demand Uncertainty." In 22nd International Workshop on Job Scheduling Strategies for Parallel Processing. JSSPP 2018, Vancouver, BC, May 25, 2018. Klusacek D, W Cirne, and N Desai, eds. Lecture Notes in Computer Science, 11332: 107–126. Cham: Springer. PNNL-SA-130071. doi:10.1007/978-3-030-10632-4_6
Friese R.D., N. R. Tallent, M. Schram, M. Halappanavar, and K. J. Barker. “Optimizing distributed data-intensive workflows.” In Proc. of the 2018 IEEE Conf. on Cluster Computing, pages 279–289. IEEE, September 2018. doi:10.1109/CLUSTER.2018.00045
Gawande N.A., J. A. Daily, C. Siegel, N. R. Tallent, and A. Vishnu. “Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing.” Future Generation Computer Systems, May 2018. doi:https://doi.org/10.1016/j.future.2018.04.073
Li A., S. Song, J. Chen, X. Liu, N. R. Tallent, and K. J. Barker. 2018. "Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite." In IEEE International Symposium on Workload Characterization. IISWC 2018, September 30–October 2, 2018, 191–202. Piscataway, New Jersey: IEEE. PNNL-SA-137642. doi:10.1109/IISWC.2018.8573483
Singh A., I. Altintas, M. Schram, and N. R. Tallent. 2018. "Deep Learning for Enhancing Fault Tolerant Capabilities of Scientific Workflows." In Proceedings of the IEEE International Conference on Big Data. Big Data 2018, Seattle, WA, December 10–13, 2018, (8622509): 3905–3914. Song Y, et al., eds. Piscataway, New Jersey: IEEE. PNNL-SA-143406. doi:10.1109/BigData.2018.8622509
Tallent N. R., N. A. Gawande, C. M. Siegel, A. Vishnu, and A. Hoisie. 2018. "Evaluating On-Node GPU Interconnects for Deep Learning Workloads." In Proceedings of the 8th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems. PMBS 2017, Denver, CO, November 13, 2017, Denver, CO. Hammond S., S., Jarvis, and S. Wright, eds. Lecture Notes in Computer Science 10724: 3–21. Cham: Springer Verlag. PNNL-SA-129849. doi:10.1007/978-3-319-72971-8_1

2017

Friese R. D., N. R. Tallent, A. Vishnu, D. J. Kerbyson, and A. Hoisie. 2017. "Generating Performance Models for Irregular Applications." In IEEE International Parallel and Distributed Processing Symposium. IPDPS 2017, Orlando, FL May 29–June 2, 2017, 317–326. Piscataway, New Jersey: IEEE. PNNL-SA-123945. doi:10.1109/IPDPS.2017.61
Gawande N. A., J. B. Landwehr, J. A. Daily, N. R. Tallent, A. Vishnu, and D. J. Kerbyson. 2017. "Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing." In IEEE International Parallel and Distributed Processing Symposium Workshops. IPDPSW 2017, Lake Beuna Vista, FL, May 29–June 2, 2017, 399–408. Los Alamitos, California: IEEE Computer Society. PNNL-SA-129129. doi:10.1109/IPDPSW.2017.36
Schram M., V. Bansal, R. D. Friese, N. R. Tallent, J. Yin, K. J. Barker, and E. G. Stephan, et al. 2017. "Integrating prediction, provenance, and optimization into high energy workflows." Journal of Physics: Conference Series 898(6): 062052. PNNL-SA-129007. doi:10.1088/1742-6596/898/6/062052
Tallent N.R., D. J. Kerbyson, and A. Hoisie. “Representative paths analysis.” In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 34:1–34:12, New York, NY, USA, November 2017. ACM. doi:10.1145/3126908.3126962
Tallent N.R., N. A. Gawande, C. Siegel, A. Vishnu, and A. Hoisie. “Evaluating on-node GPU interconnects for deep learning workloads.” In Stephen Jarvis, Steven Wright, and Simon Hammond, editors, High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, pages 3–21. Springer International Publishing, December 2017. doi:10.1007/978-3-319-72971-8_1

2016

Tallent N.R., K. J. Barker, D. Chavarrıa-Miranda, A. Tumeo, M. Halappanavar, A. Márquez, D. J. Kerbyson, and A. Hoisie. “Modeling the impact of silicon photonics on graph analytics.” In Proc. of the 11th IEEE Intl. Conf. on Networking, Architecture, and Storage, pages 1–11. IEEE Computer Society, Aug 2016. doi:10.1109/NAS.2016.7549410
Tallent N. R., J. B. Manzano Franco, N. A. Gawande, S. Kang, D. J. Kerbyson, A. Hoisie, and J. Cross. 2016. "Algorithm and Architecture Independent Benchmarking with SEAK." In IEEE International Parallel and Distributed Processing Symposium. IEEE, Chicago, IL, May 23–27, 2016, 63–72. Piscataway, New Jersey: IEEE. PNNL-SA-115612. doi:10.1109/IPDPS.2016.25
Tallent N. R., K. J. Barker, R. Gioiosa, A. Marquez, G. Kestor, S. Song, and A. Tumeo, et al. 2016. "Assessing Advanced Technology in CENATE." In Proceedings of the IEEE International Conference on Networking, Architecture, and Storage. NAS 2016, Long Beach, CA, August 8–10, 2016. Piscataway, New Jersey: IEEE. PNNL-SA-119257. doi:10.1109/NAS.2016.7549392
Vishnu A., H. van Dam, N. R. Tallent, D. J. Kerbyson, and A. Hoisie. “Fault modeling of extreme scale applications using machine learning.” In Proc. of the 30th IEEE Intl. Parallel and Distributed Processing Symp., pages 222–231, Los Alamitos, CA, USA, May 2016. IEEE Computer Society. doi:10.1109/IPDPS.2016.111

2015

Gawande N. A., J. B. Manzano Franco, A. Tumeo, N. R. Tallent, D. J. Kerbyson, and A. Hoisie. 2015. "Power and Performance Trade-offs for Space Time Adaptive Processing." In IEEE 20th International Conference on Application-specific Systems, Architectures and Processors. ASAP 2015, Toronto, Canada, July 27–29, 2015, 41–48. Piscataway, New Jersey: IEEE. PNNL-SA-110779. doi:10.1109/ASAP.2015.7245703
Halappanavar M., M. Schram, L. de La Torre, K. Barker, N. R. Tallent, and D. Kerbyson. “Towards efficient scheduling of data intensive high energy physics workflows.” In WORKS '15: Workshop on Workflows in Support of Large-Scale Science, held in conjunction with SuperComputing 15, November 2015. doi:10.1145/2822332.2822335
Tallent N. R., A. Vishnu, H. van Dam, J. A. Daily, D. J. Kerbyson, and A. Hoisie. 2015. "Diagnosing the Causes and Severity of One-sided Message Contention." In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP '15, San Francisco, CA, February 7–11, 2015, 130–139. New York, New York: ACM. PNNL-SA-106916. doi:10.1145/2688500.2688516
Venkatesh A., A. Vishnu, K. Hamidouche, N. R. Tallent, D. Panda, D. J. Kerbyson, and A. Hoisie. 2015. "A Case for Application Oblivious Energy-Efficient MPI Runtime." In SC15 Proceedings: International Conference on High Performance Computing, Networking, Storage and Analysis. Austin, Texas, November 15–20, 2015, Paper No. 29. New York, New York: ACM. PNNL-SA-113351. doi:10.1145/2807591.2807658

2014

Tallent N.R., A. Hoisie, and C. Plata. “Palm: Making application modeling easier.” PNNL Computational Sciences and Mathematics Division Research Highlights, May 2014. http://www.pnnl.gov/science/highlights/highlight.asp?id=2652.
Tallent N.R. and A. Hoisie. “Palm: Easing the burden of analytical performance modeling.” In Proc. of the 28th ACM Intl. Conf. on Supercomputing, pages 221–230, New York, NY, USA, 2014. ACM. doi:10.1145/2597652.2597683

2013

Barker K., T. Benson, D. Campbell, D. Ediger, R. Gioiosa, A. Hoisie, D. Kerbyson, J. Manzano, A. Marquez, L. Song, N. R. Tallent, and A. Tumeo. PERFECT (Power Efficiency Revolution For Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute, December 2013. http://hpc.pnnl.gov/projects/PERFECT/.
Song S., N. R. Tallent, and A. Vishnu. 2013. "Exploring Machine Learning Techniques For Dynamic Modeling on Future Exascale Systems." In Modeling & Simulation of Exascale Systems & Applications: Workshop on Modeling & Simulation of Exascale Systems & Applications. Seattle, WA, September 18–19, 2013. Washington DC: Department of Energy, Office of Advanced Scientific Computing Research. PNNL-SA-105672.

2012

Liu X., J. Mellor-Crummey, and N. R. Tallent. “Analyzing application performance bottlenecks on Intel's SCC.” Proc. of the TACC-Intel Highly Parallel Computing Symp., 2012.
Tallent N.R. and D. Kerbyson. “Data-centric performance analysis of PGAS applications.” In WHIST 2012: Proc. of the 2nd Intl. Workshop on High-performance Infrastructure for Scalable Tools, held with the 26th Intl. Conf. on Supercomputing, 2012.
Tallent N.R. and J. Mellor-Crummey. “Using sampling to understand parallel program performance.” In Holger Brunst, Matthias S. Müller, Wolfgang E. Nagel, and Michael M. Resch, editors, Tools for High Performance Computing 2011, pages 13–25. Springer, 2012. doi:10.1007/978-3-642-31476-6_2

2011

Tallent N.R., J. M. Mellor-Crummey, M. Franco, R. Landrum, and L. Adhianto. “Scalable fine-grained call path tracing.” In Proc. of the 25th Intl. Conf. on Supercomputing, pages 63–74, New York, NY, USA, 2011. ACM. doi:10.1145/1995896.1995908

2010

Adhianto L., S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. “HPCToolkit: Tools for performance analysis of optimized parallel programs.” Concurrency and Computation: Practice and Experience, 22(6):685–701, 2010. (PDF) doi:10.1002/cpe.1553
Adhianto L., J. Mellor-Crummey, and N. R. Tallent. “Effectively presenting call path profiles of application performance.” In PSTI 2010: Proc. of the 2010 Workshop on Parallel Software Tools and Tool Infrastructures, held with the 2010 Intl. Conf. on Parallel Processing, pages 179–188, Los Alamitos, CA, USA, 2010. IEEE Computer Society. (PDF) doi:10.1109/ICPPW.2010.35
Tallent N.R., L. Adhianto, and J. M. Mellor-Crummey. “Scalable identification of load imbalance in parallel executions using call path profiles.” In Proc. of the 2010 ACM/IEEE Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 1–11, Washington, DC, USA, 2010. IEEE Computer Society. (PDF) doi:10.1109/SC.2010.47
Tallent N.R., J. M. Mellor-Crummey, and A. Porterfield. “Analyzing lock contention in multithreaded applications.” In Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 269–280, New York, NY, USA, 2010. ACM. (PDF) doi:10.1145/1693453.1693489

2009

Fowler R., L. Adhianto, B. de Supinski, M. Fagan, T. Gamblin, M. Krentel, J. Mellor-Crummey, M. Schulz, and N. Tallent. “Frontiers of performance analysis on leadership-class systems.” Journal of Physics: Conference Series, 180:012041 (6pp), 2009.
Tallent N.R. and J. Mellor-Crummey. “Effective performance measurement and analysis of multithreaded applications.” In Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 229–240, New York, NY, USA, 2009. ACM. (PDF) doi:10.1145/1504176.1504210
Tallent N.R. and J. M. Mellor-Crummey. “Identifying performance bottlenecks in work-stealing computations.” Computer, 42(12):44–50, 2009. doi:10.1109/MC.2009.396
Tallent N.R., J. M. Mellor-Crummey, L. Adhianto, M.W. Fagan, and M. Krentel. “Diagnosing performance bottlenecks in emerging petascale applications.” In Proc. of the 2009 ACM/IEEE Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 1–11, New York, NY, USA, 2009. ACM. (PDF) doi:10.1145/1654059.1654111
Tallent N.R., J. Mellor-Crummey, and M.W. Fagan. “Binary analysis for measurement and attribution of program performance.” In Proc. of the 2009 ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 441–452, New York, NY, USA, 2009. ACM. Distinguished Paper. (PDF) doi:10.1145/1542476.1542526

2008

Adhianto L., M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. “HPCToolkit: Performance measurement and analysis for supercomputers with node-level parallelism.” In Proc. of the Workshop on Node Level Parallelism for Large Scale Supercomputers, held with Supercomputing 2008, November 2008.
Mellor-Crummey J. and N. R. Tallent. “A methodology for accurate, effective and scalable performance analysis of application programs.” In Proc. of the Workshop on Tools, Infrastructures and Methodologies for the Evaluation of Research Systems, held with the 2008 IEEE Intl. Symp. on Performance Analysis of Systems and Software, pages 4–11, February 2008.
Tallent N., J. Mellor-Crummey, L. Adhianto, M. Fagan, and M. Krentel. “HPCToolkit: Performance tools for scientific computing.” Journal of Physics: Conference Series, 125:012088 (5pp), 2008.
Utke J., U. Naumann, M. Fagan, N. Tallent, M. Strout, P. Heimbach, C. Hill, and C. Wunsch. “OpenAD/F: A modular open-source tool for automatic differentiation of Fortran codes.” ACM Trans. Math. Softw., 34(4):1–36, 2008. doi:10.1145/1377596.1377598

2006

Froyd N., N. Tallent, J. Mellor-Crummey, and R. Fowler. “Call path profiling for unmodified, optimized binaries.” In GCC Summit '06: Proc. of the GCC Developers' Summit, 2006, pages 21–36, 2006.

2002

Mellor-Crummey J., R. Fowler, G. Marin, and N. Tallent. “HPCView: A tool for top-down analysis of node performance.” The Journal of Supercomputing, 23(1):81–104, 2002. (PDF) doi:10.1023/A:1015789220266