Computer Scientist
Computer Scientist


Nathan Tallent is a computer scientist and team leader for the Scalable Computing & Data team in the High Performance Computing Group within the Advanced Computing, Mathematics, and Data Division at PNNL.

His research is in the areas of performance analysis, parallelism, scalable and distributed architectures, and data-intensive analytics. His work spans the challenges of characterizing, modeling, analyzing, and accelerating the performance of current and emerging workloads in scientific workflows, data analytics, and domain modeling. His recent work has focused on exploiting dynamic intertask data locality and minimizing data movement through storage, network, and memory. He is also interested in performance tools, both for performance modeling and for parallel performance analysis. A recent theme he has focused on has been lightweight and scalable techniques for workloads with irregular parallelism, data structures, and access patterns. He is one of the original developers of HPCToolkit, a widely used suite of performance tools on supercomputers.

His Google Scholar page can be found here.

Tallent has led development of the following research software prototypes:

  • TAZeR: TAZeR is a remote I/O framework for transparently minimizing the access latencies of remote I/O in workflows. TAZeR's primary strategy is capturing dynamic and irregular intertask locality, both temporal and spatial, via adaptive hierarchical staging that ensures most frequently accessed data is “closed.”
  • BigFlowSim: BigFlowSim is a workflow I/O simulator-emulator and trace generator that captures several parameters that affect local and remote I/O performance. BigFlowSim generates a large variety of flows within and between tasks of distributed workflows. With BigFlowSim, TAZeR's performance has been systematically studied on different data flows.
  • Palm: Palm is a suite of performance modeling tools (Palm, Palm-Task, Representative-Paths, Palm/FastFootprints, MIAMI-NW) to assist performance analysis and predictive model generation. Palm generates models by combining top-down (human-provided) semantic insight with bottom-up static and dynamic analysis. Palm has been used to model irregular applications with sparse data structures and unpredictable access patterns. Recent additions focus on rapid characterization of memory behavior.
  • SEAK Suite: The SEAK Suite is a collection of constraining problems for common embedded computing challenges. A constraining problem is a mission-centric and goal-oriented problem specification that separates problem-domain constraints from solution implementations so as to encourage creative solutions that meet goals but that may deviate from standard implementations.
  • PERFECT Suite: The PERFECT Suite consists of kernels and applications for evaluating tradeoffs between performance, power, and architecture within the domains of radar and image processing.

Research Interest

  • Algorithm Analysis
  • Compilers
  • Computer Information Systems
  • Hardware Analysis
  • Networking Software
  • Parallel Algorithms
  • Parallel Computing
  • Performance Benchmarking
  • Programming Languages
  • Workload Characterization


  • PhD in Computer Science, Rice University
  • MS in Computer Science, Rice University
  • MDiv in Theology, Westminster Theological Seminary
  • BA in Computer Science, Rice University

Affiliations and Professional Service

  • Institute of Electrical and Electronics Engineers (IEEE) Computer Society (CS)
  • Association for Computing Machinery (ACM)

Awards and Recognitions

  • Department of Energy (DOE) Early Career (FY 2021)
  • Best paper nominees: International Symposium on Workload Characterization (IISWC) 2018, 2015, Programming Language Design and Implementation (PLDI) 2009
  • 2009 ACM/IEEE-CS George Michael Memorial HPC Fellowship



  • Bel O., J. Pata, J. Vlimant, N. R. Tallent, J. Balcas, and M. Spiropulu. 2021. "Diolkos: Improving ethernet throughput through dynamic port selection." In Proceedings of the 18th ACM International Conference on Computing Frontiers. CF 2021 Virtual Event, Italy, May 11–13, 2021, 83–92. New York, New York: ACM. PNNL-SA-160853. doi:10.1145/3457388.3458659
  • Gawande N. A., S. Ghosh, M. Halappanavar, M. H. Khan, A. Kalyanaraman, M. Minutoli, and N. R. Tallent, et al. 2021. "ExaGraph: Graph and Combinatorial Methods for Enabling Exascale Applications." The International Journal of High Performance Computing Applications, 35 (6):109434E. PNNL-SA-155863. doi:10.1177/10943420211029299
  • Ghosh S., N. R. Tallent, and M. Halappanavar. “Characterizing performance of graph neighborhood communication patterns.” IEEE Transactions on Parallel and Distributed Systems, August 2021. doi:10.1109/TPDS.2021.3101425
  • Ghosh S., N. R. Tallent, M. Minutoli, M. Halappanavar, R. Peri, and A. Kalyanaraman. 2021. "Single-node Partitioned-Memory for Huge Graph Analytics: Cost and Performance Trade-offs." In Proceedings of the International Conference for High Performance Computing, Network, Storage and Analysis. SC 2021, Virtual, November 14–19, 2021, No. 55. New York, New York: Association for Computing Machinery. PNNL-SA-161359. doi:10.1145/3458817.3476156


  • Barik R., M. Minutoli, M. Halappanavar, N. R. Tallent, and A. Kalyanaraman. 2020. "Vertex Reordering for Real-world Graphs and Applications: An Empirical Evaluation." In IEEE International Symposium on Workload Characterization. IISWC 2020, Beijing, China, October 27–30, 2020, 240–251. Piscataway, New Jersey: IEEE. PNNL-SA-154319. doi:10.1109/IISWC50251.2020.00031
  • Bel O., K. Chang, N. R. Tallent, D. Duellmann, E.L. Miller, F. Nawab, and D. D. E. Long. “Geomancy: Automated performance enhancement through data layout optimization.” In 36th Intl. Conf. on Massive Storage Systems and Technology, October 2020 doi:10.1109/ISPASS48437.2020.00025
  • Friese R. D., B. Mutlu, N. R. Tallent, J. D. Suetterlein, and J. F. Strube. 2020. "Effectively Using Remote I/O For Work Composition in Distributed Workflows." In IEEE International Conference on Big Data. Big Data 2020, Atlanta, GA, December 10–13, 2020, 426–433. Piscataway, New Jersey: IEEE. PNNL-SA-155757. doi:10.1109/BigData50022.2020.9378352
  • Gawande N. A., J. A. Daily, C. Siegel, N. R. Tallent, and A. Vishnu. 2020. "Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing." Future Generation Computer Systems 108. PNNL-SA-134513. doi:10.1016/j.future.2018.04.073
  • Kilic O. O., N. R. Tallent, and R. D. Friese. 2020. "Rapid Memory Footprint Access Diagnostics." In 2020 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2020, Boston, MA, August 23–25, 2020, 273–284. Piscataway, New Jersey: IEEE. PNNL-SA-151215. doi:10.1109/ISPASS48437.2020.00047
  • Li A., S. Song, J. Chen, J. Li, X. Liu, N. R. Tallent, and K. J. Barker. 2020. "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect." IEEE Transactions on Parallel and Distributed Systems 31(1): 94–110. PNNL-SA-141707. doi:10.1109/TPDS.2019.2928289


  • Bhuiyan T.H., M. Halappanavar, R. D. Friese, H. Medal, L. de la Torre, A. Sathanur, and N. R. Tallent. “Stochastic programming approach for resource selection under demand uncertainty.” In Dalibor Klusáček, Walfredo Cirne, and Narayan Desai, editors, Job Scheduling Strategies for Parallel Processing, pages 107–126, Cham, 2019. Springer International Publishing. doi:10.1007/978-3-030-10632-4_6
  • Kilic O. O., N. R. Tallent, and R. D. Friese. 2019. "Rapidly Measuring Loop Footprints." In IEEE International Conference on Cluster Computing. CLUSTER 2019, Albuquerque, NM, September 23–26, 2019. Piscataway, New Jersey: IEEE. PNNL-SA-146801. doi:10.1109/CLUSTER.2019.8891025
  • Schram M., N. R. Tallent, R. D. Friese, A. Singh, and I. Altintas. 2019. "Application of Deep Learning on Integrating Prediction, Provenance, and Optimization." In Proceedings of the 23rd International Conference on Computing in High Energy and Nuclear Physics. CHEP 2018, EPJ Web of Conferences, 214(06007). PNNL-SA-147454. doi:10.1051/epjconf/201921406007
  • Suetterlein J. D., R. D. Friese, N. R. Tallent, and M. Schram. 2019. "TAZeR: Hiding the Cost of Remote I/O in Distributed Scientific Workflows." In IEEE International Conference on Big Data. Big Data 2019, Los Angeles, CA, December 9–12, 2019, 383–394. Piscataway, New Jersey: IEEE. PNNL-SA-148879. doi:10.1109/BigData47090.2019.9006418


  • Bhuiyan T. H., M. Halappanavar, R. D. Friese, H. Medal, L. De La Torre, A. Visweswara Sathanur, and N. R. Tallent. 2018. "Stochastic Programming Approach for Resource Selection under Demand Uncertainty." In 22nd International Workshop on Job Scheduling Strategies for Parallel Processing. JSSPP 2018, Vancouver, BC, May 25, 2018. Klusacek D, W Cirne, and N Desai, eds. Lecture Notes in Computer Science, 11332: 107–126. Cham: Springer. PNNL-SA-130071. doi:10.1007/978-3-030-10632-4_6
  • Friese R.D., N. R. Tallent, M. Schram, M. Halappanavar, and K. J. Barker. “Optimizing distributed data-intensive workflows.” In Proc. of the 2018 IEEE Conf. on Cluster Computing, pages 279–289. IEEE, September 2018. doi:10.1109/CLUSTER.2018.00045
  • Gawande N.A., J. A. Daily, C. Siegel, N. R. Tallent, and A. Vishnu. “Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing.” Future Generation Computer Systems, May 2018. doi:
  • Li A., S. Song, J. Chen, X. Liu, N. R. Tallent, and K. J. Barker. 2018. "Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite." In IEEE International Symposium on Workload Characterization. IISWC 2018, September 30–October 2, 2018, 191–202. Piscataway, New Jersey: IEEE. PNNL-SA-137642. doi:10.1109/IISWC.2018.8573483
  • Singh A., I. Altintas, M. Schram, and N. R. Tallent. 2018. "Deep Learning for Enhancing Fault Tolerant Capabilities of Scientific Workflows." In Proceedings of the IEEE International Conference on Big Data. Big Data 2018, Seattle, WA, December 10–13, 2018, (8622509): 3905–3914. Song Y, et al., eds. Piscataway, New Jersey: IEEE. PNNL-SA-143406. doi:10.1109/BigData.2018.8622509
  • Tallent N. R., N. A. Gawande, C. M. Siegel, A. Vishnu, and A. Hoisie. 2018. "Evaluating On-Node GPU Interconnects for Deep Learning Workloads." In Proceedings of the 8th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems. PMBS 2017, Denver, CO, November 13, 2017, Denver, CO. Hammond S., S., Jarvis, and S. Wright, eds. Lecture Notes in Computer Science 10724: 3–21. Cham: Springer Verlag. PNNL-SA-129849. doi:10.1007/978-3-319-72971-8_1


  • Friese R. D., N. R. Tallent, A. Vishnu, D. J. Kerbyson, and A. Hoisie. 2017. "Generating Performance Models for Irregular Applications." In IEEE International Parallel and Distributed Processing Symposium. IPDPS 2017, Orlando, FL May 29–June 2, 2017, 317–326. Piscataway, New Jersey: IEEE. PNNL-SA-123945. doi:10.1109/IPDPS.2017.61
  • Gawande N. A., J. B. Landwehr, J. A. Daily, N. R. Tallent, A. Vishnu, and D. J. Kerbyson. 2017. "Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing." In IEEE International Parallel and Distributed Processing Symposium Workshops. IPDPSW 2017, Lake Beuna Vista, FL, May 29–June 2, 2017, 399–408. Los Alamitos, California: IEEE Computer Society. PNNL-SA-129129. doi:10.1109/IPDPSW.2017.36
  • Schram M., V. Bansal, R. D. Friese, N. R. Tallent, J. Yin, K. J. Barker, and E. G. Stephan, et al. 2017. "Integrating prediction, provenance, and optimization into high energy workflows." Journal of Physics: Conference Series 898(6): 062052. PNNL-SA-129007. doi:10.1088/1742-6596/898/6/062052
  • Tallent N.R., D. J. Kerbyson, and A. Hoisie. “Representative paths analysis.” In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 34:1–34:12, New York, NY, USA, November 2017. ACM. doi:10.1145/3126908.3126962
  • Tallent N.R., N. A. Gawande, C. Siegel, A. Vishnu, and A. Hoisie. “Evaluating on-node GPU interconnects for deep learning workloads.” In Stephen Jarvis, Steven Wright, and Simon Hammond, editors, High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, pages 3–21. Springer International Publishing, December 2017. doi:10.1007/978-3-319-72971-8_1


  • Tallent N.R., K. J. Barker, D. Chavarrıa-Miranda, A. Tumeo, M. Halappanavar, A. Márquez, D. J. Kerbyson, and A. Hoisie. “Modeling the impact of silicon photonics on graph analytics.” In Proc. of the 11th IEEE Intl. Conf. on Networking, Architecture, and Storage, pages 1–11. IEEE Computer Society, Aug 2016. doi:10.1109/NAS.2016.7549410
  • Tallent N. R., J. B. Manzano Franco, N. A. Gawande, S. Kang, D. J. Kerbyson, A. Hoisie, and J. Cross. 2016. "Algorithm and Architecture Independent Benchmarking with SEAK." In IEEE International Parallel and Distributed Processing Symposium. IEEE, Chicago, IL, May 2327, 2016,  63–72. Piscataway, New Jersey: IEEE. PNNL-SA-115612. doi:10.1109/IPDPS.2016.25
  • Tallent N. R., K. J. Barker, R. Gioiosa, A. Marquez, G. Kestor, S. Song, and A. Tumeo, et al. 2016. "Assessing Advanced Technology in CENATE." In Proceedings of the IEEE International Conference on Networking, Architecture, and Storage. NAS 2016, Long Beach, CA, August 8–10, 2016. Piscataway, New Jersey: IEEE. PNNL-SA-119257. doi:10.1109/NAS.2016.7549392
  • Vishnu A., H. van Dam, N. R. Tallent, D. J. Kerbyson, and A. Hoisie. “Fault modeling of extreme scale applications using machine learning.” In Proc. of the 30th IEEE Intl. Parallel and Distributed Processing Symp., pages 222–231, Los Alamitos, CA, USA, May 2016. IEEE Computer Society. doi:10.1109/IPDPS.2016.111


  • Gawande N. A., J. B. Manzano Franco, A. Tumeo, N. R. Tallent, D. J. Kerbyson, and A. Hoisie. 2015. "Power and Performance Trade-offs for Space Time Adaptive Processing." In IEEE 20th International Conference on Application-specific Systems, Architectures and Processors. ASAP 2015, Toronto, Canada, July 27–29, 2015, 41–48. Piscataway, New Jersey: IEEE. PNNL-SA-110779. doi:10.1109/ASAP.2015.7245703
  • Halappanavar M., M. Schram, L. de La Torre, K. Barker, N. R. Tallent, and D. Kerbyson. “Towards efficient scheduling of data intensive high energy physics workflows.” In WORKS '15: Workshop on Workflows in Support of Large-Scale Science, held in conjunction with SuperComputing 15, November 2015. doi:10.1145/2822332.2822335
  • Tallent N. R., A. Vishnu, H. van Dam, J. A. Daily, D. J. Kerbyson, and A. Hoisie. 2015. "Diagnosing the Causes and Severity of One-sided Message Contention." In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP '15, San Francisco, CA, February 7–11, 2015, 130139. New York, New York: ACM. PNNL-SA-106916. doi:10.1145/2688500.2688516
  • Venkatesh A., A. Vishnu, K. Hamidouche, N. R. Tallent, D. Panda, D. J. Kerbyson, and A. Hoisie. 2015. "A Case for Application Oblivious Energy-Efficient MPI Runtime." In SC15 Proceedings: International Conference on High Performance Computing, Networking, Storage and Analysis. Austin, Texas, November 15–20, 2015, Paper No. 29. New York, New York: ACM. PNNL-SA-113351. doi:10.1145/2807591.2807658


  • Tallent N.R., A. Hoisie, and C. Plata. “Palm: Making application modeling easier.” PNNL Computational Sciences and Mathematics Division Research Highlights, May 2014.
  • Tallent N.R. and A. Hoisie. “Palm: Easing the burden of analytical performance modeling.” In Proc. of the 28th ACM Intl. Conf. on Supercomputing, pages 221–230, New York, NY, USA, 2014. ACM. doi:10.1145/2597652.2597683


  • Barker K., T. Benson, D. Campbell, D. Ediger, R. Gioiosa, A. Hoisie, D. Kerbyson, J. Manzano, A. Marquez, L. Song, N. R. Tallent, and A. Tumeo. PERFECT (Power Efficiency Revolution For Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute, December 2013.
  • Song S., N. R. Tallent, and A. Vishnu. 2013. "Exploring Machine Learning Techniques For Dynamic Modeling on Future Exascale Systems." In Modeling & Simulation of Exascale Systems & Applications: Workshop on Modeling & Simulation of Exascale Systems & Applications. Seattle, WA, September 18–19, 2013. Washington DC: Department of Energy, Office of Advanced Scientific Computing Research. PNNL-SA-105672.


  • Liu X., J. Mellor-Crummey, and N. R. Tallent. “Analyzing application performance bottlenecks on Intel's SCC.” Proc. of the TACC-Intel Highly Parallel Computing Symp., 2012.
  • Tallent N.R. and D. Kerbyson. “Data-centric performance analysis of PGAS applications.” In WHIST 2012: Proc. of the 2nd Intl. Workshop on High-performance Infrastructure for Scalable Tools, held with the 26th Intl. Conf. on Supercomputing, 2012.
  • Tallent N.R. and J. Mellor-Crummey. “Using sampling to understand parallel program performance.” In Holger Brunst, Matthias S. Müller, Wolfgang E. Nagel, and Michael M. Resch, editors, Tools for High Performance Computing 2011, pages 13–25. Springer, 2012. doi:10.1007/978-3-642-31476-6_2


  • Tallent N.R., J. M. Mellor-Crummey, M. Franco, R. Landrum, and L. Adhianto. “Scalable fine-grained call path tracing.” In Proc. of the 25th Intl. Conf. on Supercomputing, pages 63–74, New York, NY, USA, 2011. ACM. doi:10.1145/1995896.1995908


  • Adhianto L., S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. “HPCToolkit: Tools for performance analysis of optimized parallel programs.” Concurrency and Computation: Practice and Experience, 22(6):685–701, 2010. (PDF) doi:10.1002/cpe.1553
  • Adhianto L., J. Mellor-Crummey, and N. R. Tallent. “Effectively presenting call path profiles of application performance.” In PSTI 2010: Proc. of the 2010 Workshop on Parallel Software Tools and Tool Infrastructures, held with the 2010 Intl. Conf. on Parallel Processing, pages 179–188, Los Alamitos, CA, USA, 2010. IEEE Computer Society. (PDF) doi:10.1109/ICPPW.2010.35
  • Tallent N.R., L. Adhianto, and J. M. Mellor-Crummey. “Scalable identification of load imbalance in parallel executions using call path profiles.” In Proc. of the 2010 ACM/IEEE Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 1–11, Washington, DC, USA, 2010. IEEE Computer Society. (PDF) doi:10.1109/SC.2010.47
  • Tallent N.R., J. M. Mellor-Crummey, and A. Porterfield. “Analyzing lock contention in multithreaded applications.” In Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 269–280, New York, NY, USA, 2010. ACM. (PDF) doi:10.1145/1693453.1693489


  • Fowler R., L. Adhianto, B. de Supinski, M. Fagan, T. Gamblin, M. Krentel, J. Mellor-Crummey, M. Schulz, and N. Tallent. “Frontiers of performance analysis on leadership-class systems.” Journal of Physics: Conference Series, 180:012041 (6pp), 2009.
  • Tallent N.R. and J. Mellor-Crummey. “Effective performance measurement and analysis of multithreaded applications.” In Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 229–240, New York, NY, USA, 2009. ACM. (PDF) doi:10.1145/1504176.1504210
  • Tallent N.R. and J. M. Mellor-Crummey. “Identifying performance bottlenecks in work-stealing computations.” Computer, 42(12):44–50, 2009. doi:10.1109/MC.2009.396
  • Tallent N.R., J. M. Mellor-Crummey, L. Adhianto, M.W. Fagan, and M. Krentel. “Diagnosing performance bottlenecks in emerging petascale applications.” In Proc. of the 2009 ACM/IEEE Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 1–11, New York, NY, USA, 2009. ACM. (PDF) doi:10.1145/1654059.1654111
  • Tallent N.R., J. Mellor-Crummey, and M.W. Fagan. “Binary analysis for measurement and attribution of program performance.” In Proc. of the 2009 ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 441–452, New York, NY, USA, 2009. ACM. Distinguished Paper. (PDF) doi:10.1145/1542476.1542526


  • Adhianto L., M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. “HPCToolkit: Performance measurement and analysis for supercomputers with node-level parallelism.” In Proc. of the Workshop on Node Level Parallelism for Large Scale Supercomputers, held with Supercomputing 2008, November 2008.
  • Mellor-Crummey J. and N. R. Tallent. “A methodology for accurate, effective and scalable performance analysis of application programs.” In Proc. of the Workshop on Tools, Infrastructures and Methodologies for the Evaluation of Research Systems, held with the 2008 IEEE Intl. Symp. on Performance Analysis of Systems and Software, pages 4–11, February 2008.
  • Tallent N., J. Mellor-Crummey, L. Adhianto, M. Fagan, and M. Krentel. “HPCToolkit: Performance tools for scientific computing.” Journal of Physics: Conference Series, 125:012088 (5pp), 2008.
  • Utke J., U. Naumann, M. Fagan, N. Tallent, M. Strout, P. Heimbach, C. Hill, and C. Wunsch. “OpenAD/F: A modular open-source tool for automatic differentiation of Fortran codes.” ACM Trans. Math. Softw., 34(4):1–36, 2008. doi:10.1145/1377596.1377598


  • Froyd N., N. Tallent, J. Mellor-Crummey, and R. Fowler. “Call path profiling for unmodified, optimized binaries.” In GCC Summit '06: Proc. of the GCC Developers' Summit, 2006, pages 21–36, 2006.


  • Mellor-Crummey J., R. Fowler, G. Marin, and N. Tallent. “HPCView: A tool for top-down analysis of node performance.” The Journal of Supercomputing, 23(1):81–104, 2002. (PDF) doi:10.1023/A:1015789220266