Skip to Main Content U.S. Department of Energy
ACMD Division

Staff information

Leon

Shuaiwen Leon Song

High Performance Computing
Scientist
Pacific Northwest National Laboratory
PO Box 999
MSIN: J4-30
Richland, WA 99352
509/372-4189

Biography

I am currently a staff research scientist in High Performance Computing Group at Pacific Northwest National Lab (PNNL). I am also affiliated with College of William & Mary as a courtesy scholar in Computer Science Department. I received my Ph.D. in Computer Science from Virginia Tech in 2013. Prior to joining PNNL HPC group in May 2013, I worked as R&D intern with several government and industrial labs including Center for Advanced Computing (CASC) at Lawrence Livermore National Lab (LLNL), Performance Analysis Lab (PAL) at Pacific Northwest National Lab (PNNL), and the Architecture Research Division at NEC Research American at Princeton.

I was a 2011 Livermore ISCR scholar, recipient of 2011 Paul E. Torgersen Excellent research award and 2016 PNNL PCSD outstanding performance award. I have published in the major HPC-related conferences including ASPLOS, MICRO, HPCA, SC, PACT, HPDC, ICS, and IPDPS, etc. My SC'15 and SC'17 papers are nominated for best paper runner-up. I serve as organizing committee or PC member for several major HPC venues including ASPLOS, SC, ICS, IPDPS, HPDC, etc. My past and current research are funded by several major government agencies including DOE ASCR, DoD, DoD DARPA and Lab LDRD. In the past, I have collaborated with both academia and industry labs (e.g., Intel lab and NVIDIA research).

My research can be found at: https://sites.google.com/site/shuaiwenleonsongresearch/

Research Interests

  • Performance and Energy evaluation and optimization for large-scale HPC systems
  • Heterogeneous programming systems
  • Emerging HPC architectures (e.g., emerging many-core accelerators, memory architectures and machine-learning architectures)
  • Accelerator-Supported acceleration toolset for machine learning and big data applications
  • Approximate Computing, Accuracy-Aware Computing
  • Big data analytics, Deep Learning, and Dynamic modeling techniques
  • Fault tolerance and system reliability
  • Software-Architecture Co-design

Education and Credentials

  • Ph.D. in Computer Science and Application, Virginia Tech, May 2013
  • Master's in Computer Science and Application, Virginia Tech, May 2009

Affiliations and Professional Service

  • Afliate faculty at College of William and Mary
  • IEEE professional
  • ACM professional
  • ACM SIGHPC
  • ACM SIGARCH
  • ACM SIGPLAN
  • ACM SIGMETRICS
  • Upsilon Pi Epsilon

Awards and Recognitions

  • Best paper runner-up for SC'17
  • Courtesy Scholar, Computer Science Department, College of William & Mary
  • DOE Lab Directed Research and Development (LDRD) Award on Machine Learning Initiative (MLI)
  • DOE Lab Directed Research and Development (LDRD) Award on EvoGraph framework.
  • PNNL PCSD Outstanding Performance Award.
  • Chair, IEEE Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM), in conjunction with IPDPS.
  • Chair, The Twelfth IEEE Workshop on High-Performance Power-Aware Computing (HPPAC), in conjunction with IPDPS'16, Chicago.
  • PNNL staff research highlight award 2015
  • PNNL research award 2015
  • Best student paper runner-up for SC'15
  • Recipient of 2011 Paul E. Torgersen excellent research award

PNNL Publications

2017

  • Li A, S Song, W Liu, X Liu, A Kumar, and H Corporaal. 2017. "Locality-Aware CTA Clustering For Modern GPUs." In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2017), April 8-12, 2017, Xi'an, China, pp. 297-311.  ACM, NEW YORK, NY.  doi:10.1145/3037697.3037709
  • Xie C, S Song, J Wang, W Zhang, and X Fu. 2017. "Processing-in-Memory Enabled Graphics Processors for 3D Rendering." In IEEE International Symposium on High Performance Computer Architecture (HPCA 2017), February 4-8, 2017, Austin, Texas, pp. 637-648.  IEEE Computer Society, LOS ALAMITOS, CA.  doi:10.1109/HPCA.2017.37

2016

  • Tan J, S Song, K Yan, X Fu, A Marquez, and DJ Kerbyson. 2016. "Combating the Reliability Challenge of GPU Register File at Low Supply Voltage." In Proceedings of the 25th International Conference on Parallel Architectures and Compilation (PACT '16), September 11-15, 2016, Haifa, Israel, pp. 3-15.  ACM, NEW YORK, NY.  doi:10.1145/2967938.2967951
  • Li A, S Song, M Wijtvliet, A Kumar, and H Corporaal. 2016. "SFU-Driven Transparent Approximation Acceleration on GPUs." In Proceedings of the International Conference on Supercomputing (ICS 2016), June 1-3, 2016, Istanbul, Turkey, p. Paper No. 15.  Association for Computing Machinery, New York, NY.  doi:10.1145/2925426.2926255
  • Li A, S Song, A Kumar, E Zhang, D Chavarría-Miranda, and H Corporaal. 2016. "Critical Points Based Register-Concurrency Autotuning for GPUs." In Proceedings of the Design, Automation and Test in Europe Conference (DATE 2016), March 14-18, 2016, Dresden, Germany, pp. 1273-1278.  IEEE, Piscataway, NJ. 
  • Li A, S Song, E Brugel, A Kumar, D Chavarría-Miranda, and H Corporaal. 2016. "X: A Comprehensive Analytic Model for Parallel Machines." In IEEE International Parallel & Distributed Processing Symposium (IPDPS 2016), May 23-27, 2016 Chicago, Illinois, pp. 242-252.  IEEE, PISCATAWAY, NJ.  doi:10.1109/IPDPS.2016.89
  • Li L, A Hayes, S Song, and E Zhang. 2016. "Tag-Split Cache for Efficient GPGPU Cache Utilization." In Proceedings of the International Conference on Supercomputing (ICS 2016), June 1-3, 2016, Istanbul, Turkey, p. Paper No. 43.  ACM, New York, NY.  doi:10.1145/2925426.2926253
  • Roy P, X Liu, and S Song. 2016. "SMT-Aware Instantaneous Footprint Optimization." In Proceedings of the 25th ACM international Symposium on High-Performance and Distributed Computing (HPDC 2016), May 31-June 4, 2016, Kyoto, Japan, pp. 267-279.  ACM, NEW YORK, NY.  doi:10.1145/2907294.2907308
  • Tallent NR, KJ Barker, R Gioiosa, A Marquez, G Kestor, S Song, A Tumeo, DJ Kerbyson, and A Hoisie. 2016. "Assessing Advanced Technology in CENATE." In Proceedings of the IEEE International Conference on Networking, Architecture, and Storage (NAS 2016), August 8-10, 2016, Long Beach, California.  IEEE, PISCATAWAY, NJ.  doi:10.1109/NAS.2016.7549392
  • Tan L, Z Chen, and S Song. 2016. "Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology." ACM Transactions on Architecture and Code Optimization 12(4):Article No. 35.  doi:10.1145/2822893
  • Tan L, Z Chen, and S Song. 2016. "Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology." In 11th International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC 2016), January 18-20, 2016, Prague, Czech Republic.  ACM , New York, NY. 
  • Tao D, S Song, S Krishnamoorthy, P Wu, X Liang, E Zhang, DJ Kerbyson, and Z Chen. 2016. "New-Sum: A Novel Online ABFT Scheme For General Iterative Methods." In Proceedings of the 25th ACM international Symposium on High-Performance and Distributed Computing (HPDC 2016), May 31-June 4, 2016, Kyoto, Japan, pp. 43-55.  ACM, NEW YORK, NY.  doi:10.1145/2907294.2907306

2015

  • Li C, S Song, H Dai, A Sidelnik, S Hari, and H Zhou. 2015. "Locality-Driven Dynamic GPU Cache Bypassing." In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS 2015), June 8-11, 2015, Newport Beach, California, pp. 66-77.  ACM , New York, NY.  doi:10.1145/2751205.2751237
  • Sengupta D, S Song, K Agarwal, and K Schwan. 2015. "GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems." In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'15), November 15-20, 2015, Austin, Texas, p. Paper No. 28.  ACM , New York, NY.  doi:10.1145/2807591.2807655
  • Sengupta D, K Agarwal, S Song, and K Schwan. 2015. "GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems." In IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW 2015), May 25-29, 2016, Hyderabad, India, pp. 604-609.  IEEE, Piscataway, NJ.  doi:10.1109/IPDPSW.2015.16
  • Shrestha S, JB Manzano Franco, A Marquez, S Zuckerman, S Song, and GR Gao. 2015. "Gregarious Data Re-structuring in a Many Core Architecture." In IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), August 24-26, 2015, New York, pp. 712-720.  IEEE, Piscataway, NJ.  doi:10.1109/HPCC-CSS-ICESS.2015.291
  • Tan L, S Song, P Wu, Z Chen, R Ge, and DJ Kerbyson. 2015. "Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing." In IEEE International Parallel and Distributed Processing Symposium (IPDPS 2015), May 25-29, 2015, Hyderabad, India, pp. 786-796.  IEEE Computer Society, Los Alamitos.  doi:10.1109/IPDPS.2015.108
  • You Y, H Fu, S Song, A Randles, DJ Kerbyson, A Marquez, G Yang, and A Hoisie. 2015. "Scaling Support Vector Machines On Modern HPC Platforms." Journal of Parallel and Distributed Computing 76:16-31.  doi:10.1016/j.jpdc.2014.09.005

2014

  • Li B, HC Chang, S Song, CY Su, T Meyer, J Mooring, and K Cameron. 2014. "Extending PowerPack for Profiling and Analysis of High Performance Accelerator-Based Systems." Parallel Processing Letters 24(4):Article No. 144200.  doi:10.1142/S0129626414420018
  • Li B, HC Chang, S Song, CY Su, T Meyer, J Mooring, and K Cameron. 2014. "The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications." In IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW 2014), May 19-23, 2014, Phoenix, Arizona, pp. 1448-1456.  IEEE, Piscataway, NJ.  doi:10.1109/IPDPSW.2014.162
  • Marquez A, JB Manzano Franco, S Song, B Meister, S Shrestha, T St. John, and GR Gao. 2014. "ACDT: Architected Composite Data Types Trading-in Unfettered Data Access for Improved Execution." In The 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014), December 16-19, 2015, Hsinchu, Taiwan, pp. 289-297.  IEEE, Piscataway, NJ.  doi:10.1109/PADSW.2014.7097820
  • You Y, S Song, and DJ Kerbyson. 2014. "An Adaptive Cross-Architecture Combination Method for Graph Traversal." In Proceedings of the 28th ACM international conference on Supercomputing (ICS'14), June 10-13, 2014, Munich, Germany, pp. 169-169.  Association for Computing Machinery , New York, NY.  doi:10.1145/2597652.2600110
  • You Y, H Fu, S Song, M Mehri Dehanavi, L Gan, X Huang, and G Yang. 2014. "Evaluating Multi-core Architectures through Accelerating the Three-Dimensional Lax-Wendroff Correction." International Journal of High Performance Computing Applications 28(3):301-318.  doi:10.1177/1094342014524807
  • You Y, S Song, H Fu, A Marquez, M Mehri Dehanavi, KJ Barker, K Cameron, A Randles, and G Yang. 2014. "MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures." In IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS 2014), May 19-23, 2014, Phoenix, Arizona, pp. 809-818.  IEEE Computer Society, Los Alamitos, CA.  doi:10.1109/IPDPS.2014.88

2013

  • Vishnu A, S Song, A Marquez, KJ Barker, DJ Kerbyson, K Cameron, and P Balaji. 2013. "Designing Energy Efficient Communication Runtime Systems: A View from PGAS Models." Journal of Supercomputing 63(3):691-709 .  doi:10.1007/s11227-011-0699-9
  • Li B, S Song, I Bezakova, and K Cameron. 2013. "EDR: An Energy-Aware Runtime Load Distribution System for Data-Intensive Applications in the Cloud." In IEEE International Conference on Cluster Computing (CLUSTER 2013), September 23-27, 2013, Indianapolis, IN, pp. 1-8.  Institute of Electrical and Electronics Engineers , Piscataway, NJ.  doi:10.1109/CLUSTER.2013.6702674
  • Song S, KJ Barker, and DJ Kerbyson. 2013. "Unified Performance and Power Modeling of Scientific Workloads." In E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, November 17-21, 2013, Denver, Colorado, p. Article No. 4.  Association for Computing Machinery, New York, NY.  doi:10.1145/2536430.2536435
  • Song S, NR Tallent, and A Vishnu. 2013. "Exploring Machine Learning Techniques For Dynamic Modeling on Future Exascale Systems." In Modeling & Simulation of Exascale Systems & Applications: Workshop on Modeling & Simulation of Exascale Systems & Applications, September 18-19, 2013, Seattle, Washington.  US Department of Energy, Office of Advanced Scientific Computing Research, Washington DC. 

2011

  • Song S, C Si Yu, R Ge, A Vishnu, and K Cameron. 2011. "Iso-Energy-Efficiency: An Approach to Power Constrained Parallel Computation." In IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), May 16-20, 2011, Anchorage, Alaska, pp. 128-139.  IEEE, Piscataway, NJ.  doi:10.1109/IPDPS.2011.22

2010

  • Vishnu A, HJJ van Dam, WA De Jong, P Balaji, and S Song. 2010. "Fault Tolerant Communication Runtime Support for Data-Centric Programming Models." In International Conference on High Performance Computing (HiPC 2010), December 19-22, 2010, Goa, India.  International Electrical and Electronics Engineers, Piscataway, NJ.  doi:10.1109/HIPC.2010.5713195
  • Vishnu A, S Song, A Marquez, KJ Barker, DJ Kerbyson, K Cameron, and P Balaji. 2010. "Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models." In IEEE/ACM Internationall Conference on Green Computing and Communications (GreenCom 2010) and the International Conference on Cyber, Physical and Social Computing (CPSCom 2010), December 18-20, 2010, Hangzhou, China, ed. P Zhu, et al, pp. 229-236.  Institute of Electrical and Electronics Engineers, Inc., Piscatawy, NJ.  doi:10.1109/GreenCom-CPSCom.2010.133

Computing Research

Collaborations

Seminar Series

Science at PNNL

Computing Research

View All Highlights

Contacts