Extracting useful, actionable information from data can be a formidable challenge for the safeguards, nonproliferation, and arms control verification communities. Data scientists are often on the “front-lines” of making sense of complex and large datasets. They require flexible tools that make it easy to rapidly reformat large datasets, interactively explore and visualize data, develop statistical algorithms, and validate their approaches—and they need to perform these activities with minimal lines of code. Existing commercial software solutions often lack extensibility and the flexibility required to address the nuances of the demanding and dynamic environments where data scientists work. To address this need, Pacific Northwest National Laboratory developed Tessera, an open source software suite designed to enable data scientists to interactively perform their craft at the terabyte scale. Tessera automatically manages the complicated tasks of distributed storage and computation, empowering data scientists to do what they do best: tackling critical research and mission objectives by deriving insight from data. We illustrate the use of Tessera with an example analysis of computer network data.
Revised: September 7, 2016 |
Published: June 30, 2014
Citation
Sego L.H., R.P. Hafen, H.M. Director, and R.R. LaMothe. 2014.Tessera: Open source software for accelerated data science. In Information Analysis Technologies, Techniques and Methods for Safeguards, Nonproliferation and Arms Control Verification Workshop, May 12-14, 2014, Portland, Oregon. Oakbrook Terrace, Illinois:Institute of Nuclear Materials Management (INMM).PNNL-SA-103342.