Streaming Canvas (Open Source)
The Streaming Canvas is an interactive data visualization tool for high dimensional data such as text documents. It allows the user to interactively analyze documents. Our goal is to open source this system to support a collaboration with Alex Endert at Georgia Tech and position the work for continued contributions from others interested in studying user interaction data.
lrc package: Logistic Regression Classification (Open Source Copyright)
Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages.
CHISSL: Intuitive, Scalable, Interactive Machine Learning
We developed CHISSL, a human-machine interface that utilizes a combination of unsupervised and semi-supervised machine learning to enable a non-expert user to organize large amounts of data instances by her own mental model. The user interacts with individual examples by dragging and dropping to move items between groups, or double-clicking to create new groups. The algorithm rapidly re-evaluates the distance from all instances to those provided by the user. This is used to re-classify the un-labeled data and also to provide recommendations for what recommendations the user sees for each group she has created. Our main contribution is the technique that allows user feedback to be incorporated rapidly, incrementally, and predictably, in a manner that scales easily beyond hundreds of thousands of instances. Our algorithm is partitioned between a lightweight client and a heavyweight server. The server is responsible for initial batch processing and representation of the data. A tree representation of this data is sent to the client, without the need to send the full representation of all instances. This saves an extraordinary amount of memory and bandwidth. All computation that incorporates user feedback is performed in in a web browser without the need to return to the server. This decreases the latency of user interactions and decreases server load, theoretically allowing many analysts to use the system simultaneously.
Sci-Vis Framework
SVF is a full featured OpenGL 3d framework that allows for rapid creation of complex visualizations. The SVF framework handles much of the lifecycle and complex tasks required for a 3d visualization. Unlike a game framework SVF was designed to use fewer resources, work well in a windowed environment, and only render when necessary. The scene also takes advantage of multiple threads to free up the UI thread as much as possible. Shapes (actors) in the scene are created by adding or removing functionality (through support objects) during runtime. This allows a highly flexible and dynamic means of creating highly complex actors without the code complexity (it also helps overcome the lack of multiple inheritance in Java.)All classes are highly customizable and there are abstract classes which are intended to be subclassed to allow a developer to create more complex and highly performant actors. There are multiple demos included in the framework to help the developer get started and shows off nearly all of the functionality.Some simple shapes (actors) are already created for you such as text, bordered text, radial text, text area, complex paths, NURBS paths, cube, disk, grid, plane, geometric shapes, and volumetric area. It also comes with various camera types for viewing that can be dragged, zoomed, and rotated. Picking or selecting items in the scene can be accomplished in various ways depending on your needs (raycasting or color picking.) The framework currently has functionality for tooltips, animation, actor pools, color gradients, 2d physics, text, 1d/2d/3d textures, children, blending, clipping planes, view frustum culling, custom shaders, and custom actor states.
SocialSim Metrics Library
We present example measurements obtained using approximately 3 years of GitHub data (aka GitHub training data): from January 2015 to August 2017. The original GitHub graph size is 52,260,372 nodes and 870,532,947 edges. We first subsampled the original GitHub graph to eliminate non-active user and repo nodes using weekly connected components implemented in networkX. The subsampled GitHub graph has 50,677,259 nodes and 773,974,620 edges. We present node-level measurement examples for popular repos e.g., tensorflow or rockstar users, and population-level measurement examples. See: https://confluence.pnnl.gov/confluence/display/SOCIALSIM/Implementing+Measurements
Sensitivity calculator for low-background experiments
The code calculates sensitivity for low background counting experiments that require radioassays to determine the expected background rate. It allows the user to choose from several Bayesian priors for the radioassay results.
Hydrological Emulator
The hydrological emulator (HE) was built to mimic complex global hydrological models (GHMs) at a range of spatial scales. The HE was written in MATLAB and utilizes a genetic algorithm to calibrate the a, b, c, d, m parameters from the ABCD runoff algorithm to observed runoff from GHMs when forced by the same climate data. The HE employs two methods to choose from when processing basin data: lumped (by basin) and distributed (gridded). Both methods can be evaluated using the built-in Kling-Gupta efficiency as a measure of model performance when compared to the outputs from the more complex GHM. Testing shows that the HE is 7 orders of magnitude faster then the widely used Variable Infiltration Capacity (VIC) GHM.
Iterative Method Fault Injection Collection (IMIC)
Soft errors caused by transient bit flips have the potential to significantly impact an application's behavior. This has motivated the design of an array of techniques to detect, isolate, and correct soft errors using microarchitectural, archi- tectural, compilation-based, or application-level techniques to minimize their impact on the executing application. The first step toward the design of good error detection/correction techniques involves an understanding of an application's vulnerability to soft errors. To study the behavior of iterative methods in the presence of soft errors, we inject errors during the execution of these methods. In particular, we study the impact of one error (single- or multi-bit) on the execution of iterative methods. We use real life datasets from the SuiteSparse Matrix Collection (https://sparse.tamu.edu) and widely used iterative solver library (Iterative Methods Library, IML++ v1.2a). We instrument the iterative solver implementations so that our error injection methodology can control the iteration, vector, position, number of bits and position of the bits of the error injection. We employed 6 solvers and 28 datasets, performed a total of 1,744,800 error injection runs and collected more than 2.5TB data.
StreamWorks (Open Source)
The software is a network analysis framework, whereby a user may detect and identify precursor events and patterns as they emerge in complex networks. Scalable subgraph matching algorithms will allow users to identify precursor events based on both structural and semantic subgraph properties and enable efficient subgraph pattern matching in massive and evolving networks. The analysis framework is intended to be used in a dynamic environment where network data is streamed in and is represented as a large-scale evolving dynamic graph. The framework may be applied to identify emerging graph patterns that are known to users in advance or ones that spontaneously emerge that are deemed 'significant" or 'interesting" and then alerted to users as significant events.
The Suite for Embedded Applications and Kernels (SEAK) - Open Source
Many applications of high performance embedded computing are limited by performance or power bottlenecks. We have designed SEAK, a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions to these bottlenecks; and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) and goal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user blackbox evaluation that can capture tradeoffs between performance, power, accuracy, size, and weight. The tradeoffs are especially informative for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.