Contextual Embedding Learning for Heterogeneous Graph Databases
Representation learning methods for heterogeneous networks produce a low-dimensional vector embedding for each node that is typically fixed for all tasks involving the node. Many of the existing methods focus on obtaining a static vector representation for a node in a way that is agnostic to the downstream application where it is being used. In practice, however, downstream tasks require specific contextual information that can be extracted from the subgraphs related to the nodes provided as input to the task. To tackle this challenge, we develop SLiCE, a framework bridging static representation learning methods using global information from the entire graph with localized attention driven mechanisms to learn contextual node representations. We first pre-train our model in a self-supervised manner by introducing higher-order semantic associations and masking nodes, and then fine-tune our model for a specific link prediction task. Instead of training node representations by aggregating information from all semantic neighbors connected via metapaths, we automatically learn the composition of different metapaths that characterize the context for a specific task without the need for any pre-defined metapaths. SLiCE significantly outperforms both static and contextual embedding learning methods on several publicly available benchmark network datasets. We also interpret the semantic association matrix and provide its utility and relevance in making successful link predictions between heterogeneous nodes in the network.
Machine Learning Toolkit for Extreme Scale
Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are considered in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets.
Tools for Interactive EDA of FTICR Data
We have three (potentially four) R packages that we wish to make available on Github and will be part of an upcoming manuscript: MetaCycData - a parsed form of the MetaCyc database which is accessed and used by the package fticRanalysis to map observed molecules to functional pathways. The license for MetaCyc can be found here: https://metacyc.org/download-flatfiles.shtml fticRanalysis - a package to hand data formatting and storage of Fourier-transform ion cyclotron resonance (FTICR) data in a generalizable way. Additional functionality is available for manipulating and transforming the data, filtering the data based on different criteria, creating summary plots (e.g. Van Krevelen plots, Kendrick plots, etc.), summarizing group-level data, comparing treatment groups, and making complementary plotting functionality KeggData - a parsed form of the KEGG database which is accessed and used by the fticRanalysis package. KEGG is not an open source database, so this package may need to be modified or may not be releasable externally. FREDA - shiny application that makes user interface to interactively process and explore FTICR data
TRANSFORMATIVE REMEDIAL ACTION SCHEME TOOL (TRAST)
The transformative remedial action scheme tool (TRAST) can be applied to improve and validate the power system remedial action scheme (RAS), and further improve the performance of power system operation and control. This tool provides a full suite of advanced functionalities, which are given as follows: Advanced statistical data analysis; OPF-based automated power flow case generation; Customized dynamic simulation in HPC/cloud platform; Machine learning based RAS coefficient prediction; A reliable RAS validation strategy in multiple commercial platforms.