Abstract
In this work we consider the problem of statistical modeling of a given graph dataset enriched with vertex edge labels. Our motivation is to generate expanded versions of the same dataset for benchmarking purposes. To goal is to mimic the original dataset in terms of the statistical properties with regard to the graph connectivity, vertex label distributions and the correlations thereof . We build on the existing Multiplicative Attribute Graph (MAG) model and present a general framework that relaxes the assumptions of the MAG model thereby allowing us to perform the dataset expansion while replicating the statistical properties. We follow a three-stage approach. The first stage captures the joint distribution of the vertex labels that preserves the marginals and the correlations. Next we also capture the joint distribution of the graph connectivity and pairs of vertex labels. Finally the expansion of the dataset is performed by sampling the vertex labels followed by connecting the vertex pairs, both according to the respective joint distributions. We discuss the limitations of the above model and propose a novel strategy of augmentation with additional latent labels based on the node degree to better replicate skewed degree distributions.
Exploratory License
Eligible for exploratory license
Market Sector
Data Sciences