June 22, 2021
Conference Paper

Multi-Channel Entity Alignment via Name Uniqueness Estimation

Abstract

When searching for adversarial activity within multiple networks, one of the greatest challenges is how to accurately align entities across different channels of information. This task becomes increasingly difficult when minimal additional information is known about each individual besides a name. Within this study, we analyze name rarity and how it can be used to align people on three distinct data channels: Venmo financial transactions, Reddit online discussions, and a bibliographic data source of academic writings. We explore how the uniqueness of a name can be used to decide if a person is likely the same as another across networks, in the absence of any additional ground truth. While 100 percent confidence cannot be gained, we can use this information to clarify when a possible alignment is more or less likely to be the same individual, increasing our confidence of accurately detecting adversarial behavioral patterns. From the data collected, we found that 0.1% of people had the same name across data sets, and 22.5% of those names are considered rare by our threshold. In our study, we also examine the accuracy of our method and show how real names can be extracted from account usernames, and compared in a similar manner.

Published: June 22, 2021

Citation

Orren M.J., P.S. Mackey, N.C. Heller, and G. Chin. 2020. Multi-Channel Entity Alignment via Name Uniqueness Estimation. In IEEE International Conference on Big Data (Big Data 2020), December 10-13, 2020, Atlanta, GA, edited by 2546 - 2552. Piscataway, New Jersey:IEEE. PNNL-SA-156148. doi:10.1109/BigData50022.2020.9377777