In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.
Published: April 13, 2022
Citation
Tymochko S.J., J.A. Chaput, T.J. Doster, E. Purvine, J.T. Warley, and T.H. Emerson. 2021.Con Connections: Detecting Fraud from Abstracts using Topological Data Analysis. In IEEE 20th International Conference on Machine Learning Applications (ICMLA 2021), December 12-16, 2021, Pasadena, CA, edited by M. Arif Wani; I. Sethi; W. Shi; G. Qu; D.S. Raicu and R. Jin, 403-408. Piscataway, New Jersey:IEEE.PNNL-SA-163927.doi:10.1109/ICMLA52953.2021.00069