This paper presents RuSentiment, a new
dataset for sentiment analysis of social
media posts in Russian, and a new set of
comprehensive annotation guidelines that
are extensible to other languages. RuSentiment
is currently the largest in its class
for Russian, with 30,521 posts annotated
with Cohen’s kappa of 0.58 (3 annotations
per post). To diversify the dataset, 6,749
posts were pre-selected with an active
learning-style strategy. We report baseline
classification results, and release the bestperforming embeddings trained on 3.2B
tokens in Russian VKontakte posts.
Revised: June 28, 2019 |
Published: August 20, 2018
Citation
Rogers A., A. Romanov, A. Rumshisky, S. Volkova, M. Gronas, and A. Gribov. 2018.RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), August, 2018, Santa Fe, NM, edited by E.M. Bender, L. Derczynski, P. Isabelle, 755–763. Stroudsburg, Pennsylvania:Association for Computational Linguistics.PNNL-SA-134041.