The weblog, or blog, has become a popular form of social media, through which authors can write posts, which can in turn generate feedback in the form of user comments. When considered in totality, a collection of blogs can thus be viewed as a sort of informal collection of mass sentiment and opinion. An obvious topic of interest might be to mine this collection to obtain some gauge of public sentiment over the wide variety of topics contained therein. However, the sheer size of the so-called blogosphere, combined with the fact that the subjects of posts can vary over a practically limitless number of topics poses some serious challenges when any meaningful analysis is attempted. Namely, the fact that largely anyone with access to the Internet can author their own blog, raises the serious issue of credibility— should some blogs be considered to be more influential than others, and consequently, when gauging sentiment with respect to a topic, should some blogs be weighted more heavily than others? In addition, as new posts and comments can be made on almost a constant basis, any blog analysis algorithm must be able to handle such updates efficiently. In this paper, we give a formalization of the blog model. We give formal methods of quantifying sentiment and influence with respect to a hierarchy of topics, with the specific aim of facilitating the computation of a per-topic, influence-weighted sentiment measure. Finally, as efficiency is a specific endgoal, we give upper bounds on the time required to update these values with new posts, showing that our analysis and algorithms are scalable.
Revised: June 6, 2011 |
Published: July 25, 2010
Citation
Hui P.S., and M.L. Gregory. 2010.Quantifying Sentiment and Influence in Blogspaces. In SOMA 2010 - Proceedings of the First Workshop on Social Media Analytics in conjunction with KIDD '10, The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 25-28, 2010, Washington DC, edited by P Melvill, J Leskovec and F Provost, 53-61. New York, New York:Association for Computing Machinery.PNNL-SA-72781.doi:10.1145/1964858.1964866