Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition) ›› 2019, Vol. 37 ›› Issue (3): 71-78.doi: 10.16088/j.issn.1001-6600.2019.03.008

Previous Articles    

Topic Discovery in Microblog Based on BTM and Weighting K-Means

CHEN Feng,MENG Zuqiang*   

  1. School of Computer,Electronics and Information, Guangxi University, Nanning Guangxi 530004,China
  • Published:2019-07-12

Abstract: In order to adapt to special features of microblogging data, such as short texts, low word frequency, and lack of semantic expression, improve accuracy of topic discovery, and help users obtain useful information, a method based on BTM and weighting K-Means is proposed to achieve topic discovery. Firstly, faced with the problem of data sparsity, the text model is built based on the BTM model to obtain the topic words. Secondly, aimed at defects of the traditional K-Means algorithm itself, the weighting K-Means algorithm is proposed to obtain microblogging topics. Finally, experiments are conducted to validate the method of this paper. The experimental results show that the BTM and weighting K-Means method can solve problems of high dimensionality and sparsity of microblogging data, and it improves the accuracy and effectiveness of topic discovery.

Key words: biterm topic model(BTM), weighting K-Means, microblogging data, topic discovery

CLC Number: 

  • TP391
[1] BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3: 993-1022.
[2] 谢昊,江红.一种面向微博主题挖掘的改进LDA模型[J].华东师范大学学报(自然科学版),2013(6):93-101.DOI: 10.3969/j.issn.1000-5641.2013.06.011.
[3] LIU Quanchao,HUANG Heyan,FENG Chong.Micro-blog post topic drift detection based on LDA model[C]// Behavior and Social Computing: LNCS Volume 8178,2013:106-118.DOI:10.1007/978-3-319-04048-6_10.
[4] GE Gaofei,CHEN Liping,DU Junping.The research on topic detection of microblog based on TC-LDA[C]//2013 15th IEEE International Conference on Communication Technology.Piscataway NJ:IEEE Press,2013:722-727.DOI:10.1109/ICCT.2013.6820469.
[5] YAN Xiaohui,GUO Jiafeng,LAN Yanyan,et al.A biterm topic model for short texts[C]//Proceedings of the 22nd International Conference on World Wide Web.New York,NY:ACM Press,2013:1445-1456.DOI:10.1145/ 2488388.2488514.
[6] CHENG Xueqi,YAN Xianhui,LAN Yanyan,et al.BTM:topic modeling over short texts[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(12):2928-2941.DOI:10.1109/TKDE.2014.2313872.
[7] 张佳明,王波,唐浩浩,等.基于Biterm主题模型的无监督微博情感倾向性分析[J].计算机工程,2015,41(7): 219-223,229.DOI:10.3969/j.issn.1000-3428.2015.07.042.
[8] LI Weijiang,FENG Yanming,LI Dongjun,et al.Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm[J]. Automatic Control and Computer Sciences,2016,50(4):271-277.DOI:10.3103/ S0146411616040040.
[9] 王亚民,胡悦.基于BTM的微博舆情热点发现[J].情报杂志,2016,35(11):119-124,140.DOI:10.3969/j.issn.1002-1965.2016.11.022.
[10]HE Xingwei,XU Hua,LI Jia,et al.FastBTM:reducing the sampling time for biterm topic model[J]. Knowledge-Based Systems,2017,132:11-20.DOI:10.1016/j.knosys.2017.06.005.
[11]ZHANG Peng,LI Bicheng,YANG Ruipeng.Research on the topic evolution of microblog based on BTM-LPA[C]// Proceedings of the International Conference on Computer Science and Technology.Singapore:World Scientific,2017:860-875.DOI:10.1142/9789813146426_0098.
[12]刘少鹏,印鉴,欧阳佳,等.基于MB-HDP模型的微博主题挖掘[J].计算机学报,2015,38(7):1408-1419.DOI: 10.11897/SP.J.1016.2015.01408.
[13]黄发良,冯时,王大玲,等.基于多特征融合的微博主题情感挖掘[J].计算机学报,2017,40(4):872-888. DOI:10.11897/SP.J.1016.2017.00872.
[14]GEMAN S,GEMAN D.Stochastic relaxation, gibbs distributions and the Bayesian restoration of images[J]. Journal of Applied Statistics,1993,20(5/6):25-62.DOI:10.1080/02664769300000058.
[15]FENG Jun,FANG Yu.Research on hot topic discovery technology of micro-blog based on biterm topic model[C]//Geo-Spatial Knowledge and Intelligence: 4th International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem.Berlin:Springer,2016:234-244.DOI:10.1007/978-981- 10-3969-0_27.
[16]谢修娟,李香菊,莫凌飞.基于改进K-means算法的微博舆情分析研究[J].计算机工程与科学,2018,40(1):155-158.DOI:10.3969/j.issn.1007-130X.2018.01.023.
[17]ZHANG Huaping,YU Hongkui,XIONG Deyi,et al.HHMM-based Chinese lexical analyzer ICTCLAS[C]// Proceedings of the second SIGHAN workshop on Chinese language processin:Volume 17.Stroudsburg,PA: Association for Computational Linguistics,2003:184-187.DOI:10.3115/1119250.1119280.
[18]刘泽锦,王洁.同主题词短文本分类算法中BTM的应用与改进[J].计算机系统应用,2017,26(11):213-219.DOI: 10.15888/j.cnki.csa.006071.
[19]李卫疆,王真真,余正涛.基于BTM和K-means的微博话题检测[J].计算机科学,2017,44(2):257-261,274.DOI: 10.11896/j.issn.1002-137X.2017.02.042.
[1] XU Gang, LIU Hai-yan, ZHANG Chao-ying, LIANG Zhen-yan. Simulations of Constructivism Application Based on Cellular Automata [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(4): 7-12.
[2] MA Xian-bing, SUN Shui-fa, QIN Yin-shi, GUO Qing, XIA Ping. Object Tracking Algorithm of On-line Boosting Based on Particle Filter [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 100-105.
[3] SUN Shui-fa, LI Le-peng, DONG Fang-min, ZOU Yao-bin, CHEN Peng. Bi-histogram Equalization Based on Partially Overlapped Sub-block with Iterative Threshold [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 119-126.
[4] MA Yuan-yuan, LÜ Kang, XU Jiu-cheng. Image Retrieval of Multi-level Similarity Based on Granular Computing [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 127-131.
[5] HUANG Zhi-min, WANG Dong-li, WEN Ying, LÜ Yue. Offline Writer Recognition Based on Modified Grid Feature [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 132-137.
[6] WANG Feng, JIN Xiao-bo, YU Jun-wei, WANG Gui-cai. V-optimal Histogram and Its Application in License Plate Classification [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 138-143.
[7] YANG Jun-yao, MENG Zu-qiang. Path Planning Based on Time-dependent Logistics Networks Model [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 152-156.
[8] LIU Jun, BU Zhao-hui, IKEDA Takashi, MATSUMOTO Tadahiro. Chinese Translation of Japanese Polysemous Verb for Machine Translation Based on Semantic Combination ——Take “KIRERU”“TAOSU” as an Example [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 177-183.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!