计算机应用 ›› 2011, Vol. 31 ›› Issue (07): 1737-1739.DOI: 10.3724/SP.J.1087.2011.01737

• 数据库技术 • 上一篇    下一篇

内容分发网络中基于数据挖掘的影片热度预测

周志伟,郑烇,王嵩   

  1. 中国科学技术大学 自动化系,合肥 230027
  • 收稿日期:2011-01-06 修回日期:2011-02-28 发布日期:2011-07-01 出版日期:2011-07-01
  • 通讯作者: 周志伟
  • 作者简介:周志伟(1985-),男,安徽巢湖人,硕士研究生,主要研究方向:媒体分发系统;郑烇(1970-),男,安徽合肥人,副教授,主要研究方向:计算机网络;王嵩(1975-),男,安徽六安人,讲师,主要研究方向:计算机网络、媒体内容分发。
  • 基金资助:

    科技部支撑计划项目

Popularity forecast of movies based on data mining in content distributed/delivery network

Zhi-wei ZHOU,Quan ZHENG,Song Wang   

  1. Department of Automation, University of Science and Technology of China, Hefei Anhui 230027, China
  • Received:2011-01-06 Revised:2011-02-28 Online:2011-07-01 Published:2011-07-01
  • Contact: Zhi-wei ZHOU

摘要: 内容分发网络(CDN)系统对内容热度的估计主要依靠管理员的经验,所以主观性比较大,无法保证服务质量(QoS)。首先对数据进行预处理,得到预测影片的初始知识库,利用数据挖掘技术对已有知识进行学习,对新加入的影片热度进行预测,将影片合理部署到CDN系统中。比较基于贝叶斯网络的影片热度预测和基于决策树模型的影片热度预测,在正确分类率和其他分类参数相同的前提下,贝叶斯网络所用的时间更短,所以选择贝叶斯网络分类器,解决管理员部署时不准确的问题,提高CDN系统的效率。

关键词: 数据挖掘, 贝叶斯网络, 决策树, 流行度, 广延指数模型

Abstract: The estimation of the content popularity in the Content Distributed/Delivery Network (CDN) system mainly relies on the experience of administrators, which implies strong subjectivity and cannot guarantee the Quality of Service (QoS). In the paper, the authors firstly preprocessed the data, and obtained the initial knowledge base to predict the film popularity. This paper used data mining techniques to learn the existing knowledge and predict the popularity of films. Thus, the films in the CDN system could be deployed more effectively and efficiently. The movie popularity predicted by Bayesian network classier was compared with the movie popularity predicted by decision tree. On the premise of the same correct classification rate and other classification parameters, the time taken to build model in the Bayesian network classifier can be shorter. Therefore, the Bayesian network classifier was preferred. The method can solve the inaccurate deployment caused by the administrators subjectivities and improve the efficiency of the CDN system.

Key words: Data mining, Bayesian Network, Decision Tree, Popularity, Streched exponential model

中图分类号: