计算机应用 ›› 2013, Vol. 33 ›› Issue (05): 1338-1342.DOI: 10.3724/SP.J.1087.2013.01338

• 数据库技术 • 上一篇    下一篇

基于样本权重的出租车聚集区识别算法

姬波,叶阳东,卢红星   

  1. 郑州大学 信息工程学院,郑州 450001
  • 收稿日期:2012-11-21 修回日期:2012-12-22 出版日期:2013-05-01 发布日期:2013-05-08
  • 通讯作者: 姬波
  • 作者简介:姬波(1973-),男,河南郑州人,副教授,博士研究生,主要研究方向:机器学习、模式识别; 叶阳东(1962-),男,河南信阳人,教授,博士生导师,博士,CCF会员,主要研究方向:知识工程、机器学习、数据库; 卢红星(1965-),男,河南安阳人, 副教授,硕士,CCF会员,主要研究方向:数据挖掘、模式识别。
  • 基金资助:

    国家自然科学基金资助项目 (61170223);河南人才培养联合基金资助项目(U1204610)

Taxi gathering area recognition algorithm based on sample weight

JI Bo,YE Yangdong,LU Hongxing   

  1. School of Information Engineering, Zhengzhou University, Zhengzhou Henan 450001, China
  • Received:2012-11-21 Revised:2012-12-22 Online:2013-05-08 Published:2013-05-01
  • Contact: JI Bo

摘要: 聚类技术可以用于对具有动态、随机和异步并发特性的出租车对象进行分类。但是,现有的聚类技术认为每个出租车样本对聚类的贡献相同,没有考虑到不同样本的不同影响,这在一定程度上影响了聚类的精度。提出了一种基于样本权重的出租车聚集区识别算法——SFTA_IB算法,算法引入了样本权重来充分反映不同样本的贡献度。在此基础上,将出租车视为原变量X,出租车坐标数据视为相关变量Y,目标是寻求压缩变量T,在T中最大化保留相关变量的信息。实验表明,SFTA_IB算法可以准确识别目标样本周边的出租车聚集区,针对性地指导目标出租车个体的巡游线路,提高乘客搜寻效率。

关键词: 信息瓶颈, 样本权重, 模式识别, 出租车, 聚集区

Abstract: Dynamic, random and asynchronous taxi objects can be grouped by clustering methods. However, the traditional clustering methods treat all taxi samples equally and set weights of all samples without distinction when evaluating similarity. However, not all of the features are important to the clustering judgment. Therefore, the paper proposed a taxi gathering area recognition SFTA-IB algorithm based on sample weight. The SFTA-IB algorithm introduced sample weight to reveal the contribution level of different samples. Then, the SFTA-IB algorithm considered the taxis as the original variable X, the GPS data as the relevant variable Y. The goal was to find a compressed representation T, which was as informative as possible about Y. The experimental results show that the proposed SFTA_IB algorithm can identify the taxi gathering areas for one specified taxi, supervise the cruise strategy and improve the passenger searching efficiency.

Key words: information bottleneck, sample weight, pattern recognition, taxi, gathering area

中图分类号: