Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (5): 1334-1338.DOI: 10.11772/j.issn.1001-9081.2017102504

Previous Articles     Next Articles

Multi-source point of interest fusion algorithm based on distance and category

XU Shuang1, ZHANG Qian2, LI Yan1, LIU Jiayong1   

  1. 1. College of Electronics and Information, Sichuan University, Chengdu Sichuan 610042, China;
    2. Southwest China Research Institute of Electronic Equipment, Chengdu Sichuan 610036, China
  • Received:2017-10-23 Revised:2018-01-16 Online:2018-05-10 Published:2018-05-24
  • Contact: 刘嘉勇

基于距离类别的多源兴趣点融合算法

徐爽1, 张谦2, 李琰1, 刘嘉勇1   

  1. 1. 四川大学 电子信息学院, 成都 610042;
    2. 中国电子科技集团第二十九研究所, 成都 610036
  • 通讯作者: 刘嘉勇
  • 作者简介:徐爽(1993-),女,山东济宁人,硕士研究生,主要研究方向:机器学习、数据挖掘、大数据可视化;张谦(1990-),男,贵州遵义人,博士,主要研究方向:机器学习、数据挖掘、自然语言处理;李琰(1993-),女,贵州贵阳人,硕士研究生,主要研究方向:机器学习、数据挖掘;刘嘉勇(1962-),男,四川成都人,教授,博士,主要研究方向:信息安全、网络信息处理。

Abstract: In order to achieve effective integration and accurate fusion of multi-source Point of Interest (POI) data, a Mutually-Nearest Method considering Distance and Category (MNMDC) was proposed. Firstly, for spatial attributes, standardized weight algorithm was used to calculate the spatial similarity of the object to be fused, and the fusion set was obtained. Secondly, for non-spatial attributes, Jaro-Winkle algorithm was used to eliminate some objects with consistent categories by a low threshold, and exclude some objects with inconsistent categories by a high threshold. Finally, non-spatial Jaro-Winkle algorithm with distance constraint, category consistency constraint and high threshold was used to find out the missing objects in the spatial algorithm. The experimental results show that the average accuracy reaches 93.3%, compared with Combined Normal Weight and Title-similatity algorithm (COM-NWT) and the grid correction methods, the accuracy of MNMDC method in seven different groups of coincidence degree data, the average accuracy increases by 2.7 percentage points and 1.6 percentage points, the average recall increases by 2.3 and 1.4 percentage points. The MNMDC method allows more accurate fusion of POI data during actual fusion.

Key words: Point of Interest (POI), data fusion, spatial attribute, non-spatial attribute, distance, category

摘要: 为了更好地实现多源兴趣点(POI)数据的有效集成与精确融合,提出了一种结合空间与非空间属性的距离类别的兴趣点融合算法(MNMDC)。首先,对空间属性,通过标准化权重算法计算待融合对象的空间相似度得到融合集;其次,利用非空间Jaro-Winkle算法对融合集中类别一致的对象使用低阈值排除,对类别不一致的使用高阈值排除;最后,使用距离约束、类别一致约束和高阈值的非空间Jaro-Winkle算法找出空间算法遗漏的可融合对象。实验结果表明,该方法平均准确率达到93.3%,与空间和非空间算法(COM-NWT)及格网化纠正方法相比,在7组不同重合度的数据下MNMDC方法的平均准确率提高2.7和1.6个百分点、平均召回率提高2.3和1.4个百分点。MNMDC在实际融合过程中能更精确地融合POI数据。

关键词: 兴趣点, 数据融合, 空间属性, 非空间属性, 距离, 类别

CLC Number: