计算机应用 ›› 2020, Vol. 40 ›› Issue (11): 3159-3165.DOI: 10.11772/j.issn.1001-9081.2020030301

• 人工智能 • 上一篇    下一篇

基于联合特征和XGBoost的活动语义识别方法

郭茂祖1,2, 张彬1,2, 赵玲玲3, 张昱1,2,4   

  1. 1. 北京建筑大学 电气与信息工程学院, 北京 100044;
    2. 建筑大数据智能处理方法研究北京市重点实验室(北京建筑大学), 北京 100044;
    3. 哈尔滨工业大学 计算机科学与技术学院, 哈尔滨 150001;
    4. 深部岩土力学与地下工程国家重点实验室(中国矿业大学), 北京 100083
  • 收稿日期:2020-03-18 修回日期:2020-06-19 出版日期:2020-11-10 发布日期:2020-11-24
  • 通讯作者: 赵玲玲(1980-),女,黑龙江齐齐哈尔人,讲师,博士,主要研究方向:机器学习、智慧城市、生物信息学;zhaoll@hit.edu.cn
  • 作者简介:郭茂祖(1966-),男,山东德州人,教授,博士生导师,博士,主要研究方向:机器学习、智慧城市、生物信息学;张彬(1994-),男,河北石家庄人,硕士研究生,主要研究方向:机器学习、智慧城市;张昱(1979-),男,内蒙古呼和浩特人,副教授,博士,主要研究方向:大数据、机器学习、智慧城市
  • 基金资助:
    国家自然科学基金面上项目(61871020);北京市教委科技计划重点项目(KZ201810016019);北京市属高校高水平创新团队建设计划项目(IDHT20190506);教育部2018产学合作协同育人项目(201801113001);北京建筑大学优秀主讲教师培育计划项目(21082718041);北京建筑大学市属高校基本科研业务费专项资金资助项目(X18018);北京建筑大学2020年度研究生创新项目(PG202005)。

Activity semantic recognition method based on joint features and XGBoost

GUO Maozu1,2, ZHANG Bin1,2, ZHAO Lingling3, ZHANG Yu1,2,4   

  1. 1. School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China;
    2. Beijing Key Laboratory of Intelligent Processing for Building Big Data;(Beijing University of Civil Engineering and Architecture), Beijing 100044, China;
    3. School of Computer Science and Technology, Harbin Institute of Technology, Harbin Heilongjiang 150001, China;
    4. State Key Laboratory for Geomechanics and Deep Underground Engineering(China University of Mining and Technology), Beijing 100083, China
  • Received:2020-03-18 Revised:2020-06-19 Online:2020-11-10 Published:2020-11-24
  • Supported by:
    This work is partially supported by General Program of the National Natural Science Foundation of China (61871020),the Key Project of Science and Technology Plan of Beijing Municipal Commission of Education (KZ201810016019), the Beijing University High-level Innovation Team Building Plan (IDHT20190506), the Industry-University Cooperative Education Project of Ministry of Education (201801113001), the Excellent Teachers Development Foundation of Beijing University of Civil Engineering and Architecture (21082718041), the Fundamental Research Funds for Beijing University of Civil Engineering and Architecture (X18018), the Graduate Innovation Project of Beijing University of Civil Engineering and Architecture in 2020 (PG202005).

摘要: 针对以往活动语义识别研究单纯提取时间维度上的序列特征以及周期特征、缺乏对空间信息的深度挖掘等问题,提出一种基于联合特征和极限梯度提升(XGBoost)的活动语义识别方法。首先,挖掘时间信息中的活动周期性特征和空间信息中的经纬度特征;然后,使用经纬度信息通过具有噪声的基于密度的聚类(DBSCAN)算法提取空间区域热度特征,将这些特征组成特征向量来刻画用户活动语义;最后,采用集成学习方法中的XGBoost算法建立活动语义识别模型。在FourSquare的两个公共签到数据集上,基于联合特征的模型比基于时间特征的模型在识别准确率上提高了28个百分点,与上下文感知混合(CAH)方法和时空活动偏好(STAP)方法对比,所提方法的识别准确率分别提高了30个百分点和5个百分点。实验结果表明所提方法与对比方法相比在活动语义识别问题上更加准确有效。

关键词: 时空数据, 活动语义识别, 空间热度, 具有噪声的基于密度的聚类, 极限梯度提升算法

Abstract: The current research on the activity semantic recognition only extracts the sequence features and periodic features on the time dimension, and lacks deep mining of spatial information. To solve these problems, an activity semantic recognition method based on joint features and eXtreme Gradient Boosting (XGBoost) was proposed. Firstly, the activity periodic features in the temporal information as well as the latitude and longitude features in the spatial information were extracted. Then the latitude and longitude information was used to extract the heat features of the spatial region based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. The user activity semantics was represented by the feature vectors combined with these features. Finally, the activity semantic recognition model was established through the XGBoost algorithm in the integrated learning method. On two public check-in datasets of FourSquare, the model based on joint features has a 28 percentage points improvement in recognition accuracy compared to the model with only temporal features, and compared with the Context-Aware Hybrid (CAH) method and the Spatial Temporal Activity Preference (STAP) method, the proposed method has the recognition accuracy increased by 30 percentage points and 5 percentage points respectively. Experimental results show that the proposed method is more accurate and effective on the problem of activity semantic recognition compared to the the comparison methods.

Key words: spatio temporal data, activity semantic recognition, spatial heat, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), eXtreme Gradient Boosting (XGBoost) algorithm

中图分类号: