Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (1): 52-59.DOI: 10.11772/j.issn.1001-9081.2025010114

• Artificial intelligence • Previous Articles     Next Articles

Bimodal fusion method for constructing spatio-temporal knowledge graph in smart home space

Fei WANG1, Ye TAO1(), Jiawang LIU1, Wei LI2, Xiugong QIN3, Ning ZHANG4   

  1. 1.College of Information Science and Technology,Qingdao University of Science and Technology,Qingdao Shandong 266061,China
    2.Institute of Big Data and Artificial Intelligence,China Telecom Research Institute,Beijing 102209,China
    3.Beijing Research Institute of Automation for Machinery Industry Company Limited,Beijing 100120,China
    4.Qingdao HaiShi IoT Technology Company Limited,Qingdao Shandong 266061,China
  • Received:2025-02-07 Revised:2025-04-06 Accepted:2025-04-08 Online:2026-01-10 Published:2026-01-10
  • Contact: Ye TAO
  • About author:WANG Fei, born in 1999, M. S. candidate. Her research interests include natural language processing, knowledge graph.
    LIU Jiawang, born in 1999, M. S. candidate. His research interests include speech synthesis, natural language processing, question answering systems, knowledge graph.
    LI Wei, born in 1979, M. S., senior engineer. His research interests include big data, artificial intelligence.
    QIN Xiugong, born in 1990, Ph. D., senior engineer. His research interests include human-computer interaction, service robotics.
    ZHANG Ning, born in 1981, engineer. His research interests include automation, intelligent manufacturing.
  • Supported by:
    National Key Research and Development Program of China(2023YFF0612102);Key Technology Research and Industrialization Demonstration Project of Qingdao(24-1-2-qljh-19-gx)

面向智慧家庭空间的时空知识图谱的双模态融合构建方法

王菲1, 陶冶1(), 刘家旺1, 李伟2, 秦修功3, 张宁4   

  1. 1.青岛科技大学 信息科学技术学院,山东 青岛 266061
    2.中国电信研究院 大数据与人工智能研究所,北京 102209
    3.北京机械工业自动化研究所有限公司,北京 100120
    4.青岛海施物联科技有限公司,山东 青岛 266061
  • 通讯作者: 陶冶
  • 作者简介:王菲(1999—),女,山东潍坊人,硕士研究生,主要研究方向:自然语言处理、知识图谱
    刘家旺(1999—),男,山东济宁人,硕士研究生,主要研究方向:语音合成、自然语言处理、问答系统、知识图谱
    李伟(1979—),男,四川南充人,高级工程师,硕士,主要研究方向:大数据、人工智能
    秦修功(1990—),男,山东滕州人,高级工程师,博士,主要研究方向:人机交互、服务机器人
    张宁(1981—),男,山东滨州人,工程师,主要研究方向:自动化、智能制造。
  • 基金资助:
    国家重点研发计划项目(2023YFF0612102);青岛市关键技术研发及产业化示范项目(24-1-2-qljh-19-gx)

Abstract:

The development of smart home field relies on the construction of a rich spatial-temporal knowledge graph to support the design and execution of downstream tasks. However, constructing a spatio-temporal knowledge graph of smart home space faces challenges such as diverse data sources, low data quality, and limited scale. Therefore, a bimodal knowledge extraction framework integrating relative location information of description documents and user behavior logs was proposed to mine multi-modal information in device description documents and user behavior logs fully, so as to achieve efficient and accurate knowledge extraction and graph construction. The framework was composed of two parts: firstly, a method based on Relative Position Layout Matching (RPLM) was proposed to utilize the relative position characteristics of device description documents to match image with text in device description documents correctly. At the same time, the ontology model of description documents was designed and integrated with the Large Language Model (LLM) to extract structured information and construct the knowledge graph of description documents. Secondly, the Functional Correlation Analysis (FCA) algorithm and the Device Usage Behavior Processing (DUBP) algorithm were designed to extract function associated device information from user behavior logs and construct the spatio-temporal knowledge graph of home space. Finally, LayoutLMv3, ERNIE-Layout, GeoLayoutLM, and other models were selected as benchmark models, and the verification was carried out on a self-built Chinese Manual Document Layout Analysis (CMDLA) dataset, a synthesized user behavior log dataset, and three public document analysis datasets. The results show that the proposed framework outperforms the baseline methods in terms of accuracy and efficiency of knowledge extraction on the family domain dataset, achieving an accuracy of 96.39%, which is 0.97 percentage points higher than that of the suboptimal GeoLayoutLM method. It demonstrates significant advantages in heterogeneous data fusion and spatiotemporal modeling tasks.

Key words: smart home, device description document, behavior log, knowledge graph, multi-modal fusion, knowledge extraction

摘要:

智慧家庭领域的发展依赖于构建丰富的时空知识图谱支撑下游任务的设计与执行。然而,构建智慧家庭空间的时空知识图谱面临数据源多样、数据质量低以及规模有限等挑战。因此,提出一种融合说明文档相对位置信息与用户行为日志的双模态知识提取框架来充分挖掘设备说明文档和用户行为日志中的多模态信息,从而高效地实现知识提取与图谱构建。该框架包括两部分:首先,提出一个基于相对位置布局匹配(RPLM)的方法,以利用说明文档的相对位置特性来对设备说明文档中的图像和文本进行关联匹配,同时设计说明文档的本体模型,并与大语言模型(LLM)融合,提取结构化信息并构建说明文档知识图谱;其次,设计功能关联分析(FCA)算法和设备使用行为处理(DUBP)算法,从用户行为日志中提取功能关联的设备信息并构建家庭空间的时空知识图谱。选取LayoutLMv3、ERNIE-Layout和GeoLayoutLM等作为基准模型,并在一个自建中文说明文档布局分析(CMDLA)数据集和合成的用户行为日志数据集以及3个公开文档分析数据集上进行验证。结果表明,所提框架在家庭领域数据集上的知识提取准确性和效率上优于基线方法,准确率达到96.39%,比次优方法GeoLayoutLM提高了0.97个百分点,在异构数据融合与时空建模任务中表现出显著优势。

关键词: 智能家庭, 设备说明文档, 行为日志, 知识图谱, 多模态融合, 知识抽取

CLC Number: