Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
王菲1,陶冶2,刘家旺1,李伟3,秦修功4,张宁5
通讯作者:
基金资助:
Abstract: The development of the smart home field was reliant on the construction of a rich spatial-temporal knowledge graph to support the design and execution of downstream tasks. However, constructing a spatio-temporal knowledge graph of smart home space was faced with challenges such as diverse data sources, low data quality, and limited scale. Therefore, a dual-modal knowledge extraction framework integrating the relative location information of description documents and user behavior logs was proposed to fully mine multi-modal information in device description documents and user behavior logs to achieve efficient and accurate knowledge extraction and graph construction. The framework was composed of two parts: First, a method based on Relative Position Layout Matching (RPLM) was proposed, which was applied to leverage the relative position characteristics of device description documents to be correlated and matched with image and text in device description documents. At the same time, the ontology model of description documents were designed and integrated with the Large Language Model (LLM) to extract structured information and construct the knowledge graph of description documents. Secondly, the functional correlation analysis algorithm (FCA) and the device usage behavior processing algorithm (DUBP) were designed to extract function association device information from user behavior logs and construct the spatio-temporal knowledge graph of family space. Finally, LayoutLMv3, ERNIE-Layout, GeoLayoutLM, etc. were selected as benchmark models, and the verification was carried out based on a self-built Chinese Manual Document Layout Analysis dataset (CMDLA), a synthesized user behavior log dataset, and three public document analysis datasets. The experimental results demonstrate that the framework outperforms existing methods in both the accuracy and efficiency of knowledge extraction, and exhibits significant advantages in heterogeneous data fusion and spatiotemporal modeling tasks. In experiments on the smart home domain dataset, the proposed method achieves an accuracy of 96.39%, 0.97 percentage points higher than the second-best model GeoLayoutLM.
Key words: smart home, device description document, behavior log, knowledge graph, multimodal fusion, knowledge extraction
摘要: 智慧家庭领域发展依赖于构建丰富的时空知识图谱支撑下游任务设计与执行。然而,构建智慧家庭空间的时空知识图谱面临数据源多样、数据质量低以及规模有限等挑战。为了解决以上问题,文中提出了一种融合说明文档相对位置信息与用户行为日志的双模态知识提取框架,充分挖掘设备说明文档和用户行为日志中的多模态信息,实现高效的知识提取与图谱构建。框架包括两部分:首先,提出一个基于相对位置布局匹配的方法(RPLM),利用说明文档的相对位置特性,对设备说明文档中的图像和文本进行关联匹配,同时设计说明文档的本体模型,与大语言模型(LLM)融合,提取结构化信息构建说明文档知识图谱;其次,设计了功能关联分析算法(FCA)和设备使用行为处理算法(DUBP),从用户行为日志中提取功能关联的设备信息并构建家庭空间的时空知识图谱;最后,选取LayoutLMv3、ERNIE-Layout和GeoLayoutLM等作为基准模型,基于一个自建中文说明文档数据集(CMDLA)和合成的用户行为日志数据集,以及三个公开文档分析数据集进行验证。实验结果表明,所提框架在家庭领域知识提取的准确性和效率上优于基线方法,在异构数据融合与时空建模任务中表现出显著优势,在智慧家庭领域数据集上所提方法的准确率达到96.39%,较次优模型GeoLayoutLM提高0.97个百分点。
关键词: 智能家庭, 设备说明文档, 行为日志, 知识图谱, 多模态融合, 知识抽取
CLC Number:
51-1307/TP
王菲 陶冶 刘家旺 李伟 秦修功 张宁. 面向智慧家庭空间时空知识图谱的双模态融合构建方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025010114.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025010114