计算机应用 ›› 2020, Vol. 40 ›› Issue (6): 1574-1579.DOI: 10.11772/j.issn.1001-9081.2019101792

• 人工智能 • 上一篇    下一篇

领域本体驱动的招投标网页解析方法

马冬雪1,2, 宋设3, 谢振平1,2, 刘渊1,2   

  1. 1.江南大学 数字媒体学院,江苏 无锡 214122
    2.江苏省媒体设计与软件技术重点实验室(江南大学),江苏 无锡 214122
    3.浪潮卓数大数据产业发展有限公司,江苏 无锡 214125
  • 收稿日期:2019-10-22 修回日期:2019-12-12 出版日期:2020-06-10 发布日期:2020-06-18
  • 通讯作者: 谢振平(1979—)
  • 作者简介:马冬雪(1995—),女,山东德州人,硕士研究生,主要研究方向:自然语言处理。宋设(1983—),男,山东济南人,工程师,主要研究方向:信息系统、大数据。谢振平(1979—),男,江苏无锡人,教授,博士,CCF会员,主要研究方向:知识表示、认知学习。刘渊(1967—),男,江苏无锡人,教授,硕士,CCF高级会员,主要研究方向:网络流量监测、社交网络、数字媒体。
  • 基金资助:

    国家自然科学基金资助项目(61872166);江苏省科技计划项目(BE2018056)。

Domain ontology driven approach for bidding webpage parsing

MA Dongxue1,2, SONG She3, XIE Zhenping1,2, LIU Yuan1,2   

  1. 1. College of Digital Media, Jiangnan University, Wuxi Jiangsu 214122, China
    2. Jiangsu Key Laboratory of Media Design and Software Technology (Jiangnan University), Wuxi Jiangsu 214122, China
    3. Inspur Zhuoshu Big Data Industry Development Company Limited, Wuxi Jiangsu 214125, China
  • Received:2019-10-22 Revised:2019-12-12 Online:2020-06-10 Published:2020-06-18
  • Contact: XIE Zhenping, born in 1979, Ph. D., professor. His research interests include knowledge representation, cognitive learning.
  • About author:MA Dongxue, born in 1995, M. S. candidate. Her research interests include natural language processing.SONG She, born in 1983,engineer. His research interests include information system, big data.XIE Zhenping, born in 1979, Ph. D., professor. His research interests include knowledge representation, cognitive learning.LIU Yuan, born in 1967, M. S., professor. His research interests include network traffic monitoring, social network, digital media.
  • Supported by:

    National Natural Science Foundation of China (61872166), the Science and Technology Plan Project of Jiangsu Province (BE2018056).

摘要:

针对正则表达式解析招投标网页效率低下的问题,提出了一种基于招投标领域本体的网页自动化解析新方法。首先,分析了招投标网页文本的结构特征;其次,构建了招投标本体的轻量级领域知识模型;最后,给出一种招投标网页元素语义匹配与抽取算法,实现招投标网页的自动化解析。实验结果表明,新方法通过自适应的解析,准确率、召回率分别可达到95.33%、88.29%,与正则表达式方法相比,分别提高了3.98个百分点和3.81个百分点。所提方法可实现自适应地对招投标网页中语义信息的结构化解析抽取,能够较好地满足实用性能要求。

关键词: 招投标, 领域本体, 网页解析, 元解析模型, 知识图谱

Abstract:

In order to solve the low efficiency problem of parsing bidding webpages when using regular expression, a new automatic method was proposed based on bidding ontology model. Firstly, the structural features of bidding webpage texts were analyzed. Furthermore, a lightweight domain knowledge model on bidding ontology was constructed. Finally, a new algorithm for semantic matching and extraction of bidding webpage elements was introduced to realize the automatic parsing of bidding webpages. The experimental results show that, the accuracy and recall of the new method can reach 95.33% and 88.29% respectively by adaptive parsing. Compared with the regular expression method, the performance can be improved by 3.98 percentage points and 3.81 percentage points respectively. The proposed method can adaptively realize the structured parsing and extraction of semantic information in bidding webpages, and can satisfy the requirements of practical applications.

Key words: bidding, domain ontology, webpage parsing, meta-parsing model, knowledge graph

中图分类号: