计算机应用 ›› 2021, Vol. 41 ›› Issue (5): 1361-1366.DOI: 10.11772/j.issn.1001-9081.2020081203

所属专题: 数据科学与技术

• 数据科学与技术 • 上一篇    下一篇

基于功效特征的专利聚类方法

马建红1, 曹文斌1, 刘元刚2, 夏爽3   

  1. 1. 河北工业大学 人工智能与数据科学学院, 天津 300401;
    2. 天津市科学技术协会, 天津 300041;
    3. 天津科学技术馆, 天津 300210
  • 收稿日期:2020-08-12 修回日期:2020-11-12 出版日期:2021-05-10 发布日期:2021-05-19
  • 通讯作者: 马建红
  • 作者简介:马建红(1965-),女,河北保定人,教授,博士生导师,博士,CCF会员,主要研究方向:软件工程、自然语言处理、知识图谱、创新理论与方法;曹文斌(1995-),男,山西阳泉人,硕士研究生,主要研究方向:自然语言处理、软件工程;刘元刚(1978-),男,山东德州人,高级工程师,博士,主要研究方向:应用化学、科技管理;夏爽(1984-),男,河北唐山人,硕士,主要研究方向:测控技术、科技管理。
  • 基金资助:
    科技部创新方法工作专项(2019IM020300)。

Patent clustering method based on functional effect

MA Jianhong1, CAO Wenbin1, LIU Yuangang2, XIA Shuang3   

  1. 1. College of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;
    2. Tianjin Science and Technology Association, Tianjin 300041, China;
    3. Tianjin Science and Technology Museum, Tianjin 300210, China
  • Received:2020-08-12 Revised:2020-11-12 Online:2021-05-10 Published:2021-05-19
  • Supported by:
    This work is partially supported by the Innovation Methods Work Special Project of Ministry of Science and Technology(2019IM020300).

摘要: 当前专利是按照领域划分的,而基于功效特征可以实现跨领域的专利聚类,这在企业创新设计中具有重要意义,而精确提取专利功效特征和快速获得最优聚类结果是其中的关键任务。为此提出一种信息实体语义增强表示(ERNIE)和卷积神经网络(CNN)相结合的功效特征联合提取(FEI-Joint)模型来提取专利文献的功效特征,并且改进自组织神经网络(SOM)算法,从而提出具有早期拒绝策略与类合并思想的自组织神经网络(ERCM-SOM)来实现基于功效特征的专利聚类。对FEI-Joint模型与TF-IDF、狄利克雷分布(LDA)、CNN在特征提取后的聚类效果上进行比较和分析,结果表明其F-measure值比其他模型有明显提高。ERCM-SOM算法与K-Means算法、SOM算法相比,在F-measure值提高的同时,其时间较SOM算法有明显缩短。对比使用专利分类号(IPC)的专利分类,采用基于功效特征的聚类方法可实现跨领域的专利聚类效果,为设计者借鉴其他领域的设计方法奠定了基础。

关键词: 专利聚类, 信息实体语义增强表示, 卷积神经网络, 跨领域, 自组织神经网络

Abstract: At present, patents are divided according to their domains, and cross-domain patent clustering can be realized based on the functional effect, which is of great significance in enterprise innovation design. Accurate extraction of patent functional effect and fast acquisition of optimal clustering results are the key tasks in it. Therefore, a Functional Effect Information-Joint (FEI-Joint) model combining Enhanced Language Representation with Informative Entities (ERNIE) and Convolutional Neural Network (CNN) was proposed to extract the functional effects of patent documents, and the Self-Organizing Map (SOM) algorithm was improved, so as to propose an Early Reject based Class Merge Self-Organizing Map (ERCM-SOM) to realize the patent clustering based on functional effect. FEI-Joint model was compared with Term-Frequency-Inverse-Document-Frequency (TF-IDF), Latent Dirichlet Allocation (LDA) and CNN in the clustering effect after feature extraction, and the results show that the F-measure value of the proposed model was obviously improved than those of other models. Compared with K-Means algorithm and SOM algorithm, ERCM-SOM algorithm has higher F-measure value while has significantly shorter time than that of SOM algorithm. Compared with the patent classification using International Patent Classification (IPC), the clustering method based on functional effect can achieve cross-domain patent clustering effect, which lays a foundation for designers to learn from design methods in other domains.

Key words: patent clustering, Enhanced Language Representation with Informative Entities (ERNIE), Convolutional Neural Network (CNN), cross-domain, Self-Organizing Map (SOM)

中图分类号: