计算机应用 ›› 2016, Vol. 36 ›› Issue (3): 758-764.DOI: 10.11772/j.issn.1001-9081.2016.03.758

• 计算机软件技术 • 上一篇    下一篇

软件工程关联数据的自动构建

张宇臣, 沈备军   

  1. 上海交通大学 电子信息与电气工程学院, 上海 200240
  • 收稿日期:2015-08-13 修回日期:2015-10-14 出版日期:2016-03-10 发布日期:2016-03-17
  • 通讯作者: 沈备军
  • 作者简介:张宇臣(1990-),男,四川屏山人,硕士研究生,主要研究方向:软件仓库挖掘、关联数据;沈备军(1969-),女,浙江慈溪人,副教授,博士,CCF会员,主要研究方向:经验软件工程。
  • 基金资助:
    国家973计划项目(2015CB352203);国家自然科学基金资助项目(61472242)。

Automatic construction of software engineering linked data

ZHANG Yuchen, SHEN Beijun   

  1. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Received:2015-08-13 Revised:2015-10-14 Online:2016-03-10 Published:2016-03-17
  • Supported by:
    This work is partially supported by the National Basic Research Program (973 Program) of China (2015CB352203) and the National Natural Science Foundation of China (61472242).

摘要: 针对目前在分布异构的大规模软件开发中难以高效地知晓信息和发现知识的问题,将语义网引入软件工程领域,对多源异构数据进行细粒度语义关联,提出本体构建、关联抽取和发现的方法,实现基于本体的软件工程关联数据的自动构建。该方法对软件工程本体进行概念抽取、合并、实例消解和属性消歧,从软件仓库结构化数据集中抽取出完整无冗余的关联数据;并采用同义词、动宾短语和结构关系三个特征利用自然语言处理(NLP)技术和信息检索(IR)技术从软件仓库中发现潜在的关联数据。实验结果表明,所提出的方法能从分布式软件工程数据集中自动构建和融合生成软件工程本体,并有效地发现潜在的关联数据将其扩充到软件工程本体中;与Baseline、Phraing和O-CSTI三种方法相比,关联数据发现的召回率、精准率和F值都有显著提高。

关键词: 软件工程关联数据, 软件工程本体, 本体构建, 关联数据抽取, 关联数据发现

Abstract: Information awareness and knowledge discovery has become one of the key issues currently in distributed, heterogeneous and massive software development. In this situation, semantic Web was introduced into software engineering to build fine-grain semantic links between multi-source heterogeneous data. And a novel approach was proposed to build ontology, extract and recover links, and further construct ontology-based software engineering linked data automatically. It extracted and merged ontology concepts, resolved entities and their properties, and built complete linked data without redundancy from structural data sets in software repository. Also it recovered missing linked data from software repository using Natural Language Processing (NLP) and Information Retrieval (IR) techniques with three features including synonym, verb-object phrase and structural information. The experimental results show that the proposed approach can construct and merge software engineering ontology automatically from distributed software engineering data sets, recover missing linked data and enlarge ontology effectively. Compared with Baseline, Phrasing and O-CSTI, this approach performs much better in recall, precision and F-measure.

Key words: software engineering linked data, software engineering ontology, ontology construction, linked data extraction, linked data recovery

中图分类号: