计算机应用 ›› 2021, Vol. 41 ›› Issue (8): 2432-2439.DOI: 10.11772/j.issn.1001-9081.2020101569

所属专题: 第八届CCF大数据学术会议(CCF Bigdata 2020)

• 第八届CCF大数据学术会议 • 上一篇    下一篇

唐诗宋词中的超网络特性分析

王高杰1,2,3, 冶忠林2,3,4, 赵海兴1,2,3,4, 朱宇2,3,4, 孟磊2,3,4   

  1. 1. 青海师范大学 数学与统计学院, 西宁 810008;
    2. 青海省藏文信息处理与机器翻译重点实验室(青海师范大学), 西宁 810008;
    3. 藏文信息处理教育部重点实验室(青海师范大学), 西宁 810008;
    4. 青海师范大学 计算机学院, 西宁 810016
  • 收稿日期:2020-10-12 修回日期:2020-12-09 出版日期:2021-08-10 发布日期:2021-01-27
  • 通讯作者: 赵海兴
  • 作者简介:王高杰(1994-),男,山东五莲人,硕士研究生,CCF会员,主要研究方向:复杂网络、数据挖掘;冶忠林(1989-),男,青海民和人,讲师,博士,CCF会员,主要研究方向:问答系统、网络表示学习、社交网络数据挖掘;赵海兴(1969-),男,青海湟中人,教授,博士生导师,博士,主要研究方向:复杂网络、网络可靠性分析;朱宇(1986-),男,山东菏泽人,讲师,博士,CCF会员,主要研究方向:复杂网络、网络表示学习;孟磊(1994-),男,河南项城人,博士研究生,CCF会员,主要研究方向:复杂网络、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(11661069,61663041);青海省自然科学基金青年项目(2021-ZJ-946Q);青海师范大学自然科学中青年科研基金资助项目(2020QZR007)。

Analysis of hypernetwork characteristics in Tang poems and Song lyrics

WANG Gaojie1,2,3, YE Zhonglin2,3,4, ZHAO Haixing1,2,3,4, ZHU Yu2,3,4, MENG Lei2,3,4   

  1. 1. School of Mathematics and Statistics, Qinghai Normal University, Xining Qinghai 810008, China;
    2. Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province;(Qinghai Normal University), Xining Qinghai 810008, China;
    3. Key Laboratory of Tibetan Information Processing, Ministry of Education(Qinghai Normal University), Xining Qinghai 810008, China;
    4. Computer College, Qinghai Normal University, Xining Qinghai 810016, China
  • Received:2020-10-12 Revised:2020-12-09 Online:2021-08-10 Published:2021-01-27
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (11661069, 61763041), the Youth Program of Natural Science Foundation of Qinghai Province (2021-ZJ-946Q), the Middle-Youth Research Program of Natural Science Foundation of Qinghai Normal University (2020QZR007).

摘要: 目前,唐诗宋词在文学角度的研究成果较多,然而采用基于超图的超网络方法对唐诗宋词进行研究的成果较少,且仅有的这种研究局限于对字频和词频的研究。采用超网络数据分析方法对唐诗宋词进行分析研究有助于探索传统文学角度无法涉及的广度,也有助于发现唐诗宋词所折射出来的文学组词规律和时代背景。因此,首先基于全唐诗和全宋词这两个古文本语料库,分别建立了唐诗超网络和宋词超网络。在构建该超网络时,将一首诗或一首词作为超边,将唐诗中的字或宋词中的字作为超边内的节点。然后,对唐诗超网络和宋词超网络的拓扑指标和网络特性,如节点超度、节点超度分布、超边节点度、超边节点度分布,进行了实验分析,旨在发现唐代诗人和宋代词人的用字、用词和审美倾向。最后,基于李白诗词集、杜甫诗词集、苏轼诗词集、辛弃疾诗词集的诗词作品构建了作品集超网络,并计算了相关网络参数。分析结果表明,唐诗宋词超网络中的最大超度与最小超度相差较大,且其超度分布近似为幂律分布,该结果表明唐诗宋词超网络的无标度特性。另外,唐诗宋词超网络的超边节点度也存在明显的分布特性,具体来说,唐诗超网络的超边节点度较多分布在20~100,宋词超网络的超边节点度较多分布30~130。并且,通过分析发现作品集超网络具有较小的平均路径长度和较大的集聚系数,该结果反映作品集超网络具有小世界特性。

关键词: 唐诗宋词, 超网络, 超度, 超度分布, 超边节点度, 超边节点度分布, 无标度, 小世界

Abstract: At present, there are many research results in Tang poems and Song lyrics from the perspective of literature, but there are few research results in Tang poems and Song lyrics by using the hypergraph based hypernetwork method, and the only researches of this kind are also limited to the study of Chinese character frequency and word frequency. The analysis and study of Tang poems and Song lyrics by using the method of hypernetwork data analysis is helpful to explore the breadth that cannot be reached by the traditional perspective of literature, and to discover the law of word composition laws in literatures and historical backgrounds reflected by Tang poems and Song lyrics. Therefore, based on two ancient text corpuses:Quan Tang Shi and Quan Song Ci, the hypernetworks of Tang poems and Song lyrics were established respectively. In the construction of the hypernetworks, a Tang poem or a Song lyrics was taken as a hyperedge, and the characters in Tang poems or Song lyrics were taken as the nodes within the hyperedge. Then, the topological indexes and network characteristics of the hypernetworks of Tang poems and Song lyrics, such as node hyperdegree, node hyperdegree distribution, hyperedge node degree, and hyperedge node degree distribution, were experimentally analyzed, in order to find out the characters use, word use and aesthetic tendency of poets in Tang dynasty and lyricists in Song dynasty. Finally, based on the works of poems and lyrics of Li Bai, Du Fu, Su Shi and Xin Qiji, the work hypernetworks were constructed, and the relevant network parameters were calculated. The analysis results show that there is a great difference between the maximum and minimum hyperdegrees of the two hypernetwork, and the distribution of the hyperdegrees is approximate to the power-law distribution, which indicates the scale-free property of the two hypernetworks. In addition, the degrees of hyperedge nodes in Tang poem hypernetwork are also have obvious distribution characteristics. In specific, the degrees of hyperedge nodes in Tang poems and Song lyrics are more distributed between 20 and 100, and the degrees of hyperedge nodes in Song lyric hypernetwork are more distributed between 30 and 130. Moreover, it is found that the work hypernetworks have smaller average path length and a larger clustering coefficient, which reflects the small-world characteristics of the work hypernetworks.

Key words: Tang poems and Song lyrics, hypernetwork, hyperdegree, hyperdegree distribution, hyperedge node degree, hyperedge node degree distribution, scale-free, small world

中图分类号: