Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3692-3699.DOI: 10.11772/j.issn.1001-9081.2021101768
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Yuqi DU, Jin ZHENG(), Yang WANG, Cheng HUANG, Ping LI
Received:
2021-10-14
Revised:
2022-01-07
Accepted:
2022-01-24
Online:
2022-03-04
Published:
2022-12-10
Contact:
Jin ZHENG
About author:
DU Yuqi, born in 1998, M. S. candidate. Her research interests include deep learning, natural language processing.Supported by:
通讯作者:
郑津
作者简介:
杜雨奇(1998—),女,四川南充人,硕士研究生,主要研究方向:深度学习、自然语言处理基金资助:
CLC Number:
Yuqi DU, Jin ZHENG, Yang WANG, Cheng HUANG, Ping LI. Text segmentation model based on graph convolutional network[J]. Journal of Computer Applications, 2022, 42(12): 3692-3699.
杜雨奇, 郑津, 王杨, 黄诚, 李平. 基于图卷积网络的文本分割模型[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3692-3699.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021101768
信息类型 | Wikicities | Wikielements |
---|---|---|
文档数 | 100 | 118 |
段落数 | 6 670 | 2 810 |
单词数 | 492 402 | 191 762 |
文本块长度 | 3.33±3.05 | 5.15±4.57 |
每个文档中文本块数 | 6.82±2.57 | 12.2±2.79 |
Tab.1 Statistics of text segmentation datasets
信息类型 | Wikicities | Wikielements |
---|---|---|
文档数 | 100 | 118 |
段落数 | 6 670 | 2 810 |
单词数 | 492 402 | 191 762 |
文本块长度 | 3.33±3.05 | 5.15±4.57 |
每个文档中文本块数 | 6.82±2.57 | 12.2±2.79 |
模型 | 学习方式 | Wikicities | Wikielements |
---|---|---|---|
Random | 无监督 | 47.14 | 50.08 |
文献[ | 无监督 | 22.10 | 20.10 |
GraphSeg | 无监督 | 39.95 | 49.12 |
WIKI‑727K | 有监督 | 19.68 | 41.63 |
TLT‑TS | 有监督 | 19.21 | 20.33 |
CATS | 有监督 | 16.85 | 18.41 |
TS‑GCN | 有监督 | 19.13 | 18.03 |
Tab. 2 Comparison of Pk value among different models for text segmentation task
模型 | 学习方式 | Wikicities | Wikielements |
---|---|---|---|
Random | 无监督 | 47.14 | 50.08 |
文献[ | 无监督 | 22.10 | 20.10 |
GraphSeg | 无监督 | 39.95 | 49.12 |
WIKI‑727K | 有监督 | 19.68 | 41.63 |
TLT‑TS | 有监督 | 19.21 | 20.33 |
CATS | 有监督 | 16.85 | 18.41 |
TS‑GCN | 有监督 | 19.13 | 18.03 |
预训练词向量 | Wikicities | Wikielements |
---|---|---|
GloVe-300d | 19.62 | 18.60 |
crawl-300d | 19.90 | 18.45 |
wiki-news-300d | 19.13 | 18.03 |
Tab. 3 Segmentation results under different pre-training word vectors
预训练词向量 | Wikicities | Wikielements |
---|---|---|
GloVe-300d | 19.62 | 18.60 |
crawl-300d | 19.90 | 18.45 |
wiki-news-300d | 19.13 | 18.03 |
注意力计算方法类型 | Wikicities | Wikielements |
---|---|---|
未采用注意力 | 22.07 | 19.60 |
欧氏距离注意力 | 20.45 | 18.60 |
语义相似性注意力 | 19.13 | 18.03 |
Tab. 4 Segmentation results of different attention calculation methods
注意力计算方法类型 | Wikicities | Wikielements |
---|---|---|
未采用注意力 | 22.07 | 19.60 |
欧氏距离注意力 | 20.45 | 18.60 |
语义相似性注意力 | 19.13 | 18.03 |
1 | HEARST M A. TextTiling: segmenting text into multi-paragraph subtopic passages[J]. Computational Linguistics, 1997, 23(1): 33-64. |
2 | 秦兵,刘挺,李生. 多文档自动文摘综述[J]. 中文信息学报, 2005, 19(6): 13-20, 56. 10.3969/j.issn.1003-0077.2005.06.003 |
QIN B, LIU T, LI S. Survey of multi-document summarization[J]. Journal of Chinese Information Processing, 2005, 19(6): 13-20, 56. 10.3969/j.issn.1003-0077.2005.06.003 | |
3 | ANGHELUTA R, DE BUSSER R, MOENS M F. The use of topic segmentation for automatic summarization[C]// Proceedings of the Association for Computational Linguistics 2002 Post-Conference Workshop on Automatic Summarization. Stroudsburg, PA: Association for Computational Linguistics, 2002: 1421-1426. |
4 | HUANG X J, PENG F C, SCHUURMANS D, et al. Applying machine learning to text segmentation for information retrieval[J]. Information Retrieval, 2003, 6(3/4): 333-362. 10.1023/a:1026028229881 |
5 | SHTEKH G, KAZAKOVA P, NIKITINSKY N, et al. Exploring influence of topic segmentation on information retrieval quality[C]// Proceedings of the 2018 International Conference on Internet Science, LNCS 11193. Cham: Springer, 2018: 131-140. |
6 | 马长林,王涛. 基于相关主题模型和多层知识表示的文本情感分析[J]. 郑州大学学报(理学版), 2021, 53(4): 30-35. |
MA C L, WANG T. Text sentiment analysis based on correlated topic model and multi-layer knowledge representation[J]. Journal of Zhengzhou University (Natural Science Edition), 2021, 53(4): 30-35. | |
7 | ZIRN C, GLAVAŠ G, NANNI F, et al. Classifying topics and detecting topic shifts in political manifestos[C]// Proceedings of the 2016 International Conference on the Advances in Computational Analysis of Political Text. Zagreb: University of Zagreb, 2016: 88-93. |
8 | MANUVINAKURIKE R, PAETZEL M, QU C, et al. Toward incremental dialogue act segmentation in fast-paced interactive dialogue systems[C]// Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Stroudsburg, PA: Association for Computational Linguistics, 2016: 252-262. 10.18653/v1/w16-3632 |
9 | ZHAO T Y, KAWAHARA T. Joint learning of dialog act segmentation and recognition in spoken dialog using neural networks[C]// Proceedings of the 18th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). [S.l.]: Asian Federation of Natural Language Processing, 2017: 704-712. 10.18653/v1/w18-5021 |
10 | VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. (2018-02-04) [2021-06-20].. |
11 | CHOI F Y Y. Advances in domain independent linear text segmentation[C]// Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2000: 26-33. |
12 | UTIYAMA M, ISAHARA H. A statistical model for domain-independent text segmentation[C]// Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2001: 499-506. 10.3115/1073012.1073076 |
13 | LI J, SUN A X, JOTY S. SegBot: a generic neural text segmentation model with pointer network[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2018: 4166-4172. 10.24963/ijcai.2018/579 |
14 | KOSHOREK O, COHEN A, MOR N, et al. Text segmentation as a supervised learning task[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 469-473. 10.18653/v1/n18-2075 |
15 | ARNOLD S, SCHNEIDER R, CUDRÉ-MAUROUX P, et al. SECTOR: a neural model for coherent topic segmentation and classification[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 169-184. 10.1162/tacl_a_00261 |
16 | BARROW J, JAIN R, MORARIU V, et al. A joint model for document segmentation and segment labeling [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 313-322. 10.18653/v1/2020.acl-main.29 |
17 | LUKASIK M, DADACHEV B, PAPINENI K, et al. Text segmentation by cross segment attention[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2020: 4707-4716. 10.18653/v1/2020.emnlp-main.380 |
18 | XING L Z, HACKINEN B, CARENINI G, et al. Improving context modeling in neural topic segmentation [C]// Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2020: 626-636. |
19 | WU B, WEI B F, LIU J, et al. Faceted text segmentation via multitask learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(9): 3846-3857. 10.1109/tnnls.2020.3015996 |
20 | GLAVAŠ G, NANNI F, PONZETTO S P. Unsupervised text segmentation using semantic relatedness graphs[C]// Proceedings of the 5th Joint Conference on Lexical and Computational Semantics. Stroudsburg, PA: Association for Computational Linguistics, 2016: 125-130. 10.18653/v1/s16-2016 |
21 | YAO L, MAO C S, LUO Y. Graph convolutional networks for text classification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019: 7370-7377. 10.1609/aaai.v33i01.33017370 |
22 | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22) [2021-06-20].. 10.48550/arXiv.1609.02907 |
23 | CHEN H, BRANAVAN S R K, BARZILAY R, et al. Global models of document structure using latent permutations[C]// Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2009: 371-379. 10.3115/1620754.1620808 |
24 | BEEFERMAN D, BERGER A, LAFFERTY J. Statistical models for text segmentation[J]. Machine Learning, 1999, 34(1/2/3): 177-210. 10.1023/a:1007506220214 |
25 | GLAVAŠ G, SOMASUNDARAN S. Two-level transformer and auxiliary coherence modeling for improved text segmentation[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020:7797-7804. 10.1609/aaai.v34i05.6284 |
[1] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[2] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[3] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[4] | Chuanlin PANG, Rui TANG, Ruizhi ZHANG, Chuan LIU, Jia LIU, Shibo YUE. Distributed power allocation algorithm based on graph convolutional network for D2D communication systems [J]. Journal of Computer Applications, 2024, 44(9): 2855-2862. |
[5] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[6] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[7] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[8] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[9] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[10] | Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688. |
[11] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[12] | Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI. Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network [J]. Journal of Computer Applications, 2024, 44(9): 2952-2957. |
[13] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
[14] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[15] | Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU. Sentiment classification model of psychological counseling text based on attention over attention mechanism [J]. Journal of Computer Applications, 2024, 44(8): 2393-2399. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||