Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3692-3699.DOI: 10.11772/j.issn.1001-9081.2021101768
• Artificial intelligence • Previous Articles
Yuqi DU, Jin ZHENG(), Yang WANG, Cheng HUANG, Ping LI
Received:
2021-10-14
Revised:
2022-01-07
Accepted:
2022-01-24
Online:
2022-03-04
Published:
2022-12-10
Contact:
Jin ZHENG
About author:
DU Yuqi, born in 1998, M. S. candidate. Her research interests include deep learning, natural language processing.Supported by:
通讯作者:
郑津
作者简介:
杜雨奇(1998—),女,四川南充人,硕士研究生,主要研究方向:深度学习、自然语言处理基金资助:
CLC Number:
Yuqi DU, Jin ZHENG, Yang WANG, Cheng HUANG, Ping LI. Text segmentation model based on graph convolutional network[J]. Journal of Computer Applications, 2022, 42(12): 3692-3699.
杜雨奇, 郑津, 王杨, 黄诚, 李平. 基于图卷积网络的文本分割模型[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3692-3699.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021101768
信息类型 | Wikicities | Wikielements |
---|---|---|
文档数 | 100 | 118 |
段落数 | 6 670 | 2 810 |
单词数 | 492 402 | 191 762 |
文本块长度 | 3.33±3.05 | 5.15±4.57 |
每个文档中文本块数 | 6.82±2.57 | 12.2±2.79 |
Tab.1 Statistics of text segmentation datasets
信息类型 | Wikicities | Wikielements |
---|---|---|
文档数 | 100 | 118 |
段落数 | 6 670 | 2 810 |
单词数 | 492 402 | 191 762 |
文本块长度 | 3.33±3.05 | 5.15±4.57 |
每个文档中文本块数 | 6.82±2.57 | 12.2±2.79 |
模型 | 学习方式 | Wikicities | Wikielements |
---|---|---|---|
Random | 无监督 | 47.14 | 50.08 |
文献[ | 无监督 | 22.10 | 20.10 |
GraphSeg | 无监督 | 39.95 | 49.12 |
WIKI‑727K | 有监督 | 19.68 | 41.63 |
TLT‑TS | 有监督 | 19.21 | 20.33 |
CATS | 有监督 | 16.85 | 18.41 |
TS‑GCN | 有监督 | 19.13 | 18.03 |
Tab. 2 Comparison of Pk value among different models for text segmentation task
模型 | 学习方式 | Wikicities | Wikielements |
---|---|---|---|
Random | 无监督 | 47.14 | 50.08 |
文献[ | 无监督 | 22.10 | 20.10 |
GraphSeg | 无监督 | 39.95 | 49.12 |
WIKI‑727K | 有监督 | 19.68 | 41.63 |
TLT‑TS | 有监督 | 19.21 | 20.33 |
CATS | 有监督 | 16.85 | 18.41 |
TS‑GCN | 有监督 | 19.13 | 18.03 |
预训练词向量 | Wikicities | Wikielements |
---|---|---|
GloVe-300d | 19.62 | 18.60 |
crawl-300d | 19.90 | 18.45 |
wiki-news-300d | 19.13 | 18.03 |
Tab. 3 Segmentation results under different pre-training word vectors
预训练词向量 | Wikicities | Wikielements |
---|---|---|
GloVe-300d | 19.62 | 18.60 |
crawl-300d | 19.90 | 18.45 |
wiki-news-300d | 19.13 | 18.03 |
注意力计算方法类型 | Wikicities | Wikielements |
---|---|---|
未采用注意力 | 22.07 | 19.60 |
欧氏距离注意力 | 20.45 | 18.60 |
语义相似性注意力 | 19.13 | 18.03 |
Tab. 4 Segmentation results of different attention calculation methods
注意力计算方法类型 | Wikicities | Wikielements |
---|---|---|
未采用注意力 | 22.07 | 19.60 |
欧氏距离注意力 | 20.45 | 18.60 |
语义相似性注意力 | 19.13 | 18.03 |
1 | HEARST M A. TextTiling: segmenting text into multi-paragraph subtopic passages[J]. Computational Linguistics, 1997, 23(1): 33-64. |
2 | 秦兵,刘挺,李生. 多文档自动文摘综述[J]. 中文信息学报, 2005, 19(6): 13-20, 56. 10.3969/j.issn.1003-0077.2005.06.003 |
QIN B, LIU T, LI S. Survey of multi-document summarization[J]. Journal of Chinese Information Processing, 2005, 19(6): 13-20, 56. 10.3969/j.issn.1003-0077.2005.06.003 | |
3 | ANGHELUTA R, DE BUSSER R, MOENS M F. The use of topic segmentation for automatic summarization[C]// Proceedings of the Association for Computational Linguistics 2002 Post-Conference Workshop on Automatic Summarization. Stroudsburg, PA: Association for Computational Linguistics, 2002: 1421-1426. |
4 | HUANG X J, PENG F C, SCHUURMANS D, et al. Applying machine learning to text segmentation for information retrieval[J]. Information Retrieval, 2003, 6(3/4): 333-362. 10.1023/a:1026028229881 |
5 | SHTEKH G, KAZAKOVA P, NIKITINSKY N, et al. Exploring influence of topic segmentation on information retrieval quality[C]// Proceedings of the 2018 International Conference on Internet Science, LNCS 11193. Cham: Springer, 2018: 131-140. |
6 | 马长林,王涛. 基于相关主题模型和多层知识表示的文本情感分析[J]. 郑州大学学报(理学版), 2021, 53(4): 30-35. |
MA C L, WANG T. Text sentiment analysis based on correlated topic model and multi-layer knowledge representation[J]. Journal of Zhengzhou University (Natural Science Edition), 2021, 53(4): 30-35. | |
7 | ZIRN C, GLAVAŠ G, NANNI F, et al. Classifying topics and detecting topic shifts in political manifestos[C]// Proceedings of the 2016 International Conference on the Advances in Computational Analysis of Political Text. Zagreb: University of Zagreb, 2016: 88-93. |
8 | MANUVINAKURIKE R, PAETZEL M, QU C, et al. Toward incremental dialogue act segmentation in fast-paced interactive dialogue systems[C]// Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Stroudsburg, PA: Association for Computational Linguistics, 2016: 252-262. 10.18653/v1/w16-3632 |
9 | ZHAO T Y, KAWAHARA T. Joint learning of dialog act segmentation and recognition in spoken dialog using neural networks[C]// Proceedings of the 18th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). [S.l.]: Asian Federation of Natural Language Processing, 2017: 704-712. 10.18653/v1/w18-5021 |
10 | VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. (2018-02-04) [2021-06-20].. |
11 | CHOI F Y Y. Advances in domain independent linear text segmentation[C]// Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2000: 26-33. |
12 | UTIYAMA M, ISAHARA H. A statistical model for domain-independent text segmentation[C]// Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2001: 499-506. 10.3115/1073012.1073076 |
13 | LI J, SUN A X, JOTY S. SegBot: a generic neural text segmentation model with pointer network[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2018: 4166-4172. 10.24963/ijcai.2018/579 |
14 | KOSHOREK O, COHEN A, MOR N, et al. Text segmentation as a supervised learning task[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 469-473. 10.18653/v1/n18-2075 |
15 | ARNOLD S, SCHNEIDER R, CUDRÉ-MAUROUX P, et al. SECTOR: a neural model for coherent topic segmentation and classification[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 169-184. 10.1162/tacl_a_00261 |
16 | BARROW J, JAIN R, MORARIU V, et al. A joint model for document segmentation and segment labeling [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 313-322. 10.18653/v1/2020.acl-main.29 |
17 | LUKASIK M, DADACHEV B, PAPINENI K, et al. Text segmentation by cross segment attention[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2020: 4707-4716. 10.18653/v1/2020.emnlp-main.380 |
18 | XING L Z, HACKINEN B, CARENINI G, et al. Improving context modeling in neural topic segmentation [C]// Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2020: 626-636. |
19 | WU B, WEI B F, LIU J, et al. Faceted text segmentation via multitask learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(9): 3846-3857. 10.1109/tnnls.2020.3015996 |
20 | GLAVAŠ G, NANNI F, PONZETTO S P. Unsupervised text segmentation using semantic relatedness graphs[C]// Proceedings of the 5th Joint Conference on Lexical and Computational Semantics. Stroudsburg, PA: Association for Computational Linguistics, 2016: 125-130. 10.18653/v1/s16-2016 |
21 | YAO L, MAO C S, LUO Y. Graph convolutional networks for text classification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019: 7370-7377. 10.1609/aaai.v33i01.33017370 |
22 | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22) [2021-06-20].. 10.48550/arXiv.1609.02907 |
23 | CHEN H, BRANAVAN S R K, BARZILAY R, et al. Global models of document structure using latent permutations[C]// Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2009: 371-379. 10.3115/1620754.1620808 |
24 | BEEFERMAN D, BERGER A, LAFFERTY J. Statistical models for text segmentation[J]. Machine Learning, 1999, 34(1/2/3): 177-210. 10.1023/a:1007506220214 |
25 | GLAVAŠ G, SOMASUNDARAN S. Two-level transformer and auxiliary coherence modeling for improved text segmentation[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020:7797-7804. 10.1609/aaai.v34i05.6284 |
[1] | Haiyun WEI, Qianying ZHENG, Jinling YU. Motion blurred image restoration algorithm based on multi-scale network [J]. Journal of Computer Applications, 2022, 42(9): 2838-2844. |
[2] | Wentao ZHANG, Yuanyu WANG, Saize LI. Depth estimation model of single haze image based on conditional generative adversarial network [J]. Journal of Computer Applications, 2022, 42(9): 2865-2875. |
[3] | Yaoshun LI, Lizhi LIU. Lightweight network for rebar detection with attention mechanism [J]. Journal of Computer Applications, 2022, 42(9): 2900-2908. |
[4] | Hongjun HENG, Tianbao XU. Attention sentiment analysis model based on multi-scale convolution and gating mechanism [J]. Journal of Computer Applications, 2022, 42(9): 2674-2679. |
[5] | Zanxia QIANG, Xianfu BAO. Residual attention deraining network based on convolutional long short-term memory [J]. Journal of Computer Applications, 2022, 42(9): 2858-2864. |
[6] | Xudong HOU, Fei TENG, Yi ZHANG. Medical named entity recognition model based on deep auto-encoding [J]. Journal of Computer Applications, 2022, 42(9): 2686-2692. |
[7] | Jinghan YIN, Shaojun QU, Zekai YAO, Xuanye HU, Xiaoyu QIN, Pujing HUA. Traffic sign recognition model in haze weather based on YOLOv5 [J]. Journal of Computer Applications, 2022, 42(9): 2876-2884. |
[8] | Guozhong LI, Ya CUI, Yixin EMU, Ling HE, Yuanyuan LI, Xi XIONG. Automatic detection algorithm for attention deficit/hyperactivity disorder based on speech pause and flatness [J]. Journal of Computer Applications, 2022, 42(9): 2917-2925. |
[9] | Jinghu LI, Qianguo XING, Xiangyang ZHENG, Lin LI, Lili WANG. Noctiluca scintillans red tide extraction method from UAV images based on deep learning [J]. Journal of Computer Applications, 2022, 42(9): 2969-2974. |
[10] | Jiaxuan WEI, Shikang DU, Zhixuan YU, Ruisheng ZHANG. Review of white-box adversarial attack technologies in image classification [J]. Journal of Computer Applications, 2022, 42(9): 2732-2741. |
[11] | Yuefeng LIU, Xiaoyan ZHANG, Wei GUO, Haodong BIAN, Yingjie HE. Remaining useful life prediction method of aero-engine based on optimized hybrid model [J]. Journal of Computer Applications, 2022, 42(9): 2960-2968. |
[12] | Kai WEN, Weiwei TANG, Junchen XIONG. Real-time segmentation algorithm based on attention mechanism and effective factorized convolution [J]. Journal of Computer Applications, 2022, 42(9): 2659-2666. |
[13] | Minghui WU, Guangjie ZHANG, Canghong JIN. Time series prediction model based on multimodal information fusion [J]. Journal of Computer Applications, 2022, 42(8): 2326-2332. |
[14] | Zhenhu LYU, Xinzheng XU, Fangyan ZHANG. Lightweight attention mechanism module based on squeeze and excitation [J]. Journal of Computer Applications, 2022, 42(8): 2353-2360. |
[15] | Yajiao LIU, Haitao YU, Jiang WANG, Lifeng YU, Chunhui ZHANG. Surface detection algorithm of multi-shape small defects for section steel based on deep learning [J]. Journal of Computer Applications, 2022, 42(8): 2601-2608. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||