Visually guided word segmentation and part of speech tagging

doi:10.11772/j.issn.1001-9081.2024050627

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (5): 1488-1495.DOI: 10.11772/j.issn.1001-9081.2024050627

• Artificial intelligence • Previous Articles

Visually guided word segmentation and part of speech tagging

Haiyan TIAN, Saihao HUANG, Dong ZHANG(), Shoushan LI

School of Computer Science and Technology，Soochow University，Suzhou Jiangsu 215006，China

Received:2024-05-17 Revised:2024-10-14 Accepted:2024-10-24 Online:2024-11-01 Published:2025-05-10
Contact: Dong ZHANG
About author:TIAN Haiyan， born in 2000， M. S. candidate. Her research interests include multi-modal analysis.
HUANG Saihao， born in 1999， M. S. His research interests include text-to-SQL， multi-modal analysis
ZHANG Dong， born in 1991， Ph. D.， associate professor. His research interests include sentiment analysis， multi-modal analysis.
LI Shoushan， born in 1980， Ph. D.， professor. His research interests include sentiment analysis， multi-modal analysis.
Supported by:
National Natural Science Foundation of China(62206193)

视觉指导的分词和词性标注

田海燕, 黄赛豪, 张栋(), 李寿山

苏州大学计算机科学与技术学院，江苏苏州 215006

通讯作者: 张栋
作者简介:田海燕（2000—），女，江苏淮安人，硕士研究生，主要研究方向：多模态分析
黄赛豪（1999—），男，江苏南通人，硕士，主要研究方向：text‑to‑SQL、多模态分析
张栋（1991—），男，江苏扬州人，副教授，博士，主要研究方向：情感分析、多模态分析
李寿山（1980—），男，江苏扬州人，教授，博士，主要研究方向：情感分析、多模态分析。
基金资助:
国家自然科学基金资助项目(62206193)

Abstract

Abstract:

Chinese Word Segmentation （WS） and Part-Of-Speech （POS） tagging can assist other downstream tasks such as knowledge graph construction and sentiment analysis effectively. Existing work typically only uses pure-text information for WS and POS tagging. However， the Web also contains many associated image and video information. Therefore， efforts were made to mine associated clues from this visual information to aid Chinese WS and POS tagging. Firstly， a series of detailed annotation standards were established， and with WS and POS tagging， a multimodal dataset VG-Weibo was annotated using the text and image content from Weibo posts. Then， two multimodal information fusion methods， VGTD （Visually Guided Two-stage Decoding model） and VGCD （Visually Guided Collapsed Decoding model）， with different decoding mechanisms were proposed to accomplish this joint task of WS and POS tagging. Among the above， in VGTD method， a cross-attention mechanism was adopted to fuse textual and visual information and a two-stage decoding strategy was employed to firstly predict possible word spans and then predict the corresponding tags； in VGCD method， a cross-attention mechanism was also utilized to fuse textual and visual information and more appropriate Chinese representation and a collapsed decoding strategy were used. Experimental results on VG-Weibo test set demonstrate that on WS and POS tagging tasks， the F1 scores of VGTD method are improved by 0.18 and 0.22 percentage points， respectively， compared to those of the traditional pure-text method's Two-stage Decoding model （TD）； the F1 scores of VGCD method are improved by 0.25 and 0.55 percentage points， respectively， compared to the traditional pure-text method's Collapsed Decoding model （CD）. It can be seen that both VGTD and VGCD methods can utilize visual information effectively to enhance the performance of WS and POS tagging.

Key words: Word Segmentation (WS), Part-Of-Speech (POS) tagging, multimodal data, visual information, social media

摘要：

中文分词（WS）和词性（POS）标注可以有效帮助其他下游任务，如知识图谱创建和情感分析。但现有工作通常仅利用纯文本信息进行WS和POS标注，忽略了网络中许多与之相关的图片和视频信息。针对这一现状，尝试从这些视觉信息中挖掘相关线索，以帮助进行中文WS和POS标注。首先，制定一系列详细的数据标注规范，并基于微博推文中的文本和图像内容，使用WS和POS标签标注了一个多模态数据集VG-Weibo；其次，提出2种具有不同解码机制的多模态信息融合方法：VGTD（Visually Guided Two-stage Decoding model）和VGCD（Visually Guided Collapsed Decoding model）完成联合WS和POS标注的任务。其中：VGTD方法采用交叉注意力机制融合文本和图像信息，并通过两阶段解码策略，先预测可能的词语跨度，再预测相应的标签；VGCD方法也采用交叉注意力机制融合文本和图像信息，并采用了更适当的中文表示以及合并解码策略。在VG-Weibo测试集上的实验结果表明，在WS和POS标注任务上，VGTD方法的F1得分比传统的纯文本方法的两阶段解码模型（TD）分别提升了0.18和0.22个百分点；VGCD方法的F1得分比传统的纯文本方法的合并解码模型（CD）分别提升了0.25和0.55个百分点。可见，VGTD和VGCD方法都能有效利用视觉信息提升WS和POS标注的性能。

关键词: 分词, 词性标注, 多模态数据, 视觉信息, 社交媒体

CLC Number:

TP391.1

Haiyan TIAN, Saihao HUANG, Dong ZHANG, Shoushan LI. Visually guided word segmentation and part of speech tagging[J]. Journal of Computer Applications, 2025, 45(5): 1488-1495.

田海燕, 黄赛豪, 张栋, 李寿山. 视觉指导的分词和词性标注[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1488-1495.

Figures/Tables 11

References 37

1	XU N. Chinese word segmentation as character tagging［J］. International Journal of Computational Linguistics and Chinese Language Processing， 2003， 8（1）： 29-48.
2	DUAN S， ZHAO H. Attention is all you need for Chinese word segmentation［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2020： 3862-3872.
3	ZHENG X， CHEN H， XU T. Deep learning for Chinese word segmentation and POS tagging［C］// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2013： 647-657.
4	QIAN T， ZHANG Y， ZHANG M， et al. A transition-based model for joint segmentation， POS-tagging and normalization［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 1837-1846.
5	CHEN X， QIU X， HUANG X. A feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. San Francisco： Morgan Kaufmann Publishers Inc.， 2017： 3960-3966.
6	ZHANG M， YU N， FU G. A simple and effective neural model for joint word segmentation and POS tagging［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（9）： 1528-1538.
7	ZHAO L， ZHANG A， LIU Y， et al. Encoding multi-granularity structural information for joint Chinese word segmentation and POS tagging［J］. Pattern Recognition Letters， 2020， 138： 163-169.
8	NIAN F， LI J， DIAO H， et al. Weibo core user mining and propagation scale predicting［J］. Chaos， Solitons and Fractals， 2022， 156： No.111869.
9	XIAO S， CHEN G， ZHANG C， et al. Complementary or substitutive？ a novel deep learning method to leverage text-image interactions for multimodal review helpfulness prediction［J］. Expert Systems with Applications， 2022， 208： No.118138.
10	ZHANG K， ZHANG B， TENG Z. Leveraging graph to improve lexicon enhanced Chinese sequence labelling［C］// Proceedings of the IEEE 13th International Symposium on Parallel Architectures， Algorithms， and Programming. Piscataway： IEEE， 2022： 1-6.
11	HAN W， CHEN H， GELBUKH A， et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis［C］// Proceedings of the 2021 International Conference on Multimodal Interaction. New York： ACM， 2021： 6-15.
12	朱艳辉，刘璟，徐叶强，等. 基于条件随机场的中文领域分词研究［J］. 计算机工程与应用， 2016， 52（15）： 97-100.
	ZHU Y H， LIU J， XU Y Q， et al. Chinese word segmentation research based on conditional random field［J］. Computer Engineering and Applications， 2016， 52（5）： 97-100.
13	SHAO Y， HARDMEIER C， TIEDEMANN J， et al. Character-based joint segmentation and POS tagging for Chinese using bidirectional RNN-CRF［C］// Proceedings of the 8th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2017： 173-183.
14	李雅昆，潘晴， WANG E X.基于改进的多层BLSTM的中文分词和标点预测［J］.计算机应用，2018，38（5）：1278-1282， 1314.
	LI Y K， PAN Q， WANG E X. Joint Chinese word segmentation and punctuation prediction based on improved multilayer BLSTM network［J］. Journal of Computer Applications， 2018， 38（5）： 1278-1282， 1314.
15	TIAN Y， SONG Y， XIA F. Joint Chinese word segmentation and part-of-speech tagging via multi-channel attention of character n-grams［C］// Proceedings of the 28th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2020： 2073-2084.
16	KE Z， SHI L， SUN S， et al. Pre-training with meta learning for Chinese word segmentation［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2021： 5514-5523.
17	HE R， CAI S， MING Z， et al. Weighted self distillation for Chinese word segmentation［C］// Findings of the Association for Computational Linguistics： ACL 2022. Stroudsburg： ACL， 2022： 1757-1770.
18	LI D， ZHAO R， TAN F. CWSeg： an efficient and general approach to Chinese word segmentation［C］// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics （Volume 5： Industry Track）. Stroudsburg： ACL， 2023： 1-10.
19	CHANG B， YUAN Y， LI B， et al. A joint model of automatic word segmentation and part-of-speech tagging for ancient classical texts based on radicals［C］// Proceedings of 2003 Ancient Language Processing Workshop. Shoumen： INCOMA Ltd.， 2023： 122-132.
20	HUANG K， YU H， LIU J， et al. Lexicon-based graph convolutional network for Chinese word segmentation［C］// Findings of the Association for Computational Linguistics： EMNLP 2021. Stroudsburg： ACL， 2021： 2908-2917.
21	夏飞，陈帅琦，华珉，等. 基于改进BERT的电力领域中文分词方法［J］. 计算机应用， 2023， 43（12）： 3711-3718.
	XIA F， CHEN S Q， HUA M， et al. Chinese word segmentation method in electric power domain based on improved BERT［J］. Journal of Computer Applications， 2023， 43（12）： 3711-3718.
22	FENG S， LI P. Ancient Chinese word segmentation and part-of-speech tagging using distant supervision［C］// Proceedings of the 2023 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2023： 1-5.
23	ZHANG D， HU Z， LI S， et al. More than text： multi-modal Chinese word segmentation［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing （Volume 2： Short Papers）. Stroudsburg： ACL， 2021： 550-557.
24	EMERSON T. The second international Chinese word segmentation bakeoff［C］// Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing. ［S.l.］： Asian Federation of Natural Language Processing， 2005： 123-133.
25	XUE N， XIA F， CHIOU F D， et al. The Penn Chinese TreeBank： phrase structure annotation of a large corpus［J］. Natural Language Engineering， 2005， 11（2）： 207-238.
26	JIN G， CHEN X. The fourth international Chinese language processing bakeoff： Chinese word segmentation， named entity recognition， and Chinese POS tagging［C］// Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing. ［S.l.］： Asian Federation of Natural Language Processing， 2008： 69-81.
27	ZEMAN D， POPEL M， STRAKA M， et al. CoNLL 2017 shared task： multilingual parsing from raw text to universal dependencies［C］// Proceedings of the CoNLL 2017 Shared Task： Multilingual Parsing from Raw Text to Universal Dependencies. Stroudsburg： ACL， 2017： 1-19.
28	俞士汶，段慧明，朱学锋，等.北京大学现代汉语语料库基本加工规范［J］.中文信息学报，2002，16（5）：49-64.
	YU S W， DUAN H M， ZHU X F， et al. The basic processing of contemporary Chinese corpus at Peking University： SPECIFICATION［J］. Journal of Chinese Information Processing， 2002， 16（5）： 49-64.
29	XIA F. The segmentation guidelines for the Penn Chinese Treebank （3.0）［EB/OL］. ［2024-12-23］..
30	来斯惟，徐立恒，陈玉博，等. 基于表示学习的中文分词算法探索［J］. 中文信息学报， 2013， 27（5）： 8-14.
	LAI S W， XU L H， CHEN Y B， et al. Chinese word segment based on character representation learning［J］. Journal of Chinese Information Processing， 2013， 27（5）： 8-14.
31	LOU C， YANG S， TU K. Nested named entity recognition as latent lexicalized constituency parsing［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2022： 6183-6198.
32	EISNER J， SATTA G. Efficient parsing for bilexical context-free grammars and head automaton grammars［C］// Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 1999： 457-464.
33	ZHANG Y， LI Z， ZHANG M. Efficient second-order TreeCRF for neural dependency parsing［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 3295-3305.
34	FU Y， TAN C， CHEN M， et al. Nested named entity recognition with partially-observed TreeCRFs［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 12839-12847.
35	SUN Z， LI X， SUN X， et al. ChineseBERT： Chinese pretraining enhanced by glyph and pinyin information［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 2065-2075.
36	RADFORD A， KIM J W， HALLACY C， et al. Learning transferable visual models from natural language supervision［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 8748-8763.
37	POWERS D M W. Evaluation： from precision， recall and F-measure to ROC， informedness， markedness and correlation［J］. Journal of Machine Learning Technologies， 2011， 2（1）： 37-63.

数据集	模态	词语数/10⁶	字符数/10⁶	来源
VG-Weibo	T+V	0.05	0.12	新浪微博
MSRA^［24］	T	2.48	4.23	SIGHAN 2005
PKU^［24］	T	1.21	2.00	SIGHAN 2005
CTB5^［25］	T	0.51	0.83	新闻、杂志
CTB6^［25］	T	0.78	1.29	新闻、杂志、广播
NCC^［26］	T	0.63	1.00	SIGHAN 2008
UD^［27］	T	0.12	0.20	CoNLL 2017

数据集	模态	词语数/10⁶	字符数/10⁶	来源
VG-Weibo	T+V	0.05	0.12	新浪微博
MSRA^［24］	T	2.48	4.23	SIGHAN 2005
PKU^［24］	T	1.21	2.00	SIGHAN 2005
CTB5^［25］	T	0.51	0.83	新闻、杂志
CTB6^［25］	T	0.78	1.29	新闻、杂志、广播
NCC^［26］	T	0.63	1.00	SIGHAN 2008
UD^［27］	T	0.12	0.20	CoNLL 2017

数据集	样本数	句子数
训练集	1 400	1 884
验证集	200	264
测试集	400	543

数据集	样本数	句子数
训练集	1 400	1 884
验证集	200	264
测试集	400	543

方法	WS_Dev			WS_Test			POS_Dev			POS_Test
方法	P	R	F₁	P	R	F₁	P	R	F₁	P	R	F₁
TMIN^［23］	93.84	93.63	93.73	93.89	94.14	94.02	88.45	88.29	88.37	89.20	89.24	89.22
TD^［31］	93.39	93.48	93.44	93.71	93.86	93.79	88.50	88.19	88.34	89.28	88.99	89.13
TD+Vision	93.59	93.69	93.64	93.68	93.89	93.78	88.78	88.46	88.62	89.43	89.07	89.25
VGTD	93.87	93.73	93.80	94.01	93.93	93.97	88.80	88.43	88.62	89.44	89.26	89.35
CD^［35］	94.02	93.22	93.62	93.34	94.04	94.19	88.85	87.78	88.31	89.31	88.38	88.84
CD+Vision	94.06	92.97	93.51	94.43	93.64	94.03	87.78	86.98	87.38	88.42	87.68	88.05
VGCD	94.19	93.49	93.84	94.66	94.21	94.44^†	89.17	88.22	88.69	89.75	89.03	89.39^†

Visually guided word segmentation and part of speech tagging

视觉指导的分词和词性标注

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 37

Related Articles 11

Recommended Articles

Metrics

模型	WS_Test		POS_Test
模型	F₁	R_OOV	F₁	R_OOV
TMIN^［23］	94.02	81.07	89.22	70.74
TD^［31］	93.79	81.52	89.13	71.29
TD+Vision	93.78	81.34	89.25	71.45
VGTD	93.97	80.48	89.35	72.03
CD^［35］	94.19	82.53	88.84	73.13
CD+Vision	94.03	82.30	88.05	73.66
VGCD	94.44	82.79	89.39	74.51

方法	以太坊eth行情分析	背影照好可爱	正放大图片赏颜呢
方法
TMIN	以太坊/eth/行情/分析	背影照/好/可爱	正/放大/图片/赏/颜/呢
TMIN	NR/NR/NN/NN	NN/AD/VA	AD/VV/NN/VV/NN/SP
CD	以太坊/eth/行情/分析	背影照/好/可爱	正/放/大/图片/赏/颜/呢
CD	NR//NN/NN	NN/AD/VA	AD/VV/JJ/NN/VV/NN/SP
CD+Vision	以太坊/eth/行情/分析	背影/照/好/可爱	正/放/大/图片/赏/颜/呢
CD+Vision	NR/NN/NN	NN/NN/AD/VA	AD/VV/JJ/NN/VV/NN/SP
VGCD	以太坊/eth/行情/分析	背影/照/好/可爱	正/放大/图片/赏/颜/呢
VGCD	NR/NR/NN/NN	NN/NN/AD/VA	AD/VV/NN/VV/NN/SP

[1]	Wenbin HU, Tianxiang CAI, Tianle HAN, Zhaoman ZHONG, Changxia MA. Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis [J]. Journal of Computer Applications, 2025, 45(5): 1432-1438.
[2]	Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data [J]. Journal of Computer Applications, 2024, 44(8): 2437-2445.
[3]	Caiqian BAO, Jianmin XU, Guofang ZHANG. Extended belief network recommendation model based on user dynamic interaction behavior [J]. Journal of Computer Applications, 2023, 43(4): 1115-1121.
[4]	Xiaofei SUN, Jingyuan ZHU, Bin CHEN, Hengzhi YOU. Virtual screening of drug synthesis reaction based on multimodal data fusion [J]. Journal of Computer Applications, 2023, 43(2): 622-629.
[5]	Rui XIAO, Mingyi LIU, Zhiying TU, Zhongjie WANG. Personal event detection method based on text mining in social media [J]. Journal of Computer Applications, 2022, 42(11): 3513-3519.
[6]	MENG Xiangrui, YANG Wenzhong, WANG Ting. Survey of sentiment analysis based on image and text fusion [J]. Journal of Computer Applications, 2021, 41(2): 307-317.
[7]	GUO Kexin, ZHANG Yuxiang. Visual-textual sentiment analysis method based on multi-level spatial attention [J]. Journal of Computer Applications, 2021, 41(10): 2835-2841.
[8]	LI Shanshan, YANG Wenzhong, WANG Ting, WANG Lihua. Survey of sub-topic detection technology based on internet social media [J]. Journal of Computer Applications, 2020, 40(6): 1565-1573.
[9]	CAI Guoyong, HE Xinhao, CHU Yangyang. Visual sentiment analysis by combining global and local regions of image [J]. Journal of Computer Applications, 2019, 39(8): 2181-2185.
[10]	LU Zhigang, SUN Yadan. Multidimensional collaborative intelligence recommendation based on social media context [J]. Journal of Computer Applications, 2016, 36(3): 740-745.
[11]	CAI Guoyong, XIA Binbin. Multimedia sentiment analysis based on convolutional neural network [J]. Journal of Computer Applications, 2016, 36(2): 428-431.