视觉指导的分词和词性标注

doi:10.11772/j.issn.1001-9081.2024050627

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (5): 1488-1495.DOI: 10.11772/j.issn.1001-9081.2024050627

• 人工智能 • 上一篇

视觉指导的分词和词性标注

田海燕, 黄赛豪, 张栋(), 李寿山

苏州大学计算机科学与技术学院，江苏苏州 215006

收稿日期:2024-05-17 修回日期:2024-10-14 接受日期:2024-10-24 发布日期:2024-11-01 出版日期:2025-05-10
通讯作者: 张栋
作者简介:田海燕（2000—），女，江苏淮安人，硕士研究生，主要研究方向：多模态分析
黄赛豪（1999—），男，江苏南通人，硕士，主要研究方向：text‑to‑SQL、多模态分析
张栋（1991—），男，江苏扬州人，副教授，博士，主要研究方向：情感分析、多模态分析
李寿山（1980—），男，江苏扬州人，教授，博士，主要研究方向：情感分析、多模态分析。
基金资助:
国家自然科学基金资助项目(62206193)

Visually guided word segmentation and part of speech tagging

Haiyan TIAN, Saihao HUANG, Dong ZHANG(), Shoushan LI

School of Computer Science and Technology，Soochow University，Suzhou Jiangsu 215006，China

Received:2024-05-17 Revised:2024-10-14 Accepted:2024-10-24 Online:2024-11-01 Published:2025-05-10
Contact: Dong ZHANG
About author:TIAN Haiyan， born in 2000， M. S. candidate. Her research interests include multi-modal analysis.
HUANG Saihao， born in 1999， M. S. His research interests include text-to-SQL， multi-modal analysis
ZHANG Dong， born in 1991， Ph. D.， associate professor. His research interests include sentiment analysis， multi-modal analysis.
LI Shoushan， born in 1980， Ph. D.， professor. His research interests include sentiment analysis， multi-modal analysis.
Supported by:
National Natural Science Foundation of China(62206193)

摘要/Abstract

摘要：

中文分词（WS）和词性（POS）标注可以有效帮助其他下游任务，如知识图谱创建和情感分析。但现有工作通常仅利用纯文本信息进行WS和POS标注，忽略了网络中许多与之相关的图片和视频信息。针对这一现状，尝试从这些视觉信息中挖掘相关线索，以帮助进行中文WS和POS标注。首先，制定一系列详细的数据标注规范，并基于微博推文中的文本和图像内容，使用WS和POS标签标注了一个多模态数据集VG-Weibo；其次，提出2种具有不同解码机制的多模态信息融合方法：VGTD（Visually Guided Two-stage Decoding model）和VGCD（Visually Guided Collapsed Decoding model）完成联合WS和POS标注的任务。其中：VGTD方法采用交叉注意力机制融合文本和图像信息，并通过两阶段解码策略，先预测可能的词语跨度，再预测相应的标签；VGCD方法也采用交叉注意力机制融合文本和图像信息，并采用了更适当的中文表示以及合并解码策略。在VG-Weibo测试集上的实验结果表明，在WS和POS标注任务上，VGTD方法的F1得分比传统的纯文本方法的两阶段解码模型（TD）分别提升了0.18和0.22个百分点；VGCD方法的F1得分比传统的纯文本方法的合并解码模型（CD）分别提升了0.25和0.55个百分点。可见，VGTD和VGCD方法都能有效利用视觉信息提升WS和POS标注的性能。

关键词: 分词, 词性标注, 多模态数据, 视觉信息, 社交媒体

Abstract:

Chinese Word Segmentation （WS） and Part-Of-Speech （POS） tagging can assist other downstream tasks such as knowledge graph construction and sentiment analysis effectively. Existing work typically only uses pure-text information for WS and POS tagging. However， the Web also contains many associated image and video information. Therefore， efforts were made to mine associated clues from this visual information to aid Chinese WS and POS tagging. Firstly， a series of detailed annotation standards were established， and with WS and POS tagging， a multimodal dataset VG-Weibo was annotated using the text and image content from Weibo posts. Then， two multimodal information fusion methods， VGTD （Visually Guided Two-stage Decoding model） and VGCD （Visually Guided Collapsed Decoding model）， with different decoding mechanisms were proposed to accomplish this joint task of WS and POS tagging. Among the above， in VGTD method， a cross-attention mechanism was adopted to fuse textual and visual information and a two-stage decoding strategy was employed to firstly predict possible word spans and then predict the corresponding tags； in VGCD method， a cross-attention mechanism was also utilized to fuse textual and visual information and more appropriate Chinese representation and a collapsed decoding strategy were used. Experimental results on VG-Weibo test set demonstrate that on WS and POS tagging tasks， the F1 scores of VGTD method are improved by 0.18 and 0.22 percentage points， respectively， compared to those of the traditional pure-text method's Two-stage Decoding model （TD）； the F1 scores of VGCD method are improved by 0.25 and 0.55 percentage points， respectively， compared to the traditional pure-text method's Collapsed Decoding model （CD）. It can be seen that both VGTD and VGCD methods can utilize visual information effectively to enhance the performance of WS and POS tagging.

Key words: Word Segmentation (WS), Part-Of-Speech (POS) tagging, multimodal data, visual information, social media

中图分类号:

TP391.1

田海燕, 黄赛豪, 张栋, 李寿山. 视觉指导的分词和词性标注[J]. 计算机应用, 2025, 45(5): 1488-1495.

Haiyan TIAN, Saihao HUANG, Dong ZHANG, Shoushan LI. Visually guided word segmentation and part of speech tagging[J]. Journal of Computer Applications, 2025, 45(5): 1488-1495.

图/表 11

参考文献 37

1	XU N. Chinese word segmentation as character tagging［J］. International Journal of Computational Linguistics and Chinese Language Processing， 2003， 8（1）： 29-48.
2	DUAN S， ZHAO H. Attention is all you need for Chinese word segmentation［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2020： 3862-3872.
3	ZHENG X， CHEN H， XU T. Deep learning for Chinese word segmentation and POS tagging［C］// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2013： 647-657.
4	QIAN T， ZHANG Y， ZHANG M， et al. A transition-based model for joint segmentation， POS-tagging and normalization［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 1837-1846.
5	CHEN X， QIU X， HUANG X. A feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. San Francisco： Morgan Kaufmann Publishers Inc.， 2017： 3960-3966.
6	ZHANG M， YU N， FU G. A simple and effective neural model for joint word segmentation and POS tagging［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（9）： 1528-1538.
7	ZHAO L， ZHANG A， LIU Y， et al. Encoding multi-granularity structural information for joint Chinese word segmentation and POS tagging［J］. Pattern Recognition Letters， 2020， 138： 163-169.
8	NIAN F， LI J， DIAO H， et al. Weibo core user mining and propagation scale predicting［J］. Chaos， Solitons and Fractals， 2022， 156： No.111869.
9	XIAO S， CHEN G， ZHANG C， et al. Complementary or substitutive？ a novel deep learning method to leverage text-image interactions for multimodal review helpfulness prediction［J］. Expert Systems with Applications， 2022， 208： No.118138.
10	ZHANG K， ZHANG B， TENG Z. Leveraging graph to improve lexicon enhanced Chinese sequence labelling［C］// Proceedings of the IEEE 13th International Symposium on Parallel Architectures， Algorithms， and Programming. Piscataway： IEEE， 2022： 1-6.
11	HAN W， CHEN H， GELBUKH A， et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis［C］// Proceedings of the 2021 International Conference on Multimodal Interaction. New York： ACM， 2021： 6-15.
12	朱艳辉，刘璟，徐叶强，等. 基于条件随机场的中文领域分词研究［J］. 计算机工程与应用， 2016， 52（15）： 97-100.
	ZHU Y H， LIU J， XU Y Q， et al. Chinese word segmentation research based on conditional random field［J］. Computer Engineering and Applications， 2016， 52（5）： 97-100.
13	SHAO Y， HARDMEIER C， TIEDEMANN J， et al. Character-based joint segmentation and POS tagging for Chinese using bidirectional RNN-CRF［C］// Proceedings of the 8th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2017： 173-183.
14	李雅昆，潘晴， WANG E X.基于改进的多层BLSTM的中文分词和标点预测［J］.计算机应用，2018，38（5）：1278-1282， 1314.
	LI Y K， PAN Q， WANG E X. Joint Chinese word segmentation and punctuation prediction based on improved multilayer BLSTM network［J］. Journal of Computer Applications， 2018， 38（5）： 1278-1282， 1314.
15	TIAN Y， SONG Y， XIA F. Joint Chinese word segmentation and part-of-speech tagging via multi-channel attention of character n-grams［C］// Proceedings of the 28th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2020： 2073-2084.
16	KE Z， SHI L， SUN S， et al. Pre-training with meta learning for Chinese word segmentation［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2021： 5514-5523.
17	HE R， CAI S， MING Z， et al. Weighted self distillation for Chinese word segmentation［C］// Findings of the Association for Computational Linguistics： ACL 2022. Stroudsburg： ACL， 2022： 1757-1770.
18	LI D， ZHAO R， TAN F. CWSeg： an efficient and general approach to Chinese word segmentation［C］// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics （Volume 5： Industry Track）. Stroudsburg： ACL， 2023： 1-10.
19	CHANG B， YUAN Y， LI B， et al. A joint model of automatic word segmentation and part-of-speech tagging for ancient classical texts based on radicals［C］// Proceedings of 2003 Ancient Language Processing Workshop. Shoumen： INCOMA Ltd.， 2023： 122-132.
20	HUANG K， YU H， LIU J， et al. Lexicon-based graph convolutional network for Chinese word segmentation［C］// Findings of the Association for Computational Linguistics： EMNLP 2021. Stroudsburg： ACL， 2021： 2908-2917.
21	夏飞，陈帅琦，华珉，等. 基于改进BERT的电力领域中文分词方法［J］. 计算机应用， 2023， 43（12）： 3711-3718.
	XIA F， CHEN S Q， HUA M， et al. Chinese word segmentation method in electric power domain based on improved BERT［J］. Journal of Computer Applications， 2023， 43（12）： 3711-3718.
22	FENG S， LI P. Ancient Chinese word segmentation and part-of-speech tagging using distant supervision［C］// Proceedings of the 2023 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2023： 1-5.
23	ZHANG D， HU Z， LI S， et al. More than text： multi-modal Chinese word segmentation［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing （Volume 2： Short Papers）. Stroudsburg： ACL， 2021： 550-557.
24	EMERSON T. The second international Chinese word segmentation bakeoff［C］// Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing. ［S.l.］： Asian Federation of Natural Language Processing， 2005： 123-133.
25	XUE N， XIA F， CHIOU F D， et al. The Penn Chinese TreeBank： phrase structure annotation of a large corpus［J］. Natural Language Engineering， 2005， 11（2）： 207-238.
26	JIN G， CHEN X. The fourth international Chinese language processing bakeoff： Chinese word segmentation， named entity recognition， and Chinese POS tagging［C］// Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing. ［S.l.］： Asian Federation of Natural Language Processing， 2008： 69-81.
27	ZEMAN D， POPEL M， STRAKA M， et al. CoNLL 2017 shared task： multilingual parsing from raw text to universal dependencies［C］// Proceedings of the CoNLL 2017 Shared Task： Multilingual Parsing from Raw Text to Universal Dependencies. Stroudsburg： ACL， 2017： 1-19.
28	俞士汶，段慧明，朱学锋，等.北京大学现代汉语语料库基本加工规范［J］.中文信息学报，2002，16（5）：49-64.
	YU S W， DUAN H M， ZHU X F， et al. The basic processing of contemporary Chinese corpus at Peking University： SPECIFICATION［J］. Journal of Chinese Information Processing， 2002， 16（5）： 49-64.
29	XIA F. The segmentation guidelines for the Penn Chinese Treebank （3.0）［EB/OL］. ［2024-12-23］..
30	来斯惟，徐立恒，陈玉博，等. 基于表示学习的中文分词算法探索［J］. 中文信息学报， 2013， 27（5）： 8-14.
	LAI S W， XU L H， CHEN Y B， et al. Chinese word segment based on character representation learning［J］. Journal of Chinese Information Processing， 2013， 27（5）： 8-14.
31	LOU C， YANG S， TU K. Nested named entity recognition as latent lexicalized constituency parsing［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2022： 6183-6198.
32	EISNER J， SATTA G. Efficient parsing for bilexical context-free grammars and head automaton grammars［C］// Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 1999： 457-464.
33	ZHANG Y， LI Z， ZHANG M. Efficient second-order TreeCRF for neural dependency parsing［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 3295-3305.
34	FU Y， TAN C， CHEN M， et al. Nested named entity recognition with partially-observed TreeCRFs［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 12839-12847.
35	SUN Z， LI X， SUN X， et al. ChineseBERT： Chinese pretraining enhanced by glyph and pinyin information［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 2065-2075.
36	RADFORD A， KIM J W， HALLACY C， et al. Learning transferable visual models from natural language supervision［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 8748-8763.
37	POWERS D M W. Evaluation： from precision， recall and F-measure to ROC， informedness， markedness and correlation［J］. Journal of Machine Learning Technologies， 2011， 2（1）： 37-63.

数据集	模态	词语数/10⁶	字符数/10⁶	来源
VG-Weibo	T+V	0.05	0.12	新浪微博
MSRA^［24］	T	2.48	4.23	SIGHAN 2005
PKU^［24］	T	1.21	2.00	SIGHAN 2005
CTB5^［25］	T	0.51	0.83	新闻、杂志
CTB6^［25］	T	0.78	1.29	新闻、杂志、广播
NCC^［26］	T	0.63	1.00	SIGHAN 2008
UD^［27］	T	0.12	0.20	CoNLL 2017

数据集	模态	词语数/10⁶	字符数/10⁶	来源
VG-Weibo	T+V	0.05	0.12	新浪微博
MSRA^［24］	T	2.48	4.23	SIGHAN 2005
PKU^［24］	T	1.21	2.00	SIGHAN 2005
CTB5^［25］	T	0.51	0.83	新闻、杂志
CTB6^［25］	T	0.78	1.29	新闻、杂志、广播
NCC^［26］	T	0.63	1.00	SIGHAN 2008
UD^［27］	T	0.12	0.20	CoNLL 2017

数据集	样本数	句子数
训练集	1 400	1 884
验证集	200	264
测试集	400	543

数据集	样本数	句子数
训练集	1 400	1 884
验证集	200	264
测试集	400	543

方法	WS_Dev			WS_Test			POS_Dev			POS_Test
方法	P	R	F₁	P	R	F₁	P	R	F₁	P	R	F₁
TMIN^［23］	93.84	93.63	93.73	93.89	94.14	94.02	88.45	88.29	88.37	89.20	89.24	89.22
TD^［31］	93.39	93.48	93.44	93.71	93.86	93.79	88.50	88.19	88.34	89.28	88.99	89.13
TD+Vision	93.59	93.69	93.64	93.68	93.89	93.78	88.78	88.46	88.62	89.43	89.07	89.25
VGTD	93.87	93.73	93.80	94.01	93.93	93.97	88.80	88.43	88.62	89.44	89.26	89.35
CD^［35］	94.02	93.22	93.62	93.34	94.04	94.19	88.85	87.78	88.31	89.31	88.38	88.84
CD+Vision	94.06	92.97	93.51	94.43	93.64	94.03	87.78	86.98	87.38	88.42	87.68	88.05
VGCD	94.19	93.49	93.84	94.66	94.21	94.44^†	89.17	88.22	88.69	89.75	89.03	89.39^†

视觉指导的分词和词性标注

Visually guided word segmentation and part of speech tagging

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 37

相关文章 15

编辑推荐

Metrics

模型	WS_Test		POS_Test
模型	F₁	R_OOV	F₁	R_OOV
TMIN^［23］	94.02	81.07	89.22	70.74
TD^［31］	93.79	81.52	89.13	71.29
TD+Vision	93.78	81.34	89.25	71.45
VGTD	93.97	80.48	89.35	72.03
CD^［35］	94.19	82.53	88.84	73.13
CD+Vision	94.03	82.30	88.05	73.66
VGCD	94.44	82.79	89.39	74.51

方法	以太坊eth行情分析	背影照好可爱	正放大图片赏颜呢
方法
TMIN	以太坊/eth/行情/分析	背影照/好/可爱	正/放大/图片/赏/颜/呢
TMIN	NR/NR/NN/NN	NN/AD/VA	AD/VV/NN/VV/NN/SP
CD	以太坊/eth/行情/分析	背影照/好/可爱	正/放/大/图片/赏/颜/呢
CD	NR//NN/NN	NN/AD/VA	AD/VV/JJ/NN/VV/NN/SP
CD+Vision	以太坊/eth/行情/分析	背影/照/好/可爱	正/放/大/图片/赏/颜/呢
CD+Vision	NR/NN/NN	NN/NN/AD/VA	AD/VV/JJ/NN/VV/NN/SP
VGCD	以太坊/eth/行情/分析	背影/照/好/可爱	正/放大/图片/赏/颜/呢
VGCD	NR/NR/NN/NN	NN/NN/AD/VA	AD/VV/NN/VV/NN/SP

[1]	胡文彬, 蔡天翔, 韩天乐, 仲兆满, 马常霞. 融合对比学习与情感分析的多模态反讽检测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1432-1438.
[2]	杨杰, 尼玛扎西, 仁青东主, 祁晋东, 才让东知. 基于预训练模型标记器重构的藏文分词系统[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1199-1204.
[3]	孙焕良, 王思懿, 刘俊岭, 许景科. 社交媒体数据中水灾事件求助信息提取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2437-2445.
[4]	鲍彩倩, 徐建民, 张国防. 基于用户动态交互行为扩展的信念网络推荐模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1115-1121.
[5]	孙晓飞, 朱静远, 陈斌, 游恒志. 融合多模态数据的药物合成反应的虚拟筛选[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 622-629.
[6]	夏飞, 陈帅琦, 华珉, 蒋碧鸿. 基于改进BERT的电力领域中文分词方法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3711-3718.
[7]	肖锐, 刘明义, 涂志莹, 王忠杰. 基于社交媒体文本挖掘的个人事件检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3513-3519.
[8]	孟祥瑞, 杨文忠, 王婷. 基于图文融合的情感分析研究综述[J]. 《计算机应用》唯一官方网站, 2021, 41(2): 307-317.
[9]	郭可心, 张宇翔. 基于多层次空间注意力的图文评论情感分析方法[J]. 计算机应用, 2021, 41(10): 2835-2841.
[10]	理姗姗, 杨文忠, 王婷, 王丽花. 基于网络社交媒体的子话题检测技术综述[J]. 计算机应用, 2020, 40(6): 1565-1573.
[11]	蔡国永, 贺歆灏, 储阳阳. 图像整体与局部区域嵌入的视觉情感分析[J]. 计算机应用, 2019, 39(8): 2181-2185.
[12]	李雅昆, 潘晴, Everett X. WANG. 基于改进的多层BLSTM的中文分词和标点预测[J]. 计算机应用, 2018, 38(5): 1278-1282.
[13]	蔡国永, 夏彬彬. 基于卷积神经网络的图文融合媒体情感预测[J]. 计算机应用, 2016, 36(2): 428-431.
[14]	刘春丽, 李晓戈, 刘睿, 范贤, 杜丽萍. 基于表示学习的中文分词[J]. 计算机应用, 2016, 36(10): 2794-2798.
[15]	卢伟胜郭躬德陈黎飞. 基于词性标注序列特征提取的微博情感分类[J]. 计算机应用, 2014, 34(10): 2869-2873.