《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (5): 1488-1495.DOI: 10.11772/j.issn.1001-9081.2024050627
• 人工智能 • 上一篇
收稿日期:
2024-05-17
修回日期:
2024-10-14
接受日期:
2024-10-24
发布日期:
2024-11-01
出版日期:
2025-05-10
通讯作者:
张栋
作者简介:
田海燕(2000—),女,江苏淮安人,硕士研究生,主要研究方向:多模态分析基金资助:
Haiyan TIAN, Saihao HUANG, Dong ZHANG(), Shoushan LI
Received:
2024-05-17
Revised:
2024-10-14
Accepted:
2024-10-24
Online:
2024-11-01
Published:
2025-05-10
Contact:
Dong ZHANG
About author:
TIAN Haiyan, born in 2000, M. S. candidate. Her research interests include multi-modal analysis.Supported by:
摘要:
中文分词(WS)和词性(POS)标注可以有效帮助其他下游任务,如知识图谱创建和情感分析。但现有工作通常仅利用纯文本信息进行WS和POS标注,忽略了网络中许多与之相关的图片和视频信息。针对这一现状,尝试从这些视觉信息中挖掘相关线索,以帮助进行中文WS和POS标注。首先,制定一系列详细的数据标注规范,并基于微博推文中的文本和图像内容,使用WS和POS标签标注了一个多模态数据集VG-Weibo;其次,提出2种具有不同解码机制的多模态信息融合方法:VGTD(Visually Guided Two-stage Decoding model)和VGCD(Visually Guided Collapsed Decoding model)完成联合WS和POS标注的任务。其中:VGTD方法采用交叉注意力机制融合文本和图像信息,并通过两阶段解码策略,先预测可能的词语跨度,再预测相应的标签;VGCD方法也采用交叉注意力机制融合文本和图像信息,并采用了更适当的中文表示以及合并解码策略。在VG-Weibo测试集上的实验结果表明,在WS和POS标注任务上,VGTD方法的F1得分比传统的纯文本方法的两阶段解码模型(TD)分别提升了0.18和0.22个百分点;VGCD方法的F1得分比传统的纯文本方法的合并解码模型(CD)分别提升了0.25和0.55个百分点。可见,VGTD和VGCD方法都能有效利用视觉信息提升WS和POS标注的性能。
中图分类号:
田海燕, 黄赛豪, 张栋, 李寿山. 视觉指导的分词和词性标注[J]. 计算机应用, 2025, 45(5): 1488-1495.
Haiyan TIAN, Saihao HUANG, Dong ZHANG, Shoushan LI. Visually guided word segmentation and part of speech tagging[J]. Journal of Computer Applications, 2025, 45(5): 1488-1495.
数据集 | 模态 | 词语数/106 | 字符数/106 | 来源 |
---|---|---|---|---|
VG-Weibo | T+V | 0.05 | 0.12 | 新浪微博 |
MSRA[ | T | 2.48 | 4.23 | SIGHAN 2005 |
PKU[ | T | 1.21 | 2.00 | SIGHAN 2005 |
CTB5[ | T | 0.51 | 0.83 | 新闻、杂志 |
CTB6[ | T | 0.78 | 1.29 | 新闻、杂志、广播 |
NCC[ | T | 0.63 | 1.00 | SIGHAN 2008 |
UD[ | T | 0.12 | 0.20 | CoNLL 2017 |
表1 WS和POS标注数据集比较
Tab. 1 Comparison of WS and POS tagging datasets
数据集 | 模态 | 词语数/106 | 字符数/106 | 来源 |
---|---|---|---|---|
VG-Weibo | T+V | 0.05 | 0.12 | 新浪微博 |
MSRA[ | T | 2.48 | 4.23 | SIGHAN 2005 |
PKU[ | T | 1.21 | 2.00 | SIGHAN 2005 |
CTB5[ | T | 0.51 | 0.83 | 新闻、杂志 |
CTB6[ | T | 0.78 | 1.29 | 新闻、杂志、广播 |
NCC[ | T | 0.63 | 1.00 | SIGHAN 2008 |
UD[ | T | 0.12 | 0.20 | CoNLL 2017 |
指标 | 图像数 | 文本长度/字符数 | 句子长度/字符数 | 局部图像特征数 (置信度为0.7) |
---|---|---|---|---|
最小值 | 1 | 1 | 1 | 0 |
最大值 | 18 | 442 | 285 | 15 |
平均值 | 3.78 | 59.05 | 43.88 | 1.05 |
表2 VG-Weibo数据集的统计信息
Tab. 2 Statistics of VG-Weibo dataset
指标 | 图像数 | 文本长度/字符数 | 句子长度/字符数 | 局部图像特征数 (置信度为0.7) |
---|---|---|---|---|
最小值 | 1 | 1 | 1 | 0 |
最大值 | 18 | 442 | 285 | 15 |
平均值 | 3.78 | 59.05 | 43.88 | 1.05 |
数据集 | 样本数 | 句子数 |
---|---|---|
训练集 | 1 400 | 1 884 |
验证集 | 200 | 264 |
测试集 | 400 | 543 |
表3 VG-Weibo数据集划分
Tab. 3 Division of VG-Weibo dataset
数据集 | 样本数 | 句子数 |
---|---|---|
训练集 | 1 400 | 1 884 |
验证集 | 200 | 264 |
测试集 | 400 | 543 |
方法 | WSDev | WSTest | POSDev | POSTest | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
TMIN[ | 93.84 | 93.63 | 93.73 | 93.89 | 94.14 | 94.02 | 88.45 | 88.29 | 88.37 | 89.20 | 89.24 | 89.22 |
TD[ | 93.39 | 93.48 | 93.44 | 93.71 | 93.86 | 93.79 | 88.50 | 88.19 | 88.34 | 89.28 | 88.99 | 89.13 |
TD+Vision | 93.59 | 93.69 | 93.64 | 93.68 | 93.89 | 93.78 | 88.78 | 88.46 | 88.62 | 89.43 | 89.07 | 89.25 |
VGTD | 93.87 | 93.73 | 93.80 | 94.01 | 93.93 | 93.97 | 88.80 | 88.43 | 88.62 | 89.44 | 89.26 | 89.35 |
CD[ | 94.02 | 93.22 | 93.62 | 93.34 | 94.04 | 94.19 | 88.85 | 87.78 | 88.31 | 89.31 | 88.38 | 88.84 |
CD+Vision | 94.06 | 92.97 | 93.51 | 94.43 | 93.64 | 94.03 | 87.78 | 86.98 | 87.38 | 88.42 | 87.68 | 88.05 |
VGCD | 94.19 | 93.49 | 93.84 | 94.66 | 94.21 | 94.44† | 89.17 | 88.22 | 88.69 | 89.75 | 89.03 | 89.39† |
表4 VG-Weibo数据集上WS与POS标注性能对比
Tab. 4 Performance comparison of WS and POS tagging on VG-Weibo dataset
方法 | WSDev | WSTest | POSDev | POSTest | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
TMIN[ | 93.84 | 93.63 | 93.73 | 93.89 | 94.14 | 94.02 | 88.45 | 88.29 | 88.37 | 89.20 | 89.24 | 89.22 |
TD[ | 93.39 | 93.48 | 93.44 | 93.71 | 93.86 | 93.79 | 88.50 | 88.19 | 88.34 | 89.28 | 88.99 | 89.13 |
TD+Vision | 93.59 | 93.69 | 93.64 | 93.68 | 93.89 | 93.78 | 88.78 | 88.46 | 88.62 | 89.43 | 89.07 | 89.25 |
VGTD | 93.87 | 93.73 | 93.80 | 94.01 | 93.93 | 93.97 | 88.80 | 88.43 | 88.62 | 89.44 | 89.26 | 89.35 |
CD[ | 94.02 | 93.22 | 93.62 | 93.34 | 94.04 | 94.19 | 88.85 | 87.78 | 88.31 | 89.31 | 88.38 | 88.84 |
CD+Vision | 94.06 | 92.97 | 93.51 | 94.43 | 93.64 | 94.03 | 87.78 | 86.98 | 87.38 | 88.42 | 87.68 | 88.05 |
VGCD | 94.19 | 93.49 | 93.84 | 94.66 | 94.21 | 94.44† | 89.17 | 88.22 | 88.69 | 89.75 | 89.03 | 89.39† |
模型 | WSTest | POSTest | ||
---|---|---|---|---|
F1 | ROOV | F1 | ROOV | |
TMIN[ | 94.02 | 81.07 | 89.22 | 70.74 |
TD[ | 93.79 | 81.52 | 89.13 | 71.29 |
TD+Vision | 93.78 | 81.34 | 89.25 | 71.45 |
VGTD | 93.97 | 80.48 | 89.35 | 72.03 |
CD[ | 94.19 | 82.53 | 88.84 | 73.13 |
CD+Vision | 94.03 | 82.30 | 88.05 | 73.66 |
VGCD | 94.44 | 82.79 | 89.39 | 74.51 |
表5 VG-Weibo测试集上WS和POS标注性能对比
Tab. 5 Performance comparison of WS and POS tagging on VG-Weibo test set
模型 | WSTest | POSTest | ||
---|---|---|---|---|
F1 | ROOV | F1 | ROOV | |
TMIN[ | 94.02 | 81.07 | 89.22 | 70.74 |
TD[ | 93.79 | 81.52 | 89.13 | 71.29 |
TD+Vision | 93.78 | 81.34 | 89.25 | 71.45 |
VGTD | 93.97 | 80.48 | 89.35 | 72.03 |
CD[ | 94.19 | 82.53 | 88.84 | 73.13 |
CD+Vision | 94.03 | 82.30 | 88.05 | 73.66 |
VGCD | 94.44 | 82.79 | 89.39 | 74.51 |
方法 | 以太坊eth行情分析 | 背影照好可爱 | 正放大图片赏颜呢 |
---|---|---|---|
![]() | ![]() | ![]() | |
TMIN | 背影照/好/可爱 | ||
NN/AD/VA | |||
CD | 以太坊/eth/行情/分析 | 背影照/好/可爱 | 正/放/大/图片/赏/颜/呢 |
NR//NN/NN | NN/AD/VA | AD/VV/JJ/NN/VV/NN/SP | |
CD+Vision | 以太坊/eth/行情/分析 | 正/放/大/图片/赏/颜/呢 | |
NR/NN/NN | AD/VV/JJ/NN/VV/NN/SP | ||
VGCD | |||
表6 不同方法预测结果的案例
Tab. 6 Cases of results predicted by different methods
方法 | 以太坊eth行情分析 | 背影照好可爱 | 正放大图片赏颜呢 |
---|---|---|---|
![]() | ![]() | ![]() | |
TMIN | 背影照/好/可爱 | ||
NN/AD/VA | |||
CD | 以太坊/eth/行情/分析 | 背影照/好/可爱 | 正/放/大/图片/赏/颜/呢 |
NR//NN/NN | NN/AD/VA | AD/VV/JJ/NN/VV/NN/SP | |
CD+Vision | 以太坊/eth/行情/分析 | 正/放/大/图片/赏/颜/呢 | |
NR/NN/NN | AD/VV/JJ/NN/VV/NN/SP | ||
VGCD | |||
1 | XU N. Chinese word segmentation as character tagging[J]. International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29-48. |
2 | DUAN S, ZHAO H. Attention is all you need for Chinese word segmentation[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 3862-3872. |
3 | ZHENG X, CHEN H, XU T. Deep learning for Chinese word segmentation and POS tagging[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2013: 647-657. |
4 | QIAN T, ZHANG Y, ZHANG M, et al. A transition-based model for joint segmentation, POS-tagging and normalization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2015: 1837-1846. |
5 | CHEN X, QIU X, HUANG X. A feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc., 2017: 3960-3966. |
6 | ZHANG M, YU N, FU G. A simple and effective neural model for joint word segmentation and POS tagging[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(9): 1528-1538. |
7 | ZHAO L, ZHANG A, LIU Y, et al. Encoding multi-granularity structural information for joint Chinese word segmentation and POS tagging[J]. Pattern Recognition Letters, 2020, 138: 163-169. |
8 | NIAN F, LI J, DIAO H, et al. Weibo core user mining and propagation scale predicting[J]. Chaos, Solitons and Fractals, 2022, 156: No.111869. |
9 | XIAO S, CHEN G, ZHANG C, et al. Complementary or substitutive? a novel deep learning method to leverage text-image interactions for multimodal review helpfulness prediction[J]. Expert Systems with Applications, 2022, 208: No.118138. |
10 | ZHANG K, ZHANG B, TENG Z. Leveraging graph to improve lexicon enhanced Chinese sequence labelling[C]// Proceedings of the IEEE 13th International Symposium on Parallel Architectures, Algorithms, and Programming. Piscataway: IEEE, 2022: 1-6. |
11 | HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]// Proceedings of the 2021 International Conference on Multimodal Interaction. New York: ACM, 2021: 6-15. |
12 | 朱艳辉,刘璟,徐叶强,等. 基于条件随机场的中文领域分词研究[J]. 计算机工程与应用, 2016, 52(15): 97-100. |
ZHU Y H, LIU J, XU Y Q, et al. Chinese word segmentation research based on conditional random field[J]. Computer Engineering and Applications, 2016, 52(5): 97-100. | |
13 | SHAO Y, HARDMEIER C, TIEDEMANN J, et al. Character-based joint segmentation and POS tagging for Chinese using bidirectional RNN-CRF[C]// Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2017: 173-183. |
14 | 李雅昆,潘晴, WANG E X.基于改进的多层BLSTM的中文分词和标点预测[J].计算机应用,2018,38(5):1278-1282, 1314. |
LI Y K, PAN Q, WANG E X. Joint Chinese word segmentation and punctuation prediction based on improved multilayer BLSTM network[J]. Journal of Computer Applications, 2018, 38(5): 1278-1282, 1314. | |
15 | TIAN Y, SONG Y, XIA F. Joint Chinese word segmentation and part-of-speech tagging via multi-channel attention of character n-grams[C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 2073-2084. |
16 | KE Z, SHI L, SUN S, et al. Pre-training with meta learning for Chinese word segmentation[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 5514-5523. |
17 | HE R, CAI S, MING Z, et al. Weighted self distillation for Chinese word segmentation[C]// Findings of the Association for Computational Linguistics: ACL 2022. Stroudsburg: ACL, 2022: 1757-1770. |
18 | LI D, ZHAO R, TAN F. CWSeg: an efficient and general approach to Chinese word segmentation[C]// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). Stroudsburg: ACL, 2023: 1-10. |
19 | CHANG B, YUAN Y, LI B, et al. A joint model of automatic word segmentation and part-of-speech tagging for ancient classical texts based on radicals[C]// Proceedings of 2003 Ancient Language Processing Workshop. Shoumen: INCOMA Ltd., 2023: 122-132. |
20 | HUANG K, YU H, LIU J, et al. Lexicon-based graph convolutional network for Chinese word segmentation[C]// Findings of the Association for Computational Linguistics: EMNLP 2021. Stroudsburg: ACL, 2021: 2908-2917. |
21 | 夏飞,陈帅琦,华珉,等. 基于改进BERT的电力领域中文分词方法[J]. 计算机应用, 2023, 43(12): 3711-3718. |
XIA F, CHEN S Q, HUA M, et al. Chinese word segmentation method in electric power domain based on improved BERT[J]. Journal of Computer Applications, 2023, 43(12): 3711-3718. | |
22 | FENG S, LI P. Ancient Chinese word segmentation and part-of-speech tagging using distant supervision[C]// Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-5. |
23 | ZHANG D, HU Z, LI S, et al. More than text: multi-modal Chinese word segmentation[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg: ACL, 2021: 550-557. |
24 | EMERSON T. The second international Chinese word segmentation bakeoff[C]// Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing. [S.l.]: Asian Federation of Natural Language Processing, 2005: 123-133. |
25 | XUE N, XIA F, CHIOU F D, et al. The Penn Chinese TreeBank: phrase structure annotation of a large corpus[J]. Natural Language Engineering, 2005, 11(2): 207-238. |
26 | JIN G, CHEN X. The fourth international Chinese language processing bakeoff: Chinese word segmentation, named entity recognition, and Chinese POS tagging[C]// Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing. [S.l.]: Asian Federation of Natural Language Processing, 2008: 69-81. |
27 | ZEMAN D, POPEL M, STRAKA M, et al. CoNLL 2017 shared task: multilingual parsing from raw text to universal dependencies[C]// Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Stroudsburg: ACL, 2017: 1-19. |
28 | 俞士汶,段慧明,朱学锋,等.北京大学现代汉语语料库基本加工规范[J].中文信息学报,2002,16(5):49-64. |
YU S W, DUAN H M, ZHU X F, et al. The basic processing of contemporary Chinese corpus at Peking University: SPECIFICATION[J]. Journal of Chinese Information Processing, 2002, 16(5): 49-64. | |
29 | XIA F. The segmentation guidelines for the Penn Chinese Treebank (3.0)[EB/OL]. [2024-12-23].. |
30 | 来斯惟,徐立恒,陈玉博,等. 基于表示学习的中文分词算法探索[J]. 中文信息学报, 2013, 27(5): 8-14. |
LAI S W, XU L H, CHEN Y B, et al. Chinese word segment based on character representation learning[J]. Journal of Chinese Information Processing, 2013, 27(5): 8-14. | |
31 | LOU C, YANG S, TU K. Nested named entity recognition as latent lexicalized constituency parsing[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2022: 6183-6198. |
32 | EISNER J, SATTA G. Efficient parsing for bilexical context-free grammars and head automaton grammars[C]// Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 1999: 457-464. |
33 | ZHANG Y, LI Z, ZHANG M. Efficient second-order TreeCRF for neural dependency parsing[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3295-3305. |
34 | FU Y, TAN C, CHEN M, et al. Nested named entity recognition with partially-observed TreeCRFs[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 12839-12847. |
35 | SUN Z, LI X, SUN X, et al. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 2065-2075. |
36 | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 8748-8763. |
37 | POWERS D M W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation[J]. Journal of Machine Learning Technologies, 2011, 2(1): 37-63. |
[1] | 胡文彬, 蔡天翔, 韩天乐, 仲兆满, 马常霞. 融合对比学习与情感分析的多模态反讽检测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1432-1438. |
[2] | 杨杰, 尼玛扎西, 仁青东主, 祁晋东, 才让东知. 基于预训练模型标记器重构的藏文分词系统[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1199-1204. |
[3] | 孙焕良, 王思懿, 刘俊岭, 许景科. 社交媒体数据中水灾事件求助信息提取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2437-2445. |
[4] | 鲍彩倩, 徐建民, 张国防. 基于用户动态交互行为扩展的信念网络推荐模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1115-1121. |
[5] | 孙晓飞, 朱静远, 陈斌, 游恒志. 融合多模态数据的药物合成反应的虚拟筛选[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 622-629. |
[6] | 夏飞, 陈帅琦, 华珉, 蒋碧鸿. 基于改进BERT的电力领域中文分词方法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3711-3718. |
[7] | 肖锐, 刘明义, 涂志莹, 王忠杰. 基于社交媒体文本挖掘的个人事件检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3513-3519. |
[8] | 孟祥瑞, 杨文忠, 王婷. 基于图文融合的情感分析研究综述[J]. 《计算机应用》唯一官方网站, 2021, 41(2): 307-317. |
[9] | 郭可心, 张宇翔. 基于多层次空间注意力的图文评论情感分析方法[J]. 计算机应用, 2021, 41(10): 2835-2841. |
[10] | 理姗姗, 杨文忠, 王婷, 王丽花. 基于网络社交媒体的子话题检测技术综述[J]. 计算机应用, 2020, 40(6): 1565-1573. |
[11] | 蔡国永, 贺歆灏, 储阳阳. 图像整体与局部区域嵌入的视觉情感分析[J]. 计算机应用, 2019, 39(8): 2181-2185. |
[12] | 李雅昆, 潘晴, Everett X. WANG. 基于改进的多层BLSTM的中文分词和标点预测[J]. 计算机应用, 2018, 38(5): 1278-1282. |
[13] | 蔡国永, 夏彬彬. 基于卷积神经网络的图文融合媒体情感预测[J]. 计算机应用, 2016, 36(2): 428-431. |
[14] | 刘春丽, 李晓戈, 刘睿, 范贤, 杜丽萍. 基于表示学习的中文分词[J]. 计算机应用, 2016, 36(10): 2794-2798. |
[15] | 卢伟胜 郭躬德 陈黎飞. 基于词性标注序列特征提取的微博情感分类[J]. 计算机应用, 2014, 34(10): 2869-2873. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||