Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (4): 1139-1147.DOI: 10.11772/j.issn.1001-9081.2024040536

• Artificial intelligence • Previous Articles     Next Articles

Unsupervised text style transfer based on semantic perception of proximity

Junxiu AN, Linwang YANG(), Yuan LIU   

  1. School of Software Engineering,Chengdu University of Information Technology,Chengdu Sichuan 610225,China
  • Received:2024-04-30 Revised:2024-07-29 Accepted:2024-08-01 Online:2025-04-08 Published:2025-04-10
  • Contact: Linwang YANG
  • About author:AN Junxiu, born in 1970, M. S., professor. Her research interests include data mining, intelligent computing.
    YANG Linwang, born in 2000, M. S. candidate. His research interests include natural language processing, data mining.
    LIU Yuan, born in 1999, M. S. candidate. His research interests include data mining, view clustering.
  • Supported by:
    National Social Science Foundation of China(22BXW048);Chengdu Science and Technology Key Research and Development Support Program(2022-YF05-00454-SN)

基于邻近性语义感知的无监督文本风格迁移

安俊秀, 杨林旺(), 柳源   

  1. 成都信息工程大学 软件工程学院,成都 610225
  • 通讯作者: 杨林旺
  • 作者简介:安俊秀(1970—),女,山西临汾人,教授,硕士,CCF会员,主要研究方向:数据挖掘、智能计算
    杨林旺(2000—),男,河北沧州人,硕士研究生,主要研究方向:自然语言处理、数据挖掘
    柳源(1999—),男,新疆乌鲁木齐人,硕士研究生,主要研究方向:数据挖掘、视图聚类。
  • 基金资助:
    国家社会科学基金资助项目(22BXW048);成都市科技重点研发支撑计划项目(2022?YF05?00454?SN)

Abstract:

Aiming at the problem that the distance boundaries between word vectors in latent space are not fully considered in discrete word perturbation and embedding perturbation methods, a Semantic Proximity-aware Adversarial Auto-Encoders (SPAAE) method was proposed. Firstly, adversarial auto-encoders were used as the underlying model. Secondly, standard deviation of the probability distribution of noise vectors was obtained on the basis of proximity distance of the word vectors. Finally, by randomly sampling the probability distribution, the perturbation parameters were adjusted dynamically to maximize the blurring of its own semantics without affecting the semantics of other word vectors. Experimental results show that compared with the DAAE (Denoising Adversarial Auto-Encoders) and EPAAE (Embedding Perturbed Adversarial Auto-Encoders) methods, the proposed method has the natural fluency increased by 14.88% and 15.65%, respectively, on Yelp dataset; the proposed method has the Text Style Transfer (TST) accuracy improved by 11.68% and 6.45%, respectively, on Scitail dataset; the proposed method has the BLEU (BiLingual Evaluation Understudy) increased by 28.16% and 26.17%, respectively, on Tenses dataset. It can be seen that SPAAE method provides a more accurate way of perturbing word vectors in theory, and demonstrates its significant advantages in different style transfer tasks on 7 public datasets. Especially in the guidance of online public opinion, the proposed method can be used for style transfer of emotional text.

Key words: Text Style Transfer (TST), semantic perception, unsupervised learning, adversarial learning, embedding perturbation, distance boundary

摘要:

针对离散词扰动和嵌入扰动方法中未充分考虑潜在空间词向量之间距离边界的问题,提出一种邻近性语义感知的对抗性自动编码器(SPAAE)方法。首先,采用对抗自动编码器作为底层模型;其次,根据词向量的邻近距离求得噪声向量概率分布的标准差;最后,通过对概率分布进行随机采样,动态调整扰动参数,从而最大限度模糊自身语义且不影响其他词向量的语义。实验结果表明,与DAAE (Denoising Adversarial Auto-Encoders)和EPAAE (Embedding Perturbed Adversarial Auto-Encoders)方法相比,所提方法在Yelp数据集上的自然流畅度分别提升了14.88%、15.65%;在Scitail数据集上的文本风格迁移(TST)的准确率分别提升了11.68%、6.45%;在Tenses数据集上的BLEU (BiLingual Evaluation Understudy)值分别提升了28.16%、26.17%。可见,SPAAE方法不仅在理论上提供了一种更精确的词向量扰动方式,而且在7个公开数据集上展示了它在不同风格迁移任务中的显著优势。特别是在网络舆论引导中,所提方法可以用于情感文本的风格迁移。

关键词: 文本风格迁移, 语义感知, 无监督学习, 对抗学习, 嵌入扰动, 距离边界

CLC Number: