Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1433-1440.DOI: 10.11772/j.issn.1001-9081.2025050657

• Artificial intelligence • Previous Articles    

Adaptive multi-feature fusion detection method for AI-generated text

Jiali ZHENG1,2, Gang ZHOU1,2(), Jing CHEN1,2, Shunhang LI1,2   

  1. 1.School of Data and Target Engineering,Information Engineering University,Zhengzhou Henan 450001,China
    2.State Key Laboratory of Mathematical Engineering and Advanced Computing (Information Engineering University),Zhengzhou Henan 450001,China
  • Received:2025-06-16 Revised:2025-07-16 Accepted:2025-07-23 Online:2025-08-12 Published:2026-05-10
  • Contact: Gang ZHOU
  • About author:ZHENG Jiali, born in 2002, M. S. candidate. Her research interests include data mining, natural language processing, knowledge graph.
    CHEN Jing, born in 1990, Ph. D., lecturer. Her research interests include big data analysis, natural language processing, data mining.
    LI Shunhang, born in 2000, Ph. D. candidate. His research interests include causal computation, Chinese information processing, data mining.

基于多特征自适应融合的智能生成文本检测方法

郑嘉丽1,2, 周刚1,2(), 陈静1,2, 李顺航1,2   

  1. 1.信息工程大学 数据与目标工程学院,郑州 450001
    2.数字工程与先进计算国家重点实验室(信息工程大学),郑州 450001
  • 通讯作者: 周刚
  • 作者简介:郑嘉丽(2002—),女,河南郑州人,硕士研究生,主要研究方向:数据挖掘、自然语言处理、知识图谱
    陈静(1990—),女,河南焦作人,讲师,博士,主要研究方向:大数据分析、自然语言处理、数据挖掘
    李顺航(2000—),男,河南信阳人,博士研究生,主要研究方向:因果计算、中文信息处理、数据挖掘。

Abstract:

To address the problems posed by highly realistic AI-generated text, driven by the rapid development of Large Language Models (LLMs), and the performance degradation of traditional detection methods, an adaptive multi-feature fusion detection method for AI-generated text was proposed. Firstly, a language style feature set covering text statistical features, language structural features, and language uncertainty features was constructed to capture differences between real and AI-generated texts; then, deep semantic features of texts were extracted using independent encoding technology. Based on these, a dual-path mapping feature-adaptive fusion strategy was designed: language-style features and deep semantic features were first fused at a primary level, and secondary fusion was then performed using deep learning to enhance the capability of adaptive feature fusion. Experimental results demonstrate that the proposed method achieves detection accuracies of 98.1% on the Chinese SocialAI-Detect dataset and 98.5% on the English TuringBench dataset; compared with the best-performing baseline, J-Guard (Journalism Guided adversarially robust detection of AI-generated news), the improvements are 2.3 and 2.1 percentage points, respectively, verifying the effectiveness of the proposed method.

Key words: AI-generated text, feature fusion, generated text detection, text classification, Large Language Model (LLM)

摘要:

针对大语言模型(LLM)快速发展导致智能生成文本信息高度拟真、传统检测方法性能下降的问题,提出一种基于多特征自适应融合的智能生成文本检测方法。该方法首先构建涵盖文本统计特征、语言结构性特征及语言不确定性特征的语言风格特征集,捕捉真实文本与生成文本的差异;再利用独立编码技术提取文本的深层语义特征。在此基础上,设计一种双路映射特征自适应融合策略,先将语言风格特征与深层文本语义特征初步融合,再基于深度学习方法进行二次融合,增强特征自适应融合能力。实验结果表明:所提方法在中文SocialAI-Detect数据集与英文TuringBench数据集上的检测准确率分别达到98.1%和98.5%,与基线方法中性能表现最好的J-Guard(Journalism Guided adversarially robust detection of AI-generated news)相比,分别提升了2.3与2.1个百分点,验证了所提方法的有效性。

关键词: 智能生成文本, 特征融合, 生成文本检测, 文本分类, 大语言模型

CLC Number: