Adaptive multi-feature fusion detection method for AI-generated text

doi:10.11772/j.issn.1001-9081.2025050657

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1433-1440.DOI: 10.11772/j.issn.1001-9081.2025050657

• Artificial intelligence • Previous Articles

Adaptive multi-feature fusion detection method for AI-generated text

Jiali ZHENG¹^,², Gang ZHOU¹^,²(), Jing CHEN¹^,², Shunhang LI¹^,²

^1.School of Data and Target Engineering，Information Engineering University，Zhengzhou Henan 450001，China
^2.State Key Laboratory of Mathematical Engineering and Advanced Computing （Information Engineering University），Zhengzhou Henan 450001，China

Received:2025-06-16 Revised:2025-07-16 Accepted:2025-07-23 Online:2025-08-12 Published:2026-05-10
Contact: Gang ZHOU
About author:ZHENG Jiali， born in 2002， M. S. candidate. Her research interests include data mining， natural language processing， knowledge graph.
CHEN Jing， born in 1990， Ph. D.， lecturer. Her research interests include big data analysis， natural language processing， data mining.
LI Shunhang， born in 2000， Ph. D. candidate. His research interests include causal computation， Chinese information processing， data mining.

基于多特征自适应融合的智能生成文本检测方法

郑嘉丽¹^,², 周刚¹^,²(), 陈静¹^,², 李顺航¹^,²

^1.信息工程大学数据与目标工程学院，郑州 450001
^2.数字工程与先进计算国家重点实验室（信息工程大学），郑州 450001

通讯作者: 周刚
作者简介:郑嘉丽（2002—），女，河南郑州人，硕士研究生，主要研究方向：数据挖掘、自然语言处理、知识图谱
陈静（1990—），女，河南焦作人，讲师，博士，主要研究方向：大数据分析、自然语言处理、数据挖掘
李顺航（2000—），男，河南信阳人，博士研究生，主要研究方向：因果计算、中文信息处理、数据挖掘。

Abstract

Abstract:

To address the problems posed by highly realistic AI-generated text， driven by the rapid development of Large Language Models （LLMs）， and the performance degradation of traditional detection methods， an adaptive multi-feature fusion detection method for AI-generated text was proposed. Firstly， a language style feature set covering text statistical features， language structural features， and language uncertainty features was constructed to capture differences between real and AI-generated texts； then， deep semantic features of texts were extracted using independent encoding technology. Based on these， a dual-path mapping feature-adaptive fusion strategy was designed： language-style features and deep semantic features were first fused at a primary level， and secondary fusion was then performed using deep learning to enhance the capability of adaptive feature fusion. Experimental results demonstrate that the proposed method achieves detection accuracies of 98.1% on the Chinese SocialAI-Detect dataset and 98.5% on the English TuringBench dataset； compared with the best-performing baseline， J-Guard （Journalism Guided adversarially robust detection of AI-generated news）， the improvements are 2.3 and 2.1 percentage points， respectively， verifying the effectiveness of the proposed method.

Key words: AI-generated text, feature fusion, generated text detection, text classification, Large Language Model (LLM)

摘要：

针对大语言模型（LLM）快速发展导致智能生成文本信息高度拟真、传统检测方法性能下降的问题，提出一种基于多特征自适应融合的智能生成文本检测方法。该方法首先构建涵盖文本统计特征、语言结构性特征及语言不确定性特征的语言风格特征集，捕捉真实文本与生成文本的差异；再利用独立编码技术提取文本的深层语义特征。在此基础上，设计一种双路映射特征自适应融合策略，先将语言风格特征与深层文本语义特征初步融合，再基于深度学习方法进行二次融合，增强特征自适应融合能力。实验结果表明：所提方法在中文SocialAI-Detect数据集与英文TuringBench数据集上的检测准确率分别达到98.1%和98.5%，与基线方法中性能表现最好的J-Guard（Journalism Guided adversarially robust detection of AI-generated news）相比，分别提升了2.3与2.1个百分点，验证了所提方法的有效性。

关键词: 智能生成文本, 特征融合, 生成文本检测, 文本分类, 大语言模型

CLC Number:

TP391.1

Jiali ZHENG, Gang ZHOU, Jing CHEN, Shunhang LI. Adaptive multi-feature fusion detection method for AI-generated text[J]. Journal of Computer Applications, 2026, 46(5): 1433-1440.

郑嘉丽, 周刚, 陈静, 李顺航. 基于多特征自适应融合的智能生成文本检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1433-1440.

Figures/Tables 13

References 32

[1]	罗文，王厚峰.大语言模型评测综述［J］.中文信息学报，2024，38（1）：1-23.
	LUO W， WANG H F. Evaluating large language models： a survey of research progress［J］. Journal of Chinese Information Processing， 2024， 38（1）： 1-23.
[2]	陈慧敏，刘知远，孙茂松.大语言模型时代的社会机遇与挑战［J］.计算机研究与发展，2024，61（1）：1094-1103.
	CHEN H M， LIU Z Y， SUN M S. The social opportunities and challenges in the era of large language models［J］. Journal of Computer Research and Development， 2024， 61（1）： 1094-1103.
[3]	漆晨航.生成式人工智能的虚假信息风险特征及其治理路径［J］.情报理论与实践，2024，47（3）：112-120.
	QI C H. Research on the risks of disinformation from generative artificial intelligence and its governance paths［J］. Information Studies： Theory and Application， 2024， 47（3）： 112-120.
[4]	CINGILLIOGLU I. Detecting AI-generated essays： the ChatGPT challenge［J］. International Journal of Information and Learning Technology， 2023， 40（3）： 259-268.
[5]	蔺琛皓，沈超，邓静怡，等.虚假数字人脸内容生成与检测技术［J］.计算机学报，2023，46（3）：469-498.
	LIN C H， SHEN C， DENG J Y， et al. Digitally forged face content creation and detection［J］. Chinese Journal of Computers， 2023， 46（3）： 469-498.
[6]	CHEN W， LIU B， GUAN W. ERNIE and multi-feature fusion for news topic classification［J］. Artificial Intelligence and Applications， 2024， 2（2）： 135-140.
[7]	LIU A， PAN L， LU Y， et al. A survey of text watermarking in the era of large language models［J］. ACM Computing Surveys， 2025， 57（2）： No.47.
[8]	WEBER-WULFF D， ANOHINA-NAUMECA A， BJELOBABA S， et al. Testing of detection tools for AI-generated text［J］. International Journal for Educational Integrity， 2023， 19： No.26.
[9]	WANG Y， MANSUROV J， IVANOV P， et al. M4GT-Bench： evaluation benchmark for black-box machine-generated text detection［C］// Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2024： 3964-3992.
[10]	QAZI Z， SHIAO W， PAPALEXAKIS E E. GPT-generated text detection： benchmark dataset and tensor-based detection method［C］// Companion Proceedings of the ACM Web Conference 2024. New York： ACM， 2024： 842-846.
[11]	KUMARAGE T， BHATTACHARJEE A， PADEJSKI D， et al. J‑Guard： journalism guided adversarially robust detection of AI-generated news［C］// Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2023： 484-497.
[12]	YANG Z， FENG Z， HUO R， et al. The Imitation Game revisited： a comprehensive survey on recent advances in AI-generated text detection［J］. Expert Systems with Applications， 2025， 272： No.126694.
[13]	TANG R， CHUANG Y N， HU X. The science of detecting LLM-generated text［J］. Communications of the ACM， 2024， 67（4）： 50-59.
[14]	SU J， ZHUO T， WANG D， et al. DetectLLM： leveraging log rank information for zero-shot detection of machine-generated text［C］// Findings of the Association for Computational Linguistics： EMNLP 2023. Stroudsburg： ACL， 2023： 12395-12412.
[15]	YANG X， CHENG W， WU Y， et al. DNA-GPT： divergent N‑gram analysis for training-free detection of GPT-generated text［EB/OL］. ［2024-12-27］. .
[16]	项慧，薛鋆豪，郝玲昕. 基于语言特征集成学习的大语言模型生成文本检测［J］. 信息网络安全， 2024， 24（7）： 1098-1109.
	XIANG H， XUE Y H， HAO L X. Large language model-generated text detection based on linguistic feature ensemble learning［J］. Netinfo Security， 2024， 24（7）： 1098-1109.
[17]	MITCHELL E， LEE Y， KHAZATSKY A， et al. DetectGPT： zero-shot machine-generated text detection using probability curvature［C］// Proceedings of the 40th International Conference on Machine Learning. New York： JMLR.org， 2023： 24950-24962.
[18]	COTTON D R E， COTTON P A， SHIPWAY J R. Chatting and cheating： ensuring academic integrity in the era of ChatGPT［J］. Innovations in Education and Teaching International， 2024， 61（2）： 228-239.
[19]	KIRCHENBAUER J， GEIPING J， WEN Y， et al. A watermark for large language models［C］// Proceedings of the 40th International Conference on Machine Learning. New York： JMLR.org， 2023： 17061-17084.
[20]	LIU Y， BU Y. Adaptive text watermark for large language models［C］// Proceedings of the 41st International Conference on Machine Learning. New York： JMLR.org， 2024： 30718-30737.
[21]	LI L， WANG P， REN K， et al. Origin tracing and detecting of LLMs［EB/OL］. ［2025-02-09］. .
[22]	Workshop BigScience. BLOOM： a 176B-parameter open-access multilingual language model［EB/OL］. ［2025-03-12］..
[23]	CROTHERS E， JAPKOWICZ N， VIKTOR H， et al. Adversarial robustness of neural-statistical features in detection of generative Transformers［C］// Proceedings of the 2022 International Joint Conference on Neural Networks. Piscataway： IEEE， 2022： 1-8.
[24]	郑嘉丽.面向社交媒体的恶意账号检测技术研究［D］.郑州：信息工程大学，2024：37-42.
	ZHENG J L. Research on malicious account detection technology for social media［D］. Zhengzhou： Information Engineering University， 2024： 37-42.
[25]	范志武，姚金良.基于深度金字塔卷积神经网络的ChatGPT生成文本检测方法［J］.数据分析与知识发现，2024，8（7）：14-22.
	FAN Z W， YAO J L. Detecting ChatGPT generated texts based on deep pyramid convolutional neural network［J］. Data Analysis and Knowledge Discovery， 2024， 8（7）： 14-22.
[26]	刘冬，陈一民.基于Shapley加性解释的ChatGPT生成文本检测模型研究［J］.计算机应用与软件，2024，41（10）：212-220.
	LIU D， CHEN Y M. ChatGPT generated text detection model based on Shapley additive explanations［J］. Computer Applications and Software， 2024， 41（10）： 212-220.
[27]	帅奇，王海瑞，朱贵富.基于双向对比训练的中文故事结尾生成模型［J］.计算机应用，2024，44（9）：2683-2688.
	SHUAI Q， WANG H R， ZHU G F. Chinese story ending generation model based on bidirectional contrastive training［J］. Journal of Computer Applications， 2024， 44（9）： 2683-2688.
[28]	朱君辉，王梦焰，杨尔弘，等.大模型生成回答与人类回答文本的语言特征比较研究［J］.中文信息学报，2024，38（4）：17-27.
	ZHU J H， WANG M Y， YANG E H， et al. A comparative study of language between artificial intelligence and human： a case study of ChatGPT［J］. Journal of Chinese Information Processing， 2024， 38（4）： 17-27.
[29]	王舰，孙宇清.可控文本生成技术研究综述［J］.中文信息学报，2024，38（10）：1-23.
	WANG J， SUN Y Q. Survey on controllable text generation［J］. Journal of Chinese Information Processing， 2024， 38（10）： 1-23.
[30]	GEHRMANN S， STROBELT H， RUSH A M. GLTR： statistical detection and visualization of generated text［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics： System Demonstrations. Stroudsburg： ACL， 2019： 111-116.
[31]	WU K， PANG L， SHEN H， et al. LLMDet： a third party large language models generated text detection tool［C］// Findings of the Association for Computational Linguistics： EMNLP 2023. Stroudsburg： ACL， 2023： 2113-2133.
[32]	LIU S， LIU X， WANG Y， et al. Does DetectGPT fully utilize perturbation？ bridging selective perturbation to fine-tuned contrastive learning detector would be better［C］// Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2024： 1874-1889.

特征	特征描述
字符数	表示文本中字符的总数
重复字符数	反映文本中字符重复程度
不重复字符数	反映文本多样性与词汇丰富度
不重复字符占比	反映文本不重复字符所占比例
原始字符长度	反映文本长度，生成文本可能因算法限制呈现某种规律
字符频率极值、平均值、标准差与极差	描述文本中字符出现频率的分布情况，捕捉文本隐含信息

特征	特征描述
字符数	表示文本中字符的总数
重复字符数	反映文本中字符重复程度
不重复字符数	反映文本多样性与词汇丰富度
不重复字符占比	反映文本不重复字符所占比例
原始字符长度	反映文本长度，生成文本可能因算法限制呈现某种规律
字符频率极值、平均值、标准差与极差	描述文本中字符出现频率的分布情况，捕捉文本隐含信息

主题	Prompt提示语
政治	请你扮演一名［资深政治评论员］［官方发言人］，结合时事，模仿社交媒体上真实热门的推文风格，撰写一篇高度仿真的［政治］主题推文，要求内容仿真度高，语言风格贴近真实用户
军事	请你一名扮演［军事爱好达人］［军事专家学者］，结合时事，模仿社交媒体上真实热门的推文风格，撰写一条高度仿真的［军事］主题推文，要求内容仿真度高，语言风格贴近真实用户
娱乐	请你扮演一名［资深娱乐博主］［娱乐记者］，结合时事，模仿社交媒体上真实热门的推文风格，撰写一条高度仿真的［娱乐］主题推文，要求内容仿真度高，语言风格贴近真实用户

主题	Prompt提示语
政治	请你扮演一名［资深政治评论员］［官方发言人］，结合时事，模仿社交媒体上真实热门的推文风格，撰写一篇高度仿真的［政治］主题推文，要求内容仿真度高，语言风格贴近真实用户
军事	请你一名扮演［军事爱好达人］［军事专家学者］，结合时事，模仿社交媒体上真实热门的推文风格，撰写一条高度仿真的［军事］主题推文，要求内容仿真度高，语言风格贴近真实用户
娱乐	请你扮演一名［资深娱乐博主］［娱乐记者］，结合时事，模仿社交媒体上真实热门的推文风格，撰写一条高度仿真的［娱乐］主题推文，要求内容仿真度高，语言风格贴近真实用户

硬件	配置/值	硬件	配置/值
GPU	RTX 4090 （24 GB）	系统盘	30 GB
内存	90 GB	数据盘	50 GB

Adaptive multi-feature fusion detection method for AI-generated text

基于多特征自适应融合的智能生成文本检测方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 32

Related Articles 15

Recommended Articles

Metrics

类别	方法	TuringBench		SocialAI-Detect
类别	方法	ACC	AUROC	ACC	AUROC
基于统计的方法	GLTR	0.834	0.849	0.827	0.833
基于统计的方法	LLMDet	0.881	0.893	0.874	0.865
机器学习方法	BERT-LR	0.922	0.921	0.864	0.867
	BERT-SVM	0.936	0.938	0.908	0.893
	BERT-RF	0.873	0.874	0.929	0.912
深度学习方法	TextCNN	0.835	0.868	0.873	0.893
	Bi-LSTM	0.931	0.952	0.939	0.949
	BiGRU	0.929	0.934	0.918	0.941
	DetectGPT	0.967	0.958	0.942	0.928
	PECOLA	0.943	0.937	0.921	0.916
	J-Guard	0.964	0.968	0.958	0.961
本文方法		0.985	0.982	0.981	0.979

特征	TuringBench	SocialAI-Detect
-w/o文本统计特征	0.954 9	0.940 3
-w/o语言不确定性特征	0.927 1	0.934 7
-w/o ESC融合特征集	0.910 3	0.901 8

特征融合策略	TuringBench		SocialAI-Detect
特征融合策略	ACC	AUROC	ACC	AUROC
直接融合策略	0.963	0.962	0.960	0.957
双路特征映射融合策略	0.985	0.982	0.981	0.979

序号	模型架构	AUROC	ACC
1	BERT	0.893 2	0.891 6
2	+隐藏层	0.908 6	0.904 0
3	+AEC+隐藏层	0.945 8	0.940 3
4	+AEC+ normalize +隐藏层	0.932 8	0.933 1
5	+扩维ESC+隐藏层	0.954 9	0.957 3
6	ESC +隐藏层	0.951 0	0.956 7
7	扩维ESC +自注意力+隐藏层	0.900 8	0.914 4
8	扩维ESC +GAT+隐藏层	0.894 8	0.883 3

序号	模型架构	AUROC	ACC
1	BERT	0.896 4	0.893 1
2	+隐藏层	0.905 3	0.902 2
3	+SVD+隐藏层	0.934 1	0.938 2
4	+ normalize_SVD+隐藏层	0.933 6	0.939 0
5	+降维SVD+自注意力+隐藏层	0.889 8	0.875 0
6	+SVD+GAT+隐藏层	0.906 6	0.911 5
7	+降维SVD+隐藏层	0.897 2	0.901 3

[1]	Xiaoyu WANG, Xin LI, Di XUE, Zhangtao JIANG, Wei WANG, Yanjun XIAO. Vulnerability classification framework for video surveillance network security based on large language models [J]. Journal of Computer Applications, 2026, 46(4): 1158-1170.
[2]	Kaizhou SHI, Xuan HE, Guoyi HOU, Gen LI, Shuanggao LI, Xiang HUANG. Airborne product metrological traceability knowledge graph construction method based on large language models [J]. Journal of Computer Applications, 2026, 46(4): 1086-1095.
[3]	Wenhao LI, Yinzhang GUO. Urban traffic flow prediction based on dual-layer multi-scale dynamic graph convolutional network model [J]. Journal of Computer Applications, 2026, 46(4): 1323-1333.
[4]	Huanxian LIU, Hongtao WANG, Xian’ao WANG, Hongmei WANG, Weifeng XU. Multimodal fact verification with cross-modal semantic association [J]. Journal of Computer Applications, 2026, 46(4): 1069-1076.
[5]	Shuai HE, Chunhua DENG. Object detection algorithm with few-shot learning based on YOLO-World [J]. Journal of Computer Applications, 2026, 46(4): 1275-1282.
[6]	Xinyi YAN, Linglong ZHU, Yonghong ZHANG. CDC-DETR： multi-scale real-time human-vehicle detection method for complex traffic scenarios [J]. Journal of Computer Applications, 2026, 46(4): 1283-1291.
[7]	Rilong WANG, Zhenping LI, Xiaosong LI, Qiang GAO, Ya HE, Yong ZHONG, Yingxiao ZHAO. Multi-Agent collaborative knowledge reasoning framework [J]. Journal of Computer Applications, 2026, 46(3): 708-714.
[8]	Haoyang ZHANG, Liping ZHANG, Sheng YAN, Na LI, Xuefei ZHANG. Review of large language model methods for knowledge graph completion [J]. Journal of Computer Applications, 2026, 46(3): 683-695.
[9]	Enkang XI, Jing FAN, Yadong JIN, Hua DONG, Hao YU, Yihang SUN. Review of threats faced by federated learning in privacy and security field [J]. Journal of Computer Applications, 2026, 46(3): 798-808.
[10]	Yiming HUANG, Xihua ZOU, Guo DENG, Di ZHENG. Pre-answering and retrieval filtering： dual-stage optimization method for RAG-based question-answering systems [J]. Journal of Computer Applications, 2026, 46(3): 696-707.
[11]	Bin SHEN, Xiaoning CHEN, Hua CHENG, Yiquan FANG, Huifeng WANG. Intelligent undergraduate teaching evaluation system based on large language models [J]. Journal of Computer Applications, 2026, 46(3): 993-1003.
[12]	Dingjia WU, Zhe CUI. MG-SQL： SQL generation framework with enhanced schema linking and multi-generator collaboration [J]. Journal of Computer Applications, 2026, 46(3): 723-731.
[13]	Hanqing LIU, Guoming SANG, Yijia ZHANG. Remote sensing image captioning model combining dense multi-scale feature fusion and feature knowledge-enhanced Transformer [J]. Journal of Computer Applications, 2026, 46(3): 741-749.
[14]	Jincheng FU, Shiyou YANG. Short-term wind power prediction using hybrid model based on Bayesian optimization and feature fusion [J]. Journal of Computer Applications, 2026, 46(2): 652-658.
[15]	Fei GAO, Dong CHEN, Dixing BIAN, Wenqiang FAN, Qidong LIU, Pei LYU, Chaoyang ZHANG, Mingliang XU. Multistage coupled decision-making framework for researcher redeployment after discipline revocation [J]. Journal of Computer Applications, 2026, 46(2): 416-426.