基于生成对抗网络的数据不确定性量化方法

doi:10.11772/j.issn.1001-9081.2022030383

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (4): 1094-1101.DOI: 10.11772/j.issn.1001-9081.2022030383

所属专题：数据科学与技术

基于生成对抗网络的数据不确定性量化方法

王昊¹, 王子成¹, 张超¹, 马韵升²()

^1.大连理工大学数学科学学院，辽宁大连 116024
^2.山东京博控股集团有限公司，山东滨州 256500

收稿日期:2022-03-30 修回日期:2022-06-22 接受日期:2022-06-24 发布日期:2023-01-11 出版日期:2023-04-10
通讯作者: 马韵升
作者简介:王昊（1996—），男，辽宁抚顺人，博士研究生，主要研究方向：机器学习、强化学习、主动学习；
王子成（1996—），男，辽宁鞍山人，硕士研究生，主要研究方向：机器学习、不确定性量化；
张超（1981—），男，河北文安人，教授，博士，主要研究方向：机器学习；
基金资助:
国家重点研发计划项目(2020YFB1711104)

Generative adversarial network based data uncertainty quantification method

Hao WANG¹, Zicheng WANG¹, Chao ZHANG¹, Yunsheng MA²()

^1.School of Mathematical Sciences，Dalian University of Technology，Dalian Liaoning 116024，China
^2.Shandong Chambroad Holding Group Company Limited，Binzhou Shandong 256500，China

Received:2022-03-30 Revised:2022-06-22 Accepted:2022-06-24 Online:2023-01-11 Published:2023-04-10
Contact: Yunsheng MA
About author:WANG Hao， born in 1996， Ph. D. candidate. His research interests include machine learning， reinforcement learning， active learning.
WANG Zicheng， born in 1996， M. S. candidate. His research interests include machine learning， uncertainty quantification.
ZHANG Chao， born in 1981， Ph. D.， professor. His research interests include machine learning.
Supported by:
National Key R&D Program of China(2020YFB1711104)

摘要/Abstract

摘要：

针对直接使用高维、高频、含有噪声的现实世界数据进行数据处理时会导致估计器不可靠的问题，提出一种基于生成对抗网络（GAN）的数据不确定性量化方法。首先，通过GAN重构原始数据分布，构建噪声空间到原始数据空间的映射分布；其次，使用马尔可夫链蒙特卡洛（MCMC）方法抽取样本，从而得到基于原始数据分布的新样本；然后，基于指定的函数定义样本的不确定性置信区间；最后，使用置信区间对原始数据进行不确定性估计，并选择置信区间内的数据作为估计器使用的数据。实验结果表明，与使用原始数据相比，使用置信区间内的数据进行估计器训练达到性能上限所需要的样本数减少了50%；同时，对比原始训练数据，置信区间内的数据在达到相同测试精度时所需要的样本数平均降低了30%。

关键词: 生成对抗网络, 不确定性量化, 马尔可夫链蒙特卡洛方法, 置信区间, 不确定性估计

Abstract:

To solve the problem that the direct use of high-dimensional， high-frequency， noise-containing real-world data to perform data processing leads to unreliable estimators， a data uncertainty quantification method based on Generative Adversarial Network （GAN） was proposed. Firstly， the original data distribution was reconstructed by GAN to construct a mapping distribution from the noise space to the space of the original data. Secondly， the samples were extracted by Markov Chain Monte Carlo （MCMC） method to obtain new samples based on the original data distribution. Thirdly， confidence intervals for the uncertainty of the samples were defined based on the specified functions. Finally， the confidence intervals were used to estimate the uncertainty of the original data， and within the data the confidence intervals was selected as the data used by the estimator. Experimental results show that 50% fewer samples are required to train the estimator to reach the upper limit by using the data within the confidence intervals compared to the samples required by using the original data. At the same time， compared to the original data， the data within the confidence intervals requires 30% fewer samples on average to achieve the same test accuracy.

Key words: Generative Adversarial Network (GAN), uncertainty quantification, Markov Chain Monte Carlo (MCMC) method, confidence interval, uncertainty estimation

中图分类号:

TP399

王昊, 王子成, 张超, 马韵升. 基于生成对抗网络的数据不确定性量化方法[J]. 计算机应用, 2023, 43(4): 1094-1101.

Hao WANG, Zicheng WANG, Chao ZHANG, Yunsheng MA. Generative adversarial network based data uncertainty quantification method[J]. Journal of Computer Applications, 2023, 43(4): 1094-1101.

图/表 9

图1 生成对抗网络结构

Fig. 1 Structure of generative adversarial network

图2 3种函数的函数图像

Fig. 2 Function images of three functions

表1 部分工艺参数数据表的数据描述

Tab. 1 Data description of some process parameter data sheets

数据表名称	数据维度	数据大小
Fuc_chargeinfo	56	209
Fuc_sdd	119	209
Fuc_slabcalcudata	75	39 863
Rmc_hmi	31	209
Rmc_hsb	5	4 153
Rmc_r1	3	2 806
Rmc_r2	4	38 212
Scc_fsb	6	209

表2 不同函数的置信区间

Tab. 2 Confidence intervals of different functions

区间	Griewank函数	Rastrigin函数	Ackley函数
原始数据目标值区间	$0.350 3,0.835 8$	$2 848.91,5 634.95$	$3.141 3,4.309 2$
置信区间	$0.355 4,0.571 1$	$4 794.95,6 105.20$	$3.861 4,4.286 7$

表2 不同函数的置信区间

Tab. 2 Confidence intervals of different functions

区间	Griewank函数	Rastrigin函数	Ackley函数
原始数据目标值区间	$0.350 3,0.835 8$	$2 848.91,5 634.95$	$3.141 3,4.309 2$
置信区间	$0.355 4,0.571 1$	$4 794.95,6 105.20$	$3.861 4,4.286 7$

图3 Griewank函数下PCA和t-SNE降维后的分布

Fig. 3 Distribution after dimensionality reduction of PCA and t-SNE for Griewank function

图4 Rastrigin函数下PCA和t-SNE降维后的分布

Fig. 4 Distribution after dimensionality reduction of PCA and t-SNE for Rastrigin function

图5 Ackley函数下PCA和t-SNE降维后的分布

Fig. 5 Distribution after dimensionality reduction of PCA and t-SNE for Ackley function

图6 SVR和RFR使用原始数据和置信数据的性能比较

Fig. 6 Performance comparison of SVR and RFR using original data and confidence data

表3 RFR下达到相同MSE所需要的样本数对比

Tab. 3 Comparison of samples required to achieve same MSE under RFR

MSE	所需原始数据的样本数	置信数据样本数
MSE	所需原始数据的样本数	Griewank函数	Rastrigin函数	Ackley函数
0.09	10	1	1	1
0.08	13	6	7	6
0.07	36	8	8	8
0.06	175	37	63	37
0.05	224	212	208	210
0.04	687	375	543	343
0.03	687	463	543	418
0.02	950	803	842	904

参考文献 34

1	王纲胜，夏军，陈军锋. 模型多参数灵敏度与不确定性分析［J］. 地理研究， 2010， 29（2）： 263-270. 10.11821/yj2010020009
	WANG G S， XIA J， CHEN J F. A multi-parameter sensitivity and uncertainty analysis method to evaluate relative importance of parameters and model performance［J］. Geographical Research， 2010， 29（2）： 263-270. 10.11821/yj2010020009
2	熊芬芬，陈江涛，任成坤，等. 不确定性传播的混沌多项式方法研究进展［J］. 中国舰船研究， 2021， 16（4）： 19-36.
	XIONG F F， CHEN J T， REN C K， et al. Recent advances in polynomial chaos method for uncertainty propagation［J］. Chinese Journal of Ship Research， 2021， 16（4）： 19-36.
3	梁天锡，彭忠明，沈展鹏，等. 基于裕量与不确定性量化的系统可靠性评估［J］. 科学技术与工程， 2017， 17（3）： 121-129. 10.3969/j.issn.1671-1815.2017.03.018
	LIANG T X， PENG Z M， SHEN Z P， et al. System reliability assessment based on QMU［J］. Science Technology and Engineering， 2017， 17（3）： 121-129. 10.3969/j.issn.1671-1815.2017.03.018
4	TAHA A， CHEN Y T， MISU T， et al. Unsupervised data uncertainty learning in visual retrieval systems［EB/OL］. （2019-02-07）［2022-03-18］..
5	LIU R R， CHENG S Y， TIAN L， et al. Deep spectral learning for label-free optical imaging oximetry with uncertainty quantification［J］. Light： Science and Applications， 2019， 8： No.102. 10.1038/s41377-019-0216-0
6	HAN X， LI B Y， WANG Z R. An attention-based neural framework for uncertainty identification on social media texts［J］. Tsinghua Science and Technology， 2019， 25（1）： 117-126. 10.26599/tst.2019.9010022
7	VANDAL T， KODRA E， DY J， et al. Quantifying uncertainty in discrete-continuous and skewed data with Bayesian deep learning［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 2377-2386. 10.1145/3219819.3219996
8	GRIGORESCU S， TRASNEA B， COCIAS T， et al. A survey of deep learning techniques for autonomous driving［J］. Journal of Field Robotics， 2020， 37（3）： 362-386. 10.1002/rob.21918
9	翟俊海，张素芳，王聪，等. 基于MapReduce的大数据主动学习［J］. 计算机应用， 2018， 38（10）： 2759-2763. 10.11772/j.issn.1001-9081.2018041141
	ZHAI J H， ZHANG S F， WANG C， et al. Big data active learning based on MapReduce［J］. Journal of Computer Applications， 2018， 38（10）： 2759-2763. 10.11772/j.issn.1001-9081.2018041141
10	FILOS A， FARQUHAR S， GOMEZ A N， et al. A systematic comparison of Bayesian deep learning robustness in diabetic retinopathy tasks［EB/OL］. （2019-12-22）［2022-03-18］.
11	SALIMANS T， KINGMA D P， WELLING M. Markov chain Monte Carlo and variational inference： bridging the gap［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 1218-1226.
12	SWIATKOWSKI J， ROTH K， VEELING B， et al. The k-tied normal distribution： a compact parameterization of Gaussian mean field posteriors in Bayesian neural networks［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 9289-9299. 10.48550/arXiv.2002.02655
13	GAL Y， ISLAM R， GHAHRAMANI Z. Deep Bayesian active learning with image data［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1183-1192.
14	BLUNDELL C， CORNEBISE J， KAVUKCUOGLU K， et al. Weight uncertainty in neural network［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 1613-1622.
15	GHOSH P， SAJJADI M S M， VERGARI A， et al. From variational to deterministic autoencoders［EB/OL］. （2020-05-29）［2022-03-18］..
16	DAMIANOU A， LAWRENCE N D. Deep Gaussian processes［C］// Proceedings of the 16th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2013： 207-215.
17	MacKAY D J C. Information Theory， Inference and Learning Algorithms［M］. Cambridge： Cambridge University Press， 2003： 341-342.
18	MALININ A. Uncertainty estimation in deep learning with application to spoken language assessment［D］. Cambridge： University of Cambridge， 2019： 29-59.
19	OTT M， AULI M， GRANGIER D， et al. Analyzing uncertainty in neural machine translation［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 3956-3965. 10.18653/v1/w18-6301
20	CHANG J， LAN Z H， CHENG C M， et al. Data uncertainty learning in face recognition［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5709-5718. 10.1109/cvpr42600.2020.00575
21	卜宇，任晓芳，唐学军，等. 不确定性估计结合主动外观模型三维特征提取的人脸识别方法［J］. 计算机应用， 2016， 36（7）： 1971-1975. 10.11772/j.issn.1001-9081.2016.07.1971
	BU Y， REN X F， TANG X J， et al. Face recognition method based on uncertainty measurement combined with 3D features extraction using active appearance model［J］. Journal of Computer Applications， 2016， 36（7）： 1971-1975. 10.11772/j.issn.1001-9081.2016.07.1971
22	管其杰，张挺，李德亚，等. 基于多分辨率生成对抗网络的空间数据不确定性重建方法［J］. 计算机应用， 2021， 41（8）： 2306-2311.
	GUAN Q J， ZHANG T， LI D Y， et al. Indefinite reconstruction method of spatial data based on multi-resolution generative adversarial network［J］. Journal of Computer Applications， 2021， 41（8）： 2306-2311.
23	李艳，郭劼，范斌. 元学习的不确定性特征构建及初步分析［J］. 计算机应用， 2022， 42（2）： 343.
	LI Y， GUO J， FAN B. Feature construction and preliminary analysis of uncertainty for meta-learning［J］. Journal of Computer Applications， 2022， 42（2）： 343-348.
24	段友祥，张含笑，孙歧峰，等. 基于拉普拉斯金字塔生成对抗网络的图像超分辨率重建算法［J］. 计算机应用， 2021， 41（4）： 1020-1026. 10.11772/j.issn.1001-9081.2020081299
	DUAN Y X， ZHANG H X， SUN Q F， et al. Image super-resolution reconstruction algorithm based on Laplacian pyramid generative adversarial network［J］. Journal of Computer Applications， 2021， 41（4）： 1020-1026. 10.11772/j.issn.1001-9081.2020081299
25	李虹霞，秦品乐，闫寒梅，等. 基于面部特征图对称的人脸正面化生成对抗网络算法［J］. 计算机应用， 2021， 41（3）： 714-720.
	LI H X， QIN P L， YAN H M， et al. Face frontalization generative adversarial network algorithm based on face feature map symmetry［J］. Journal of Computer Applications， 2021， 41（3）： 714-720.
26	李福海，蒋慕容，杨磊，等. 基于生成对抗网络的梯度引导太阳斑点图像去模糊方法［J］. 计算机应用， 2021， 41（11）： 3315-3352. 10.11772/j.issn.1001-9081.2020121898
	LI F H， JIANG M R， YANG L， et al. Solar speckle image deblurring method with gradient guidance based on generative adversarial network［J］. Journal of Computer Applications， 2021， 41（11）： 3345-3352. 10.11772/j.issn.1001-9081.2020121898
27	赵杨，李波. 基于增强生成器条件生成对抗网络的单幅图像去雾［J］. 计算机应用， 2021， 41（12）： 3686-3691.
	ZHAO Y， LI B. Single image dehazing based on conditional generative adversarial network with enhanced generator［J］. Journal of Computer Applications， 2021， 41（12）： 3686-3691.
28	孙鹤立，孙玉柱，张晓云. 基于生成对抗网络的事件描述生成［J］. 计算机应用， 2021， 41（5）： 1256-1261. 10.11772/j.issn.1001-9081.2020081242
	SUN H L， SUN Y Z， ZHANG X Y. Event description generation based on generative adversarial network［J］. Journal of Computer Applications， 2021， 41（5）： 1256-1261. 10.11772/j.issn.1001-9081.2020081242
29	夏彬，白宇轩，殷俊杰. 基于生成对抗网络的系统日志级异常检测算法［J］. 计算机应用， 2020， 40（10）： 2960-2966.
	XIA B， BAI Y X， YIN J J. Generative adversarial network-based system log-level anomaly detection algorithm［J］. Journal of Computer Applications， 2020， 40（10）： 2960-2966.
30	NEAL R M. MCMC using Hamiltonian dynamics［M］// BROOKS S， GELMAN A， JONES G L， et al. Handbook of Markov Chain Monte Carlo. Boca Raton： Chapman and Hall/CRC， 2011： 113-162. 10.1201/b10905-6
31	GRIEWANK A O. Generalized descent for global optimization［J］. Journal of Optimization Theory and Applications， 1981， 34（1）： 11-39. 10.1007/bf00933356
32	徐晓强，秦品乐，曾建朝. 基于改进粒子群优化算法的牙齿正畸路径规划方法［J］. 计算机应用， 2020， 40（7）： 1938-1943.
	XU X Q， QIN P L， ZENG J C. Orthodontic path planning based on improved particle swarm optimization algorithm［J］. Journal of Computer Applications， 2020， 40（7）： 1938-1943.
33	POHLHEIM H. Examples of objective functions［EB/OL］.（2006-12-01）［2022-03-18］. .
34	VILLANI C. Optimal Transport： Old and New［M］. Berlin： Springer， 2009： 107. 10.1007/978-3-540-71050-9_28

[1]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[2]	周妍, 李阳. 用于脑卒中病灶分割的具有注意力机制的校正交叉伪监督方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1942-1948.
[3]	王昊冉, 于丹, 杨玉丽, 马垚, 陈永乐. 面向工控系统未知攻击的域迁移入侵检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1158-1165.
[4]	郑毅, 廖存燚, 张天倩, 王骥, 刘守印. 面向城区的基于图去噪的小区级RSRP估计方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 855-862.
[5]	周辉, 陈玉玲, 王学伟, 张洋文, 何建江. 基于生成对抗网络的联邦学习深度影子防御方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 223-232.
[6]	陈少权, 蔡剑平, 孙岚. 动态梯度阈值裁剪的差分隐私生成对抗网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2065-2072.
[7]	刘安阳, 赵怀慈, 蔡文龙, 许泽超, 解瑞灯. 基于主动判别机制的自适应生成对抗网络图像去模糊算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2288-2294.
[8]	靳鑫, 刘仰川, 朱叶晨, 张子健, 高欣. 基于残差编解码-生成对抗网络的正弦图修复的稀疏角度锥束CT图像重建[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1950-1957.
[9]	郭劲文, 马兴华, 骆功宁, 王玮, 曹阳, 王宽全. 基于Transformer的结构强化IVOCT导丝伪影去除方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1596-1605.
[10]	吴家皋, 章仕稳, 蒋宇栋, 刘林峰. 基于状态精细化长短期记忆和注意力机制的社交生成对抗网络用于行人轨迹预测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1565-1570.
[11]	樊小宇, 蔺素珍, 王彦博, 刘峰, 李大威. 基于残差图卷积神经网络的高倍欠采样核磁共振图像重建算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1261-1268.
[12]	尹春勇, 周立文. 基于再编码的无监督时间序列异常检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 804-811.
[13]	陶玲玲, 刘波, 李文博, 何希平. 有闭解的可控人脸编辑算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 601-607.
[14]	朱利安, 张鸿. 基于双分支条件生成对抗网络的非均匀图像去雾[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 567-574.
[15]	王若莹, 吕凡, 赵柳清, 胡伏原. 融合用户需求和边界约束的平面图生成算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 575-582.

基于生成对抗网络的数据不确定性量化方法

Generative adversarial network based data uncertainty quantification method

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 34

相关文章 15

编辑推荐

Metrics