基于改进Wide&Deep的卷烟焦油指标预测模型

doi:10.11772/j.issn.1001-9081.2022050736

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (S1): 95-99.DOI: 10.11772/j.issn.1001-9081.2022050736

基于改进Wide&Deep的卷烟焦油指标预测模型

周涛¹^,², 谢立华¹(), 王啸飞³

^1.四川中烟工业有限责任公司什邡卷烟厂, 什邡618400
^2.四川中烟工业有限责任公司信息中心, 成都 610020
^3.中国科学院成都计算机应用研究所, 成都 610041

收稿日期:2022-05-23 修回日期:2022-06-15 接受日期:2022-06-17 发布日期:2023-07-04 出版日期:2023-06-30
通讯作者: 谢立华
作者简介:周涛（1974—），男，四川什邡人，高级工程师，主要研究方向：大数据分析、智能制造
谢立华（1995—），男，四川简阳人，助理工程师，主要研究方向：信息技术、智能制造.sctobaccoxlh@163.com
王啸飞（1997—），男，湖南慈利人，硕士研究生，主要研究方向：机器学习、推荐算法。
基金资助:
中国科学院西部青年学者项目(RRJZ2021003)

Cigarette tar index prediction model based on improved Wide&Deep

Tao ZHOU¹^,², Lihua XIE¹(), Xiaofei WANG³

^1.Shifang Cigarette Factory，China Tobacco Sichuan Industry Limited Liability Company，Shifang Sichuan 618400，China
^2.Information Center，China Tobacco Sichuan Industry Limited Liability Company，Chengdu Sichuan 610020，China
^3.Chengdu Institute of Computer Application，Chinese Academy of Sciences，Chengdu Sichuan 610041，China

Received:2022-05-23 Revised:2022-06-15 Accepted:2022-06-17 Online:2023-07-04 Published:2023-06-30
Contact: Lihua XIE

摘要/Abstract

摘要：

针对卷烟焦油指标预测任务中历史卷烟数据样本具有小样本和高维度的特点，导致模型预测准确度偏低的问题，提出一种基于改进Wide&Deep的卷烟焦油指标预测模型。首先通过多个机器学习模型对数据样本进行预测，并将得到的结果作为模型新特征；然后将机器学习模型得到的新特征输入到Wide&Deep模型的Wide端，同时构建融合特征输入到Wide&Deep模型的Deep端，并在Deep端通过引入二阶特征和注意力机制构建注意力特征交叉层实现特征的高阶组合以提高模型预测的准确度。实验结果表明，所提模型与未经过改进的Wide&Deep模型相比，平均绝对误差（MAE）降低了23.4%，均方根误差（RMSE）降低了21.8%；与基于卷积神经网络提取特征的改进Wide&Deep模型相比，MAE降低了15.0%，RMSE降低了16.4%；有效提升了卷烟焦油指标预测任务的准确度。

关键词: 机器学习, Wide&Deep模型, 小样本, 指标预测, 特征交叉, 卷烟焦油

Abstract:

Aiming at the problem that the historical cigarette data samples in the cigarette tar index prediction task have the characteristics of small sample and high dimension， which leads to the low prediction accuracy of the model， a cigarette tar index prediction model based on the improved Wide&Deep was proposed. First， the data samples were predicted through multiple machine learning models and the obtained results were used as new features of the model. Then the new features obtained by the machine learning models were input to the Wide side of the Wide&Deep model，the fusion features were constructed and input to the Deep side of the Wide&Deep model， and by introducing second-order features and attention mechanism to build an attention feature intersection layer， high-order combination of features were achieved to improve the accuracy of model prediction. Experimental results show that compared with the unimproved Wide&Deep model， the proposed model reduces Mean Absolute Error （MAE） by 23.4% and Root Mean Square Error （RMSE） by 21.8%； compared with the Wide&Deep model based on convolutional neural network for extraction features， the proposed model reduces MAE by 15.0% and RMSE by 16.4%. The proposed model effectively improves the accuracy of the cigarette tar index prediction task.

Key words: machine learning, Wide&Deep model, small sample, index prediction, feature intersection, cigarette tar

中图分类号:

TP391.1

周涛, 谢立华, 王啸飞. 基于改进Wide&Deep的卷烟焦油指标预测模型[J]. 计算机应用, 2023, 43(S1): 95-99.

Tao ZHOU, Lihua XIE, Xiaofei WANG. Cigarette tar index prediction model based on improved Wide&Deep[J]. Journal of Computer Applications, 2023, 43(S1): 95-99.

图/表 9

参考文献 25

1	徐翔禹. 基于数据挖掘的单料烟感官评吸指标预测研究［D］. 沈阳：东北大学，2014：1-3.
2	SHI Q， WANG H， XU X， et al. The application of tobacco product quality prediction using ensemble learning method ［C］// Proceedings of the 2019 IEEE 4th Advanced Information Technology， Electronic and Automation Control Conference. Piscataway： IEEE， 2019： 1780-1784. 10.1109/iaeac47372.2019.8998080
3	ATHEY S， TIBSHIRANI J， WAGER S. Generalized random forests［J］. The Annals of Statistics， 2019， 47（2）： 1148-1178. 10.1214/18-aos1709
4	CHEN T， GUESTRIN C. XGBoost： a scalable tree boosting system ［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 785-794. 10.1145/2939672.2939785
5	ZHANG Z， TANG J， LUO X， et al. The application of data mining in cigarette sensory quality evaluation： an experimental study ［C］// Proceedings of the 26th Chinese Control and Decision Conference. Piscataway： IEEE， 2014： 1328-1332. 10.1109/ccdc.2014.6852372
6	HSSINA B， MERBOUHA A， EZZIKOURI H， et al. A comparative study of decision tree ID3 and C4.5 ［J］. International Journal of Advanced Computer Science and Applications， 2014， 4（2）： 13-19. 10.14569/specialissue.2014.040203
7	桑应宾. 基于 K 近邻的分类算法研究［D］. 重庆：重庆大学， 2009：2-4. 10.3778/j.issn.1002-8331.2009.11.044
8	杨宁.支持向量机在感官评估中的应用研究［D］. 青岛：中国海洋大学，2004：5-6.
9	王强，陈英武，李孟军.卷烟焦油量的支持向量机预测［J］.烟草科技，2007（10）：5-8. 10.3969/j.issn.1002-0861.2007.10.001
10	李红梅. 基于线性回归和SVM的烟叶质量分析及等级预测模型［D］. 昆明：昆明理工大学，2013：11-15.
11	王强，陈英武，李孟军.基于支持向量机的卷烟焦油预测［J］.计算机工程与应用，2007，43（9）：234-236. 10.3321/j.issn:1002-8331.2007.09.069
12	林华. 数据挖掘技术在卷烟配方优化中的应用［D］.青岛：中国海洋大学，2008：5-8.
13	段俊杰，蒋美红，王岚，等.基于化学成分的烟叶质量神经网络预测［J］. 西南农业学报，2012，25（1）：48-53. 10.3969/j.issn.1001-4829.2012.01.011
14	CHENG H-T， KOC L， HARMSEN J， et al. Wide & deep learning for recommender systems ［C］// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. New York： ACM， 2016： 7-10. 10.1145/2988450.2988454
15	孙晓燕，聂鑫，暴琳，等.基于改进Wide&Deep交互特征提取的移动APP转化率预估［J］.郑州大学学报（工学版），2020，41（6）：26-32.
16	冯宇寒. 基于改进Wide&Deep的广告转化率预估模型［D］.荆州：长江大学，2021：23-25.
17	ZAR J H. Spearman rank correlation ［M］// Encyclopedia of Biostatistics. Hoboken， NJ： John Wiley & Sons， 2005：5-10. 10.1002/0470011815.b2a15150
18	聂鑫.基于改进深度学习的移动APP广告转化率预估［D］.徐州：中国矿业大学，2019：27-29.
19	段佳良，蔡国明，徐开勇.基于多BP神经网络的内存组合特征分类方法［J］.计算机应用，2022，42（1）：178-182. 10.11772/j.issn.1001-9081.2021010199
20	MNIH V， HEESS N， GRAVES A. Recurrent models of visual attention ［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2014： 2204-2212.
21	高广尚. 深度学习推荐模型中的注意力机制研究综述［J］. 计算机工程与应用， 2022， 58（9）： 9-18. 10.3778/j.issn.1002-8331.2112-0382
22	SEBER G A F， LEE A J. Linear Regression Analysis ［M］. 2nd ed. Hoboken， NJ： John Wiley & Sons， 2012：2-5.
23	ZHANG M L， ZHOU Z H. ML-KNN： a lazy learning approach to multi-label learning ［J］. Pattern Recognition， 2007，40（7）： 2038-2048. 10.1016/j.patcog.2006.12.019
24	SAFAVIAN S R， LANDGREBE D. A survey of decision tree classifier methodology ［J］. IEEE Transactions on Systems， Man， and Cybernetics， 1991， 21（3）： 660-674. 10.1109/21.97458
25	BREIMAN L. Bagging predictors ［J］. Machine Learning， 1996， 24（2）： 123-140. 10.1007/bf00058655

模型	RMSE/mg	MAE/mg	R²
线性回归	0.291 722	0.247 547	0.735 0
K-Neighbors	0.344 363	0.276 059	0.696 3
SVR	0.344 117	0.281 576	0.718 7
DecisionTree	0.391 791	0.321 896	0.615 7
RandomForest	0.302 214	0.253 849	0.778 1
本文模型	0.259 458	0.207 809	0.869 9

模型	RMSE/mg	MAE/mg	R²
线性回归	0.291 722	0.247 547	0.735 0
K-Neighbors	0.344 363	0.276 059	0.696 3
SVR	0.344 117	0.281 576	0.718 7
DecisionTree	0.391 791	0.321 896	0.615 7
RandomForest	0.302 214	0.253 849	0.778 1
本文模型	0.259 458	0.207 809	0.869 9

模型	RMSE/mg	MAE/mg	R²
文献［14］模型	0.331 717	0.271 163	0.781 40
文献［15］模型	0.310 237	0.244 550	0.801 90
本文模型	0.259 458	0.207 809	0.869 91

模型	RMSE/mg	MAE/mg	R²
文献［14］模型	0.331 717	0.271 163	0.781 40
文献［15］模型	0.310 237	0.244 550	0.801 90
本文模型	0.259 458	0.207 809	0.869 91

算法	RMSE/mg	MAE/mg	R²
A组实验	0.331 717	0.271 163	0.781 40
B组实验	0.275 405	0.219 896	0.851 60
C组实验	0.302 394	0.248 140	0.798 90
本文模型	0.259 458	0.207 809	0.869 91

基于改进Wide&Deep的卷烟焦油指标预测模型

Cigarette tar index prediction model based on improved Wide&Deep

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 25

相关文章 15

编辑推荐

Metrics

[1]	靳东辉, 杨小博, 郭炳晖. 基于时空信息转换方程的药品销量预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 107-111.
[2]	谭朋柳, 徐光勇, 张露玉, 王润庶. 基于卷积神经网络和Adaboost的心脏病预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 19-25.
[3]	王辉, 李建红. 基于Transformer的三维模型小样本识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1750-1758.
[4]	黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1620-1624.
[5]	蔡引江, 许光俊, 马喜波. 图结构表示下的药物数据增强方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1136-1141.
[6]	郝劭辰, 卫孜钻, 马垚, 于丹, 陈永乐. 基于高效联邦学习算法的网络入侵检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1169-1175.
[7]	孙晓飞, 朱静远, 陈斌, 游恒志. 融合多模态数据的药物合成反应的虚拟筛选[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 622-629.
[8]	蔡淳豪, 李建良. 小样本问题下培训弱教师网络的模型蒸馏模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2652-2658.
[9]	邓杰航, 郭文权, 陈汉杰, 顾国生, 刘景建, 杜宇坤, 刘超, 康晓东, 赵建. 融合多尺度多头自注意力和在线难例挖掘的小样本硅藻检测[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2593-2600.
[10]	张剑, 程培源, 邵思羽. 基于改进残差卷积自编码网络的类自适应旋转机械故障诊断[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2440-2449.
[11]	李洪亮, 张弄, 孙婷, 李想. 分布式机器学习作业性能干扰分析与预测[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1649-1655.
[12]	包永春, 张建臣, 杜守信, 张军军. 基于非负矩阵分解与稀疏表示的多标签分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1375-1382.
[13]	李晓寒, 贾华丁, 程雪, 李太勇. 基于改进遗传算法和图神经网络的股市波动预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1624-1633.
[14]	王颖洁, 朱久祺, 汪祖民, 白凤波, 弓箭. 自然语言处理在文本情感分析领域应用综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1011-1020.
[15]	陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1194-1200.