基于云计算和改进K-means算法的海量用电数据分析方法

doi:10.11772/j.issn.1001-9081.2017071660

计算机应用 ›› 2018, Vol. 38 ›› Issue (1): 159-164.DOI: 10.11772/j.issn.1001-9081.2017071660

基于云计算和改进K-means算法的海量用电数据分析方法

张承畅¹, 张华誉², 罗建昌¹, 何丰¹

1. 重庆邮电大学光电工程学院, 重庆 400065;
2. 重庆邮电大学通信与信息工程学院, 重庆 400065

收稿日期:2017-07-04 修回日期:2017-08-21 出版日期:2018-01-10 发布日期:2018-01-22
通讯作者: 张华誉
作者简介:张承畅(1975-),男,湖北利川人,副教授,博士,主要研究方向:能源互联网、电力大数据、数据挖掘、信息物理系统;张华誉(1990-),男,安徽合肥人,硕士研究生,主要研究方向:数据挖掘;罗建昌(1990-),男,湖北荆州人,硕士研究生,主要研究方向:信息物理系统、大数据;何丰(1962-),男,重庆人,教授,主要研究方向:大数据、通信技术。
基金资助:
中国电力科学研究院科技基金资助项目（XXB51201603155）；国网北京经济技术研究院科技基金资助项目（15JS191）。

Massive data analysis of power utilization based on improved K-means algorithm and cloud computing

ZHANG Chengchang¹, ZHANG Huayu², LUO Jianchang¹, HE Feng¹

1. College of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
2. College of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received:2017-07-04 Revised:2017-08-21 Online:2018-01-10 Published:2018-01-22
Supported by:
This work is partially supported by the Technology Foundation of China Electric Power Research Institute (XXB51201603155), the Technology Foundation of State Grid Economic and Technological Research Institute (15JS191).

摘要/Abstract

摘要： 针对小区居民用电数据挖掘效率低、数据量大等难题，进行了基于云计算和改进K-means算法的海量用电数据分析方法研究。针对传统K-means算法中存在初始聚类中心和K值难确定的问题，提出一种基于密度的K-means改进算法。首先，定义样本密度、簇内样本平均距离的倒数和簇间距离三者乘积为权值积，通过最大权值积法依次确定聚类中心，提高了聚类的准确率；然后，基于MapReduce模型实现改进算法的并行化，提高了聚类的效率；最后，以小区400户家庭用电数据为基础，进行海量电力数据的挖掘分析实验。以家庭为单位，提取出用户的峰时耗电率、负荷率、谷电负荷系数以及平段用电量百分比，建立聚类的数据维度特征向量，完成相似用户类型的聚类，同时分析出各类用户的行为特征。基于Hadoop集群的实验结果证明提出的改进K-means算法运行稳定、可靠，具有很好的聚类效果。

关键词: 用电数据, 云计算, 改进K-means算法, MapReduce模型, 并行化

Abstract: For such difficulties as low mining efficiency and large amount of data that the data mining of residential electricity data has to be faced with, the analysis based on improved K-means algorithm and cloud computing on massive data of power utilization was researched. As the initial cluster center and the value K are difficult to determine in traditional K-means algorithm, an improved K-means algorithm based on density was proposed. Firstly, the product of sample density, the reciprocal of the average distance between the samples in the cluster, and the distance between the clusters were defined as weight product, the initial center was determined successively according to the maximum weight product method and the accuracy of the clustering was improved. Secondly, the parallelization of improved K-means algorithm was realized based on MapReduce model and the efficiency of clustering was improved. Finally, the mining experiment of massive power utilization data was carried out on the basis of 400 households' electricity data. Taking a family as a unit, such features as electricity consumption rate during peak hour, load rate, valley load coefficient and the percentage of power utilization during normal hour were calculated, and the feature vector of data dimension was established to complete the clustering of similar user types, at the same time, the behavioral characteristics of each type of users were analyzed. The experimental results on Hadoop cluster show that the improved K-means algorithm operates stably and efficiently and it can achieve better clustering effect.

Key words: power utilization data, cloud computing, improved K-means algorithm, MapReduce model, parallelization

中图分类号:

张承畅, 张华誉, 罗建昌, 何丰. 基于云计算和改进K-means算法的海量用电数据分析方法[J]. 计算机应用, 2018, 38(1): 159-164.

ZHANG Chengchang, ZHANG Huayu, LUO Jianchang, HE Feng. Massive data analysis of power utilization based on improved K-means algorithm and cloud computing[J]. Journal of Computer Applications, 2018, 38(1): 159-164.

参考文献

[1] 张东霞,苗新,刘丽平,等.智能电网大数据技术发展研究[J].中国电机工程学报,2015,35(1):2-12.(ZHANG D X, MIAO X, LIU L P, et al. Research on development strategy for smart grid big data[J]. Proceedings of the CSEE, 2015, 35(1):2-12.)
[2] 彭小圣,邓迪元,程时杰,等.面向智能电网应用的电力大数据关键技术[J].中国电机工程学报,2015,35(3):503-511.(PENG X S, DENG D Y, CHENG S J, et al. Key technologies of electric power big data and its application prospects in smart grid[J]. Proceedings of the CSEE, 2015, 35(3):503-511.)
[3] 沈玉玲,吕燕,陈瑞峰,等.基于大数据技术的电力用户行为分析及应用现状[J].电气自动化,2016,38(3):50-52.(SHEN Y J, LYU Y, CHEN R F, et al. Power user behavior analysis and application status based on big data technology[J]. Power System & Automation, 2016, 38(3):50-52.)
[4] 王德文,孙志伟.电力用户侧大数据分析与并行负荷预测[J].中国电机工程学报,2015,35(3):527-537.(WAND D W, SUN Z W. Big data analysis and parallel load forecasting of electric power user side[J]. Proceedings of the CSEE, 2015, 35(3):527-537.)
[5] 孙志伟.大数据环境下用电行为分析的研究[D].北京:华北电力大学,2015.(SUN Z W. Study on behavior analysis of electricity in big data environment[D]. Beijing:North China Electric Power University, 2015.)
[6] 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.(SUN J G, LIU J, ZHAO L Y. Clustering algorithms research[J]. Journal of Software, 2008, 19(1):48-61.)
[7] 王惠中,刘轲,周佳,等.电力系统短期负荷预测建模仿真研究[J].计算机仿真,2016,33(2):175-179.(WANG H Z, LIU K, ZHOU J, et al. Pretreatment of short-term load forecasting based on K-means clustering algorithm[J]. Computer Simulation, 2016, 33(2):175-179.)
[8] 赵文清,龚亚强.基于Kernel K-means的负荷曲线聚类[J].电力自动化设备,2016,36(6):203-207.(ZHAO W Q, GONG Y Q. Load curve clustering based on Kernel K-means[J]. Electric Power Automation Equipment, 2016, 36(6):203-207.)
[9] 李亚,刘丽平,李柏青,等.基于改进K-means聚类和BP神经网络的台区线损率计算方法[J].中国电机工程学报,2016,36(17):4543-4551.(LI Y, LIU L P, LI B Q, et al. Calculation of line loss rate in transformer district based on improved K-means clustering algorithm and BP neural network[J]. Proceedings of the CSEE, 2016, 36(17):4543-4551.)
[10] 许元斌,李国辉,郭昆,等.基于改进的并行K-means算法的电力负荷聚类研究[J]. 计算机工程与应用,2017,53(17):260-265.(XU Y B, LI G H, GUO K, et al. Research on parallel clustering of power load based on improved K-means algorithm[J]. Computer Engineering and Applications, 2017, 53(17):260-265.)
[11] 张素香,刘建明,赵丙镇,等.基于云计算的居民用电行为分析模型研究[J].电网技术,2013,37(6):1542-1546.(ZHANG S X, LIU J M, ZHAO B Z, et al. Cloud computing-based analysis on residential electricity consumption behavior[J]. Power System Technology, 2013, 37(6):1542-1546.)
[12] 程艳柳.基于云计算的智能电网数据挖掘的研究[D].北京:华北电力大学,2013.(CHENG Y L. Research on smart grid data mining based on cloud computing[D]. Beijing:North China Electric Power University, 2013.)
[13] SHVACHKO K, KUANG H, RADIA S, et al. The Hadoop distributed file system[C]//Proceedings of the 2010 IEEE Symposium on MASS Storage Systems and Technologies. Washington, DC:IEEE Computer Society, 2010:1-10.
[14] DEAN J, GHEMAWAT S. MapReduce:simplified data processing on large clusters[C]//Proceedings of the 2004 Conference on Symposium on Operating Systems Design & Implementation. Berkeley, CA:USENIX Association, 2004:10-10.
[15] 黄韬,刘胜辉,谭艳娜.基于K-means聚类算法的研究[J].计算机技术与发展,2011,21(7):54-57.(HUANG T, LIU S H, TAN Y N. Research of clustering algorithm based on K-means[J]. Computer Technology and Development, 2011, 21(7):54-57.)

基于云计算和改进K-means算法的海量用电数据分析方法

Massive data analysis of power utilization based on improved K-means algorithm and cloud computing

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王周恺, 张炯, 马维纲, 王怀军. 面向高速列车监测数据的并行解压缩算法[J]. 计算机应用, 2021, 41(9): 2586-2593.
[2]	蒋林, 施佳琪, 李远成. 可重构结构下合成视点失真变化算法并行设计与实现[J]. 计算机应用, 2021, 41(6): 1734-1740.
[3]	陈家豪, 殷新春. 基于云雾计算的可追踪可撤销密文策略属性基加密方案[J]. 计算机应用, 2021, 41(6): 1611-1620.
[4]	葛丽娜, 胡雨谷, 张桂芬, 陈园园. 云计算环境基于客体属性匹配的逆向混合访问控制方案[J]. 计算机应用, 2021, 41(6): 1604-1610.
[5]	杨翎, 姜春茂. 基于三支决策的虚拟机节能迁移策略[J]. 计算机应用, 2021, 41(4): 990-998.
[6]	孙晓玲, 杨光, 沈焱萍, 杨秋格, 陈涛. 基于可拆分倒排索引的可搜索加密方案[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3288-3294.
[7]	吕佳玉, 竺智荣, 姚志强. 云计算环境下的双通道数据动态加密策略[J]. 计算机应用, 2020, 40(8): 2268-2273.
[8]	陈程军, 毛莺池, 王绎超. 基于激活-熵的分层迭代剪枝策略的CNN模型压缩[J]. 计算机应用, 2020, 40(5): 1260-1265.
[9]	郭曙杰, 李志华, 蔺凯青. 云环境下基于模糊隶属度的虚拟机放置算法[J]. 计算机应用, 2020, 40(5): 1374-1381.
[10]	许英鑫, 孙磊, 赵建成, 郭松辉. 基于蚁群优化算法的虚拟现场可编程门阵列部署策略[J]. 计算机应用, 2020, 40(3): 747-752.
[11]	宋祥帅, 杨伏长, 谢江, 张武. Graphlet Degree Vector方法的优化与并行[J]. 计算机应用, 2020, 40(2): 398-403.
[12]	王庆永, 毛莺池, 王绎超, 王龙宝. 基于多微云协作的计算任务卸载[J]. 计算机应用, 2020, 40(2): 328-334.
[13]	刘福鑫, 李劲巍, 王熠弘, 李琳. 基于Kubernetes的云原生海量数据存储系统设计与实现[J]. 计算机应用, 2020, 40(2): 547-552.
[14]	杨哂哂, 吴慧珍, 庄黎丽, 吕宏武. 基于Markov过程的IaaS系统可用性建模与分析方法[J]. 计算机应用, 2020, 40(10): 3013-3018.
[15]	林立, 熊金波, 肖如良, 林铭炜, 陈秀华. Gaming@Edge:基于边缘节点的低延迟云游戏系统[J]. 计算机应用, 2019, 39(7): 2001-2007.