Challenges and recent progress in big data visualization
CUI Di1,2, GUO Xiaoyan3, CHEN Wei2
1. College of Electronic and Information Engineering, Ningbo University of Technology, Ningbo Zhejiang 315211, China; 2. State Key Laboratory of Computer Aided Design and Computer Graphics(Zhejiang University), Hangzhou Zhejiang 310058, China; 3. College of Information Science and Technology, Gansu Agricultural University, Lanzhou Gansu 730070, China
Abstract:The advent of big data era elicits the importance of visualization. As an import data analysis method, visual analytics explores the cognitive ability and advantages of human beings, integrates the abilities of human and computer, and gains insights into big data with human-computer interaction. In view of the characteristics of large amount of data, high dimension, multi-source and multi-form, the visualization method of large scale data was discussed firstly: 1) divide and rule principle was used to divide big problem into a number of smaller tasks, and parallel processing was used to improve the processing speed; 2) the means of aggregation, sampling and multi-resolution express were used to reduce data; 3) multi-view was used to present high dimensional data. Then, the visualization process of flow data was discussed for the two types of flow data, which were monitoring and superposition. Finally, the visualization of unstructured data and heterogeneous data was described. In a word, the visualization could make up for the disadvantages and shortcomings of computer automatic analysis, integrate computer analysis ability and human perception of information, and find the information and wisdom behind big data effectively. However, the research results of this theory are very limited, and it is faced with the challenge of large scale, dynamic change, high dimension and multi-source heterogeneity, which are becoming the hot spot and direction of large data visualization research in the future.
崔迪, 郭小燕, 陈为. 大数据可视化的挑战与最新进展[J]. 计算机应用, 2017, 37(7): 2044-2049.
CUI Di, GUO Xiaoyan, CHEN Wei. Challenges and recent progress in big data visualization. Journal of Computer Applications, 2017, 37(7): 2044-2049.
[1] 作磊,杜一,马帅.大数据分析综述[J].软件学报,2014,25(9):1909-1930.(ZUO L, DU Y, MA S. Review on big data analysis[J]. Journal of Software, 2014,25(9):1909-1930.) [2] TONY H.第四范式:数据密集型科学发现[M].潘教峰,张晓林,译.北京:科学出版社,2012:58-62.(TONY H. The Forth Paradigm:Data-Intensive Scientific Discovery[M]. PAN J F, ZHANG X L, translated. Beijing:Science Press, 2012:58-62.) [3] CORRELL M, HEER J. Surprise! Bayesian weighting for de-biasing thematic maps[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(1):651-660. [4] KUSUMA P Y C, SUMPENO S, WIBAWA A D. Social media analysis of BPS data availability in economics using decision tree method[C]//ICITISEE 2016:Proceedings of the 1st International Conference on Information Technology, Information Systems and Electrical Engineering. Piscataway, NJ:IEEE, 2016:148-153. [5] 任磊.信息可视化中的交互技术研究[D].北京:中国科学院,2009:38-40.(REN L. Research on interaction techniques in information visualization[D]. Beijing:Chinese Academy of Sciences, 2009:38-40.) [6] CARD S K, MACKINLAY J D, SHNEIDERMAN B. Readings in Information Visualization:Using Vision to Think[M]. San Francisco:Morgan-Kaufmann Publishers, 1999:1-712. [7] MUNZNER T. Visualization analysis and design[J]. Wiley Interdisciplinary Reviews Computational Statistics, 2015, 2(4):387-403. [8] CHARLES D H, CHRIS J. The Visualization Handbook[M]. New York:Academic Press, 2004:76-85. [9] EDWARD R T. The Visual Display of Quantitative Information[M]. New York:Graphics Press, 1992:98-100. [10] LELAND W. The Grammar of Graphics[M]. Berlin:Springer, 2005:25-28. [11] ITOH M, YOKOYAMA D, TOYODA M, et al. Visual fusion of mega-city big data:an application to traffic and tweets data analysis of metro passengers[C]//Proceedings of the 2014 IEEE International Conference on Big Data. Piscataway, NJ:IEEE, 2014:431-440. [12] MURTHY D, GROSS A, MCGARRY M. Visual social media and big data, interpreting instagram images posted on Twitter[J]. Digital Culture & Society, 2016, 2:12-15. [13] 李伟,周峰,朱炜,等.轨道交通网络客流大数据可视化研究[J].中国铁路,2015(2):94-98.(LI W, ZHOU F, ZHU W, et al. Visualization of large passenger flow data in rail transit network[J]. China Railways, 2015(2):94-98. [14] 陈为,张嵩,鲁爱东.数据可视化的基本原理与方法[M].北京:科学出版社,2013:77-82.(CHEN W, ZHANG S, LU A D. The Basic Principle and Method of Data Visualization[M]. Beijing:Science Press, 2013:77-82. [15] KIM M, KANG K, PARK D, et al. TopicLens:efficient multi-level visual topic exploration of large-scale document collections[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(1):151-160. [16] MANYIKA J, CHUI M. Big Data:The Next Frontier for Innovation, Competition, and Productivity[M]. San Francisco:McKinsey Global Institute, 2011:92-95. [17] IBM. What is big data? -Bringing big data to the enterprise[EB/OL].[2016-12-10]. http://www-01.ibm.com/software/data/bigdata. [18] FENG M, DENG C, PECK E M, et al. HindSight:encouraging exploration through direct encoding of personal interaction history[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(1):351-360. [19] HUANG T H, CHEN L B. Decision support for the QoS-aware 4G mobile networks through data mining[C]//Proceedings of the IEEE 5th Global Conference on Consumer Electronics. Piscataway, NJ:IEEE, 2016:1-2. [20] LAW P M, WU W, ZHENG Y, et al. VisMatchmaker:cooperation of the user and the computer in centralized matching adjustment[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(1):231-240. [21] TIAN J, ZHANG H, WU D, et al. Interference-aware cross-layer design for distributed video transmission in wireless networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(5):978-991. [22] GUHA S, HAFEN R, ROUNDS J, et al. Large complex data:divide and recombine (D&R) with RHIPE[J]. Stat, 2012, 1(1):53-67. [23] GUHA, S, KIDWELL P, HAFEN R P, et al. Visualization databases for the analysis of large complex datasets[J]. Journal of Machine Learning Research, 2009, 5:193-200. [24] VO H T, BRONSON J, SUMMA B, et al. Parallel visualization on large clusters using MapReduce[C]//Proceedings of the 2011 IEEE Symposium on Large Data Analysis and Visualization. Piscataway, NJ:IEEE, 2011:81-88. [25] LIU Z C, JIANG B Y, HEER J. Real-time visual querying of big data[J]. Computer Graphics Forum, 2013, 32(3):421-430. [26] LINS L, KLOSOWSKI J T, SCHEIDEGGER C. Nanocubes for real-time exploration of spatiotemporal datasets[J]. IEEE Transactions on Visualization & Computer Graphics, 2013, 19(12):2456-2465. [27] CHEN H D, CHEN W, MEI H H, et al. Visual abstraction and exploration of multi-class scatterplots[J]. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12):1683-1692. [28] Github Group. Walmart locations all hexed up[EB/OL].[2016-06-12]. http://indiemaps.github.io/hexbin-js/tests/walmart.html. [29] AL-DOHUKI S, WU Y, KAMW F, et al. SemanticTraj:a new approach to interacting with massive taxi trajectories[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(1):11-19. [30] 陈为,沈则潜,陶煜波.数据可视化[M].北京:电子工业出版社,2013:302-305.(CHEN W, SHEN Z Q, TAO Y B. Data Visualization[M]. Beijing:Publishing House of Electronics Industry, 2013:302-305.) [31] RAJARAMAN A, LESKOVEC J. Mining of Massive Datasets[M]. London:Cambridge Uiversity Press, 2012:109-112. [32] ZIKOPOULOS P, EATON C. Understanding Big Data:Analytics for Enterprise Class Hadoop and Streaming Data[M]. London:McGraw-Hill Osborne Media, 2011:126-135. [33] KRSTAJIC M, KEIM D A. Visualization of streaming data:Observing change and context in information visualization techniques[C]//Proceedings of the 2013 IEEE International Conference on Big Data. Piscataway, NJ:IEEE, 2013:41-47. [34] ALSAKRAN J, CHEN Y, ZHAO Y, et al. STREAMIT:dynamic visualization and interactive exploration of text streams[C]//Proceedings of the 2011 IEEE Pacific Visualization Symposium. Piscataway, NJ:IEEE, 2011:131-138. [35] VONG K, RASMEQUAN S, CHINNASARN K, et al. Empirical modelling for dynamic visualization of ICU patient data streams[C]//Proceedings of the 2015 IEEE Biomedical Engineering International Conference. Piscataway, NJ:IEEE, 2015:1-5. [36] WANG F, CHEN W, WU F, et al. A visual reasoning approach for data-driven transport assessment on urban roads[C]//Proceedings of the 2014 IEEE Conference on Visual Analytics Science and Technology. Piscataway, NJ:IEEE, 2014:103-112. [37] DANG T N, ANAND A, WILKINSON L. TimeSeer:scagnostics for high-dimensional time series[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(3):470-483. [38] PALOMO C, GUO Z, SILVA C T, et al. Visually exploring transportation schedules[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1):170-179. [39] TAN J, PAN X, KAVULYA S, et al. Mochi:visual log-analysis based tools for debugging Hadoop[C]//Proceedings of the 2009 ACM HotCloud Conference. New York:ACM, 2009:99-103. [40] KAVULYA S, TAN J, GANDHI R, et al. An analysis of traces from a production MapReduce cluster[C]//Proceedings of the 2010 IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. Washington, DC:IEEE Computer Society, 2010:94-103. [41] XIA C, SCHWARTZ R, XIE K, et al. CityBeat:real-time social media visualization of hyper-local city data[C]//Proceedings of the 23rd International Conference on World Wide Web. New York:ACM, 2014:167-170. [42] DOU W, WANG X, SKAU D, et al. Leadline:interactive visual analysis of text data through event identification and exploration[C]//Proceedings of the 2012 IEEE Conference on Visual Analytics Science and Technology. Piscataway, NJ:IEEE, 2012:93-102. [43] ABDELHAQ H, SENGSTOCK C, GERTZ M. Eventweet:online localized event detection from Twitter[J]. Proceedings of the VLDB Endowment, 2013, 6(12):1326-1329. [44] ALSAKRAN J, CHEN Y, LUO D, et al. Real-time visualization of streaming text with a force-based dynamic system[J]. IEEE Computer Graphics & Applications, 2012, 32(1):34-45 [45] MONROE M, LAN R, LEE H, et al. Temporal event sequence simplification[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12):2227-2236. [46] XIE C, CHEN W, HUANG X X, et al. VAET:a visual analytics approach for E-transactions time-series[J]. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12):1743-1751. [47] CAMMARANO M, DONG X L, CHAN B, et al. Visualization of heterogeneous data[J]. IEEE Transactions on Visualization & Computer Graphics, 2007, 13(6):1200-1207. [48] SHEN Z, MA K L, ELIASSI-RAD T. Visual analysis of large heterogeneous social networks by semantic and structural abstraction[J]. IEEE Transactions on Visualization & Computer Graphics, 2006, 12(6):1427-1439. [49] 阮彤,王昊奋,陈为.大数据技术前沿[M].北京:电子工业出版社,2016:87-92.(RUAN T, WANG H F, CHEN W. Big Data Technology Frontier[M]. Beijing:Publishing House of Electronics Industry, 2016:87-92.) [50] NICHOLAS M. Google's knowledge graph[EB/OL].[2016-04-24]. http://wenku.baidu.com/link?url=4oUGek3uxnlt0ismi0AXMuH8rl9xY1tva7n7p_rLiCYC_vSFixqE2UfoFI7CpS1QwSPOEUZcDt2XXXx4pP1lyeqlfzhugYmKGjll5f82kie. [51] ABHISHEK G. Object meta tags for facebook open graph protocol[EB/OL].[2016-06-24]. https://thecustomizewindows.com/2013/06/object-meta-tags-for-facebook-open-graph-protocol.