Multi-view clustering has recently been a hot topic in graph data mining. However, due to the limitations of data collection technology or human factors, multi-view data often has the problem of missing views or samples. Reducing the impact of incomplete views on clustering performance is a major challenge currently faced by multi-view clustering. In order to better understand the development of Incomplete Multi-view Clustering (IMC) in recent years, a comprehensive review is of great theoretical significance and practical value. Firstly, the missing types of incomplete multi-view data were summarized and analyzed. Secondly, four types of IMC methods, based on Multiple Kernel Learning (MKL), Matrix Factorization (MF) learning, deep learning, and graph learning were compared, and the technical characteristics and differences among the methods were analyzed. Thirdly, from the perspectives of dataset types, the numbers of views and categories, and application fields, twenty-two public incomplete multi-view datasets were summarized. Then, the evaluation metrics were outlined, and the performance of existing incomplete multi-view clustering methods on homogeneous and heterogeneous datasets were evaluated. Finally, the existing problems, future research directions, and existing application fields of incomplete multi-view clustering were discussed.
Since the existing pruning strategies of the Convolutional Neural Network (CNN) model are different and have general effects, an Activation-Entropy based Layer-wise Iterative Pruning (AE-LIP) strategy was proposed to reduce the parameter amount of the model while ensuring the accuracy of the model within a controllable range. Firstly, combined with the neuronal activation value and information entropy, a weight evaluation criteria based on activation-entropy was constructed, and the weight importance score was calculated. Secondly, the pruning was performed layer by layer, the weights were sorted according to the importance score, and the pruning number in each layer was combined to filter out the weights to be pruned and set them to zero. Finally, the model was fine-tuned, and the above process was repeated until the iteration ended. The experimental results show that the activation-entropy based layer-wise iterative pruning strategy makes the AlexNet model compressed 87.5%, and the corresponding accuracy is reduced by 2.12 percentage points, which is 1.54 percentage points higher than that of the magnitude-based weight pruning strategy and 0.91 percentage points higher than that of the correlation-based weight pruning strategy; the strategy makes VGG-16 model compressed 84.1%, and the corresponding accuracy is reduced by 2.62 percentage points, which is 0.62 and 0.27 percentage points higher than those of the two above strategies. It can be seen that the proposed strategy reduces the size of the CNN model effectively while ensuring the accuracy of the model, and is helpful for the deployment of CNN model on mobile devices with limited storage.
Public opinions on major social security emergencies in the era of big data are mainly spread through the media. Most of the existing researches fail to consider the special group — news media and the influence of news media in a certain kind of specific events. In order to study the above problems, a method to evaluate the influence by integrating the network structure and behavioral relationship between users was proposed, and the Xinjiang and Paris violent and terrorist events were taken as examples to calculate the international influence of news media of different countries on such events on the Twitter platform. This evaluation method can better obtain the influence of various news media at the event level. By calculating the influence of news media in the violent and terrorist events in Xinjiang and Paris, the experimental results show that there are differences in the influence of news media of different countries in Xinjiang and Paris violent and terrorist events, which indicates that these two events of the same type have different influence scopes, and also reflects the differences of political positions of different countries.
Concerning the problem that the efficiency of serial PageRank algorithm is low in dealing with mass Web data, a PageRank parallel algorithm based on Web link classification was proposed. Firstly, the Web was classified according to its Web link, and the weights of different Web which was from diverse websites were set variously. Secondly, with the Hadoop parallel computation platform and MapReduce which has the characteristics of dividing and conquering, the Webpage ranks were computed parallel. At last, a data compression method of three layers including data layer, pretreatment layer and computation layer was adopted to optimize the parallel algorithm. The experimental results show that, compared with the serial PageRank algorithm, the accuracy of the proposed algorithm is improved by 12% and the efficiency is improved by 33% in the best case.