Focusing on the challenge task for mining complementary information in different levels of features in the deep subspace clustering problem, based on the deep autoencoder, by exploring complementary information between the low-level and high-level features obtained by the encoder, a Diversity Represented Deep Subspace Clustering (DRDSC) algorithm was proposed. Firstly, based on Hilbert-Schmidt Independence Criterion (HSIC), a diversity representation measurement model was established for different levels of features. Secondly, a feature diversity representation module was introduced into the deep autoencoder network structure, which explored image features beneficial to enhance the clustering effect. Furthermore, the form of loss function was updated to effectively fuse the underlying subspaces of multi-level representation. Finally, several experiments were conducted on commonly used clustering datasets. Experimental results show that on the datasets Extended Yale B, ORL, COIL20 and Umist, the clustering error rates of DRDSC reach 1.23%, 10.50%, 1.74% and 17.71%, respectively, which are reduced by 10.41, 16.75, 13.12 and 12.92 percentage points, respectively compared with those of Efficient Dense Subspace Clustering (EDSC), and are reduced by 1.44, 3.50, 3.68 and 9.17 percentage points, respectively compared with Deep Subspace Clustering (DSC), which indicates that the proposed DRDSC algorithm has better clustering effect.
To deal with the problem that the existing reviewer recommendation algorithms assign reviewers only through affinity score and ignore the research direction matching between reviewers and papers to be reviewed, a reviewer recommendation algorithm based on Affinity and Research Direction Coverage (ARDC) was proposed. Firstly, the order of the paper’s selection of reviewers was determined according to the frequencies of the research directions appearing in the papers and the reviewer’s paper groups. Secondly, the reviewer’s comprehensive review score to the paper to be reviewed was calculated based on the affinity score between the reviewers and the paper to be reviewed and the research direction coverage score of the reviewers to the paper to be reviewed, and the pre-assigned review team for the paper was obtained on the basis of round-robin scheduling. Finally, the final recommendation of the review team was realized based on the conflict of interest conflict inspection and resolution. Experimental results show that compared with assignment based reviewer recommendation algorithms such as Fair matching via Iterative Relaxation (FairIR) and Fair and Accurate reviewer assignment in Peer Review (PR4A), the proposed algorithm has the average research direction coverage score increased by 38% on average at the expense of a small amount of affinity score, so that the recommendation result is more accurate and reasonable.
Aiming at the problems of high computational complexity and poor expansibility in the mining process of partial periodic patterns from dynamic time series data, a partial periodic pattern mining algorithm for dynamic time series data combined with multi-scale theory, named MSI-PPPGrowth (Multi-Scale Incremental Partial Periodic Frequent Pattern) was proposed. In MSI-PPPGrowth, the objective multi-scale characteristics of time series data, were made full use, and the multi-scale theory was introduced in the mining process of partial periodic patterns from time series data. Firstly, both the original data after scale division and incremental time series data were used as a finer-grained benchmark scale dataset for independent mining. Then, the correlation between different scales was used to realize scale transformation, so as to indirectly obtain global frequent patterns corresponding to the dynamically updated dataset. Therefore, the repeated scanning of the original dataset and the constant adjustment of the tree structure were avoided. In which, a new frequent missing count estimation model PJK-EstimateCount was designed based on Kriging method considering the periodicity of time series to effectively estimate the frequent missing item support count in scale transformation. Experimental results show that MSI-PPPGrowth has good scalability and real-time performance. Especially for dense datasets, MSI-PPPGrowth has significant performance advantages.
Aiming at the problem of high matching cost in the existing complex event matching processing methods, a complex event matching algorithm ReCEP was proposed, which uses event buffers (ordered event lists) for recursive traversal. Different from the existing method that uses automaton to match on the event stream, this method decomposes the constraints in the complex event query mode into different types, and then recursively verifies the different constraints on the ordered list. Firstly, according to the query mode, the related event instances were cached according to the event type. Secondly, the query filtering operation was performed to the event instances on the ordered list, and an algorithm based on recursive traversal was given to determine the initial event instance and obtain candidate sequence. Finally, the attribute constraints of the candidate sequence were further verified. Experimental testing and analysis results based on simulated stock transaction data show that compared with the current mainstream matching methods SASE and Siddhi, ReCEP algorithm can effectively reduce the processing time of query matching, has overall performance better than both of the two methods, and has the query matching efficiency improved by more than 8.64%. It can be seen that the proposed complex event matching method can effectively improve the efficiency of complex event processing.
Privacy Preserving Utility Mining (PPUM) has problems of long sanitization time, high computational complexity, and high side effect. To solve these problems, a fast sanitization algorithm based on BCU-Tree and Dictionary (BCUTD) for high-utility mining was proposed. In the algorithm, a new tree structure called BCU-Tree was presented to store sensitive item information, and based on the bitwise operator coding model, the tree construction time and search space were reduced. The dictionary table was used to store all nodes in the tree structure, and only the dictionary table needed to be accessed when the sensitive item was modified. Finally, the sanitization process was completed. In the experiments on four different datasets, BCUTD algorithm has better performance on sanitization time and high side effect than Hiding High Utility Item First (HHUIF), Maximum Sensitive Utility-MAximum item Utility (MSU-MAU), and Fast Perturbation Using Tree and Table structures (FPUTT). Experimental results show that BCUTD algorithm can effectively speed up the sanitization process, reduce the side effect and computational complexity of the algorithm.
Clustering results of the pSCAN (pruned Structural Clustering Algorithm for Network) algorithm are influenced by the density constraint parameter and the similarity threshold parameter. If the requirements cannot be satisfied by the clustering results obtained by the clustering parameters provided by the user, then the user’s own clustering requirements can be expressed through instance clusters. Aiming at the problem of instance clusters expressing clustering query requirements, an instance cluster-driven structural graph clustering parameter calculation algorithm PART and its improved algorithm ImPART were proposed. Firstly, the influences of two clustering parameters on the clustering results were analyzed, and correlation subgraph of instance cluster was extracted. Secondly, the feasible interval of the density constraint parameter was obtained by analyzing the correlation subgraph, and the nodes in the instance cluster were divided into core nodes and non-core nodes according to the current density constraint parameter and the structural similarity between nodes. Finally, according to the node division result, the optimal similarity threshold parameter corresponding to the current density constraint parameter was calculated, and the obtained parameters were verified and optimized on the relevant subgraph until the clustering parameters that satisfy the requirements of the instance cluster were obtained. Experimental results on real datasets show that a set of effective parameters can be returned for user instance clusters by using the proposed algorithm, and the proposed improved algorithm ImPART is more than 20% faster than the basic algorithm PART, and can return the optimal clustering parameters that satisfy the requirements of instance clusters quickly and effectively for the user.