The existing two-stage few-shot object detection methods based on fine-tuning are not sensitive to the features of new classes, which will cause misjudgment of new classes into base classes with high similarity to them, thus affecting the detection performance of the model. To address the above issue, a few-shot object detection algorithm that incorporates multi-scale and attention mechanism was proposed, namely MA-FSOD (Few-Shot Object Detection via fusing Multi-scale and Attention mechanism). Firstly, grouped convolutions and large convolution kernels were used to extract more class-discriminative features in the backbone network, and Convolutional Block Attention Module (CBAM) was added to achieve adaptive feature augmentation. Then, a modified pyramid network was used to achieve multi-scale feature fusion, which enables Region Proposal Network (RPN) to accurately find Regions of Interest (RoI) and provide more abundant high-quality positive samples from multiple scales to the classification head. Finally, the cosine classification head was used for classification in the fine-tuning stage to reduce the intra-class variance. Compared with the Few-Shot object detection via Contrastive proposal Encoding (FSCE) algorithm on PASCAL-VOC 2007/2012 dataset, the MA-FSOD algorithm improved AP50 for new classes by 5.6 percentage points; and on the more challenging MSCOCO dataset, compared with Meta-Faster-RCNN, the APs corresponding to 10-shot and 30-shot were improved by 0.1 percentage points and 1.6 percentage points, respectively. Experimental results show that MA-FSOD can more effectively alleviate the misclassification problem and achieve higher accuracy in few-shot object detection than some mainstream few-shot object detection algorithms.
Aiming at the problems that most feature selection algorithms do not fully consider class non-uniform distribution of data, the correlation between features and the influence of different parameters on the feature selection results, a feature selection method for imbalanced data based on neighborhood tolerance mutual information and Whale Optimization Algorithm (WOA) was proposed. Firstly, for the binary and multi-class datasets in incomplete neighborhood decision system, two kinds of feature importances of imbalanced data were defined on the basis of the upper and lower boundary regions. Then, to fully reflect the decision-making ability of features and the correlation between features, the neighborhood tolerance mutual information was developed. Finally, by integrating the feature importance of imbalanced data and the neighborhood tolerance mutual information, a Feature Selection for Imbalanced Data based on Neighborhood tolerance mutual information (FSIDN) algorithm was designed, where the optimal parameters of feature selection algorithm were obtained by using WOA, and the nonlinear convergence factor and adaptive inertia weight were introduced to improve WOA and avoid WOA from falling into the local optimum. Experiments were conducted on 8 benchmark functions, the results show that the improved WOA has good optimization performance; and the experimental results of feature selection on 13 binary and 4 multi-class imbalanced datasets show that the proposed algorithm can effectively select the feature subsets with good classification effect compared with the other related algorithms.
Achieving seal segmentation precisely, it is benefit to intelligent application of the Republican archives. Concerning the problems of serious printing invasion and excessive noise, a network for seal segmentation was proposed, namely U-Net for Seal (UNet-S). Based on the encoder-decoder framework and skip connections of U-Net, this proposed network was improved from three aspects. Firstly, multi-scale residual module was employed to replace the original convolution layer of U-Net. In this way, the problems such as network degradation and gradient explosion were avoided, while multi-scale features were extracted effectively by UNet-S. Next improvement was using Depthwise Separable Convolution (DSConv) to replace the ordinary convolution in the multi-scale residual module, thereby greatly reducing the number of network parameters. Thirdly, Binary Cross Entropy Dice Loss (BCEDiceLoss) was used and weight factors were determined by experimental results to solve the data imbalance problem of archives of the Republic of China. Experimental results show that compared with U-Net, DeepLab v2 and other networks, the Dice Similarity Coefficient (DSC), mean Intersection over Union (mIoU) and Mean Pixel Accuracy (MPA) of UNet-S have achieved the best results, which have increased by 17.38%, 32.68% and 0.6% at most, and the number of parameters have decreased by 76.64% at most. It can be seen that UNet-S has good segmentation effect in the dataset of Republican archives.
Exsiting machine learning-based methods for Distributed Denial-of-Service (DDoS) attack detection continue to increase in detection difficulty and cost when facing more and more complex network traffic and constantly increased data structures. To address these issues, a random forest DDoS attack detection method that integrates feature selection was proposed. In this method, the mean impurity algorithm based on Gini coefficient was used as the feature selection algorithm to reduce the dimensionality of DDoS abnormal traffic samples, thereby reducing training cost and improving training accuracy. Meanwhile, the feature selection algorithm was embedded into the single base learner of random forest, and the feature subset search range was reduced from all features to the features corresponding to a single base learner, which improved the coupling of the two algorithms and improved the model accuracy. Experimental results show that the model trained by the random forest DDoS attack detection method that integrates feature selection has a recall increased by 21.8 percentage points and an F1-score increased by 12.0 percentage points compared to the model before improvement under the premise of limiting decision tree number and training sample size, and both of them are also better than those of the traditional random forest detection scheme.
The classical Monarch Butterfly Optimization (MBO) algorithm cannot handle continuous data well, and the rough set model cannot sufficiently process large-scale, high-dimensional and complex data. To address these problems, a new feature selection algorithm based on Neighborhood Rough Set (NRS) and MBO was proposed. Firstly, local disturbance, group division strategy and MBO algorithm were combined, and a transmission mechanism was constructed to form a Binary MBO (BMBO) algorithm. Secondly, the mutation operator was introduced to enhance the exploration ability of this algorithm, and a BMBO based on Mutation operator (BMBOM) algorithm was proposed. Then, a fitness function was developed based on the neighborhood dependence degree in NRS, and the fitness values of the initialized feature subsets were evaluated and sorted. Finally, the BMBOM algorithm was used to search the optimal feature subset through continuous iterations, and a meta-heuristic feature selection algorithm was designed. The optimization performance of the BMBOM algorithm was evaluated on benchmark functions, and the classification performance of the proposed feature selection algorithm was evaluated on UCI datasets. Experimental results show that, the proposed BMBOM algorithm is significantly better than MBO and Particle Swarm Optimization (PSO) algorithms in terms of the optimal value, worst value, average value and standard deviation on five benchmark functions. Compared with the optimized feature selection algorithms based on rough set, the feature selection algorithms combining rough set and optimization algorithms, the feature selection algorithms combining NRS and optimization algorithms, the feature selection algorithms based on binary grey wolf optimization, the proposed feature selection algorithm performs well in the three indicators of classification accuracy, the number of selected features and fitness value on UCI datasets, and can select the optimal feature subset with few features and high classification accuracy.
Focusing on the issue that the 3-point and 4-point hill climbing algorithms have high calculation and low efficiency in enhancing the nonlinearity of a Substitution box (S-box), an algorithm named Combination of Hill Climbing (CHC), which could apply multiple swap elements at a time, was proposed. The algorithm defined the behavior of swapping 2 output data of an S-box as a swap element, and used weighting prioritizing function to select swap elements that have larger contribution to the enhancement of nonlinearity, then simultaneously applied multiple selected swap elements to enhance the nonlinearity of an S-box. In the experiments, a maximum of 12 output data were swapped at a time by using the CHC algorithm, and most of the random 8-input and 8-output S-boxes' nonlinearity surpassed 102, with a maximum of 106. The experimental results show that the proposed CHC algorithm not only reduces the amount of calculation, but also enhances the nonlinearity of random S-boxes more significantly in comparison with the 3-point and 4-point hill climbing algorithms.
Three general methods to detect duplicate Web pages were introduced. The similarity search technique was used to detect duplicate information automatically in enterprise data warehouse. The results indicate that the similarity search method is fit for intelligent pretreatment of enterprise intelligence data.