Aiming at overfitting problem caused by memory behavior of Deep Neural Networks (DNNs) on image data with noisy labels, a meta label correction method based on predictions from shallow neural networks was proposed. In this method, with the use of weakly supervised training method, a label reweighting network was set to reweight noise data, meta learning method was employed to facilitate dynamic learning of the model to noise data, and the prediction output from both deep and shallow networks was used as the pseudo labels to train the model. At the same time, the knowledge distillation algorithm was applied to allow the deep network to guide the training of the shallow networks. In this way, the memory behavior of the model was alleviated effectively and the robustness of the model was enhanced. Experiments conducted on CIFAR10/100 and Clothing1M datasets demonstrate the superiority of the proposed method over Meta Label Correction (MLC) method. Particularly, on CIFAR10 dataset with symmetrical noise ratios of 60% and 80%, the accuracy improvements are 3.49 and 1.56 percentage points respectively. Furthermore, in ablation experiments on CIFAR100 dataset with asymmetric noise ratio of 40%, at most 5.32 percentage points accuracy improvement is achieved by the proposed method over models trained without predicted labels, confirming the feasibility and effectiveness of the proposed method.
To solve the problem of feature extraction and state prediction of intermittent non-stationary time series in the industrial field, a new prediction approach based on Ensemble Empirical Mode Decomposition (EEMD), Principal Component Analysis (PCA) and Support Vector Machine (SVM) was proposed in this paper. Firstly, the intermittent non-stationary time series was analyzed by multiple time scales and decomposed into a couple of IMF components which possessed the different scales by the EEMD algorithm. Then, the noise energy was estimated to determine the cumulative contribution rate adaptively on the basis of 3-sigma principle. The feature dimension and redundancy were reduced and the noise in IMF was removed by using PCA algorithm. Finally, on the basis of the determining of SVM key parameters, the principal components were regarded as input variables to predict future. Instance's testing results show that Mean Average Error (MAE), Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE) and Mean Squared Percentage Error (MSPE) were 514.774, 78.216, 12.03% and 1.862%, respectively. It is concluded that the SVM prediction of the time series of output power of wind farm possesses a higher accuracy than not using PCA because the frequency mixing phenomena was inhibited, the non-stationary was reduced and the noise was further eliminated by EEMD algorithm and PCA algorithm.
Web applications are under the threat from malicious host problem just as native applications. How to ensure the core algorithm or main business process's security of Web applications in browser-side has become a serious problem needed to be solved. For the problem of low effectiveness to resist dynamic analysis and cumulative attack in present JavaScript code protection methods, a JavaScript code Protection based on Temporal Diversity (TDJSP) method was proposed. In order to resist cumulative attack, the method firstly made the JavaScript program obtain the diverse ability in runtime by building program's diversity set and obfuscating its branch space. And then, it detected features of abnormal execution environments such as debuggers and emulations to improve the difficulty of dynamic analysis. The theoretical analyses and experimental results show that the method improves the ability of JavaScript program against the converse analysis. And the space growth rate is 3.1 (superior to JScrambler3) while the delay time is in millisecond level. Hence, the proposed method can protect Web applications effectively without much overhead.
For the given multiple sequences, a certain threshold and the gap constraints, the study objective is to discover frequent patterns whose supports in multiple sequences are no less than the given threshold value, where any two successive elements of pattern fulfill the user-specified gap constraints, and any two occurrences of a pattern in a given sequence meet the one-off condition. To solve this problem, the existing algorithms only consider the first occurrence of each character of a pattern when they compute the support of a pattern in a given sequence, so that many frequent patterns are not mined. An efficient mining algorithm of multiple sequential patterns with gap constraints, named MMSP, was proposed. Firstly, it stored the candidate positions of a pattern using two-dimensional table, then it selected the position from the candidate positions according to the left-most strategy. The experiments were conducted on DNA sequences. The number of frequent patterns mined by MMSP was 3.23 times of that mined by the related algorithm named M-OneOffMine when the number of multiple sequence elements is constant and the sequence length changes, and the average number of mining patterns by MMSP was 4.11 times of that mined by M-OneOffMine when the number of multiple sequence elements changes. The average number of mined patterns by MMSP was 2.21 and 5.24 times of that mined by M-OneOffMine and MPP respectively when the number of multiple sequence elements changes, and the frequent patterns mined by M-OneOffMine was a subset of MMSP. The experimental results show that MMSP can mine more frequent patterns with shorter time, and it is more suitable for practical applications.
For the localization problem in urban areas, where Global Positioning System (GPS) cannot provide the accurate location as its signal can be easily blocked by the high-rise buildings, a visual localization method based on vertical building facades and 2D bulding boundary map was proposed. Firstly, the vertical line features across two views, which are captured with an onboard camera, were matched into pairs. Then, the vertical building facades were reconstructed using the matched vertical line pairs. Finally, a visual localization method, which utilized the reconstructed vertical building facades and 2D building boundary map, was designed under the RANSAC (RANdom Sample Consensus) framework. The proposed localization method can work in real complex urban scenes. The experimental results show that the average localization error is around 3.6m, which can effectively improve the accuracy and robustness of self-localization of mobile robots in urban environments.
High-dimensional data mining methods are mostly based on the mathematical theory rather than visual intuition currently. To facilitate visual analysis and evaluation of high-dimensional data, Random Forest (RF) was introduced to visualize high-dimensional data. Firstly, RF applied supervised learning to get the proximity measurement from the source data and the principal coordinate analysis was used for dimension reduction, which transformed the high-dimensional data relationship into the low-dimensional space. Then scattering plots were used to visualize the data in low-dimensional space. The results of experiment on high-dimensional gene datasets show that visualization with supervised dimension-reduction based on RF can illustrate perfectly discrimination of class distribution and outperforms traditional unsupervised dimension-reduction.