As a multi-task meta learning algorithm, Model Agnostic Meta Learning (MAML) can use different models and adapt quickly to different tasks, but it still needs to be improved in terms of training speed and accuracy. The principle of MAML was analyzed from the perspective of Gaussian stochastic process, and a new Model Agnostic Meta Learning algorithm based on Bayesian Weight function (BW-MAML) was proposed, in which the weight was assigned by Bayesian analysis. In the training process of BW-MAML, each sampling task was regarded as following a Gaussian distribution, and the importance of the task was determined according to the probability of the task in the distribution, and then the weight was assigned according to the importance, thus improving the utilization of information in each gradient descent. The small sample image learning experimental results on Omniglot and Mini-ImageNet datasets show that by adding Bayesian weight function, for training effect of BW-MAML after 2500 step with 6 tasks, the accuracy of BW-MAML is at most 1.9 percentage points higher than that of MAML, and the final accuracy is 0.907 percentage points higher than that of MAML on Mini-ImageNet averagely; the accuracy of BW-MAML on Omniglot is also improved by up to 0.199 percentage points averagely.
The traditional clustering methods are carried out in the data space, and clustered data is high-dimensional. In order to solve these two problems, a new binary image clustering method, Clustering based on Discrete Hashing (CDH), was proposed. To reduce the dimension of data, L 21 ?norm was used in this framework to realize adaptive feature selection. At the same time, the data was mapped into binary Hamming space by the hashing method. Then, the sparse binary matrix was decomposed into a low-rank matrix in the Hamming space to complete fast image clustering. Finally, an optimization scheme that could converge quickly was used to solve the objective function. Experimental results on image datasets (Caltech101, Yale, COIL20, ORL) show that this method can effectively improve the efficiency of clustering. Compared with the traditional clustering methods,such as K-means and Spectral Clustering (SC),the time efficiency of CDH was improved by 87 and 98 percentage points respectively in the Gabor view of the Caltech101 dataset when processing high-dimensional data.
In recent years, the action recognition of audio visual joint learning has received some attention. Whether in video (visual modality) or audio (auditory modality), the occurrence of action is instantaneous, only the information in the time period of action can significantly express the action category. How to make better use of the significant expression information carried by the key frames of audio-visual modality is one of the problems to be solved in audio-visual action recognition. According to this problem, a key frame screening network KFIA-S was proposed. Though the linear temporal attention mechanism based on the full connected layer, different weights were given to the audio-visual information at different times, so as to screen the audio-visual features beneficial to video classification, reduce redundant information, suppress background interference information, and improve the accuracy of action recognition. The effect of different intensity of time attention on action recognition was studied. The experiments on ActivityNet dataset show that KFIA-S network achieves the SOTA (State-Of-The-Art) recognition accuracy, which proves the effectiveness of the proposed method.
Ride-sharing application systems can reduce traffic congestion and alleviate parking space tension by increasing the utilization rate of car available seat capacity, thus improving social and environmental benefits. The effective real-time matching and optimization technology of drivers and passengers is one of the core components for a successful ride-sharing system. Role-Based Collaboration (RBC) is an emerging methodology to facilitate an organizational structure, provide orderly system behavior, and coordinate the activities within the system. In order to reduce the dynamic real-time matching time of passengers and drivers, and improve the matching efficiency, a method combining RBC and Environment-Class, Agent, Role, Group and Object (E-CARGO) model was proposed to formalize ride sharing problem. To improve the utilization rate of available seat capacity, maximize platform revenue, and rationalize resource allocation with constraints of entire resource capacity and given profit, the modeling and simulation experiments for ride-sharing matching method were conducted. The experimental results show that the proposed formal method based on E-CARGO model can be applied to the modeling of ride-sharing matching problem, and the optimal matching matrix and time can be obtained by Kuhn-Munkres (K-M) algorithm and ILOG software package in Java. The simulation results show that the average time of K-M algorithm is reduced by 21% at least compared to ILOG software package algorithm. When the agent size is larger than a certain value (more than 600), the time consumption of the proposed algorithm increases sharply.
Due to the introduction of MonoDepth2, unsupervised monocular ranging has made great progress in the field of visible light. However, visible light is not applicable in some scenes, such as at night and in some low-visibility environments. Infrared thermal imaging can obtain clear target images at night and under low-visibility conditions, so it is necessary to estimate the depth of infrared image. However, due to the different characteristics of visible and infrared images, it is unreasonable to migrate existing monocular depth estimation algorithms directly to infrared images. An infrared monocular ranging algorithm based on multiscale feature fusion after improving the MonoDepth2 algorithm can solve this problem. A new loss function, edge loss function, was designed for the low texture characteristic of infrared image to reduce pixel mismatch during image reprojection. The previous unsupervised monocular ranging simply upsamples the four-scale depth maps to the original image resolution to calculate projection errors, ignoring the correlation between scales and the contribution differences between different scales. A weighted Bi-directional Feature Pyramid Network (BiFPN) was applied to feature fusion of multiscale depth maps so that the blurring of depth map edge was solved. In addition, Residual Network (ResNet) structure was replaced by Cross Stage Partial Network (CSPNet) to reduce network complexity and increase operation speed. The experimental results show that edge loss is more suitable for infrared image ranging, resulting in better depth map quality. After adding BiFPN structure, the edge of depth image is clearer. After replacing ResNet with CSPNet, the inference speed is improved by about 20 percentage points. The proposed algorithm can accurately estimate the depth of the infrared image, solving the problem of depth estimation in night low-light scenes and some low-visibility scenes, and the application of this algorithm can also reduce the cost of assisted driving to a certain extent.
In order to solve the occlusion problem of student expression recognition in complex classroom scenes, and give full play to the advantages of deep learning in the application of intelligent teaching evaluation,a student expression recognition model and an intelligent teaching evaluation algorithm based on deep attention network in classroom teaching videos were proposed. A video library, an expression library and a behavior library for classroom teaching were constructed, then, multi-channel facial images were generated by cropping and occlusion strategies. A multi-channel deep attention network was built and self-attention mechanism was used to assign different weights to multiple channel networks. The weight distribution of each channel was restricted by a constrained loss function, then the global feature of the facial image was expressed as the quotient of the sum of the product of the feature times its attention weight of each channel divided by the sum of the attention weights of all channels. Based on the learned global facial feature, the student expressions in classroom were classified, and the student facial expression recognition under occlusion was realized. An intelligent teaching evaluation algorithm that integrates the student facial expressions and behavior states in classroom was proposed, which realized the recognition of student facial expressions and intelligent teaching evaluation in classroom teaching videos. By making experimental comparison and analysis on the public dataset FERplus and self-built classroom teaching video datasets, it is verified that the student facial expressions recognition model in classroom teaching videos achieves high accuracy of 87.34%, and the intelligent teaching evaluation algorithm that integrates the student facial expressions and behavior states in classroom achieves excellent performance on the classroom teaching video dataset.
In order to solve the problems of small targets, large noises, and many types in the logo recognition for vehicles on traffic road, a method combining a target detection algorithm based on deep learning and a template matching algorithm based on morphology was proposed, and a recognition system with high accuracy and capable of dealing with new types of vehicle logo was designed. First, K-Means++ was used to re-cluster the anchor box values and residual network was introduced into YOLOv4 for one-step positioning of the vehicle logo. Secondly, the binary vehicle logo template library was built by preprocessing and segmenting standard vehicle logo images. Then, the positioned vehicle logo was preprocessed by MSRCR (Multi-Scale Retinex with Color Restoration), OTSU binarization, etc. Finally, the Hamming distance was calculated between the processed vehicle logo and the standard vehicle logo in the template library and the best match was found. In the vehicle logo detection experiment, the improved YOLOv4 detection achieves the higher accuracy of 99.04% compared to the original YOLOv4, two-stage positioning method of vehicle logo based on license plate position and the vehicle logo positioning method based on radiator grid background; its speed is slightly lower than that of the original YOLOv4, higher than those of the other two, reaching 50.62 fps (frames per second). In the vehicle logo recognition experiment, the recognition accuracy based on morphological template matching is higher compared to traditional Histogram Of Oriented Gradients (HOG), Local Binary Pattern (LBP) and convolutional neural network, reaching 91.04%. Experimental results show that the vehicle logo detection algorithm based on deep learning has higher accuracy and faster speed. The morphological template matching method can maintain a high recognition accuracy under the conditions of light change and noise pollution.
In view of the problems that classroom teaching scene is obscured seriously and has numerous students, the current video action recognition algorithm is not suitable for classroom teaching scene, and there is no public dataset of student classroom action, a classroom teaching video library and a student classroom action library were constructed, and a real-time multi-person student classroom action recognition algorithm based on deep spatiotemporal residual convolution neural network was proposed. Firstly, combined with real-time object detection and tracking to get the real-time picture stream of each student, and then the deep spatiotemporal residual convolution neural network was used to learn the spatiotemporal characteristics of each student’s action, so as to realize the real-time recognition of classroom behavior for multiple students in classroom teaching scenes. In addition, an intelligent teaching evaluation model was constructed, and an intelligent teaching evaluation system based on the recognition of students’ classroom actions was designed and implemented, which can help improve the teaching quality and realize the intelligent education. By making experimental comparison and analysis on the classroom teaching video dataset, it is verified that the proposed real-time classroom action recognition model for multiple students in classroom teaching video can achieve high accuracy of 88.5%, and the intelligent teaching evaluation system based on classroom action recognition has also achieved good results in classroom teaching video dataset.
In order to strengthen the monitoring of old people and reduce the safety risks caused by falls, a new indoor fall detection algorithm based on Res2Net-YOLACT and fusion feature was proposed. For the video image sequences, firstly, the YOLACT network integrated with Res2Net module was used to extract the human body contour, and then a two-level judgment method was used to make a fall decision. In the first level, whether an abnormal state occurs was judged roughly through the movement speed feature, and in the second level, the human body posture was determined through the model structure that combines the body shape features and the depth feature. Finally, when fall posture was detected and the occurrence time was greater than the threshold, a fall alarm was given. Experimental results show that the proposed fall detection algorithm can extract the human body contour well in complex scenes, which has good robustness to illumination as well as a real-time performance of up to 28 fps (frames per second). In addition, the classification performance of the algorithm after adding manual features is better, the classification accuracy is 98.65%, which is 1.03 percentage points higher than that of the algorithm with original CNN (Convolutional Neural Network) features.
Non-negative Matrix Tri-Factorization (NMTF) is an important part of the latent factor model. Because this algorithm decomposes the original data matrix into three mutually constrained latent factor matrices, it has been widely used in research fields such as recommender systems and transfer learning. However, there is no research work on the interpretability of non-negative matrix tri-factorization. From this view, by regarding the user comment text information as prior knowledge, Partially Explainable Non-negative Matrix Tri-Factorization (PE-NMTF) algorithm was designed based on prior knowledge. Firstly, sentiment analysis technology was used by to extract the emotional polarity preferences of user comment text information. Then, the objective function and updating formula in non-negative matrix tri-factorization algorithm were changed, embedding prior knowledge into the algorithm. Finally, a large number of experiments were carried out on the Yelp and Amazon datasets for the cold start task of the recommender system and the AwA and CUB datasets for the image zero-shot task to compare the proposed algorithm with the non-negative matrix factorization and the non-negative matrix three-factor decomposition algorithms. The experimental results show that the proposed algorithm performs well on RMSE (Root Mean Square Error), NDCG (Normalized Discounted Cumulative Gain), NMI (Normalized Mutual Information), and ACC (ACCuracy), and the feasibility and effectiveness of the non-negative matrix tri-factorization were verified by using prior knowledge.
In order to solve the problem of huge labeling cost for person re-identification, a method of one-shot video-based person re-identification with multi-loss learning and joint metric was proposed. Aiming at the problem that the number of label samples is small and the model obtained is not robust enough, a Multi-Loss Learning (MLL) strategy was proposed. In each training process, different loss functions were used for different data to optimize and improve the discriminative ability of the model. Secondly, a Joint Distance Metric (JDM) was proposed for label estimation, which combined the sample distance and the nearest neighbor distance to further improve the accuracy of pseudo label prediction. JDM solved the problems of the low accuracy of label estimation for unlabeled data, and the instability in the training process caused by the unlabeled data not fully utilized. Experimental results show that compared with the one-shot progressive learning method PL (Progressive Learning), the rank-1 accuracy reaches 65.5% and 76.2% on MARS and DukeMTMC-VideoReID datasets when the ratio of pseudo label samples added per iteration is 0.10, with the improvement of the proposed method of 7.6 and 5.2 percentage points, respectively.
Most of the existing network embedding methods only preserve the local structure information of the network, while they ignore other potential information in the network. In order to preserve the community information of the network and reflect the multi-granularity characteristics of the network community structure, a network Embedding method based on Multi-Granularity Community information (EMGC) was proposed. Firstly, the network’s multi-granularity community structure was obtained, the node embedding and the community embedding were initialized. Then, according to the node embedding at previous level of granularity and the community structure at this level of granularity, the community embedding was updated, and the corresponding node embedding was adjusted. Finally, the node embeddings under different community granularities were spliced to obtain the network embedding that fused the community information of different granularities. Experiments on four real network datasets were carried out. Compared with the methods that do not consider community information (DeepWalk, node2vec) and the methods that consider single-granularity community information (ComE, GEMSEC), EMGC’s AUC value on link prediction and F1 score on node classification are generally better than those of the comparison methods. The experimental results show that EMGC can effectively improve the accuracy of subsequent link prediction and node classification.
Colored Traveling Salesman Problem (CTSP) is a variant of Multiple Traveling Salesmen Problem (MTSP) and Traveling Salesman Problem (TSP), which can be applied to the engineering problems such as Multi-machine Engineering System (MES) with overlapping workspace. CTSP is an NP complete problem, although related studies have attempted to solve the problem by Genetic Algorithm (SA), Simulated Annealing (SA) algorithm and some other methods, but they solve the problem at a limited scale and with unsatisfactory speed and solution quality. Therefore, a hybrid IT? algorithm combined with Uniform Design (UD), Ant Colony Optimization (ACO) and IT? algorithm was proposed to solve this problem, namely UDHIT?. UD was applied to choose the appropriate combination of parameters of the UDHIT? algorithm, the probabilistic graphic model of ACO was used to generate feasible solutions, and the drift operator and volatility operator of IT? were used to optimize the solutions. Experimental results show that the UDHIT? algorithm can demonstrate improvement over the traditional GA, ACO and IT? algorithm for the multi-scale CTSP problems in terms of best solution and average solution.
Traditional stock prediction methods are mostly based on time-series models, which ignore the complex relations among stocks, and the relations often exceed pairwise connections, such as stocks in the same industry or multiple stocks held by the same fund. To solve this problem, a stock trend prediction method based on temporal HyperGraph Convolutional neural Network (HGCN) was proposed, and a hypergraph model based on financial investment facts was constructed to fit multiple relations among stocks. The model was composed of two major components: Gated Recurrent Unit (GRU) network and HGCN. GRU network was used for performing time-series modeling on historical data to capture long-term dependencies. HGCN was used to model high-order relations among stocks to learn intrinsic relation attributes, and introduce the multiple relation information among stocks into traditional time-series modeling for end-to-end trend prediction. Experiments on real dataset of China A-share market show that compared with existing stock prediction methods, the proposed model improves prediction performance, e.g. compared with the GRU network, the proposed model achieves the relative increases in ACC and F1_score of 9.74% and 8.13%, respectively, and is more stable. In addition, the simulation back-testing results show that the trading strategy based on the proposed model is more profitable, with an annual return of 11.30%, which is 5 percentage points higher than that of Long Short-Term Memory (LSTM) network.
The recognition of spam is one of the main tasks in natural language processing. The traditional methods are based on text features or word frequency, which recognition accuracies mainly depend on the presence or absence of specific keywords. When there are no keywords or errors in recognizing keywords in the spam, the traditional methods have poor recognition performance. Neural network-based methods were proposed. Recognition training and testing were conducted on complex spam. The spams that cannot be recognized by traditional methods were collected and the same amount of normal information was randomly selected from spam messages, advertisement and spam email datasets to form three new datasets without duplicate data. Three models were proposed based on convolutional neural network and recurrent neural network and tested on three new datasets for spam recognition. The experimental results show that the neural network-based models learned better semantic features from the text and achieved the accuracies of more than 98% on all three datasets, which are significantly higher than those of the traditional methods, such as Naive Bayes (NB), Random Forest (RF) and Support Vector Machine (SVM). The experimental results also show that different neural networks are suitable for text classification with different lengths. The models composed of recurrent neural networks are good at recognizing text with sentence length, the models composed of convolutional neural networks are good at recognizing text with paragraph length, and the models composed of both neural networks are good at recognizing text with chapter length.
Concerning the shortcoming that the current feature-weighted Fuzzy Support Vector Machines (FSVM) only consider the influence of feature weights on the membership functions but ignore the application of feature weights to the kernel functions calculation during sample training, a new FSVM algorithm that considers the influence of feature weights on the membership function and the kernel function calculation simultaneously was proposed, namely Doubly Feature-Weighted FSVM (DFW-FSVM). Firstly, relative weight of each feature was calculated by using Information Gain (IG). Secondly, the weighted Euclidean distance between the sample and the class center was calculated in the original space based on the feature weights, and then the membership function was constructed by applying the weighted Euclidean distance; at the same time, the feature weights were applied to the calculation of the kernel function in the sample training process. Finally, DFW-FSVM algorithm was constructed according to the weighted membership functions and kernel functions. In this way, DFW-FSVM is able to avoid being dominated by trivial relevant or irrelevant features. The comparative experiments were carried out on eight UCI datasets, and the results show that compared with the best results of SVM, FSVM, Feature-Weighted SVM (FWSVM), Feature-Weighted FSVM (FWFSVM) and FSVM based on Centered Kernel Alignment (CKA-FSVM) , the accuracy and F1 value of the DFW-FSVM algorithm increase by 2.33 and 5.07 percentage points, respectively, indicating that the proposed DFW-FSVM has good classification performance.
Traditional research on preference reasoning and preference query mainly focuses on the preference of a single object represented by a relational tuple. However, it is a challenge to extend the method of temporal conditional preference query to the extraction sequence of data stream. The problems encountered mainly include the extraction of sequences in data stream and the rapid processing to obtain the dominant sequences and dominant objects. According to the preference data stream, firstly, the Continuous Query Language (CQL) was extended and a special query language named StreamSeq was proposed to deal with the temporal conditional preference on the data stream effectively, which allows the temporal conditional preference specification and reasoning of the sequences extracted from the data stream. Then, an algorithm for extracting object sequences according to temporal index from data stream and an algorithm for performing dominant comparison between sequences were designed, and the dominant sequences satisfying preference condition were returned according to the input data stream. Finally, two sets of data were used for experimental verification. On the synthetic data set, when the number of generated attributes, sequence number, time range and time sliding interval were 10, 8, 20 s and 1 s, the running time acceleration ratio of sequence extraction algorithm and CQL equivalent algorithm was 13.33; on the real data set; when the time range and time sliding interval were 40 s and 1 s, the running time acceleration ratios of the dominant contrast algorithm to mintopK, partition, and incpartition were 10.77, 6.46 and 5.69. Experimental results show that compared with other preference query algorithms, the proposed method needs less running time and is more efficient in getting results.
Concerning the characteristics of breast cancer in Magnetic Resonance Imaging (MRI), such as different shapes and sizes, and fuzzy boundaries, an algorithm based on multiscale residual U Network (UNet) with attention mechanism was proposed in order to avoid error segmentation and improve segmentation accuracy. Firstly, the multiscale residual units were used to replace two adjacent convolution blocks in the down-sampling process of UNet, so that the network could pay more attention to the difference of shape and size. Then, in the up-sampling stage, layer-crossed attention was used to guide the network to focus on the key regions, avoiding the error segmentation of healthy tissues. Finally, in order to enhance the ability of representing the lesions, the atrous spatial pyramid pooling was introduced as a bridging module to the network. Compared with UNet, the proposed algorithm improved the Dice coefficient, Intersection over Union (IoU), SPecificity (SP) and ACCuracy (ACC) by 2.26, 2.11, 4.16 and 0.05 percentage points, respectively. The experimental results show that the algorithm can improve the segmentation accuracy of lesions and effectively reduce the false positive rate of imaging diagnosis.
Focusing on the unbalance issue between local optimization and global optimization and the inability to jump out of the local optimum of Artificial Fish Swarm Algorithm (AFSA), an Adaptive AFSA utilizing Gene Exchange (AAFSA-GE) was proposed. Firstly, an adaptive mechanism of view and step was utilized to enhance the search speed and accuracy. Then, chaotic behavior and gene exchange behavior were employed to improve the ability of jumping out of the local optimum and the search efficiency. Ten classic test functions were selected to prove the feasibility and robustness of the proposed algorithm by comparing it with the other three modified AFSAs, which are Normative Fish Swarm Algorithm (NFSA), FSA optimized by PSO algorithm with Extended Memory (PSOEM-FSA), and Comprehensive Improvement of Artificial Fish Swarm Algorithm (CIAFSA). Experimental results show that AAFSA-GE achieves better results in local and global search ability than those of PSOEM-FSA and CIAFSA,and better search efficiency and better global search ability than those of NSFA.
In deep learning, as the depth of Convolutional Neural Network (CNN) increases, more and more data is required for neural network training, but gene structure variation is a small sample event in large-scale genetic data, resulting in a very shortage of image data of variant genes, which seriously affects the training effect of CNN and causes the problems of poor gene structure variation detection precision and high false positive rate. In order to increase the number of gene structure variation samples and improve the precision of CNN to identify gene structure variation, a gene image data augmentation method was proposed based on GAN (Generative Adversarial Network), namely GeneGAN. Firstly, initial genetic image data was generated by using the Reads stacking method and it was divided into two datasets including variant gene images and non-variant gene images. Secondly, GeneGAN was used to augment the variant image samples to balance the positive and negative datasets. Finally, CNN was used to detect the datasets before and after augmentation, and precision, recall and F1 score were used as measurement indicators. Experimental results show that compared with tradional augmentation method, GAN based augmentation method and feature extraction method, the F1 score of GeneGAN is improved by 1.94 to 17.46 percentage points, verifying that GeneGAN method can improve the precision of CNN to identify gene structure variation.
Nitrogen oxide (NOx) is one of the main pollutants in the regenerated flue gas of Fluid Catalytic Cracking (FCC) unit. Accurate prediction of NOx emission can effectively avoid the occurrence of pollution events in refinery enterprises. Because of the non-stationarity, nonlinearity and long-memory characteristics of pollutant emission data, a new hybrid model incorporating Ensemble Empirical Mode Decomposition (EEMD) and Long Short-Term Memory network (LSTM) was proposed to improve the prediction accuracy of pollutant emission concentration. The NOx emission concentration data was first decomposed into several Intrinsic Mode Functions (IMFs) and a residual by using the EEMD model. According to the correlation analysis between the IMF sub-sequences and the original data, the IMF sub-sequences with low correlation were eliminated, which could effectively reduce the noise in the original data. The IMFs could be divided into high and low frequency sequences, which were respectively trained in the LSTM networks with different depths. The final NOx concentration prediction results were reconstructed by the predicted results of each sub-sequences. Compared with the performance of LSTM in the NOx emission prediction of FCC unit, the Mean Square Error (MSE), Mean Absolute Error (MAE) were reduced by 46.7%, 45.9%,and determination coefficient (R2) of EEMD-LSTM was improved by 43% respectively, which means the proposed model achieves higher prediction accuracy.
In online kernel regression learning, the inverse matrix of the kernel matrix needs to be calculated when a new sample arrives, and the computational complexity is at least the square of the number of rounds. The idea of applying sketching method to hypothesis updating was introduced, and a more efficient online kernel regression algorithm via sketching method was proposed. Firstly, The loss function was set as the square loss, a new gradient descent algorithm, called FTL-Online Kernel Regression (F-OKR) was proposed, using the Nystr?m approximation method to approximate the Kernel, and applying the idea of Follow-The-Leader (FTL). Then, sketching method was used to accelerate F-OKR so that the computational complexity of F-OKR was reduced to the level of linearity with the number of rounds and sketch scale, and square with the data dimension. Finally, an efficient online kernel regression algorithm called Sketched Online Kernel Regression (SOKR) was designed. Compared to F-OKR, SOKR had no change in accuracy and reduced the runtime by about 16.7% on some datasets. The sub-linear regret bounds of these two algorithms were proved, and experimental results on standard regression datasets also verify that the algorithms have better performance than NOGD (Nystr?m Online Gradient Descent) algorithm, the average loss of all the datasets was reduced by about 64%.
In practical classification tasks such as image annotation and disease diagnosis, there is usually a hierarchical structural relationship between the classes in the label space of data with high dimensionality of the features. Many hierarchical feature selection algorithms have been proposed for different practical tasks, but ignoring the unknown and uncertainty of feature space. In order to solve the above problems, an online streaming feature selection algorithm OH_ReliefF based on ReliefF for hierarchical classification learning was presented. Firstly, the hierarchical relationship between classes was incorporated into the ReliefF algorithm to define a new method HF_ReliefF for calculating feature weights for hierarchical data. Then, important features were dynamically selected based on the ability of features to classify decision attributes. Finally, the dynamic redundancy analysis of features was performed based on the independence between features. Experimental results show that the proposed algorithm achieves better results in all evaluation metrics of the K-Nearest Neighbor (KNN) classifier and the Lagrangian Support Vector Machine (LSVM) classifier at least 7 percentage points improvement in accuracy when compared with five advanced online streaming feature selection algorithms.
Due to the complex and variable structure of fundus vessels, and the low contrast between the fundus vessel and the background, there are huge difficulties in segmentation of fundus vessels, especially small fundus vessels. U-Net based on deep fully convolutional neural network can effectively extract the global and local information of fundus vessel images,but its output is grayscale image binarized by a hard threshold, which will cause the loss of vessel area, too thin vessel and other problems. To solve these problems, U-Net and Pulse Coupled Neural Network (PCNN) were combined to give play to their respective advantages and design a fundus vessel segmentation method. First, the iterative U-Net model was used to highlight the vessels, the fusion results of the features extracted by the U-Net model and the original image were input again into the improved U-Net model to enhance the vessel image. Then, the U-Net output result was viewed as a gray image, and the PCNN with adaptive threshold was utilized to perform accurate vessel segmentation. The experimental results show that the AUC (Area Under the Curve) of the proposed method was 0.979 6,0.980 9 and 0.982 7 on the DRVIE, STARE and CHASE_DB1 datasets, respectively. The method can extract more vessel details, and has strong generalization ability and good application prospects.
Because there are many differences in real life scenes, human emotions are various in different scenes, which leads to an uneven distribution of labels in the emotion dataset. Furthermore, most traditional methods utilize model pre-training and feature engineering to enhance the expression ability of expression-related features, but do not consider the complementarity between different feature representations, which limits the generalization and robustness of the model. To address these issues, EE-GAN, an end-to-end deep learning framework including the network integration model Ens-Net was proposed. It took the characteristics of different depths and regions into consideration,the fusion of different semantic and different level features was implemented, and network integration was used to improve the learning ability of the model. Besides, facial images with specific expression labels were generated by generative adversarial network, which aimed to balance the distribution of expression labels in data augmentation. The qualitative and quantitative evaluations on CK+, FER2013 and JAFFE datasets demonstrate the effectiveness of proposed method. Compared with existing view learning methods, including Locality Preserving Projections (LPP), EE-GAN achieves the facial expression accuracies of 82.1%, 84.8% and 91.5% on the three datasets respectively. Compared with traditional CNN models such as AlexNet, VGG, and ResNet, EE-GAN achieves the accuracy increased by at least 9 percentage points.