Existing cross-modal hashing algorithms underestimate the importance of semantic differences between different class labels and ignore the balance condition of hash vectors, which makes the learned hash codes less discriminative. In addition, some methods utilize the label information to construct similarity matrix and treat multi-label data as single label ones to perform modeling, which causes large semantic loss in multi-label cross-modal retrieval. To preserves accurate similarity relationship between heterogeneous data and the balance property of hash vectors, a novel supervised hashing algorithm, namely Discriminative Matrix Factorization Hashing (DMFH) was proposed. In this method, the Collective Matrix Factorization (CMF) of the kernelized features was used to obtain a shared latent subspace. The proportion of common labels between the data was also utilized to describe the similarity degree of the heterogeneous data. Besides, a balanced matrix was constructed by label balanced information to generate hash vectors with balance property and maximize the inter-class distances among different class labels. By comparing with seven advanced cross-modal hashing retrieval methods on two commonly used multi-label datasets, MIRFlickr and NUS-WIDE, DMFH achieves the best mean Average Precision (mAP) on both I2T (Image to Text) and T2I (Text to Image) tasks, and the mAPs of T2I are better, indicating that DMFH can utilize the multi-label semantic information in text modal more effectively. The validity of the constructed balanced matrix and similarity matrix is also analyzed, verifying that DMFH can maintain semantic information and similarity relations, and is effective in cross-modal hashing retrieval.
A multi-stage low-illuminance image enhancement network based on attention mechanism was proposed to solve the problem that the details of low-illuminance images are lost due to the overlapping of image contents and large brightness differences in some regions during the enhancement process of low-illuminance images. At the first stage, an improved multi-scale fusion module was used to perform preliminary image enhancement. At the second stage, the enhanced image information of the first stage was cascaded with the input of this stage, and the result was used as the input of the multi-scale fusion module in this stage. At the third stage, the enhanced image information of the second stage was cascaded with the input of the this stage, and the result was used as the input of the multi-scale fusion module in this stage. In this way, with the use of multi-stage fusion, not only the brightness of the image was improved adaptively, but also the details were retained adaptively. Experimental results on open datasets LOL and SICE show that compared to the algorithms and networks such as MSR (Multi-Scale Retinex) algorithm, gray Histogram Equalization (HE) algorithm and RetinexNet (Retina cortex Network), the proposed network has the value of Peak Signal-to-Noise Ratio (PSNR) 11.0% to 28.9% higher, and the value of Structural SIMilarity (SSIM) increased by 6.8% to 46.5%. By using multi-stage method and attention mechanism to realize low-illuminance image enhancement, the proposed network effectively solves the problems of image content overlapping and large brightness difference, and the images obtained by this network are more detailed and subjective recognizable with clearer textures.
It is an effective hybrid strategy for imbalanced data classification of integrating cost-sensitivity and resampling methods into the ensemble algorithms. Concerning the problem that the misclassification cost calculation and undersampling process less consider the intra-class and inter-class distributions of samples in the existing hybrid methods, an imbalanced data classification algorithm based on ball cluster partitioning and undersampling with density peak optimization was proposed, named Boosting algorithm based on Ball Cluster Partitioning and UnderSampling with Density Peak optimization (DPBCPUSBoost). Firstly, the density peak information was used to define the sampling weights of majority samples, and the majority ball cluster with “neighbor cluster” was divided into “area misclassified easily” and “area misclassified hardly”, then the sampling weight of samples in “area misclassified easily” was increased. Secondly, the majority samples were undersampled based on the sampling weights in the first iteration, then the majority samples were undersampled based on the sample distribution weight in every iteration. And the weak classifier was trained on the temporary training set combining the undersampled majority samples with all minority samples. Finally, the density peak information of samples was combined with the categorical distribution of samples to define the different misclassification costs for all samples, and the weights of samples with higher misclassification cost were increased by the cost adjustment function. Experimental results on 10 KEEL datasets indicate that, the number of datasets with the highest performance achieved by DPBCPUSBoost is more than that of the imbalanced data classification algorithms such as Adaptive Boosting (AdaBoost), Cost-sensitive AdaBoost (AdaCost), Random UnderSampling Boosting (RUSBoost) and UnderSampling and Cost-sensitive Boosting (USCBoost), in terms of evaluation metrics such as Accuracy, F1-Score, Geometric Mean (G-mean) and Area Under Curve (AUC) of Receiver Operating Characteristic (ROC). Experimental results verify that the definition of sample misclassification cost and sampling weight of the proposed DPBCPUSBoost is effective.
Symbolic music generation is still an unsolved problem in the field of artificial intelligence and faces many challenges. It has been found that the existing methods for generating polyphonic music fail to meet the marke requirements in terms of melody, rhythm and harmony, and most of the generated music does not conform to basic music theory knowledge. In order to solve the above problems, a new Transformer-based multi-track music Generative Adversarial Network (Transformer-GAN) was proposed to generate music with high musicality under the guidance of music rules. Firstly, the decoding part of Transformer and the Cross-Track Transformer (CT-Transformer) adapted on the basis of Transformer were used to learn the information within a single track and between multiple tracks respectively. Then, a combination of music rules and cross-entropy loss was employed to guide the training of the generative network, and the well-designed objective loss function was optimized while training the discriminative network. Finally, multi-track music works with melody, rhythm and harmony were generated. Experimental results show that compared with other multi-instrument music generation models, for piano, guitar and bass tracks, Transformer-GAN improves Prediction Accuracy (PA) by a minimum of 12%, 11% and 22%, improves Sequence Similarity (SS) by a minimum of 13%, 6% and 10%, and improves the rest index by a minimum of 8%, 4% and 17%. It can be seen that Transformer -GAN can effectively improve the indicators including PA and SS of music after adding CT-Transformer and music rule reward module, which leads to a relatively high overall improvement of the generated music.
Concerning the lack of flexibility in adversarial training of Deep Convolutional Generative Adversarial Network (DCGAN) and the problems of inflexible optimization and unclear convergence state of Binary Cross-Entropy loss (BCE loss) function used in DCGAN, an improved algorithm of Generative Adversarial Network (GAN) based on arbitration mechanism was proposed. In this algorithm, the proposed arbitration mechanism was added on the basis of DCGAN. Firstly, the network structure of the proposed improved algorithm was composed of generator, discriminator, and arbiter. Secondly, the adversarial training was conducted by the generator and discriminator according to the training plan, and the abilities to generate images and verify the authenticity of images were strengthened according to the characteristics learned from the dataset respectively. Thirdly, the arbiter was generated by the generator and the discriminator after the last round of adversarial training and metric score calculation module, and the adversarial training results of the generator and the discriminator were measured by this arbiter and fed back into the training plan. Finally, a wining limit was added to the network structure to improve the stability of model training, and the Circle loss function was used to replace the BCE loss function, which made the model optimization process more flexible and the convergence state more clear. Experimental results show that the proposed algorithm has a good generation effect on the architectural and face datasets. On the Large-scale Scene UNderstanding (LSUN) dataset, the proposed algorithm has the Fréchet Inception Distance (FID) index decreased by 1.04% compared with the DCGAN original algorithm; on the CelebA dataset, the proposed algorithm has the Inception Score (IS) index increased by 4.53% compared with the DCGAN original algorithm. The images generated by the proposed algorithm have better diversity and higher quality.
Particle filter is widely applied in many fields due to its ability of dealing with nonlinear and non-Gaussian problems. However, concerning some serious problems such as particle degradation and poverty in particle filtering, an improved resampling algorithm was proposed in the paper. The idea of method was based on partial stratified resampling and residual resampling, to classify particles by large, medium and small weights and replicate samples from three hierarchical sets with different strategies. The efficiency of algorithm was improved while maintaining diversity of particles. Finally through comparison with classic sequential importance sampling and resamplings and other partial resamplings, simulation results of UNG (Univariate Non-stationary Growth) and BOT (Bearings Only Tracking) models also verify the filtering performance and validity of the proposed algorithm in this paper.