To address the problems of blurred texture details and color distortion in low-light image enhancement, an end-to-end lightweight dual-branch network by combining spatial and frequency information, named SAFNet, was proposed. Transformer-based spatial block and frequency block were adopted by SAFNet to process spatial information and Fourier transformed frequency information of input image in spatial and frequency branchs, respectively. Attention mechanism was also applied in SAFNet to fuse features captured from spatial and frequency branchs adaptively to obtain final enhanced image. Furthermore, a frequency-domain loss function for frequency information was added into joint loss function, in order to constrain SAFNet on both spatial and frequency domains. Experiments on public datasets LOL and LSRW were conducted to evaluate the performance of SAFNet. Experimed results show that SAFNet achieved 0.823, 0.114 in metrics of Structural SIMilarity (SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) on LOL, respectively, and 17.234 dB, 0.550 in Peak Signal-to-Noise Ratio (PSNR) and SSIM on LSRW. SAFNet achieve supreme performance than evaluated mainstream methods, such as LLFormer (Low-Light Transformer), IAT (Illumination Adaptive Transformer), and KinD (Kindling the Darkness) ++ with only 0.07×106 parameters. On DarkFace dataset, the average precision of human face detection is increased from 52.6% to 72.5% by applying SAFNet as preprocessing step. Above experimental results illustrate that SAFNet can effectively enhance low-light images quality and improve performance of low-light face detection for downstream tasks significantly.
At present, the image super-resolution networks based on deep learning are mainly implemented by convolution. Compared with the traditional Convolutional Neural Network (CNN), the main advantage of Transformer in the image super-resolution task is its long-distance dependency modeling ability. However, most Transformer-based image super-resolution models cannot establish global dependencies with small parameters and few network layers, which limits the performance of the model. In order to establish global dependencies in super-resolution network, an image Super-Resolution network based on Global Dependency Transformer (GDTSR) was proposed. Its main component was the Residual Square Axial Window Block (RSAWB), and in Transformer residual layer, axial window and self-attention were used to make each pixel globally dependent on the entire feature map. In addition, the super-resolution image reconstruction modules of most current image super-resolution models are composed of convolutions. In order to dynamically integrate the extracted feature information, Transformer and convolution were combined to jointly reconstruct super-resolution images. Experimental results show that the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) of GDTSR on five standard test sets, including Set5, Set14, B100, Urban100 and Manga109, are optimal for three multiples ( × 2 , × 3 , × 4 ), and on large-scale datasets Urban100 and Manga109, the performance improvement is especially obvious.
Chance-Constrained Multi-Choice Knapsack Problem (CCMCKP) is a class of NP-hard combinatorial optimization problems with important practical applications. However, there is a lack of research on the solution methods for this problem. The first framework for solving CCMCKP was proposed for this problem, and two solution methods were established based on this framework, including the dynamic programming-based method RA-DP and the genetic algorithm-based method RA-IGA. RA-DP is an exact method with optimality guarantee, but it can only solve small-scale problem instances within a time budget of 1 hour. In contrast, RA-IGA is an approximation method with better scalability. Simulation experimental results verify the performance of the proposed methods. On small-scale problem instances, both RA-DP and RA-IGA can find the optimal solutions. On the medium- and large-scale problem instances, RA-IGA exhibits significantly higher efficiency than RA-DP, always obtaining feasible solutions quickly within 1 hour. In future research on CCMCKP, RA-DP and RA-IGA can be considered as baseline methods, and the benchmark set considered in this work can also be used as a standard benchmark test set.
In the training process of manipulator path planning algorithm, the training efficiency of manipulator path planning is low due to the huge action space and state space leading to sparse rewards, and it becomes challenging to evaluate the value of both states and actions given the immense number of states and actions. To address the above problems, a robotic manipulator planning algorithm based on SAC (Soft Actor-Critic) reinforcement learning was proposed. The learning efficiency was improved by incorporating the demonstrated path into the reward function so that the manipulator imitated the demonstrated path during reinforcement learning, and the SAC algorithm was used to make the training of the manipulator path planning algorithm faster and more stable. The proposed algorithm and Deep Deterministic Policy Gradient (DDPG) algorithm were used to plan 10 paths respectively, and the average distances between paths planned by the proposed algorithm and the DDPG algorithm and the reference paths were 0.8 cm and 1.9 cm respectively. The experimental results show that the path imitation mechanism can improve the training efficiency, and the proposed algorithm can better explore the environment and make the planned paths more reasonable than DDPG algorithm.
Due to the unique characteristics of underwater creatures, underwater images usually exit many small targets being hard to detect and often overlapping with each other. In addition, light absorption and scattering in underwater environment can cause underwater images' color offset and blur. To overcome those challenges, an underwater target detection algorithm, namely WCA-YOLOv8, was proposed. Firstly, the Feature Fusion Module (FFM) was designed to improve the focus on spatial dimension in order to improve the recognition ability for targets with color offset and blur. Secondly, the FReLU Coordinate Attention (FCA) module was added to enhance the feature extraction ability for overlapped and occluded underwater targets. Thirdly, Complete Intersection over Union (CIoU) loss function was replaced by Wise-IoU version 3 (WIoU v3) loss function to strengthen the detection performance for small size targets. Finally, the Downsampling Enhancement Module (DEM) was designed to preserve context information during feature extraction more completely. Experimental results show that WCA-YOLOv8 achieves 75.8% and 88.6% mean Average Precision (mAP0.5) and 60 frame/s and 57 frame/s detection speeds on RUOD and URPC datasets, respectively. Compared with other state-of-the-art underwater target detection algorithms, WCA-YOLOv8 can achieve higher detection accuracy with faster detection speed.
By comprehensively comparing the models of traditional knowledge graph representation learning, including the advantages and disadvantages and the applicable tasks, the analysis shows that the traditional single-modal knowledge graph cannot represent knowledge well. Therefore, how to use multimodal data such as text, image, video, and audio for knowledge graph representation learning has become an important research direction. At the same time, the commonly used multimodal knowledge graph datasets were analyzed in detail to provide data support for relevant researchers. On this basis, the knowledge graph representation learning models under multimodal fusion of text, image, video, and audio were further discussed, and various models were summarized and compared. Finally, the effect of multimodal knowledge graph representation on enhancing classical applications, including knowledge graph completion, question answering system, multimodal generation and recommendation system in practical applications was summarized, and the future research work was prospected.
Aiming at the drawbacks that Sparrow Search Algorithm (SSA) has relatively low search accuracy and is easy to fall into the local optimum, an Enhanced Sparrow Search Algorithm based on Multiple Improvement strategies (EMISSA) was proposed. Firstly, in order to balance the global search and local search abilities of the algorithm, fuzzy logic was introduced to adjust the scale of sparrow finders dynamically. Secondly, the mixed differential mutation operation was performed on sparrow followers to generate mutation subgroups, thereby enhancing the ability of EMISSA to jump out of the local optimum. Finally, Topological Opposition-Based Learning (TOBL) was used to obtain topological opposition solutions of sparrow finders, thereby fully mining high-quality position information in the search space. EMISSA, standard SSA and Chaotic Sparrow Search Optimization Algorithm (CSSOA) were evaluated by 12 test functions in 2013 Congress on Evolutionary Computation (CEC2013). Experimental results show that EMISSA achieves 11 first places on 12 test functions in the 30-dimensional case; in the 80-dimensional case, the proposed algorithm has the optimal results on all the test functions. In the Friedman test, EMISSA ranks first on all the test functions. Experimental results of applying EMISSA to the Wireless Sensor Network (WSN) node deployment in obstacle environment show that compared with other algorithms, EMISSA achieves the highest wireless node coverage with more uniform node distribution and less coverage redundancy.
When using the slicing method to measure the point cloud volumes of irregular objects, the existing Polygon Splitting and Recombination (PSR) algorithm cannot split the nearer contours correctly, resulting in low calculation precision. Aiming at this problem, a multi-contour segmentation algorithm — Improved Nearest Point Search (INPS) algorithm was proposed. Firstly, the segmentation of multiple contours was performed through the single-use principle of local points. Then, Point Inclusion in Polygon (PIP) algorithm was adopted to judge the inclusion relationship of contours, thereby determining positive or negative property of the contour area. Finally, the slice area was multiplied by the thickness and the results were accumulated to obtain the volume of irregular object point cloud. Experimental results show that on two public point cloud datasets and one point cloud dataset of chemical electron density isosurface, the proposed algorithm can achieve high-accuracy boundary segmentation and has certain universality. The average relative error of volume measurement of the proposed algorithm is 0.043 6%, which is lower than 0.062 7% of PSR algorithm, verifying that the proposed algorithm achieves high accuracy boundary segmentation.
To improve the performance of Yin-Yang-Pair Optimization-Simulated Annealing1 (YYPO-SA1), a Yin-Yang-pair optimization algorithm based on dynamic D-way splitting and chaotic perturbation NYYPO (Newton-Yin-Yang-Pair Optimization) was proposed. Firstly, in order to dynamically adjust the probability of D-way splitting, Newton’s law of cooling mechanism was adopted. Then, the chaotic perturbation strategy was applied in splitting stage. The dynamic adjustment mechanism was applied to enable NYYPO to use a larger D-way segmentation probability at the early stage of search, and use a smaller D-way segmentation probability at the late stage of search, which enhanced the global search ability of the algorithm. Meanwhile, the diversity of solution was enriched, and the ability of the algorithm to jump out of local optimum was improved by using chaotic perturbation strategy. Finally, NYYPO was applied to the parameter optimization design problem of wind-driven generator. Fifteen test functions, including unimodal, multimodal, and composite functions, were selected to evaluate the performance of NYYPO, YYPO-SA1, and 6 representative single-objective optimization algorithms: Particle Swarm Optimization (PSO) algorithm, Crow Search Algorithm (CSA), Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WOA), Flower Pollination Algorithm (FPA), and Sparrow Search Algorithm (SSA). The results show that compared with YYPO-SA1, NYYPO obtains 12 orders of magnitude improvement on Sphere function. In Friedman test, when dimension is 10, 30, 50 respectively, NYYPO ranks 2.87, 2.0 and 1.93 averagely and respectively, total ranking of all of them is the first. It can be seen that NYYPO achieves significant performance advantages in statistical significance. At the same time, NYYPO also achieves better optimization results in the parameter optimization design problem of wind-driven generator.
Aiming at the problem of insufficient representation ability of features extracted by the existing vehicle re-identification methods based on convolution Neural Network (CNN), a vehicle re-identification method based on the combination of wavelet features and attention mechanism was proposed. Firstly, the single-layer wavelet module was embedded in the convolution module to replace the pooling layer for subsampling, thereby reducing the loss of fine-grained features. Secondly, a new local attention module named Feature Extraction Module (FEM) was put forward by combining Channel Attention (CA) mechanism and Pixel Attention (PA) mechanism, which was embedded into CNN to weight and strengthen the key information. Comparison experiments with the benchmark residual convolutional network ResNet-50 and ResNet-101 were conducted on VeRi dataset. Experimental results show that increasing the number of wavelet decomposition layers in ResNet-50 can improve mean Average Precision (mAP). In the ablation experiment, although ResNet-50+Discrete Wavelet Transform (DWT) has the mAP reduced by 0.25 percentage points compared with ResNet-101, it has the number of parameters and computational complexity lower than those of ResNet-101, and has the mAP, Rank-1 and Rank-5 higher than those of ResNet-50 without DWT, verifying that the proposed model can effectively improve the accuracy of vehicle retrieval in vehicle re-identification.
In order to achieve the stable and precise control of industrial processes with non-linearity, hysteresis, and strong coupling, a new control method based on Local Policy Interaction Exploration-based Deep Deterministic Policy Gradient (LPIE-DDPG) was proposed for the continuous control of deep reinforcement learning. Firstly, the Deep Deterministic Policy Gradient (DDPG) algorithm was used as the control strategy to greatly reduce the phenomena of overshoot and oscillation in the control process. At the same time, the control strategy of original controller was used as the local strategy for searching, and interactive exploration was used as the rule for learning, thereby improving the learning efficiency and stability. Finally, a penicillin fermentation process simulation platform was built under the framework of Gym and the experiments were carried out. Simulation results show that, compared with DDPG, the proposed LPIE-DDPG improves the convergence efficiency by 27.3%; compared with Proportion-Integration-Differentiation (PID), the proposed LPIE-DDPG has fewer overshoot and oscillation phenomena on temperature control effect, and has the penicillin concentration increased by 3.8% in yield. In conclusion, the proposed method can effectively improve the training efficiency and improve the stability of industrial process control.
In massive Multiple-Input Multiple-Output (MIMO) systems, Minimum Mean Square Error (MMSE) detection algorithm has the problems of poor adaptability, high computational complexity and low efficiency on the reconfigurable array structure. Based on the reconfigurable array processor developed by the project team, a parallel mapping method based on MMSE algorithm was proposed. Firstly, a pipeline acceleration scheme which could be highly parallel in time and space was designed based on the relatively simple data dependency of Gram matrix calculation. Secondly, according to the relatively independent characteristic of Gram matrix calculation and matched filter calculation module in MMSE algorithm, a modular parallel mapping scheme was designed. Finally, the mapping scheme was implemented based on Xilinx Virtex-6 development board, and the statistics of its performance were performed. Experimental results show that, the proposed method achieves the acceleration ratio of 2.80, 4.04 and 5.57 in Quadrature Phase Shift Keying (QPSK) uplink with the MIMO scale of 128 × 4 , 128 × 8 and 128 × 16 , respectively, and the reconfigurable array processor reduces the resource consumption by 42.6% compared with the dedicated hardware in the 128 × 16 massive MIMO system.
Rate-Distortion (R-D) optimization is a crucial technique in video encoders. However, the widely used independent R-D optimization is far from being global optimal. In order to further improve the compression performance of High Efficiency Video Coding (HEVC), a two-pass encoding algorithm combined with both R-D dependency and R-D characteristic was proposed. Firstly, the current frame was encoded with the original method in HEVC, and the number of bits consumed by the current frame and the R-D model parameters of each Coding Tree Unit (CTU) were obtained. Then, combined with time domain dependent rate distortion optimization, the optimal Lagrange multiplier and quantization parameter for each CTU were determined according to the information including current frame bit budget and R-D model parameters. Finally, the current frame was re-encoded, where each CTU had different optimization goal according to its Lagrange multiplier. Experimental results show that the proposed algorithm achieves significant rate-distortion performance improvement. Specifically, the proposed algorithm saves 3.5% and 3.8% bitrate at the same coding quality, compared with the original HEVC encoder, under the coding configurations of low-delay B and P frames.
Under the background of emphasizing data right confirmation and privacy protection, federated learning, as a new machine learning paradigm, can solve the problem of data island and privacy protection without exposing the data of all participants. Since the modeling methods based on federated learning have become mainstream and achieved good effects at present, it is significant to summarize and analyze the concepts, technologies, applications and challenges of federated learning. Firstly, the development process of machine learning and the inevitability of the appearance of federated learning were elaborated, and the definition and classification of federated learning were given. Secondly, three federated learning methods (including horizontal federated learning, vertical federated learning and federated transfer learning) which were recognized by the industry currently were introduced and analyzed. Thirdly, concerning the privacy protection issue of federated learning, the existing common privacy protection technologies were generalized and summarized. In addition, the recent mainstream open-source frameworks were introduced and compared, and the application scenarios of federated learning were given at the same time. Finally, the challenges and future research directions of federated learning were prospected.
A robust speech recognition model training algorithm based on self-supervised knowledge transfer was proposed to solve the problems of the increasingly high cost of tagging neural network training data and noise interference hindering performance improvement of speech recognition system. Firstly, three artificial features of the original speech samples were extracted in the pre-processing stage. Then, the advanced features generated by the feature extraction network were fitted to the artificial features extracted in the pre-processing stage through three shallow networks respectively in the training stage. At the same time, the feature extraction front-end and the speech recognition back-end were cross-trained, and their loss functions were integrated. Finally, the advanced features that are more conducive to denoised speech recognition were extracted by the feature extraction network after using the gradient back propagation, thereby realizing the artificial knowledge transfer and denoising as well as using training data efficiently. In the application scenario of military equipment control, the word error rate of the proposed method can be reduced to 0.12 based on the test on three open source Chinese speech recognition datasets THCHS-30 (TsingHua Continuous Chinese Speech), Aishell-1 and ST-CMDS (Surfing Technology Commands) as well as the military equipment control command dataset. Experimental results show that the proposed method can not only train robust speech recognition models, but also improve the utilization rate of training samples through self-supervised knowledge transfer, and can complete equipment control tasks.
The traditional social collaborative filtering algorithms based on rating prediction have the inherent deficiency in which the prediction value does not match the real sort, and social collaborative ranking algorithms based on ranking prediction are more suitable to practical application scenarios. However, most existing social collaborative ranking algorithms focus on explicit feedback data only or implicit feedback data only, and not make full use of the information in the dataset. In order to fully exploit both the explicit and implicit scoring information of users’ social networks and recommendation objects, and to overcome the inherent deficiency of traditional social collaborative filtering algorithms based on rating prediction, a new social collaborative ranking model based on the newest xCLiMF model and TrustSVD model, namely SPR_SVD++, was proposed. In the algorithm, both the explicit and implicit information of user scoring matrix and social network matrix were exploited simultaneously and the learning to rank’s evaluation metric Expected Reciprocal Rank (ERR) was optimized. Experimental results on real datasets show that SPR_SVD++ algorithm outperforms the existing state-of-the-art algorithms TrustSVD, MERR_SVD++ and SVD++ over two different evaluation metrics Normalized Discounted Cumulative Gain (NDCG) and ERR. Due to its good performance and high expansibility, SPR_SVD++ algorithm has a good application prospect in the Internet information recommendation field.
With the existing deep learning algorithms, it is difficult to restore the highly blurred solar speckle images taken by Yunnan Observatories, and it is difficult to reconstruct the high-frequency information of images. In order to solve the problems, a deblurring method for restoring the solar speckle images and recovering the high-frequency information of images based on Generative Adversarial Network (GAN) and gradient information was proposed. The proposed method was consisted of one generator and two discriminators. Firstly, the image multi-scale features were obtained by the generator with the Feature Pyramid Network (FPN) framework, and these features were input into the gradient branch hierarchically to capture the smaller details in the form of gradient map, and the solar speckle image with high-frequency information was reconstructed by combining the gradient branch results and the FPN results. Then, based on the conventional adversarial discriminator, another discriminator was added to ensure the gradient map generated by the gradient branch more realistic. Finally, a joint training loss including pixel content loss, perceptual loss and adversarial loss was introduced to guide the model to perform high-resolution reconstruction of solar speckle images. Experimental results show that, compared with the existing deep learning deblurring method, the proposed method with image preprocessing has stronger ability to recover the high-frequency information, and significantly improves the Peak Signal-to-Noise Ratio (PSNR) and Structural SIMilarity (SSIM) indicators, reaching 27.801 0 dB and 0.851 0 respectively. The proposed method can meet the needs for high-resolution reconstruction of solar observation images.
Since a large number of irregular task sets have low resource requirements and high parallelism, the use of Graphics Processing Unit (GPU) to accelerate processing is the current mainstream. However, the existing irregular task scheduling strategies either use an exclusive GPU approach or use the traditional optimization methods to map tasks to GPU devices. The former leads to the idleness of GPU resources, and the latter cannot make maximum use of GPU computing resources. Based on the analysis of existing problems, an idea of multi-knapsack optimization was adopted to enable more irregular tasks to share GPU equipment in the best way. Firstly, according to the characteristics of GPU clusters, a distributed GPU job scheduling framework consisting of schedulers and executions was given. Then, with GPU memory as the cost, an Extended-grained Greedy Scheduling (EGS) algorithm based on GPU computing resources was designed. In the algorithm, as many irregular tasks as possible were scheduled on multiple available GPUs to maximize the use of GPU computing resources, and the problem of idle GPU resources was solved. Finally, the actual benchmark programs were used to randomly generate a target task set to verify the effectiveness of the proposed scheduling strategy. Experimental results show that, compared with the traditional greedy algorithm, the Minimum Completion Time (MCT) algorithm and the Min-min algorithm, when the number of tasks is equal to 1 000,the execution time of EGS algorithm is reduced to 58%, 64% and 80% of the original ones on average respectively, and the proposed algorithm can effectively improve the GPU resource utilization.
The Multi-Class Support Vector Machine (MSVM) has the defects such as strong sensitivity to noise, instability to resampling data and lower generalization performance. In order to solve the problems, the pinball loss function, sample fuzzy membership degree and sample structural information were introduced into the Simplified Multi-Class Support Vector Machine (SimMSVM) algorithm, and a structure-fuzzy multi-class support vector machine algorithm based on pinball loss, namely Pin-SFSimMSVM, was proposed. Experimental results on synthetic datasets, UCI datasets and UCI datasets adding different proportions of noise show that, the accuracy of the proposed Pin-SFSimMSVM algorithm is increased by 0~5.25 percentage points compared with that of SimMSVM algorithm. The results also show that the proposed algorithm not only has the advantages of avoiding indivisible areas of multi-class data and fast calculation speed, but also has good insensitivity to noise and stability to resampling data. At the same time, the proposed algorithm considers the fact that different data samples play different roles in classification and the important prior knowledge contained in the data, so that the classifier training is more accurate.
To achieve efficient software testing under cloud computing environment, a method of generating parallel test cases automatically for functional testing of Web application system was proposed. First, parallel test paths were obtained by conducting depth-first traversal algorithm on scene flow graph; then parallel test scripts were assembled from test scripts referred by the test paths, and parameterized valid test data sets that can traverse target test paths and replace test data in script were generated using Search Based Software Testing (SBST) method. A vast number of automatic distributable parallel test cases were generated by inputting test data into parallel test scripts. Finally, a prototype system of automatic testing in cloud computing environment was built for examination of the method. The experimental results show that the method can generate a large number of valid test cases rapidly for testing in cloud computing environment and improve the efficiency of testing.
According to the influence of earliness and reworking penalties, the production order acceptance problem of hot-rolled bar was studied. A mathematical model with the objective of maximize gross profit of order was proposed. A hybrid algorithm with improved NEH (Nawaz-Enscore-Ham) algorithm and Modified Harmony Search (MHS) algorithm was proposed for the model. With the consideration of the constraints in the model, an initial solution was generated by the improved NEH algorithm and further optimized by MHS algorithm. Furthermore, the idea of Teaching-Learning-Based Optimization (TLBO) was introduced to the process of selection and updating for harmony vector to take control of the acceptance of new solutions. Meanwhile, in order to balance the breadth and depth of this algorithm's searching ability, the parameters were adjusted dynamically to improve the global optimization ability. The simulation experiments with practical production data show that the proposed algorithm can effectively improve total profit and acceptance rate, and validate the feasibility and effectiveness of the model and algorithm.
According to the characteristics of traditional multivariate linear regression method for long processing time and limited memory, a parallel multivariate linear regression forecasting model was designed based on MapReduce for the time-series sample data. The model was composed of three MapReduce processes which were used to solve the eigenvector and standard orthogonal vector of cross product matrix composed by historical data, to forecast the future parameter of the eigenvalues and eigenvectors matrix, and to estimate the regression parameters in the next moment respectively. Experiments were designed and implemented to the validity effectiveness of the proposed parallel multivariate linear regression forecasting model. The experimental results show multivariate linear regression prediction model based on MapReduce has good speedup and scaleup, and suits for analysis and forecasting of large data.
Aiming at improving the robustness in pre-processing and extracting features sufficiently for Synthetic Aperture Radar (SAR) images, an automatic target recognition algorithm for SAR images based on Deep Belief Network (DBN) was proposed. Firstly, a non-local means image despeckling algorithm was proposed based on Dual-Tree Complex Wavelet Transformation (DT-CWT); then combined with the estimation of the object azimuth, a robust process on original data was achieved; finally a multi-layer DBN was applied to extract the deeply abstract visual information as features to complete target recognition. The experiments were conducted on three Moving and Stationary Target Acquisition and Recognition (MSTAR) databases. The results show that the algorithm performs efficiently with high accuracy and robustness.