Aiming at the problem that Wireless Capsule Endoscopy (WCE) image classification models are only for a single disease or limited to a specific organ, and are difficult to adapt to clinical needs, a WCE image classification model based on improved ConvNeXt-T(ConvNeXt Tiny) was proposed. Firstly, a Simple parameter-free Attention Module (SimAM) was introduced during the model’s feature extraction process to make the model focus on the key areas of WCE images, so as to capture the detailed features such as the boundaries and textures of lesion areas accurately. Secondly, a Global Context Multi-scale Feature Fusion (GC-MFF) module was designed. In the module, global context modeling capability of the model was firstly optimized through Global Context Block (GC Block), and then the shallow and deep multi-scale features were fused to obtain WCE images features with more representation ability. Finally, the Cross Entropy (CE) loss function was optimized to address the problem of large intra-class differences among WCE images. Experimental results on a WCE dataset show that the proposed model has the accuracy and F1 value increased by 2.96 and 3.16 percentage points, respectively, compared with the original model ConvNeXt-T; compared with Swin-B (Swin Transformer Base) model, which has the best performance among mainstream classification models, the proposed model has the number of parameters reduced by 67.4% and the accuracy and F1 value increased by 0.51 and 0.67 percentage points, respectively. The above indicates that the proposed model has better classification performance and can assist doctors in making accurate diagnosis of digestive tract diseases effectively.
In view of the challenges of non-Independent and Identically Distributed (non-IID) data and heterogeneous computing power faced in Federated Learning (FL) for edge computing applications, the concept of local drift variable was introduced to avoid the significant deviation in client model updates caused by non-IID data, thereby preventing unstable model convergence. By correcting the local model parameters, the local training process was separated from the global aggregation process, optimizing FL performance in non-IID data training process. Furthermore, considering the diversity of edge server computing power, a new strategy was proposed: a simplified neural network sub-model was divided from the global model for deployment on resource-constrained edge servers, while high-capacity servers utilized the complete global model. Parameters trained by the low-capacity servers were uploaded to the cloud server, with partial parameter freezing to accelerate model convergence. Integrating these two methods, a Federated learning optimization algorithm based on Local drift and Diversity computing power (FedLD) was proposed to solve the heterogeneous challenges caused by non-IID data and diversity computing power in FL for edge computing. Experimental results show that FedLD has faster convergence speed and higher accuracy compared to FedAvg, SCAFFOLD, and FedProx algorithms, compared to FedProx, when 50 clients are involved in training, FedLD improves the model accuracy by 0.39%, 3.68% and 15.24% on MNIST, CIFAR-10 and CIFAR-100 datasets, respectively. Comparative analysis with the latest FedProc algorithm reveals that FedLD has lower communication overhead. Additional experiments incorporating K-Nearest Neighbors (KNN) algorithm, Long Short-Term Memory (LSTM) model, and bidirectional Gated Recurrent Unit (GRU) model demonstrate approximately 1% accuracy improvements across all three models when integrated with FedLD.
In order to improve the data transmission efficiency in edge computing application scenarios and manage the concurrent data traffic effectively, a relay control model for concurrent data flow in edge computing was designed. Firstly, based on the features of Data Plane Development Kit (DPDK) such as bypassing kernel, multi-core processing, and sending and receiving packets from multiple network ports, the concurrent receiving and forwarding processing of data flow was realized. Secondly, by establishing a system model with Model Predictive Control (MPC) as the core, state prediction was used to optimize control inputs and provide timely feedback and adjustment, so as to achieve the control of data traffic. Finally, a Weighted Round-Robin (WRR) algorithm was proposed to allocate weights according to buffer size and recent usage time in order to achieve load balancing of data flow. Experimental results show that the proposed model is able to control the real-time data flow rate effectively in edge network environment, and has the control error between -1% and 2%. The proposed model improves the data flow sending bit rate of edge nodes in real application scenarios compared with traditional Linux kernel forwarding, and the transmission quality and packet delay are also improved accordingly. It can be seen that the proposed model can meet the demands for low latency and high bandwidth in edge clusters and internet of things data centers, and can optimize critical computing resources while reducing peak loads.
To address the problems of low recognition precision and difficult recognition of the existing one-stage anchor-free detectors in genetic object detection scenarios, a high-precision object detection algorithm based on improved variable focal network VarifocalNet (VFNet) was proposed. Firstly, the ResNet backbone network used for feature extraction in VFNet was replaced by the Recurrent Layer Aggregation Network (RLANet). The recurrent residual connection operation imported the features of the previous layer into the subsequent network layer to improve the representation ability of the features. Next, the original feature fusion network was substituted by the Feature Pyramid Network (FPN) with feature alignment convolution operation, thereby effectively utilizing the deformable convolution operation in the fusion process of the upper and lower layers of FPN to align the features and optimize the feature quality. Finally, the Focal-Global Distillation (FGD) algorithm was used to further improve the detection performance of small-scale algorithm. The evaluation experimental results on COCO (Common Objects in Context) 2017 dataset show that under the same training conditions,the improved algorithm adopting RLANet-50 as the backbone can achieve the mean Average Precision (mAP) of 45.9%, which is 4.3 percentage points higher than that of the VFNet algorithm, and the improved algorithm has the number of parameters of 36.67×10 6, which is only 4×10 6 higher than that of the VFNet algorithm. The improved VFNet algorithm only slightly increases the amount of parameters while improving the detection accuracy, indicating that the algorithm can meet the requirements of lightweight and high-precision of object detection.
In stock market, investors can predict the future stock return by capturing the potential trading patterns in historical data. The key issue for predicting stock return is how to find out the trading patterns accurately. However, it is generally difficult to capture them due to the influence of uncertain factors such as corporate performance, financial policies, and national economic growth. To solve this problem, a Multi-Scale Kernel Adaptive Filtering (MSKAF) method was proposed to capture the multi-scale trading patterns from past market data. In this method, in order to describe the multi-scale features of stocks, Stationary Wavelet Transform (SWT) was employed to obtain data components with different scales. The different trading patterns hidden in stock price fluctuations were contained in these data components. Then, the Kernel Adaptive Filtering (KAF) was used to capture the trading patterns with different scales to predict the future stock return. Experimental results show that compared with those of the prediction model based on Two-Stage KAF (TSKAF), the Mean Absolute Error (MAE) of the results generated by the proposed method is reduced by 10%, and the Sharpe Ratio (SR) of the results generated by the proposed method is increased by 8.79%, verifying that the proposed method achieves better stock return prediction performance.
In recent years, the Grid-based distributed Xin’anjiang hydrological Model (GXM) has played an important role in flood forecasting, but when simulating the flooding process, due to the vast amount of data and calculation of the model, the computing time of GXM increases exponentially with the increase of the model warm-up period, which seriously affects the computational efficiency of GXM. Therefore, a parallel computing algorithm of GXM based on grid flow direction division and dynamic priority Directed Acyclic Graph (DAG) scheduling was proposed. Firstly, the model parameters, model components, and model calculation process were analyzed. Secondly, a parallel algorithm of GXM based on grid flow direction division was proposed from the perspective of spatial parallelism to improve the computational efficiency of the model. Finally, a DAG task scheduling algorithm based on dynamic priority was proposed to reduce the occurrence of data skew in model calculation by constructing the DAG of grid computing nodes and dynamically updating the priorities of computing nodes to achieve task scheduling during GXM computation. Experimental results on Dali River basin of Shaanxi Province and Tunxi basin of Anhui Province show that compared with the traditional serial computing method, the maximum speedup ratio of the proposed algorithm reaches 4.03 and 4.11, respectively, the computing speed and resource utilization of GXM were effectively improved when the warm-up period is 30 days and the data resolution is 1 km.
In order to solve the problems, such as insufficient search ability and low search efficiency of Heap-Based optimizer (HBO) in solving complex problems, a Differential disturbed HBO (DDHBO) was proposed. Firstly, a random differential disturbance strategy was proposed to update the best individual’s position to solve the problem of low search efficiency caused by not updating of this individual by HBO. Secondly, a best worst differential disturbance strategy was used to update the worst individual’s position and strengthen its search ability. Thirdly, the ordinary individual’s position was updated by a multi-level differential disturbance strategy to strengthen information communication among individuals between multiple levels and improve the search ability. Finally, a dimension-based differential disturbance strategy was proposed for other individuals to improve the probability of obtaining effective solutions in initial stage of original updating model. Experimental results on a large number of complex functions from CEC2017 show that compared with HBO, DDHBO has better optimization performance on 96.67% functions and less average running time (3.445 0 s), and compared with other state-of-the-art algorithms, such as Worst opposition learning and Random-scaled differential mutation Biogeography-Based Optimization (WRBBO), Differential Evolution and Biogeography-Based Optimization (DEBBO), Hybrid Particle Swarm Optimization and Grey Wolf Optimizer (HGWOP), etc., DDHBO also has significant advantages.
Handwritten text recognition technology can transcribe handwritten documents into editable digital documents. However, due to the problems of different writing styles, ever-changing document structures and low accuracy of character segmentation recognition, handwritten English text recognition based on neural networks still faces many challenges. To solve the above problems, a handwritten English text recognition model based on Convolutional Neural Network (CNN) and Transformer was proposed. Firstly, CNN was used to extract features from the input image. Then, the features were input into the Transformer encoder to obtain the prediction of each frame of the feature sequence. Finally, the Connectionist Temporal Classification (CTC) decoder was used to obtain the final prediction result. A large number of experiments were conducted on the public Institut für Angewandte Mathematik (IAM) handwritten English word dataset. Experimental results show that this model obtains a Character Error Rate (CER) of 3.60% and a Word Error Rate (WER) of 12.70%, which verify the feasibility of the proposed model.
In current international society, as the international language, English characters appear in many public occasions, as well as the Chinese pinyin characters in Chinese environment. When these characters appear in the image, especially in the image with complex style, it is difficult to edit and modify them directly. In order to solve the problems, an image character editing method based on improved character generation network named Font Adaptive Neural network (FANnet) was proposed. Firstly, the salience detection algorithm based on Histogram Contrast (HC) was used to improve the Character Adaptive Detection (CAD) model to accurately extract the image characters selected by the user. Secondly, the binary image of the target character that was almost consistent with the font of the source character was generated by using FANnet. Then, the color of source characters were transferred to target characters effectively by the proposed Colors Distribute-based Local (CDL) transfer model based on color complexity discrimination. Finally, the target editable characters that were highly consistent with the font structure and color change of the source character were generated, so as to achieve the purpose of character editing. Experimental results show that, on MSRA-TD500, COCO-Text and ICDAR datasets, the average values of Structural SIMilarity(SSIM), Peak Signal-to-Noise Ratio (PSNR) and Normalized Root Mean Square Error (NRMSE) of the proposed method are 0.776 5, 18.321 1 dB and 0.435 8 respectively, which are increased by 18.59%,14.02% and decreased by 2.97% comparing with those of Scene Text Editor using Font Adaptive Neural Network(STEFANN) algorithm respectively, and increased by 30.24%,23.92% and decreased by 4.68% comparing with those of multi-modal few-shot font style transfer model named Multi-Content GAN(MC-GAN) algorithm(with 1 input character)respectively. For the image characters with complex font structure and color gradient distribution in real scene, the editing effect of the proposed method is also good. The proposed method can be applied to image reuse, image character computer automatic error correction and image text information restorage.
Most of the current Chinese questions and answers matching technologies require word segmentation first, and the word segmentation problem of Chinese medical text requires maintenance of medical dictionaries to reduce the impact of segmentation errors on subsequent tasks. However, maintaining dictionaries requires a lot of manpower and knowledge, making word segmentation problem always be a great challenge. At the same time, the existing Chinese medical questions and answers matching methods all model the questions and the answers separately, and do not consider the relationship between the keywords contained in the questions and the answers respectively. Therefore, an Attention mechanism based Stack Convolutional Neural Network (Att-StackCNN) model was proposed to solve the problem of Chinese medical questions and answers matching. Firstly, character embedding was used to encode the questions and answers to obtain the respective character embedding matrices. Then, the respective feature attention mapping matrices were obtained by constructing the attention matrix using the character embedding matrices of the questions and answers. After that, Stack Convolutional Neural Network (Stack-CNN) model was used to perform convolution operation to the above matrices at the same time to obtain the respective semantic representations of the questions and answers. Finally, the similarity was calculated, and the max-margin loss was calculated by using the similarity to update the network parameters. On the cMedQA dataset, the Top-1 accuracy of proposed model was about 1 percentage point higher than that of Stack-CNN model and about 0.5 percentage point higher than that of Multi-CNNs model. Experimental results show that Att-StackCNN model can improve the matching effect of Chinese medical questions and answers.
For the positioning task of mobile robots in indoor environment, the emerging auxiliary positioning technology based on Visual Inertial Odometry (VIO) is heavily limited by the light conditions and cannot works in the dark environment. And Ultra-Wide Band (UWB)-based positioning methods are easily affected by Non-Line Of Sight (NLOS) error. To solve the above problems, an indoor mobile robot positioning algorithm based on the combination of UWB and VIO was proposed. Firstly, S-MSCKF (Stereo-Multi-State Constraint Kalman Filter) algorithm/DS-TWR (Double Side-Two Way Ranging) algorithm and trilateral positioning method were used to obtain the position information of VIO output/positioning information resolved by UWB respectively. Then, the motion equation and observation equation of the position measurement system were established. Finally, the optimal position estimation of the robot was obtained by data fusion carried out using Error State-Extended Kalman Filter (ES-EKF) algorithm. The built mobile positioning platform was used to verify the combined positioning method in different indoor environments. Experimental results show that in the indoor environment with obstacles, the proposed algorithm can reduce the maximum error of overall positioning by about 4.4% and the mean square error of overall positioning by about 6.3% compared with the positioning method only using UWB, and reduce the maximum error of overall positioning by about 31.5% and the mean square error of overall positioning by about 60.3% compared with the positioning method using VIO. It can be seen that the proposed algorithm can provide real-time, accurate and robust positioning results for mobile robots in indoor environment.
The mixed noise formed by a large number of spikes, speckles and multi-directional stripe errors in Shuttle Radar Terrain Mission (SRTM) will cause serious interference to the subsequent applications. In order to solve the problem, a Low-Rank Group Sparsity_Total Variation (LRGS_TV) algorithm was proposed. Firstly, the uniqueness of the data in the local range low-rank direction was used to regularize the global multi-directional stripe error structure, and the variational idea was used to perform unidirectional constraints. Secondly, the non-local self-similarity of the weighted kernel norm was used to eliminate the random noise, and the Total Variation (TV) regularity was combined to constrain the data gradient, so as to reduce the difference of local range changes. Finally, the low-rank group sparse model was solved by the alternating direction multiplier optimization to ensure the convergence of model. Quantitative evaluation shows that, compared with four algorithms such as TV, Unidirectional Total Variation (UTV), Low-Rank-based Single-Image Decomposition (LRSID) and Low-Rank Group Sparsity (LRGS) model, the proposed LRGS_TV has the Peak Signal-to-Noise Ratio (PSNR) of 38.53 dB and the Structural SIMilarity (SSIM) of 0.97, which are both better than the comparison algorithms. At the same time, the slope and aspect results show that after LRGS_TV processing, the subsequent applications of the data can be significantly improved. The experimental results show that, the proposed LRGS_TV can repair the original data better while ensuring that the terrain contour features are basically unchanged, and can provide important support to the reliability improvement and subsequent applications of SRTM.
For the localization and static semantic mapping problems in dynamic scenes, a Simultaneous Localization And Mapping (SLAM) algorithm in dynamic scenes based on semantic and optical flow constraints was proposed to reduce the impact of moving objects on localization and mapping. Firstly, for each frame of the input, the masks of the objects in the frame were obtained by semantic segmentation, then the feature points that do not meet the epipolar constraint were filtered out by the geometric method. Secondly, the dynamic probability of each object was calculated by combining the object masks with the optical flow, the feature points were filtered by the dynamic probabilities to obtain the static feature points, and the static feature points were used for the subsequent camera pose estimation. Then, the static point cloud was created based on RGB-D images and object dynamic probabilities, and the semantic octree map was built by combining the semantic segmentation. Finally, the sparse semantic map was created based on the static point cloud and the semantic segmentation. Test results on the public TUM dataset show that, in highly dynamic scenes, the proposed algorithm improves the performance on both the absolute trajectory error and relative pose error by more than 95% compared with ORB-SLAM2, and reduces the absolute trajectory error by 41% and 11% compared with DS-SLAM and DynaSLAM respectively, which verifies that the proposed algorithm has better localization accuracy and robustness in highly dynamic scenes. The experimental results of mapping show that the proposed algorithm creates a static semantic map, and the storage space requirement of the sparse semantic map is reduced by 99% compared to that of the point cloud map.