Speaker Identification (SI) in novels aims to determine the speaker of a quotation by its context. This task is of great help in assigning appropriate voices to different characters in the production of audiobooks. However, the existing methods mainly use fixed window values in the selection of the context of quotations, which is not flexible enough and may produce redundant segments, making it difficult for the model to capture useful information. Besides, due to the significant differences in the number of quotations and writing styles in different novels, a small number of labeled samples cannot enable the model to fully generalize, and the labeling of datasets is expensive. To solve the above problems, a novel speaker identification framework that integrates narrative units and reliable labels was proposed. Firstly, a Narrative Unit-based Context Selection (NUCS) method was used to select a suitable length of context for the model to focus highly on the segment closest to the quotation attribution. Secondly, a Speaker Scoring Network (SSN) was constructed with the generated context as input. In addition, the self-training was introduced, and a Reliable Pseudo Label Selection (RPLS) algorithm was designed to compensate for the lack of labeled samples to some extent and screen out more reliable pseudo-label samples with higher quality. Finally, a Chinese Novel Speaker Identification corpus (CNSI) containing 11 Chinese novels was built and labeled. To evaluate the proposed framework, experiments were conducted on two public datasets and the self-built dataset. The results show that the novel speaker identification framework that integrates narrative units and reliable labels is superior to the methods such as CSN (Candidate Scoring Network), E2E_SI and ChatGPT-3.5.
Traditional knowledge reasoning methods based on representation learning can only be used for closed-world knowledge reasoning. Conducting open-world knowledge reasoning effectively is a hot issue currently. Therefore, a knowledge reasoning model based on path and enhanced triplet text, named PEOR (Path and Enhanced triplet text for Open world knowledge Reasoning), was proposed. First, multiple paths generated by structures between entity pairs and enhanced triplets generated by individual entity neighborhood structures were utilized. Among then, the path text was obtained by concatenating the text of triplets in the path, and the enhanced triplet text was obtained by concatenating the text of head entity neighborhood, relation, and tail entity neighborhood. Then, BERT (Bidirectional Encoder Representations from Transformers) was employed to encode the path text and enhanced triplet text separately. Finally, semantic matching attention calculation was performed using path vectors and triplet vectors, followed by aggregation of semantic information from multiple paths using semantic matching attention. Comparison experimental results on three open-world knowledge graph datasets: WN18RR, FB15k-237, and NELL-995 show that compared with suboptimal model BERTRL (BERT-based Relational Learning), the proposed model has Hits@10 (Hit ratio) metric improved by 2.6, 2.3 and 8.5 percentage points, respectively, validating the effectiveness of the proposed model.
To address the issues of insufficient focus on tumor regions and the loss of spatial contextual information in brain tumor image segmentation models, which affect the accuracy of tumor segmentation, a TransUNet-based brain tumor segmentation network integrating Coordinate Enhanced Learning mechanism (CEL) and multi-source sampling was proposed. Firstly, a CEL was proposed, and ResNetv2 was combined as shallow feature extraction network of the model, so as to enhance attention to brain tumor regions. Secondly, a deep blended sampling feature extractor was designed, and deformable attention and self-attention mechanisms were used to perform multi-source sampling on both global and local information of brain tumors. Finally, an Interactive Level Fusion (ILF) module was designed between the encoder and the decoder, thereby realizing interaction between deep and shallow feature information while minimizing parameter computational cost. Experimental results on BraTS2018 and BraTS2019 datasets indicate that compared to the benchmark TransUNet, the proposed model has the mean Dice coefficient (mDice), the mean Intersection over Union (mIoU), the mean Average Precision (mAP) and the mean Recall (mRecall) improved by 4.84, 7.21, 3.83, 3.15 percentage points, respectively, and the model size reduced by 16.9 MB.
Utilizing contextual information of scene graphs can help models understand the correlation effect among targets. However, a large number of unrelated targets may introduce additional noise, affecting information interaction and causing prediction biases. In noisy and diverse scenes, even a few simple associated targets are sufficient to infer environmental information of the target and eliminate ambiguity information of other targets. In addition, Scene Graph Generation (SGG) faces challenges when dealing with long-tailed biased data in real-world scenarios. To address the problems of contextual information optimization and prediction biases, an association Information Enhancement and Relationship Balance based SGG (IERB) method was proposed. In IERB method, a secondary reasoning structure was employed according to biased scene graph prediction results, to reconstruct association information under different prediction angles of view and balance the prediction biases. Firstly, strongly correlated targets from different angles of view were focused on to construct the contextual association information. Secondly, the prediction capability for tail relationships was enhanced using a balancing strategy of tree structure. Finally, a prediction-guided approach was used to optimize predictions based on the existing scene graph. Experimental results on Visual Genome dataset show that compared with three baseline models Visual Translation Embedding network (VTransE), Motif, and Visual Context Tree (VCTree), the proposed method improves the mean Recall mR@100 in the Predicate Classification (PredCls) task by 11.66, 13.77 and 13.62 percentage points, respectively, demonstrating the effectiveness of the proposed method.
To address the issue of network active attacks such as tampering for industrial cloud storage system data, to achieve the goal of secure sharing of industrial data in cloud storage, and to ensure the confidentiality, integrity, and availability of industrial data transmission and storage processes, a data tamper-proof batch auditing scheme based on industrial cloud storage systems was proposed. In this scheme, a homomorphic digital signature algorithm based on bilinear pairing mapping was proposed, enabling a third-party auditor to achieve batch tamper-proof integrity detection of industrial cloud storage system data, and feedback the tamper-proof integrity auditing results to engineering service end users timely. Besides, the computational burden on engineering service end users was reduced by adding auditors, while ensuring the integrity of industrial encrypted data during transmission and storage processes. Security analysis and performance comparison results demonstrate that the proposed scheme reduces the third-party auditing computational cost significantly by reducing the third-party auditor’s computational cost from O(n) bilinear pairing operations to O(1) constant-level bilinear pairing operations through the design of tamper-proof detection vectors. It can be seen that the proposed scheme is suitable for lightweight batch auditing scenarios that require tamper-proof detection of a large number of core data files of industrial cloud storage systems.
Visual localization of surveillance images is an important technology in industrial intelligence. The existing visual localization algorithms lack the protection of the privacy information in the image and may lead to the leakage of sensitive content during data transmission. To address the problem, a localization method of surveillance images based on Large Vision Models (LVMs) was proposed. Firstly, the architecture of LVM privacy preserving-based visual localization was designed to transfer the style of input images by using a few prompts and reference images. Then, a feature matching algorithm for the image with style transfer was designed to estimate the camera pose. Experimental results on public datasets show that the localization error of the proposed algorithm is relatively small, demonstrating that the algorithm reduces the privacy leakage significantly while ensuring the localization accuracy.
While previous anomaly detection methods have achieved high-precision detection in specific scenarios, but their applicability is constrained by their lack of generalizability and automation. Thus, a Vision Foundation Model (VFM)-driven pixel-level image anomaly detection method, namely SSMOD-Net (State Space Model driven-Omni Dimensional Net), was proposed with the aim of achieving more accurate industrial defect detection. Unlike the existing methods, SSMOD-Net achieved automated prompting of SAM (Segment Anything Model) without the need for fine-tuning SAM, making it particularly suitable for scenarios that require processing large-scale industrial visual data. The core of SSMOD-Net is a novel prompt encoder driven by a state space model, which was able to generate prompts dynamically based on the input image of SAM. With this design, the model was allowed to introduce additional guidance information through the prompt encoder while preserving SAM’s architecture, thereby enhancing detection accuracy. A residual multi-scale module was integrated in the prompt encoder, and this module was constructed based on the state space model and was able to use multi-scale and global information comprehensively. Through iterative search, the module found optimal prompts in the prompt space and provided the prompts to SAM as high-dimensional tensors, thereby strengthening the model’s ability to recognize industrial anomalies. Moreover, the proposed method did not require any modifications to SAM, thereby avoiding the need for complex fine-tuning of the training schedules. Experimental results on several datasets show that the proposed method has excellent performance, and achieves better results in mE (mean E-measure) and Mean Absolute Error (MAE), Dice, and Intersection over Union (IoU) compared to methods such as AutoSAM and SAM-EG (SAM with Edge Guidance framework for efficient polyp segmentation).
Panoptic Scene Graph Generation (PSGG) aims to identify all objects within an image and capture the intricate semantic association among them automatically. Semantic association modeling depends on feature description of target objects and subject-object pair. However, current methods have several limitations: object features extracted through bounding box extraction are ambiguous; the methods only focus on the semantic and spatial position features of objects, while ignoring the semantic joint features and relative position features of subject-object pair, which are equally essential for accurate relation predictions; current methods fail to extract features of different types of subject-object pair (e.g., foreground-foreground, foreground-background, background-background) differentially, ignoring their inherent differences. To address these challenges, a PSGG method based on Relation Feature Enhancement (RFE) was proposed. Firstly, by introducing pixel-level mask regional features, the detailed information of object features was enriched, and the joint visual features, semantic joint features, and relative position features of subject-objects were integrated effectively. Secondly, depending on the specific type of subject-object, the most suitable feature extraction method was selected adaptively. Finally, more accurate relation features after enhancement were obtained for relation prediction. Experimental results on the PSG dataset demonstrate that with VCTree (Visual Contexts Tree), Motifs, IMP (Iterative Message Passing), and GPSNet as baseline methods, and ResNet-101 as the backbone network, RFE achieves increases of 4.37, 3.68, 2.08, and 1.80 percentage points, respectively, in R@20 index for challenging SGGen tasks. The above validates the effectiveness of the proposed method in PSGG.
When detecting small targets in multi-scale remote sensing images, target detection algorithms based on deep learning are prone to false detection and missed detection. One of the reasons is that the feature extraction module carries out multiple down-sampling operations. The second reason is the failure to pay attention to the contextual information required by different categories and different scales of targets. To solve this problem, a small object detection algorithm in remote sensing images integrating attention and contextual information ACM-YOLO (Attention-Context-Multiscale YOLO) was proposed. Firstly, to reduce the loss of small target feature information, fine-grained query aware sparse attention was applied, thereby avoiding missed detection. Secondly, to pay more attention to the contextual information required by different categories of remote sensing targets, the Local Contextual Enhancement (LCE) function was designed, thereby avoiding false detection. Finally, to strengthen multi-scale feature fusion capability of the feature fusion module on small targets in remote sensing images, the weighted Bi-directional Feature Pyramid Network (BiFPN) was adopted, thereby improving detection effect of the algorithm. Comparison experiments and ablation experiments were performed on DOTA dataset and NWPU VHR-10 dataset to verify effectiveness and generalization of the proposed algorithm. Experimental results show that on the two datasets, the proposed algorithm has the mean Average Precision (mAP) reached 77.33% and 96.12% respectively, and the Recall increases by 10.00 and 7.50 percentage points, respectively, compared with YOLOv5 algorithm. It can be seen that the proposed algorithm improves mAP and recall effectively, which reduces false detection and missed detection.
In recent years, Deformable Convolutional Network (DCN) has been widely applied in fields such as image recognition and classification. However, research on the interpretability of this model is relatively limited, and its applicability lacks sufficient theoretical support. To address these issues, this paper proposed an interpretability study of DCN and its application in butterfly species recognition model. Firstly, deformable convolution was introduced to improve the VGG16, ResNet50, and DenseNet121 (Dense Convolutional Network121) classification models. Secondly, visualization methods such as deconvolution and Class Activation Mapping (CAM) were used to compare the feature extraction capabilities of deformable convolution and standard convolution. The results of ablation experiments show that deformable convolution performs better when used in the lower layers of the neural network and not continuously. Thirdly, the Saliency Removal (SR) method was proposed to uniformly evaluate the performance of CAM and the importance of activation features. By setting different removal thresholds and other perspectives, the objectivity of the evaluation is improved. Finally, based on the evaluation results, the FullGrad (Full Gradient-weighted) explanation model was used as the basis for the recognition judgment. Experimental results show that on the Archive_80 dataset, the accuracy of the proposed D_v2-DenseNet121 reaches 97.03%, which is 2.82 percentage points higher than that of DenseNet121 classification model. It can be seen that the introduction of deformable convolution endows the neural network model with the ability to extract invariant features and improves the accuracy of the classification model.
Both the Damped Least Squares (DLS) and Genetic Algorithm (GA) are applicable to automatic design of optical systems. Although DLS has a high search efficiency, it is susceptible to falling into local optima traps. Conversely, GA has strong global search capability in the parameter space of optical structures but weak local search capability. To address these challenges, a Correctable Reinforced Search GA (CRSGA) was proposed. Firstly, DLS was introduced after the GA crossover operation to enhance local search capability. Additionally, a correction strategy was introduced to rollback individuals with deteriorated fitness values before the next iteration, thereby achieving corrective evolutionary results. The improvement of two aspects to genetic algorithm enhanced strengths and compensated for weaknesses. Three typical optical system design experiments, including Double Gaussian (DG), Reversed Telephoto (RT), and Finite Conjugate Distance Imaging (FCDI), were conducted to validate the effectiveness of CRSGA. CRSGA outperforms both DLS and GA, and its optimization outcomes are about 8.92%, 12.19%, and 9.39% respectively better than those of commercial optical design software Zemax DLS. In particularly, the optimization outcomes achieve a significant improvement, reaching 99.98%, 94.33%, and 88.45% respectively compared to the Zemax HAMMER algorithm. In conclusion, it is shown that the proposed algorithm is effective for optical system optimization and can be used for automatic optical system design.
Sensors are the basis for unmanned systems to perform intelligent actions. The fusion of multi-sensor data can enhance intelligent perception and autonomous decision-making capabilities of unmanned systems, and improve the reliability and robustness of these systems. Data fusion of unmanned systems encounters many challenges such as diverse sensor types, heterogeneous data formats, real-time needs of data fusion and analysis, as well as complex types and fast evolution of algorithm models. Traditional methods of developing fusion models through customization on front end and approaches based on fusion platform running on back end are difficult to apply in these cases. Therefore, a pipeline platform for data fusion was proposed. This platform has capabilities to support automatic data transformation, flexible algorithm combination, dynamic model configuration, and rapid iteration of functions to achieve dynamic and quick data fusion model construction and provide information service for different tasks. Based on the analysis of data fusion process and techniques, the pipeline framework and its key functions and components were characterized, the key technologies that urgently need breakthroughs were analyzed, the running way and actual case of the framework were given, and research directions for future development were pointed out.
With the emergence of large-scale pre-trained language models, text generation technology has made breakthrough progress. However, in the field of open text generation, the generated content lacks anthropomorphic emotional features, making it difficult for the generated text to resonate and connect emotionally. Controllable text generation is of great significance in compensating for the shortcomings of current text generation technology. Firstly, the extension of theme and emotional attributes was completed on the basis of the ChnSensiCorp dataset. At the same time, in order to construct a multivariate controllable text generation model that could generate smooth text with rich emotion, a diffusion sequence based controllable text generation model DiffuSeq-PT was proposed based on a diffusion model architecture. Theme emotion attributes and text data were used to perform the diffusion process on the sequences without the guidance of classifier. The encoding and decoding capabilities of the pre-trained model ERNIE 3.0(Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation) were used to fit the noising and denoising process of the diffusion model, and ultimately, target text that matched the relevant theme and multiple sentiment granularities were generated. Compared with the benchmark model DiffuSeq, the proposed model achieved an improvement of 0.13 and 0.01 in BERTScore on two publicly available real datasets (ChnSentiCorp and Debate dataset), and decreased the perplexity by 14.318 and 9.46.
A reliability enhancement algorithm for Physical Unclonable Function (PUF) was proposed to address the instability of PUF’s response caused by external and internal factors. The proposed algorithm is based on the Non-Orthogonal Discrete (NOD) transform. Firstly, a reorder mixer was designed to iteratively process the random seed vector and PUF response, resulting in the inner product of the non-orthogonal confusion matrix and the response confusion matrix, upon which the NOD spectrum was established. The algorithm effectively solved the bias of key caused by insufficient uniformity of PUF. Then, the partition encoding and decoding strategy enabled the NOD spectrum to have the ability to tolerate certain errors, significantly improving the reliability of the final response by limiting the impact of unstable responses to a limited range. Compared to traditional error correcting code-based methods, the proposed algorithm requires fewer auxiliary data. Experimental results on SRAM-XMC dataset show that, during 101 repeated experiments with 2 949 120 sets of 64-bit responses, the average reliability of the proposed algorithm reaches 99.97%, the uniqueness achieves 49.92%, and the uniformity reaches 50.61%. The experimental results demonstrate that the proposed algorithm can effectively improve reliability while ensuring uniformity and uniqueness of PUF responses.
Aiming at the problems that the detection accuracy of small objects such as cyclists and pedestrians in Three-Dimensional (3D) object detection is low, and it is difficult to adapt to complex urban road conditions, a 3D object detection network based on self-attention mechanism and graph convolution was proposed. Firstly, in order to obtain more discriminative small object features, self-attention mechanism was introduced into the backbone network to make the network more sensitive to small object features and improve the ability to extract network features. Secondly, a feature fusion module was constructed based on the self-attention mechanism to further enrich the information of shallow network and enhance the feature expression ability of deep network. Finally, dynamic graph convolution was used to predict the boundary box of the object, improving the accuracy of object prediction. The proposed network was tested on KITTI dataset, and compared to eight major networks such as TANet (Triple Attention Network) and IA-SSD (Instance-Aware Single-Stage Detector). The experimental results show that the pedestrian detection accuracy of the proposed network is increased by 12.12, 13.82 and 11.03 percentage points compared with TANet, which has the suboptimal pedestrian detection accuracy, under three difficulty levels of simple, medium,and difficult degrees; the cyclist detection accuracy of the proposed network is 3.06 and 5.34 percentage points higher than that of IA-SSD under medium and difficult degrees. In summary, the network proposed in this paper can be better applied to small object detection tasks.
Object detection in autonomous driving scenes is one of the important research directions in computer vision. The researches focus on ensuring real-time and accurate object detection of objects by autonomous vehicles. Recently, a rapid development in deep learning technology had been witnessed, and its wide application in the field of autonomous driving had prompted substantial progress in this field. An analysis was conducted on the research status of object detection by YOLO (You Only Look Once) algorithms in the field of autonomous driving from the following four aspects. Firstly, the ideas and improvement methods of the single-stage YOLO series of detection algorithms were summarized, and the advantages and disadvantages of the YOLO series of algorithms were analyzed. Secondly, the YOLO algorithm-based object detection applications in autonomous driving scenes were introduced, the research status and applications for the detection and recognition of traffic vehicles, pedestrians, and traffic signals were expounded and summarized respectively. Additionally, the commonly used evaluation indicators in object detection, as well as the object detection datasets and automatic driving scene datasets, were summarized. Lastly, the problems and future development directions of object detection were discussed.
The automatic segmentation of brain lesions provides a reliable basis for the timely diagnosis and treatment of stroke patients and the formulation of diagnosis and treatment plans, but obtaining large-scale labeled data is expensive and time-consuming. Semi-Supervised Learning (SSL) methods alleviate this problem by utilizing a large number of unlabeled images and a limited number of labeled images. Aiming at the two problems of pseudo-label noise in SSL and the lack of ability of existing Three-Dimensional (3D) networks to focus on smaller objects, a semi-supervised method was proposed, namely, a rectified cross pseudo supervised method with attention mechanism for stroke lesion segmentation RPE-CPS (Rectified Cross Pseudo Supervision with Project & Excite modules). First, the data was input into two 3D U-Net segmentation networks with the same structure but different initializations, and the obtained pseudo-segmentation graphs were used for cross-supervised training of the segmentation networks, making full use of the pseudo-label data to expand the training set, and encouraging a high similarity between the predictions of different initialized networks for the same input image. Second, a correction strategy about cross-pseudo-supervised approach based on uncertainty estimation was designed to reduce the impact of the noise in pseudo-labels. Finally, in the segmentation network of 3D U-Net, in order to improve the segmentation performance of small object classes, Project & Excite (PE) modules were added behind each encoder module, decoder module and bottleneck module. In order to verify the effectiveness of the proposed method, evaluation experiments were carried out on the Acute Ischemic Stroke (AIS) dataset of the cooperative hospital and the Ischemic Stroke Lesion Segmentation Challenge (ISLES2022) dataset. The experimental results showed that when only using 20% of the labeled data in the training set, the Dice Similarity Coefficient (DSC), 95% Hausdorff Distance (HD95), and Average Surface Distance (ASD) on the public dataset ISLES2022 reached 73.87%, 6.08 mm and 1.31 mm; on the AIS dataset, DSC, HD95, and ASD reached 67.74%, 15.38 mm and 1.05 mm, respectively. Compared with the state-of-the-art semi-supervised method Uncertainty Rectified Pyramid Consistency(URPC), DSC improved by 2.19 and 3.43 percentage points, respectively. The proposed method can effectively utilize unlabeled data to improve segmentation accuracy, outperforms other semi-supervised methods, and is robust.
The quality of low-light images is poor and Low-Light Image Enhancement (LLIE) aims to improve the visual quality. Most of LLIE algorithms focus on enhancing luminance and contrast, while neglecting details. To solve this issue, a Progressive Enhancement algorithm for low-light images based on Layer Guidance (PELG) was proposed, which enhanced algorithm images to a suitable illumination level and reconstructed clear details. First, to reduce the task complexity and improve the efficiency, the image was decomposed into several frequency components by Laplace Pyramid (LP) decomposition. Secondly, since different frequency components exhibit correlation, a Transformer-based fusion model and a lightweight fusion model were respectively proposed for layer guidance. The Transformer-based model was applied between the low-frequency and the lowest high-frequency components. The lightweight model was applied between two neighbouring high-frequency components. By doing so, components were enhanced in a coarse-to-fine manner. Finally, the LP was used to reconstruct the image with uniform brightness and clear details. The experimental results show that, the proposed algorithm achieves the Peak Signal-to-Noise Ratio (PSNR) 2.3 dB higher than DSLR (Deep Stacked Laplacian Restorer) on LOL(LOw-Light dataset)-v1 and 0.55 dB higher than UNIE (Unsupervised Night Image Enhancement) on LOL-v2. Compared with other state-of-the-art LLIE algorithms, the proposed algorithm has shorter runtime and achieves significant improvement in objective and subjective quality, which is more suitable for real scenes.
Current cache side-channel attack detection technology mainly aims at a single attack mode. The detection methods for two to three attacks are limited and cannot fully cover them. In addition, although the detection accuracy of a single attack is high, as the number of attacks increases, the accuracy decreases and false positives are easily generated. To effectively detect cache side-channel attacks, a multi-object cache side-channel attack detection model based on machine learning was proposed, which utilized Hardware Performance Counter (HPC) to collect various cache side-channel attack features. Firstly, relevant feature analysis was conducted on various cache side-channel attack modes, and key features were selected and data sets were collected. Then, independent training was carried out to establish a detection model for each attack mode. Finally, during detection, test data was input into multiple models in parallel. The detection results from multiple models were employed to ascertain the presence of any cache side-channel attack. Experimental results show that the proposed model reaches high accuracies of 99.91%, 98.69% and 99.54% respectively when detecting three cache side-channel attacks: Flush+Reload, Flush+Flush and Prime+Probe. Even when multiple attacks exist at the same time, various attack modes can be accurately identified.
Sleep disorders are receiving more and more attention, and the accuracy and generalization of automated sleep stage classification are facing more and more challenges. However, due to the very limited human sleep data publicly available, the sleep stage classification task is actually similar to a few-shot scenario. And due to the widespread individual differences in sleep features, it is difficult for existing machine learning models to guarantee accurate classification of data from new subjects who have not participated in the training. In order to achieve accurate stage classification of new subjects’ sleep data, existing studies usually require additional collection and labeling of large amounts of data from new subjects and personalized fine-tuning of the model. Based on this, a new sleep stage classification model, Meta Transfer Sleep Learner (MTSL), was proposed. Inspired by the idea of Scale & Shift based weight transfer strategy in transfer learning, a new meta transfer learning framework was designed. The training phase included two steps: pre-training and meta transfer training, and many meta-tasks were used for meta transfer training. In the test phase, the model could be easily adapted to the feature distribution of new subjects by fine-tuning with only a few new subjects’ data, which greatly reduced the cost of accurate sleep stage classification for new subjects. Experimental results on two public sleep datasets show that MTSL model can achieve higher accuracy and F1-score under both single-dataset and cross-dataset conditions. This indicates that MTSL is more suitable for sleep stage classification tasks in few-shot scenarios.
In multi-agent systems, there are multiple cooperative tasks that change with time and multiple conflict optimization objective functions. To build a multi-agent system, the dynamic multiobjective multi-agent cooperative scheduling problem becomes one of critical problems. To solve this problem, a probability-driven dynamic prediction strategy was proposed to utilize the probability distributions in historical environments to predict the ones in new environments, thus generating new solutions and realizing the fast response to environmental changes. In detail, an element-based representation for probability distributions was designed to represent the adaptability of elements in dynamic environments, and the probability distributions were gradually updated towards real distributions according to the best solutions found by optimization algorithms in each iteration. Taking into account continuity and relevance of environmental changes, a fusion-based prediction mechanism was built to predict the probability distributions and to provide a priori knowledge of new environments by fusing historical probability distributions when the environment changes. A new heuristic-based sampling mechanism was also proposed by combining probability distributions and heuristic information to generate new solutions for updating out-of-date populations. The proposed probability-driven dynamic prediction strategy can be inserted into any multiobjective evolutionary algorithms, resulting in probability-driven dynamic multiobjective evolutionary algorithms. Experimental results on 10 dynamic multiobjective multi-agent cooperative scheduling problem instances show that the proposed algorithms outperform the competing algorithms in terms of solution optimality and diversity, and the proposed probability-driven dynamic prediction strategy can improve the performance of multiobjective evolutionary algorithms in dynamic environments.
Aiming at the problems of detail information loss and low segmentation accuracy in the segmentation of day and night ground-based cloud images, a segmentation network called CloudResNet-UNetwork (CloudRes-UNet) for day and night ground-based cloud images based on improved Res-UNet (Residual network-UNetwork) was proposed, in which the overall network structure of encoder-decoder was adopted. Firstly, ResNet50 was used by the encoder to extract features to enhance the feature extraction ability. Then, a Multi-Stage feature extraction (Multi-Stage) module was designed, which combined three techniques of group convolution, dilated convolution and channel shuffle to obtain high-intensity semantic information. Secondly, Efficient Channel Attention Network (ECA?Net) module was added to focus on the important information in the channel dimension, strengthen the attention to the cloud region in the ground-based cloud image, and improve the segmentation accuracy. Finally, bilinear interpolation was used by the decoder to upsample the features, which improved the clarity of the segmented image and reduced the loss of object and position information. The experimental results show that, compared with the state-of-the-art ground-based cloud image segmentation network Cloud-UNetwork (Cloud-UNet) based on deep learning, the segmentation accuracy of CloudRes-UNet on the day and night ground-based cloud image segmentation dataset is increased by 1.5 percentage points, and the Mean Intersection over Union (MIoU) is increased by 1.4 percentage points, which indicates that CloudRes-UNet obtains cloud information more accurately. It has positive significance for weather forecast, climate research, photovoltaic power generation and so on.
Gliomas are the most common primary cranial tumors arising from cancerous changes in the glia of the brain and spinal cord, with a high proportion of malignant gliomas and a significant mortality rate. Quantitative segmentation and grading of gliomas based on Magnetic Resonance Imaging (MRI) images is the main method for diagnosis and treatment of gliomas. To improve the segmentation accuracy and speed of glioma, a 3D-Ghost Convolutional Neural Network (CNN) -based MRI image segmentation algorithm for glioma, called 3D-GA-Unet, was proposed. 3D-GA-Unet was built based on 3D U-Net (3D U-shaped Network). A 3D-Ghost CNN block was designed to increase the useful output and reduce the redundant features in traditional CNNs by using linear operation. Coordinate Attention (CA) block was added, which helped to obtain more image information that was favorable to the segmentation accuracy. The model was trained and validated on the publicly available glioma dataset BraTS2018. The experimental results show that 3D-GA-Unet achieves average Dice Similarity Coefficients (DSCs) of 0.863 2, 0.847 3 and 0.803 6 and average sensitivities of 0.867 6, 0.949 2 and 0.831 5 for Whole Tumor (WT), Tumour Core (TC), and Enhanced Tumour (ET) in glioma segmentation results. It is verified that 3D-GA-Unet can accurately segment glioma images and further improve the segmentation efficiency, which is of positive significance for the clinical diagnosis of gliomas.
Current Video Super-Resolution (VSR) algorithms cannot fully utilize inter-frame information of different distances when processing complex scenes with large motion amplitude, resulting in difficulty in accurately recovering occlusion, boundaries, and multi-detail regions. A VSR model based on frame straddling optical flow was proposed to solve these problems. Firstly, shallow features of Low-Resolution frames (LR) were extracted through Residual Dense Blocks (RDBs). Then, motion estimation and compensation was performed on video frames using a Spatial Pyramid Network (SPyNet) with straddling optical flows of different time lengths, and deep feature extraction and correction was performed on inter-frame information through RDBs connected in multiple layers. Finally, the shallow and deep features were fused, and High-Resolution frames (HR) were obtained through up-sampling. The experimental results on the REDS4 public dataset show that compared with deep Video Super-Resolution network using Dynamic Upsampling Filters without explicit motion compensation (DUF-VSR), the proposed model improves Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) by 1.07 dB and 0.06, respectively. The experimental results show that the proposed model can effectively improve the quality of video image reconstruction.
Concerning the trade-off between convergence and diversity in solving the multi-trip pickup and delivery Vehicle Routing Problem (VRP), a hybrid Non-dominated Sorting Genetic Algorithm Ⅱ (NSGA-Ⅱ) combining Adaptive Large Neighborhood Search (ALNS) algorithm and Adaptive Neighborhood Selection (ANS), called NSGA-Ⅱ-ALNS-ANS, was proposed. Firstly, considering the influence of the initial population on the convergence speed of the algorithm, an improved regret insertion method was employed to obtain high-quality initial population. Secondly, to improve global and local search capabilities of the algorithm, various destroy-repair operators and neighborhood structures were designed, according to the characteristics of the pickup and delivery problem. Finally, a Best Fit Decreasing (BFD) algorithm based on random sampling and an efficient feasible solution evaluation criterion were proposed to generate vehicle routing schemes. The simulation experiments were conducted on public benchmark instances of different scales, in the comparison experiments with the MA (Memetic Algorithm), the optimal solution quality of the proposed algorithm increased by 27%. The experimental results show that the proposed algorithm can rapidly generate high-quality vehicle routing schemes that satisfy multiple constraints, and outperform the existing algorithms in terms of both convergence and diversity.
Due to the ambiguity of text and the lack of location information in training data, current state-of-the-art diffusion model cannot accurately control the locations of generated objects in the image under the condition of text prompts. To address this issue, a spatial condition of the object’s location range was introduced, and an attention-guided method was proposed based on the strong correlation between the cross-attention map in U-Net and the image spatial layout to control the generation of the attention map, thus controlling the locations of the generated objects. Specifically, based on the Stable Diffusion (SD) model, in the early stage of the generation of the cross-attention map in the U-Net layer, a loss was introduced to stimulate high attention values in the corresponding location range, and reduce the average attention value outside the range. The noise vector in the latent space was optimized step by step in each denoising step to control the generation of the attention map. Experimental results show that the proposed method can significantly control the locations of one or more objects in the generated image, and when generating multiple objects, it can reduce the phenomenon of object omission, redundant object generation, and object fusion.
Aspect-level sentiment analysis aims to predict the sentiment polarity of specific target in given text. Aiming at the problem of ignoring the syntactic relationship between aspect words and context and reducing the attention difference caused by average pooling, an aspect-level sentiment analysis model based on Alternating-Attention (AA) mechanism and Graph Convolutional Network (AA-GCN) was proposed. Firstly, the Bidirectional Long Short-Term Memory (Bi-LSTM) network was used to semantically model context and aspect words. Secondly, the GCN based on syntactic dependency tree was used to learn location information and dependencies, and the AA mechanism was used for multi-level interactive learning to adaptively adjust the attention to the target word. Finally, the final classification basis was obtained by splicing the corrected aspect features and context features. Compared with the Target-Dependent Graph Attention Network (TD-GAT), the accuracies of the proposed model on four public datasets increased by 1.13%-2.67%, and the F1 values on five public datasets increased by 0.98%-4.89%, indicating the effectiveness of using syntactic relationships and increasing keyword attention.
To solve the problems of uncertain influence factors and indicator quantification difficulty in the risk assessment of industrial control networks, a method based on fuzzy theory and attack tree was proposed, and the proposed method was tested and verified on Chinese Train Control System (CTCS). First, an attack tree model for CTCS was constructed based on network security threats and system vulnerability. α-cut Triangular Fuzzy Number (TFN) was used to calculate the interval probabilities of leaf nodes and attack paths. Then, Analytic Hierarchy Process (AHP) was adopted to establish the mathematical model for security event losses and get the final risk assessment result. Finally, the experimental result demonstrates that the proposed method implements system risk assessment effectively, predicts the attack paths successfully and reduces the influence of subjective factors. By taking advantage of the proposed method, the risk assessment result would be more realistic and provides reference and basis for the selection of security protection strategies.
Exact cover problems are NP complete problems in combinatorial optimization, and it is difficult to solve them in polynomial time by using classical algorithms. In order to solve this problem, on the open source quantum computing framework qiskit, a quantum circuit solution based on Quantum Approximate Optimization Algorithm (QAOA) was proposed, and Constrained Optimization BY Linear Approximation (COBYLA) algorithm based on the simplex method was used to optimize the parameters in the quantum logic gates. Firstly, the classical Ising model was established through the mathematical model of the exact cover problem. Secondly, the classical Ising model was quantized by using the rotation variable in quantum theory, and then the Pauli rotation operator was used to replace the rotation variable to obtain the quantum Ising model and the problem Hamiltonian, which improved the speed of QAOA in finding the optimal solution. Finally, the expected expression of the problem Hamiltonian was obtained by the accumulation of the product of the unitary transformation with the mixed Hamiltonian as the generator and the unitary transformation with the problem Hamiltonian as the generator, and the generative quantum circuit was designed accordingly. In addition, the classical processor was used to optimize the parameters in the two unitary transformations to adjust the expected value of the problem Hamiltonian, thereby increasing the probability of solution. The circuit was simulated on qiskit, IBM’s open source quantum computing framework. Experimental results show that the proposed scheme can obtain the solution of the problem in polynomial time with a probability of 95.6%, which proves that the proposed quantum circuit can find a solution to the exact cover problem with a higher probability.
To address the problems of poor effects, easily falling into suboptimal solutions, and inefficiency in neural network hyperparameter optimization, an Improved Real Coding Genetic Algorithm (IRCGA) based hyperparameter optimization algorithm for the neural network was proposed, which was named IRCGA-DNN (IRCGA for Deep Neural Network). Firstly, a real-coded form was used to represent the values of hyperparameters, which made the search space of hyperparameters more flexible. Then, a hierarchical proportional selection operator was introduced to enhance the diversity of the solution set. Finally, improved single-point crossover and variational operators were designed to explore the hyperparameter space more thoroughly and improve the efficiency and quality of the optimization algorithm, respectively. Two simulation datasets were used to show IRCGA’s performance in damage effectiveness prediction and convergence efficiency. The experimental results on two datasets indicate that, compared to GA-DNN(Genetic Algorithm for Deep Neural Network), the proposed algorithm reduces the convergence iterations by 8.7% and 13.6% individually, and the MSE (Mean Square Error) is not much different; compared to IGA-DNN(Improved Genetic Algorithm for Deep Neural Network), IRCGA-DNN achieves reductions of 22.2% and 13.6% in convergence iterations respectively. Experimental results show that the proposed algorithm is better in both convergence speed and prediction performance, and is suitable for hyperparametric optimization of neural networks.