In response to the problems of small existing chaos system parameter range and poor diffusion effect of the encrypted algorithm, a new cascaded chaotic system and a filter diffusion model were designed, and a color image encryption algorithm with unlimited image size was proposed and implemented. Firstly, a chaotic system named 2D-SIHC (Two-Dimensional Sine-Iterative-Henon Chaotic system) of was proposed with Henon mapping, Sine mapping and Iterative mapping as seed mappings, and a linear function was incorporated to extend the parameter range. In the two-dimensional sequence generated by this system, one dimension was employed to disturb pixel positions, the other dimension was applied to update filter templates and perform diffusion operations. Secondly, in order to avoid key reuse that reduces the algorithm’s security, the SHA-512 algorithm was combined with plaintext images to generate the key through the calculation of flag bits and weights. Thirdly, in order to enhance the algorithm's diffusion effect, a two-dimensional filter diffusion model was designed. Different from the traditional filter diffusion, which altered image pixel values by fixed template traversing, image pixel values were modified by the new diffusion model dynamically in the way of introducing chaotic sequences to update filter template values continuously. Finally, encryption was realized. Experimental results show that,taking the Airplane image as an example, the proposed algorithm can achieve the Number Pixel Change Rate (NPCR) of 99.605 9% and the Unified Average Change Intensity (UACI) of 33.397 1%, respectively, which are very close to the ideal values. In addition, the algorithm can resist a noise interference with 0.2 intensity and a cropping attack with 50% missing,and it has high encryption efficiency.
Graph Neural Network (GNN) is an effective graph representation learning method for processing graph structure data. However, the performance of GNN in practical applications is limited by the problem of missing information. On the one hand, the graph structure is usually sparse, making it difficult for the model to learn node features adequately. On the other hand, model training is limited because supervised learning relies on sparse label data, making it difficult to obtain robust node representations. To address these problems, a Subgraph-aware Contrastive Learning with Data Augmentation (SCLDA) model was proposed. Firstly, the relationship scores among nodes were obtained by learning the original graph through link prediction, and the edges with the highest scores were added to the original graph to generate the enhanced graph. Secondly, local subgraphs of the original and enhanced graphs were sampled by using target nodes respectively, and the target nodes of subgraphs were input to the shared GNN encoder, so as to generate the target node embeddings at subgraph level. Finally, the mutual information between similar instances was maximized on the basis of contrastive learning of the target nodes from the two perspective subgraphs. Experimental results of node classification on six public datasets Cora, Citeseer, Pubmed, Cora_ML, DBLP, and Photo show that SCLDA model improves the accuracy over the traditional GCN model by about 4.4%, 6.3%, 4.5%, 7.0%, 13.2% and 9.3%, respectively.
Visualization reconstruction technology aims to transform graphics into data forms that can be parsed and operated by machines, providing the necessary basic information for large-scale analysis, reuse and retrieval of visualization. However, the existing reconstruction methods focus on the recovery of visual information obviously, while ignoring the key role of interaction information in data analysis and understanding. To address the above problem, a visual interaction information reconstruction method for machine understanding was proposed. Firstly, interactions were defined formally to divide the visual elements into different visual groups, and the automated tools were used to extract interaction information of the visual graphics. Secondly, associations among interactions and visual elements were decoupled, and the interactions were split into independent experimental variables to build an interaction entity library. Thirdly, a standardized declarative language was formulated to realize querying of the interaction information. Finally, migration rules were designed to achieve migration adaptation of interactions among different visualizations based on visual element matching and adaptive adjustment mechanisms. The experimental cases focused on downstream tasks for machine understanding, such as visual question answering, querying, and migration. The results show that adding interaction information can enable machines to understand the semantics of visual interaction, thereby expanding the application scope of the above tasks. The above experimental results verify that proposed method can achieve structural integrity of the reconstructed visual graphics by integrating dynamic interaction information.
Environmental, Social, and Governance (ESG) indicator is a critical indicator for assessing the sustainability of enterprises. The existing ESG assessment systems face challenges such as narrow coverage, strong subjectivity, and poor timeliness. Thus, there is an urgent need for research on prediction models that can forecast ESG indicator accurately using enterprise data. Addressing the issue of inconsistent information richness among ESG-related features in enterprise data, a prediction model RCT (Richness Coordination Transformer) was proposed for enterprise ESG indicator prediction based on richness coordination technology. In this model, an auto-encoder was used in the upstream richness coordination module to coordinate features with heterogeneous information richness, thereby enhancing the ESG indicator prediction performance of the downstream module. Experimental results on real datasets demonstrate that on various prediction indicators, RCT model outperforms multiple models including Temporal Convolutional Network (TCN), Long Short-Term Memory (LSTM) network, Self-Attention Model (Transformer), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The above verifies that the effectiveness and superiority of RCT model in ESG indicator prediction.
Proliferation of multimodal harmful content on social media harms public interests and disrupts social order severely at the same time, highlighting the urgent need for effective detection methods of this content. The existing researches rely on pre-trained models to extract and fuse multimodal features, often neglect the limitations of general semantics in harmful content detection tasks, and fail to consider complex, dynamic combinations of harmful content. Therefore, a multimodal harmful content detection method based on weakly Supervised modality semantic enhancement (weak-S) was proposed. In the proposed method, weakly supervised modality information was introduced to facilitate the harmful semantic alignment of multimodal features, and a low-rank bilinear pooling-based multimodal gated integration mechanism was designed to differentiate the contributions of various information. Experimental results show that the proposed method achieves the F1 value improvements of 2.2 and 3.2 percentage points, respectively, on Harm-P and MultiOFF datasets, outperforming SOTA (State-Of-The-Art) models and validating the significance of weakly supervised modality semantics in multimodal harmful content detection. Additionally, the proposed method has improvement in generalization performance for multimodal exaggeration detection tasks.
Affective computing can provide a better teaching effectiveness and learning experience for intelligent education. Current research on affective computing in classroom domain still suffers from limited adaptability and weak perception on complex scenarios. To address these challenges, a novel hybrid architecture was proposed, namely SC-ACNet, aiming at accurate affective computing for students in classroom. In the architecture, the followings were included: a multi-scale student face detection module capable of adapting to small targets, an affective computing module with an adaptive spatial structure that can adapt to different facial postures to recognize five emotions (calm, confused, jolly, sleepy, and surprised) of students in classroom, and a self-attention module that visualized the regions of the model contributing most to the results. In addition, a new student classroom dataset, SC-ACD, was constructed to alleviate the lack of face emotion image datasets in classroom. Experimental results on SC-ACD dataset show that SC-ACNet improves the mean Average Precision (mAP) by 4.2 percentage points and the accuracy of affective computing by 9.1 percentage points compared with the baseline method YOLOv7. Furthermore, SC-ACNet has the accuracies of 0.972 and 0.994 on common sentiment datasets, namely KDEF and RaFD, validating the viability of the proposed method as a promising solution to elevate the quality of teaching and learning in intelligent classroom.
Technology terms are used to communicate information accurately in the field of science and technology. Automatically recognizing technology terms from text can help experts and the public to discover, recognize, and apply new technologies, which is great of value, but unsupervised technology term recognition methods still have some limitations, such as complex rules and poor adaptability. To enhance the ability to recognize technology terms from text, an unsupervised technology term recognition method was proposed. Firstly, a syntactic structure tree was constructed through constituency parsing. Then, the candidate technology terms were extracted from both top-down and bottom-up perspectives. Finally, the statistical frequency and semantic information were combined to determine the most appropriate technology terms. Besides, a technology term dataset was constructed to validate the effectiveness of the proposed method. Experimental results on the proposed dataset show that the proposed method with top-down extraction has the F1 score improved by 4.55 percentage points compared to the dependency-based method. Meanwhile, the analysis results conducted on case study in the field of 3D printing show that the recognized technology terms by the proposed method are in line with the development of the field, which can be used to trace the development process of technology and depict the evolution path of technology, so as to provide references for understanding, discovering, and exploring future technologies of the field.
Multi-modal abstractive summarization is commonly based on the Sequence-to-Sequence (Seq2Seq) framework, and the objective function optimizes the model at the character level, which searches locally optimal results to generate words and ignores the global semantic information of the summary samples. It may cause a problem of semantic deviation between the summary and multimodal information, resulting in factual errors. In order to solve the above problems, a multi-modal summarization model based on semantic relevance analysis was proposed. Firstly, the summary generator based on Seq2Seq framework was trained to generate candidate summaries with semantic multiplicity. Secondly, a summary evaluator based on semantic relevance analysis was applied to learn the semantic differences among candidate summaries and the evaluation mode of ROUGE (Recall-Oriented Understudy for Gisting Evaluation) from a global perspective, so that the model could be optimized at the level of summary samples. Finally, the summary evaluator was used to carry out reference-free evaluation of the candidate summaries, making the finally selected summary sample as similar as possible to the source text in semantic space. Experiments on benchmark dataset MMSS show that the proposed model can improve the evaluation indexes of ROUGE-1, ROUGE-2 and ROUGE-L by 3.17, 1.21 and 2.24 percentage points respectively compared with the current optimal MPMSE (Multimodal Pointer-generator via Multimodal Selective Encoding) model.
Focused on the issue that current classification models are generally effective on texts of one length, and a large number of long and short texts occur in actual scenes in a mixed way, a General Long and Short Text Classification Model based on Hybrid Neural Network (GLSTCM-HNN) was proposed. Firstly, BERT (Bidirectional Encoder Representations from Transformers) was applied to encode texts dynamically. Then, convolution operations were used to extract local semantic information, and a Dual Channel ATTention mechanism (DCATT) was built to enhance key text regions. Meanwhile, Recurrent Neural Network (RNN) was utilized to capture global semantic information, and a Long Text Cropping Mechanism (LTCM) was established to filter critical texts. Finally, the extracted local and global features were fused and input into Softmax function to obtain the output category. In comparison experiments on four public datasets, compared with the baseline model (BERT-TextCNN) and the best performing comparison model BERT, GLSTCM-HNN has the F1 scores increased by up to 3.87 and 5.86 percentage points respectively. In two generality experiments on mixed texts, compared with the generality model — CNN-BiLSTM/BiGRU hybrid text classification model based on Attention (CBLGA) proposed by existing research, GLSTCM-HNN has the F1 scores increased by 6.63 and 37.22 percentage points respectively. Experimental results show that the proposed model can improve the accuracy of text classification task effectively, and has generality of classification on texts with different lengths from training data and on long and short mixed texts.
In Capacitated Vehicle Routing Problem (CVRP), the influence of uncertain factors including traffic congestion, resource supply and customer demand will easily make the single optimal solution infeasible or non-optimal. To solve this problem, a Multimodal Differential Evolution (MDE) algorithm was proposed to obtain multiple alternative vehicle routing schemes with similar objective values. Firstly, combined with the characteristics of CVRP, an efficient solution individual coding and decoding strategy was constructed, and the solution individual quality was improved using a repair mechanism. Secondly, in the framework of Differential Evolution (DE) algorithm, a dynamic radius niche generation method was introduced from the perspective of multimodal optimization, and the Jaccard coefficient was used to measure the similarity between solution individuals, which realized the calculation of the distance between solution individuals. Finally, the neighborhood search strategy was modified, and a multimodal optimal solution set was obtained using elite archiving and updating strategy. Simulation and analysis results based on typical datasets show that the average number of optimal solutions obtained by the proposed MDE algorithm reaches 1.743 4, and the deviation between the average optimal solution obtained by the proposed MDE algorithm and the known optimal solution is 0.03%, better than 0.8486 and 0.63% obtained by the DE algorithm. It can be seen that the proposed algorithm has high effectiveness and stability in solving CVRP, and can obtain multiple optimal solutions for CVRP simultaneously.
It is an effective hybrid strategy for imbalanced data classification of integrating cost-sensitivity and resampling methods into the ensemble algorithms. Concerning the problem that the misclassification cost calculation and undersampling process less consider the intra-class and inter-class distributions of samples in the existing hybrid methods, an imbalanced data classification algorithm based on ball cluster partitioning and undersampling with density peak optimization was proposed, named Boosting algorithm based on Ball Cluster Partitioning and UnderSampling with Density Peak optimization (DPBCPUSBoost). Firstly, the density peak information was used to define the sampling weights of majority samples, and the majority ball cluster with “neighbor cluster” was divided into “area misclassified easily” and “area misclassified hardly”, then the sampling weight of samples in “area misclassified easily” was increased. Secondly, the majority samples were undersampled based on the sampling weights in the first iteration, then the majority samples were undersampled based on the sample distribution weight in every iteration. And the weak classifier was trained on the temporary training set combining the undersampled majority samples with all minority samples. Finally, the density peak information of samples was combined with the categorical distribution of samples to define the different misclassification costs for all samples, and the weights of samples with higher misclassification cost were increased by the cost adjustment function. Experimental results on 10 KEEL datasets indicate that, the number of datasets with the highest performance achieved by DPBCPUSBoost is more than that of the imbalanced data classification algorithms such as Adaptive Boosting (AdaBoost), Cost-sensitive AdaBoost (AdaCost), Random UnderSampling Boosting (RUSBoost) and UnderSampling and Cost-sensitive Boosting (USCBoost), in terms of evaluation metrics such as Accuracy, F1-Score, Geometric Mean (G-mean) and Area Under Curve (AUC) of Receiver Operating Characteristic (ROC). Experimental results verify that the definition of sample misclassification cost and sampling weight of the proposed DPBCPUSBoost is effective.
The subgraph isomorphism problem is a Non-deterministic Polynomial (NP)-complete problem, and the pivoted subgraph isomorphism is a special subgraph isomorphism problem. There are many existing efficient subgraph isomorphism algorithms, but there is no GPU-based search algorithm for the pivoted subgraph isomorphism problem at present, and a large number of unnecessary intermediate results will be generated when the pivoted subgraph matching problem is solved by the existing subgraph isomorphism algorithms. Therefore, a GPU-based pivoted subgraph isomorphism algorithm was proposed. Firstly, through a novel coding tree method, nodes were encoded by the combination of node labels, degrees and the structural features of node neighbors. And the query graph nodes were pruned on GPU in parallel, so that the size of search space tree generated by the data graph candidate nodes was significantly reduced. Then, the candidate nodes of the query graph node were visited level by level, and the unsatisfied nodes were filtered out. Finally, the obtained subgraph was verified whether it was an isomorphic subgraph of the query graph, and the search of pivoted subgraph isomorphism was realized efficiently. Experimental results show that compared with GPU-friendly Subgraph Matching (GpSM) algorithm, the proposed algorithm has the execution time reduced by one-half, and the proposed algorithm can efficiently perform the pivoted subgraph isomorphism search with scalability. The proposed pivoted subgraph isomorphism algorithm can reduce the time required to solve the pivoted subgraph isomorphism problem, while reducing GPU memory consumption and improving the performance of algorithm.
The large number of duplicate images in the database not only affects the performance of the learner, but also consumes a lot of storage space. For massive image deduplication, a duplicate detection algorithm for massive images was proposed based on pHash (perception Hashing). Firstly, the pHash values of all images were generated. Secondly, the pHash values were divided into several parts with the same length. If the values of one of the pHash parts of the two images were equal to each other, the two images might be duplicate. Finally, the transitivity of image duplicate was discussed, and corresponding algorithms for transitivity case and non-transitivity case were proposed. Experimental results show that the proposed algorithms are effective in processing massive images. When the similarity threshold is 13, detecting the duplicate of nearly 300000 images by the proposed transitive algorithm only takes about two minutes with the accuracy around 53%.
To avoid the limitations of the traditional fuzzy rule based on Genetic Algorithm (GA), a calculation method of fuzzy control rule which contains weight coefficient was presented. GA was used to find the best weight coefficient which calculate the fuzzy rules. In this method, different weight coefficients could be provided according to different input levels, the correlation and symmetry of the weight coefficients could be used to assess all the fuzzy rules and then reduce the influence of the invalid rules. The performance comparison experiments show that the system which consists of these fuzzy rules has small overshoot, short adjustment time, and practical applications in fuzzy control. The experiments of different stimulus signals show that the system which consists of these fuzzy rules doesnt rely on stimulus signal as well as having a good tracking effect and stronger robustness.