Journal of Computer Applications

Two-stage data selection method for classifier with low energy consumption and high performance

Shuangshuang CUI, Hongzhi WANG, Jiahao ZHU, Hao WU

2025, 45(6): 1703-1711. DOI: 10.11772/j.issn.1001-9081.2024060883

Asbtract ( )

HTML ( )

PDF (2107KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of large training data size， long training time and high carbon emission when constructing classification models using massive data， a two-stage data selection method TSDS （Two-Stage Data Selection） was proposed for low energy consumption and high classifier performance. Firstly， the clustering center was determined by modifying the cosine similarity， and the sample data was split and hierarchically clustered on the basis of dissimilar points. Then， the clustering results were sampled adaptively according to the data distribution， so as to obtain a high-quality subset. Finally， the subset was used to train on the classification model， which accelerated the training process and improved the model accuracy at the same time. Support Vector Machine （SVM） and Multi-Layer Perceptron （MLP） classification models were constructed on six datasets， including Spambase， Bupa and Phoneme， to verify the performance of TSDS. Experimental results show that when the sample data compression ratio reaches 85.00%， TSDS can improve the classification model accuracy by 3 to 10 percentage points， and accelerates model training at the same time， with reducing the energy consumption of SVM classifiers by average 93.76%， and reducing that of MLP classifiers by average 75.41%. It can be seen that TSDS can shorten the training time and reduce the energy consumption， as well as improve the performance of classifiers in classification tasks in big data scenarios， thereby helping to achieve the “carbon peaking and carbon neutrality” goal.

Subspace Gaussian mixture model clustering ensemble algorithm based on maximum mean discrepancy

Yulin HE, Xu LI, Yingting HE, Laizhong CUI, Zhexue HUANG

2025, 45(6): 1712-1723. DOI: 10.11772/j.issn.1001-9081.2024070943

Asbtract ( )

HTML ( )

PDF (2129KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of limited capability and parameter sensitivity of Gaussian Mixture Model （GMM） clustering algorithms in processing large-scale high-dimensional data clustering， a Subspace GMM Clustering Ensemble （SGMM-CE） algorithm based on Maximum Mean Discrepancy （MMD） was proposed. Firstly， Random Sample Partition （RSP） was performed to the original large-scale high-dimensional dataset to obtain multiple subsets of data， thereby reducing the size of clustering problem from the perspective of sample size. Secondly， subspace learning was performed in the high-dimensional feature space corresponding to each subset of data by considering the influence of features on optimal number of GMM components， so that multiple low-dimensional feature subspaces corresponding to each high-dimensional feature space were obtained， and then GMM clustering was conducted on each subspace to obtain a series of heterogeneous GMMs. Thirdly， GMM clustering results of different subspaces from the same subset of data were relabeled and merged on the basis of the proposed Average Shared Affiliation Probability （ASAP）. Finally， the expanded Subspace MMD （SubMMD） was used as a criterion to measure distributional consistency between two clusters in the clustering results of different subsets of data， so as to relabel and merge clustering results of these subsets of data based on the above， thereby obtaining the final clustering ensemble result of the original dataset. Exhaustive experiments were conducted to validate the effectiveness of SGMM-CE algorithm. Experimental results show that compared with the best-performing comparison algorithm — Meta-CLustering Algorithm （MCLA）， SGMM-CE algorithm increases 19%， 20%， and 52% for Normalized Mutual Information （NMI）， Clustering Accuracy （CA） and Adjusted Rand Index （ARI） values， respectively， on the given clustering datasets. Besides， the feasibility and rationality experimental results reflect that SGMM-CE algorithm has parameter convergence and time efficiency， demonstrating that this algorithm can deal with large-scale high-dimensional data clustering problems effectively.

Aspect-based sentiment analysis model integrating syntax and sentiment knowledge

Ziliang LI, Guangli ZHU, Yulei ZHANG, Jiajia LIU, Yixuan JIAO, Shunxiang ZHANG

2025, 45(6): 1724-1731. DOI: 10.11772/j.issn.1001-9081.2024060903

Asbtract ( )

HTML ( )

PDF (1499KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aspect-Based Sentiment Analysis （ABSA） is a fine-grained sentiment analysis task aiming to analyze sentiment polarity of specific aspect words in a given text. Existing ABSA methods use Graph Convolutional Network （GCN） to process syntactic and semantic information， but they treat all syntactic dependencies of aspect words equally， ignoring the impact of distant unrelated words on target aspect words， resulting in inappropriate weight allocation of target aspect words and viewpoint words， and insufficient extraction of semantic information. Aiming at these issues， an ABSA model integrating syntax and sentiment knowledge was proposed. Firstly， a reachability matrix was constructed according to syntactic information. Based on this， a syntactic enhancement graph was constructed by weighting the central position through the aspect words. Secondly， a semantic enhancement graph was constructed by external emotional knowledge and aspect enhancement， and graph convolution was used to fully model the syntactic enhancement graph and semantic enhancement graph， respectively， so as to form different feature channels. Thirdly， biaffine attention was used to integrate syntactic and semantic information effectively. Finally， average-pooling and concatenation operations were used to obtain the final feature vectors corresponding to the aspect words. Experimental results indicate that compared with the deep dependency aware graph convolutional network model — DA-GCN-BERT （deep Dependency Aware GCN+BERT（Bidirectional Encoder Representations from Transformers））， the proposed model achieves the accuracy improvements of 1.71， 1.41， 1.27， 0.17， and 0.43 percentage points on five publicly available datasets， respectively. It can be seen that the proposed model has strong applicability in the ABSA field.

Multi-label text classification method based on contrastive learning enhanced dual-attention mechanism

Mingfeng YU, Yongbin QIN, Ruizhang HUANG, Yanping CHEN, Chuan LIN

2025, 45(6): 1732-1740. DOI: 10.11772/j.issn.1001-9081.2024070909

Asbtract ( )

HTML ( )

PDF (1801KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem that the existing methods based on attention mechanism are difficult to capture complex dependencies among texts， a multi-label text classification method based on contrastive learning enhanced dual-attention mechanism was proposed. Firstly， text representations based on self-attention and label attention were learned respectively， and the two were fused to obtain a more comprehensive text representation for capturing structural features of the text and semantic associations among the text and labels. Then， a multi-label contrastive learning objective was given to supervise the learning of text representations by label-guided text similarity， thereby capturing complex dependencies among the texts at topic， content， and structural levels. Finally， a feedforward neural network was used as a classifier for text classification. Experimental results demonstrate that compared with LDGN （Label-specific Dual Graph neural Network）， the proposed method improves the normalized Discounted Cumulative Gain at top-5 （nDCG@5） value by 1.81 and 0.86 percentage points， respectively， on EUR-Lex （European Union Law Document） dataset and Reuters-21578 dataset， and achieves competitive results on AAPD （Arxiv Academic Paper Dataset） dataset and RCV1 （Reuters Corpus Volume Ⅰ） dataset. It can be seen that this method can capture the complex dependencies among texts at topic， content， and structural levels effectively， resulting in good performance in multi-label text classification tasks.

Legal case retrieval method integrating temporal behavior chain and event type

Lilin ZHAN, Yongbin QIN, Ruizhang HUANG, Hua WANG, Yanping CHEN

2025, 45(6): 1741-1747. DOI: 10.11772/j.issn.1001-9081.2024070917

Asbtract ( )

HTML ( )

PDF (1642KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that the existing Legal Case Retrieval （LCR） methods lack effective utilization of case elements and are easily misled by similarity of semantic structure of the case content， an LCR method integrating temporal behavior chain and event type was proposed. Firstly， the sequence labeling method was adopted to identify legal event type in the case description， and the temporal behavior chain was constructed by using behavioral elements in the case text， thereby highlighting key elements of the case， so that the model focused on core content of the case， so as to solve the problem that the existing methods are easily misled by similarity of semantic structure of the case content. Secondly， similarity vector representation matrix of the temporal behavior chain was constructed by segmented coding to enhance semantic interaction of behavioral elements among cases. Finally， through the aggregation scorer， relevance of the cases was measured from three perspectives： temporal behavior chain， legal event type， and crime type， so as to increase rationality of the case matching score. Experimental results show that on LeCaRD （Legal Case Retrieval Dataset）， compared with SAILER （Structure-Aware pre-traIned language model for LEgal case Retrieval） method， the proposed method has the P@5 value improved by 4 percentage points， the P@10 value increased by 3 percentage points， the MAP value improved by 4 percentage points， and the NDCG@30 value increased by 0.8 percentage points. It can be seen that this method utilizes case elements effectively to avoid interference of similarity of semantic structure of the case content， and can provide a reliable basis for LCR.

Visual interaction information reconstruction method for machine understanding

Xin LI, Wen LIU, Jixiu LIAO, Zongchi YANG

2025, 45(6): 1748-1755. DOI: 10.11772/j.issn.1001-9081.2024060904

Asbtract ( )

HTML ( )

PDF (3602KB) ( )

Figures and Tables | References | Related Articles | Metrics

Visualization reconstruction technology aims to transform graphics into data forms that can be parsed and operated by machines， providing the necessary basic information for large-scale analysis， reuse and retrieval of visualization. However， the existing reconstruction methods focus on the recovery of visual information obviously， while ignoring the key role of interaction information in data analysis and understanding. To address the above problem， a visual interaction information reconstruction method for machine understanding was proposed. Firstly， interactions were defined formally to divide the visual elements into different visual groups， and the automated tools were used to extract interaction information of the visual graphics. Secondly， associations among interactions and visual elements were decoupled， and the interactions were split into independent experimental variables to build an interaction entity library. Thirdly， a standardized declarative language was formulated to realize querying of the interaction information. Finally， migration rules were designed to achieve migration adaptation of interactions among different visualizations based on visual element matching and adaptive adjustment mechanisms. The experimental cases focused on downstream tasks for machine understanding， such as visual question answering， querying， and migration. The results show that adding interaction information can enable machines to understand the semantics of visual interaction， thereby expanding the application scope of the above tasks. The above experimental results verify that proposed method can achieve structural integrity of the reconstructed visual graphics by integrating dynamic interaction information.

Digital content copyright protection and fair tracking scheme based on blockchain

Li’e WANG, Caiyi LIN, Yongdong LI, Xingcheng FU, Xianxian LI

2025, 45(6): 1756-1765. DOI: 10.11772/j.issn.1001-9081.2024060901

Asbtract ( )

HTML ( )

PDF (3016KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the problems that copyright owners maliciously frame purchasers up and purchasers know their own watermarks so remove them easily during the digital content copyright protection and tracking processes， a digital content copyright protection and fair tracking scheme based on blockchain was proposed. Firstly， Paillier homomorphic encryption algorithm and key distribution smart contract were used to change the purchaser’s watermark in ciphertext state， and the watermark was embedded in the encrypted digital content. Secondly， the key distribution smart contract and arbitration smart contract were called by the verification node in blockchain， which solved the single point of failure problem in the traditional copyright protection solutions. Finally， experiments were conducted to verify the performance of the proposed scheme. The results show that when the digital content size is 1 024×1 024， compared with the blockchain-enabled accountability mechanism against information leakage in vertical industry services， the proposed scheme has the total execution time of encryption and watermark embedding reduced by 94.92%， and the total decryption execution time reduced by 79.72%. It can be seen that the proposed scheme has low total time and operating costs with good efficiency， and can be widely used in the field of digital content copyright protection.

Vehicular edge computing scheme with task offloading and resource optimization

Tianyu XUE, Aiping LI, Liguo DUAN

2025, 45(6): 1766-1775. DOI: 10.11772/j.issn.1001-9081.2024060905

Asbtract ( )

HTML ( )

PDF (3414KB) ( )

Figures and Tables | References | Related Articles | Metrics

In view of the increasing demand for user experience quality， the difficulty in obtaining link status caused by highly mobile vehicles， and the time-varying problem of heterogeneous edge nodes providing resources to vehicles in Vehicle Edge Computing （VEC）， a VEC scheme based on Joint Task Offloading and Resource Optimization （JTO-RO） was developed. Firstly， without loss of the generality， a Vehicle-to-Infrastructure （V2I） transmission model was proposed by considering the intra-edge and inter-edge interference comprehensively. In the model， by introducing Non-Orthogonal Multiple Access （NOMA） technology， edge nodes did not rely on link status information and improved the channel capacity at the same time. Secondly， in order to enhance performance and efficiency of the system， a Multi-Agent Twin Delayed Deep Deterministic policy gradient （MATD3） algorithm was designed to formulate task offloading strategies， which were able to be adjusted dynamically through interactive learning with the environment. Thirdly， the synergies of the two strategies were considered jointly， and an optimization scheme was formulated with the goal of maximizing the task service ratio to meet the increasing user experience quality requirements. Finally， simulation was carried out on a real vehicle trajectory dataset. The results show that compared with three current representative schemes （the schemes using Random Offloading （RO） algorithm， D4PG （Distributed Distributional Deep Deterministic Policy Gradient） algorithm， and MADDPG （Multi-Agent Deep Deterministic Policy Gradient） algorithm as task offloading algorithms as task offloading algorithm， respectively）， the proposed scheme has the average service ratio improved by more than 20%， 10%， and 29%， respectively， in three scenarios （normal scenario， task-intensive scenario and delay-sensitive scenario）， verifying the advantages and effectiveness of the scheme.

Multi-scale information fusion time series long-term forecasting model based on neural network

Lanhao LI, Haojun YAN, Haoyi ZHOU, Qingyun SUN, Jianxin LI

2025, 45(6): 1776-1783. DOI: 10.11772/j.issn.1001-9081.2024070930

Asbtract ( )

HTML ( )

PDF (1260KB) ( )

Figures and Tables | References | Related Articles | Metrics

Time series data come from a wide range of social fields， from meteorology to finance and to medicine. Accurate long-term prediction is a key issue in time series data analysis， processing， and research. Aiming at exploitation and utilization of the correlation of different scales in time series data， a multi-scale information fusion time series long-term forecasting model based on neural network — ScaleNN was proposed with the purpose of better handling multi-scale problem in time series data to achieve more accurate long-term forecast. Firstly， fully connected neural network and convolutional neural network were combined to extract both global and local information effectively， and the two were aggregated for prediction. Then， by introducing a compression mechanism in the global information representation module， longer sequence input was accepted with a lighter structure， which increased perceptual range of the model and improved the model’s performance. Numerous experimental results demonstrate that ScaleNN outperforms the current excellent model in this field — PatchTST （Patch Time Series Transformer） on multiple real-world datasets. In specific， the running time is shortened by 35% with only 19% parameters required. It can be seen that ScaleNN can be applied to time series prediction problems in various fields widely， providing a foundation for forecasting in areas such as traffic flow prediction and weather forecasting.

Readmission prediction model based on graph contrastive learning

Chaoying JIANG, Qian LI, Ning LIU, Lei LIU, Lizhen CUI

2025, 45(6): 1784-1792. DOI: 10.11772/j.issn.1001-9081.2024060902

Asbtract ( )

HTML ( )

PDF (1708KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the problems of the insufficient mining of relationship among inter-disease joint effects and readmission and the weak generalization ability of related models， a readmission prediction model based on graph contrastive learning was proposed， called HealthGraph. Firstly， the disease co-occurrence information in the dataset was used to construct a disease code map， so that the correlation information among diseases was fully explored. Then， a patient data augmentation method was proposed with the guidance of the idea of graph contrastive learning， and the topology related to the task was captured by the graph sampler adaptively， and a new view was constructed to improve the data richness， thereby improving generalization performance of the model. Finally， readmission prediction was carried out by combining the initial disease code map embedding and the new view embedding. The respiratory and circulatory system diseases datasets were constructed on real dataset MIMIC-Ⅲ and extensive experiments were conducted. The results show that compared with REverse Time AttentIoN model （RETAIN） and the Stage-aware neural Network model （StageNet）， the proposed model has the accuracy and F1 indicators improved by about 1 percentage point. In addition， results of two groups of ablation experiments verify the effectiveness of the proposed model in improving the accuracy and generalization of readmission prediction.

Multi-view entity alignment combining triples and text attributes

Sheping ZHAI, Yan HUANG, Qing YANG, Rui YANG

2025, 45(6): 1793-1800. DOI: 10.11772/j.issn.1001-9081.2024050703

Asbtract ( )

HTML ( )

PDF (1754KB) ( )

Figures and Tables | References | Related Articles | Metrics

Entity Alignment （EA） is to identify entities referring to the same thing in the Knowledge Graphs （KGs） of different sources. Most of the existing EA models focus on characteristics of the entities themselves， some of the models introduce entity relationship and attribute information to assist in alignment. However， these models ignore potential neighborhood information and semantic information in the entities. In order to solve the above problems， a Multi-view EA model combining triples and text attributes （MultiEA） was proposed. In the proposed model， entity information was divided into multiple views to achieve alignment. For the lack of neighborhood information， Graph Convolutional Network （GCN） and translation model were used to learn relationship information embedded in entities in parallel. Aiming at the lack of semantic information， word embedding and pre-trained language model were adopted to learn semantic information of attribute text. Experimental results show that on the three sub-datasets of DBP15K， compared to the baseline model EPEA （Entity-Pair Embedding Approach for KG alignment） that yields the optimal results， the Hits@1 value of the proposed model is increased by 2.18，1.36 and 0.96 percentage points， respectively， and the Mean Reciprocal Rank （MRR） of the proposed model is improved by 2.4，0.9 and 0.5 percentage points， respectively， indicating the effectiveness of the proposed model.

Relation extraction method combining semantic enhancement and perception attention

Dawei YANG, Xihai XU, Wei SONG

2025, 45(6): 1801-1808. DOI: 10.11772/j.issn.1001-9081.2024060776

Asbtract ( )

HTML ( )

PDF (1839KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focusing on the issues that text feature extraction lacks consideration of the contextual discriminative features of sentences and fails to fully utilize the association information among instances and relation labels， a method combining Semantic enhancement and Perception attention for Relation Extraction （SPRE） was proposed. Firstly， during the sentence feature encoding phase， a Semantic Enhancement Mechanism （SEM） was constructed to extract salient semantic features of sentences， and a salient information enhanced sentence representation was obtained through entity-aware word embeddings and Salient Feature Perception （SFP）. Then， a Perception Attention Mechanism （PAM） was designed to integrate sentence features. In the mechanism， the matching degree among sentences and relation labels was evaluated by perceiving the semantic information among sentences and relation labels， the consistency information among entity types of sentences and the corresponding relations， and the similarity information among sentences， so as to fully utilize the dependencies between instances and relation labels in a bag， thereby further enhancing noise reduction capability of the method. Finally， after conducting relation prediction by a classifier， the network parameters were adjusted according to cross-entropy between the predicted results and the actual results. Experimental results on NYT-10 （New York Times 10） and GDS （Google Distant Supervision） datasets show that on NYT-10 dataset， compared with the BERT （Bidirectional Encoder Representations from Transformers）-based relation extraction method PARE （Passage-Attended Relation Extraction）， the proposed method achieves an Area Under Curve （AUC） increase of 2.1 percentage points and an average precision Precision@N （P@N） — P@M increase of 2.4 percentage points for the top 100， 200， and 300 data entries ranked in descending order of confidence； on GDS dataset， the AUC and P@M of the proposed method are 90.5% and 97.8% respectively. The proposed method outperforms mainstream distant supervision relation extraction methods on both datasets significantly， verifying the effectiveness of this method. It can be seen that in mainstream distant supervision relation extraction tasks， the proposed method can enhance the model’s ability to learn data features effectively.

Document-level relation extraction based on entity representation enhancement

Haijie WANG, Guangxin ZHANG, Hai SHI, Shu CHEN

2025, 45(6): 1809-1816. DOI: 10.11772/j.issn.1001-9081.2024050682

Asbtract ( )

HTML ( )

PDF (1555KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at problems of ignoring entity mention differences and lack of complexity calculation paradigm for entity-pair relation extraction in the existing entity representation learning for Document-level Relation Extraction （DocRE） tasks， a DocRE model based on Entity Representation Enhancement （DREERE） was proposed. Firstly， an attention mechanism was used to evaluate the differences of entity mentions in determining different entity-pair relations， so as to obtain more flexible entity representations. Secondly， the entity-pair sentence importance distribution computed by the encoder was used to evaluate the complexity of entity-pair relation extraction， and the two-hop information among entity-pairs was used selectively to enhance entity-pair representations. Experiments were carried out on the popular datasets DocRED， Re-DocRED and DWIE. The results show that DREERE model improves the F1 value by 0.06， 0.14， and 0.23 percentage points， respectively， and the ign-F1 （F1 score calculated by ignoring the triples that appear in the training set） value by 0.07， 0.09 and 0.12 percentage points， respectively， compared to the optimal baseline models such as ATLOP （Adaptive Thresholding and Localized cOntext Pooling） and E2GRE （Entity and Evidence Guided Relation Extraction）， indicating that DREERE model is able to acquire semantic information of entities in documents effectively.

Hyper-relational knowledge graph completion method fusing noise filtering

Shuang LIU, Daqing LIU, Jiana MENG, Di ZHAO

2025, 45(6): 1817-1826. DOI: 10.11772/j.issn.1001-9081.2024060792

Asbtract ( )

HTML ( )

PDF (3183KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that qualifiers in the hyper-relational knowledge graph will introduce irrelevant noise into the main triple， a Hyper-Relational knowledge graph completion method fusing Noise Filtering （HRNF） was proposed. Firstly， a feature enhancement module was constructed in order to enhance the hyper-relational facts effectively. At the same time， Convolutional Neural Network （CNN） was utilized to extract the ordinary triple features， and complex relational features in the hyper-relational fact were captured by Heterogeneous Graph Neural Network （HGNN）. Secondly， these two features were fused to enhance information of the main triple in the hyper-relational fact by utilizing stability and reliability of the ordinary triple， so as to reduce the effect of noise introduced by qualifiers. Thirdly， a relevance-aware module was constructed to fuse the feature representations more accurately. At the same time， Graph ATtention network version Two（GATv2） was utilized to update the enhanced feature representation by learning weights among different nodes dynamically. Fourthly， a semantic enhancement module was constructed to capture complex semantic information. Finally， Transformer model was utilized to generate the final predicted sequence by capturing the dependency between any two elements in the sequence through self-attention mechanism. To validate the effectiveness of HRNF， extensive experiments were conducted on two commonly used datasets， Wikipeople and JF17K. The results show that when predicting main triple entities， compared to the optimal GRAN （GRAph-based N-ary relational learning） of the baseline methods， the Mean Reciprocal Rank （MRR）， Hits@1， and Hits@10 of HRNF are improved by 0.6， 1.1， and 1.8 percentage points， respectively， on Wikipeople dataset， and the MRR， Hits@1， and Hits@10 of HRNF are improved by 0.5， 0.7， and 2.9 percentage points， respectively， on JF17K dataset. The above significant improvements prove that in dealing with task of hyper-relational knowledge graph completion， HRNF can reduce the noise problem brought by qualifiers effectively.

Aspect-based sentiment analysis method based on code generation

Jian SHUAI, Zhongqing WANG, Jiali CHEN

2025, 45(6): 1827-1832. DOI: 10.11772/j.issn.1001-9081.2024060885

Asbtract ( )

HTML ( )

PDF (1029KB) ( )

Figures and Tables | References | Related Articles | Metrics

Tasks of Aspect-Based Sentiment Analysis （ABSA） are receiving increasing attention from people. An ABSA method based on code generation was proposed to address the limitations of current mainstream ABSA methods that cannot fully utilize semantic relationships and learn connections among various emotional elements. Firstly， each emotional element was correspond to the Programming Language （PL）. Secondly， the experimental dataset was constructed into data patterns of code generation task which can better express relationships among various emotional elements according to the principle of correspondence. Finally， the powerful performance of current Large Language Models （LLMs） and the excellent performance of code generation methods were utilized in event extraction tasks to obtain more accurate results. To verify the effectiveness of the proposed method， comparison experiments were conducted using Paraphrase， Seq2Path， and Opinion Tree Generation （OTG） methods. Experimental results show that the proposed method achieves F1 score improvement of 2.82 percentage points compared to OTG method on the restaurant dataset in ABSA tasks， which meaning better results.

Robust unsupervised multi-task anomaly detection method for defect diagnosis of urban drainage pipe network

Longbo YAN, Wentao MAO, Zhihong ZHONG, Lilin FAN

2025, 45(6): 1833-1840. DOI: 10.11772/j.issn.1001-9081.2024050739

Asbtract ( )

HTML ( )

PDF (5059KB) ( )

Figures and Tables | References | Related Articles | Metrics

Currently， applying machine learning techniques to anomaly detection of drainage pipe defects， e.g.， pipe leakage， has become the focus of urban intelligent management. However， monitoring data of drainage pipe network collected from the real-world scenarios contains much noise， particularly sudden water level fluctuations caused by rainfall， significantly reduce the accuracy and reliability of detect results. To address the above problems， a robust unsupervised multi-task anomaly detection method was proposed for drainage network defect diagnosis. Firstly， by integrating the spatial-temporal information from multiple physical monitoring stations， a deep multi-task Support Vector Data Description （SVDD） model was established， individual hypersphere-based one-class classifiers were established for each station to extract anomaly detection rules， thus constructing a rule adaptation mechanism to obtain the common feature representation of multiple stations. Secondly， based on the obtained feature representations， sliding windows were introduced into each station’s SVDD model to continuously identify abnormal fluctuations in the pipeline monitoring data， thereby determining noise points caused by common interference factors in the monitoring data sequences. These noise points were corrected by polynomial interpolation to exclude irregular noise interference caused by rainfall. Finally， the modified monitoring sequences were employed to detect pipe network leakage based on AutoEncoder （AE） reconstruction errors. Experimental results on the real-world monitoring data collected from 2017 to 2018 by Qingtan Water Management System in Changzhou City demonstrate that the proposed method is consistent with the hand-operated maintenance records， moreover， the proposed method has higher detection accuracy and lower false alarm rate compared with statistical methods and traditional machine learning approaches. Taking the Qingtan East area as an example， the false detection rate of the method proposed in this paper when dealing with rainfall interference is 5.47 percentage points lower than that of the suboptimal method USAD （UnSupervised Anomaly Detection）， significantly improving the robustness of the model in strong noise scenarios and further verifying the accuracy and practicability of the proposed method.

Network traffic classification model integrating variational autoencoder and AdaBoost-CNN

Daoquan LI, Zheng XU, Sihui CHEN, Jiayu LIU

2025, 45(6): 1841-1848. DOI: 10.11772/j.issn.1001-9081.2024060840

Asbtract ( )

HTML ( )

PDF (2173KB) ( )

Figures and Tables | References | Related Articles | Metrics

The problem of network traffic classification has always been a challenge of iterative methods with the development of network communication， and many solutions have been developed. At present， most network data classification methods focus on the balanced dataset to facilitate experiment and calculation. To solve the problem that most real network datasets are still unbalanced， a network traffic classification model VAE-ABC （Variational AutoEncoder- Adaptive Boosting-Convolutional neural network） was proposed by integrating Variational AutoEncoder （VAE） and Adaptive Boosting Convolutional Neural Network （AdaBoost-CNN）. Firstly， at the data level， VAE was used to partially enhance the unbalanced dataset， and shorten the learning time with the VAE’s characteristics of learning data potential distribution. Then， in order to improve classification effect at the algorithm level， combining with the idea of ensemble learning， AdaBoost-CNN algorithm was designed on the basis of Adaptive Boosting （AdaBoost） algorithm with using an improved Convolutional Neural Network （CNN） as a weak classifier， thereby improving the accuracy of learning and training. Finally， the fully connected layer was used to complete feature mapping， and then the final classification results were obtained through an activation function Sigmoid. After multiple comparisons， experimental results show that the proposed model achieves an accuracy of 94.31% on the unbalanced sub-dataset of partitioned classification dataset ISCX VPN-nonVPN. Compared with AdaBoost-SVM， using Support Vector Machine （SVM） as a weak classifier， SMOTE-SVM， combining SMOTE （Synthetic Minority Oversampling TEchnique） and SVM， and SMOTE-AB-D-T， with Decision Tree （D-T） as a weak classifier and combined with SMOTE algorithm， the proposed model has the accuracy increased by 1.34， 0.63 and 0.24 percentage points， respectively. It can be seen that the classification effect of this model is better than those of other models on this dataset.

Cognitive load EEG recognition model integrating variational graph autoencoder and local-global graph network

Tiantong ZHOU, Yanqi ZHENG, Tao WEI, Yakang DAI, Ling ZOU

2025, 45(6): 1849-1857. DOI: 10.11772/j.issn.1001-9081.2024060794

Asbtract ( )

HTML ( )

PDF (2006KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues in cognitive load recognition models such as excessive reliance on manual feature extraction， ignorance of spatial information in ElectroEncephaloGram （EEG） signals， and inability to learn graph structure data effectively， a cognitive load EEG recognition model with Variational Graph AutoEncoder （VGAE） and Local-Global Graph Network （VLGGNet） was proposed. The model was consisted of two parts： a temporal learning module and a graph learning module. Firstly， the temporal learning module was employed to capture dynamic frequency representation of EEG signals using multi-scale temporal convolution， and features extracted by the multi-scale convolution were fused through Spatial and Channel reconstruction Convolution （SCConv） and 1 $×$ 1 convolutional kernel cascading module. Then， the graph learning module was employed to define the EEG data as a local-global graph， where in the local graph feature extraction layer， the node attributes were aggregated into a low-dimensional vector， and in the global graph feature extraction layer， the graph structure was reconstructed using VGAE. Finally， the lightweight graph convolution operations were performed to the global graph and the node feature vector， and the prediction results were output by the fully connected layer. Through nested cross-validation， experimental results show that VLGGNet has the mean Accuracy （mAcc） and mean F1 score （mF1） improved by 4.07 and 3.86 percentage points， respectively， compared with the sub-optimal Local-Global Graph Network （LGGNet） on Mental Arithmetic Task （MAT） dataset； compared with the best-performing multi-scale Temporal-Spatial convolutional neural network （TSception） on Simultaneous Task EEG Workload （STEW） dataset， the mAcc of VLGGNet is the same as that of TSception， and VLGGNet has the mF1 only reduced by 0.01 percentage points. It can be seen that VLGGNet improves performance of cognitive load classification， and it is verified that prefrontal and frontal regions are closely related to cognitive load status.

Multimodal fusion recommendation algorithm based on joint self-supervised learning

Zonghang WU, Dong ZHANG, Guanyu LI

2025, 45(6): 1858-1868. DOI: 10.11772/j.issn.1001-9081.2024060824

Asbtract ( )

HTML ( )

PDF (1860KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the data sparsity problem in multimodal recommendation algorithms and the problem in the existing Self-Supervised Learning （SSL） algorithms that the algorithms often focus on SSL a single feature in a dataset， ignoring the possibility of joint learning of multiple features， a multimodal fusion recommendation algorithm based on joint self-supervised learning was proposed， called SFELMMR （SelF-supErvised Learning for MultiModal Recommendation）. Firstly， the existing SSL strategies were integrated and optimized to enhance data representation capabilities significantly by learning data features from different modalities jointly， thereby alleviating the data sparsity issue. Secondly， a method to construct multimodal latent semantic graph was designed by integrating deep item relationships from a global perspective with direct interactions from a local perspective， enabling the algorithm to capture complex relationships among items more accurately. Finally， experiments were carried out on three datasets. The results demonstrate that the proposed algorithm achieves significant improvements in multiple recommendation performance metrics compared to the existing best-performing multimodal recommendation algorithms. Specifically， the proposed algorithm has the Recall@10 improved by 5.49%， 2.56%， and 2.99%， respectively， the NDCG@10 improved by 1.17%， 1.98%， and 3.52%， respectively， the Precision@10 improved by 4.69%， 2.74%， and 1.22%， respectively， and the Map@10 improved by 0.81%， 1.59%， and 3.11%， respectively. Besides， through ablation experiments of the proposed algorithm， the effectiveness of the algorithm is verified.

Recipe recommendation model based on hierarchical learning of flavor embedding heterogeneous graph

Wenjing YAN, Ruidong WANG, Min ZUO, Qingchuan ZHANG

2025, 45(6): 1869-1878. DOI: 10.11772/j.issn.1001-9081.2024060859

Asbtract ( )

HTML ( )

PDF (2465KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of incomplete information dimension， sparse interaction data and redundant interaction information in recipe recommendation tasks， a Recipe recommendation model based on hierarchical learning of Flavor embedding heterogeneous graph （RecipeFlavor） was proposed. Firstly， the flavor molecule dimension was introduced， and a heterogeneous graph was constructed on the basis of users， foods， ingredients and flavor substances of ingredients to represent the connection among four kinds of nodes effectively. Then， a hierarchical learning module based on heterogeneous graph was constructed on the basis of information transmission mechanism， and combined with Squeeze Attention （SA） mechanism， different node relationships were regarded as different information channels， so that key interaction information between nodes was extracted and noise was suppressed. Finally， a Contrastive Learning （CL） module was constructed on the basis of feature-aware noise， and positive and negative sample discrimination tasks were introduced in model learning， thereby enhancing the information associations among users and recipe nodes and improving the model’s learning ability for features. Experimental results show that compared with HGAT （Hierarchical Graph ATtention network for recipe recommendation） model on Recipe 1M+ large dataset， RecipeFlavor has the Area Under the ROC Curve （AUC） increased by 1.44 percentage points， and the model Precision （Pre）， Hit Rate （HR）， Mean Average Precision （MAP）， and Normalized Discounted Cumulative Gain （NDCG） of Top-10 increased by 0.76， 6.11， 2.68， and 3.05 percentage points， respectively. It can be seen that the introduction of flavor molecule information expands the learning dimension of recipe recommendation， and RecipeFlavor can extract key information in heterogeneous graph effectively， and enhance correlation among users and recipes， and thus improving the precision of recipe recommendations.

Paper recommendation method with mixed information enhancement

Panpan GUO, Gang ZHOU, Jicang LU, Zhufeng LI, Taojie ZHU

2025, 45(6): 1879-1887. DOI: 10.11772/j.issn.1001-9081.2024050708

Asbtract ( )

HTML ( )

PDF (2014KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of data sparsity and cold start in traditional Collaborative Filtering （CF）， as well as errors caused by various transformations in the process of generating result matrices using matrix factorization methods， a Low-rank and Sparse Matrix Factorization （LSMF） paper recommendation method with mixed information enhancement was proposed. Firstly， pre-trained document-level representation learning and citation aware converter — SPECTER （Scientific Paper Embeddings using Citation-informed TransformERs） was used to learn the representation of papers， and then the similarity matrix among papers was calculated and constructed. Secondly， the similarity matrix and citation matrix were added together to form a mixed information matrix， and then the content similarity information and citation information were integrated into the paper-author matrix through matrix multiplication. Finally， the recommendation list was obtained by using LSMF model to decompose the paper-author matrix. Experimental results on ACL Anthology Network （AAN） and DBLP datasets show that the proposed method achieves better recommendation performance， and the way of introducing content information and citation information in the proposed method can be equally applicable to other matrix factorization models. For Non-negative Matrix Factorization （NMF）， Singular Value Decomposition （SVD）， Low-rank and Sparse Matrix Completion （LSMC）， and Go Decomposition （GoDec）， the Recall values of the top 30 recommended results （R@30） of these models with mixed information are increased by 18.72，7.43，11.53，14.62 and 20.58， 2.11， 7.91， 5.01 percentage points， respectively， compared with those of the original models on the two datasets.

Comparability assessment and comparative citation generation method for scientific papers

Xiangyu LI, Jingqiang CHEN

2025, 45(6): 1888-1894. DOI: 10.11772/j.issn.1001-9081.2024060898

Asbtract ( )

HTML ( )

PDF (1508KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the two major challenges in comparative citation generation — determining the comparability between papers accurately and generating comparative sentences， a Comparability Assessment （CA） and comparative citation generation method for scientific papers， named SciCACG（Scientific Comparability Assessment and Citation Generation）， was proposed. Three core modules were constructed in the proposed method： a CA module， which was used to determine whether two papers were comparable； a Comparison object Extraction （CE） module， which was employed to extract specific comparison objects from the papers and references， and a comparative citation generation module， which was responsible for generating the corresponding comparative citation sentences. Firstly， the SciBERT （Scientific BERT） model was used to process the two input papers， and the comparability was assessed through the CA module. Then， for papers determined to be comparable， the CE module was used to identify and extract key comparison objects. Finally， the comparative citation generation module was utilized to generate comparative citations containing these objects. Experimental results show that in the CA stage， the proposed method achieves 0.532 in Mean Reciprocal Rank （MRR） and 0.731 in Recall@10 （R@10）， and outperforms the previous SciBERT-FNN （Scientific Bidirectional Encoder Representations from Transformers-Feedforward Neural Network） method on all the datasets； in the comparative citation generation， Compared to the suboptimal BART-Large （Bidirectional and Auto-Progressive Transformers-Large） method， the F1 scores of ROUGE （Recall-Oriented Understudy for Gisting Evaluation）-1， ROUGE-2， and ROUGE-L in the proposed method have increased by 1.90， 1.29， and 2.55 percentage points， respectively. Additionally， the results confirm that the technologies of automated comparison and analysis of scientific literature are crucial for citation sentence generation tasks； particularly， in enhancing the traceability of comparative information and ensuring the comprehensiveness of citation sentences， these technologies demonstrate substantial practical value.

Correlation power analysis method of advanced encryption standard algorithm based on uniform manifold approximation and projection

Runlian ZHANG, Ruifeng TANG, Hao WANG, Xiaonian WU

2025, 45(6): 1895-1901. DOI: 10.11772/j.issn.1001-9081.2024060867

Asbtract ( )

HTML ( )

PDF (4616KB) ( )

Figures and Tables | References | Related Articles | Metrics

The efficiency of Side Channel Attack （SCA） and the accuracy of key recovery are reduced by the high noise and dimension of energy trace data collected in SCA greatly. To solve these problems， a Correlation Power Analysis （CPA） method of Advanced Encryption Standard （AES） algorithm based on Uniform Manifold Approximation and Projection （UMAP） was proposed. In the proposed method， Euclidean distance was used as a basis to calculate the set of proximate points of energy traces. Firstly， in order to capture position relationships of the energy trace data to preserve local structural features of the data， a weighted adjacency matrix was obtained by constructing an adjacency graph and calculating the similarity among proximate nodes. Then， structure relationships of the adjacency graph were described using the Laplacian matrix， and the eigenvectors with small eigenvalues were extracted as the initialized low-dimensional data from the adjacency graph by feature decomposition. Meanwhile， in order to preserve global structural features of the data， the binary cross-entropy was used as optimization function to adjust position of the data in the low-dimensional space. Furthermore， in order to improve the computational efficiency， the force-directed graph layout algorithm was adopted in the gradient descent process. Finally， correlation power attacks were performed on the dimensional reduced data to recover the key. Experimental results show that UMAP method can preserve local and global structural features of the original energy trace data effectively； the proposed method can improve the correlation between energy trace data and assumed power leakage models， and reduce the number of energy traces required for key recovery，specifically， the number of energy traces required to recover a single key byte is 180， and the number of energy traces required to recover all 16 key bytes is 700 by the proposed method； compared to the ISOmetric MAPping （ISOMAP） dimension reduction method， the proposed method reduces the number of energy traces required to recover all key bytes by 36.4%.

Vehicular digital evidence preservation and access control based on consortium blockchain

Xin SHAO, Zigang CHEN, Xingchun YANG, Haihua ZHU, Wenjun LUO, Long CHEN, Yousheng ZHOU

2025, 45(6): 1902-1910. DOI: 10.11772/j.issn.1001-9081.2024030263

Asbtract ( )

HTML ( )

PDF (2356KB) ( )

Figures and Tables | References | Related Articles | Metrics

In today’s society， the issue of frequent vehicle traffic accidents is still a serious practical problem. In order to ensure the trusted preservation and legal use of vehicle digital evidence， it is necessary to adopt advanced security technologies and strict access control mechanisms. Aiming at the preservation and sharing requirements of digital evidence on vehicle devices， an evidence preservation and access control scheme based on consortium blockchain was proposed. Firstly， based on consortium blockchain technology and InterPlanetary File System （IPFS）， on-chain and off-chain storage of the digital evidence was realized， while confidentiality of the evidence was guaranteed by symmetric key and integrity of the evidence was verified by hash value. Secondly， in the process of uploading， managing and downloading the digital evidence， an access control mechanism combining attributes and roles was introduced to realize fine-grained and dynamic access control management， thereby ensuring legal access and sharing of the evidence. Finally， comparison and performance analysis of the schemes were conducted. Experimental results show that the proposed scheme has confidentiality， integrity and non-repudiation with stability in the case of large number of concurrent requests.

Portable executable malware static detection model based on shallow artificial neural network

Tianchen HUA, Xiaoning MA, Hui ZHI

2025, 45(6): 1911-1921. DOI: 10.11772/j.issn.1001-9081.2024060857

Asbtract ( )

HTML ( )

PDF (3218KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to address the imbalance or incompleteness issues of the datasets in Portable Executable （PE） malware detection methods based on deep learning， as well as the problem of increase of model computing resource overhead and time-consuming caused by too deep neural network structure or large feature sets， a PE malware static detection model based on Shallow Artificial Neural Network （SANN） was proposed. Firstly， LIEF（Library to Instrument Executable Formats） library was used to create a PE feature extractor to extract PE file samples from EMBER dataset， and a feature combination was proposed. In this feature set， there were fewer PE features， thereby reducing the feature space and parameters while improving performance of the deep learning model. Secondly， after generating feature vectors， the unlabeled samples were removed through data cleaning. Thirdly， different feature values in the feature set were normalized. Finally， the feature vectors were input into SANN for training and testing. Experimental results show that SANN can achieve a recall of 95.64% and an accuracy of 95.24%. Compared to the MalConv model and LightGBM model， the accuracy of SANN has increased by 1.19 and 1.57 percentage points， respectively. The total working time of SANN is about half of the comparison model LightGBM that takes the least time. Besides， facing unknown attacks， SANN is flexible and can still maintain a high level of detection.

Review of checkpoint technology for multiple computing scenarios

Xiaolin CHEN, Yaqiang ZHANG, Hongzhi SHI

2025, 45(6): 1922-1933. DOI: 10.11772/j.issn.1001-9081.2024050697

Asbtract ( )

HTML ( )

PDF (2781KB) ( )

Figures and Tables | References | Related Articles | Metrics

Checkpoint technology is a method of saving the current computing task and system state in a computing system in order to roll back the system to the previously saved state when needed. It is commonly used in multiple scenarios such as system failure recovery， job migration， and job preemption. With the development of technology， there are more computing scenarios， larger computing scales， more complex structural hierarchy of computing systems， and more variable computing environments， which increase the probability of failure occurrence. At the same time， the Mean Time Between Failures （MTBT） is reduced from ［6.50 h， 40.00 h］ to 1.25 h. Therefore， checkpoint technology is becoming increasingly critical as a commonly used fault-tolerant method. Firstly， the development overview of checkpoint technology was introduced， and the existing checkpoint technologies were classified based on their technical characteristics. Then， the latest research progress was reviewed in four directions： incremental checkpoint， multi-level asynchronous checkpoint， optimal checkpoint interval， and fault perception-based checkpoint. And the current trends in checkpoint technology — dynamic， intelligent， and proactive trends， as well as challenges faced by this technology were summarized. Finally， main ideas and latest methods of optimizing checkpoint strategies were sorted out to help researchers grasp checkpoint technology’s current development status and future development trends quickly.

Online matching algorithm for supporting periodic tasks in spatial crowdsourcing

Junling LIU, Meng SUN, Huanliang SUN, Jingke XU

2025, 45(6): 1934-1944. DOI: 10.11772/j.issn.1001-9081.2024060747

Asbtract ( )

HTML ( )

PDF (2341KB) ( )

Figures and Tables | References | Related Articles | Metrics

For a type of repetitive periodic tasks with fixed demand in spatial crowdsourcing， the existing matching algorithms ignore the familiarity required for periodic tasks， so that an online matching algorithm for supporting periodic tasks in spatial crowdsourcing was proposed. Firstly， the online matching problem was regarded as a multiplayer game， with tasks considered as independent participants in the game， and utility functions of the players were determined based on the need of tasks’ preference for matching workers with high familiarity and workers’ preference for tasks with high rewards and short distances， which were then analyzed using Game Theory （GT）. Then， a Simulated Annealing （SA） strategy was introduced into the updating strategy of GT， resulting in the design of GT algorithm based on SA strategy. Finally， a matching with greater total utility was achieved with reaching a Nash equilibrium. Experimental results on real datasets demonstrate that the proposed GT algorithm achieves the highest total utility compared to existing related algorithm， the matching of the proposed GT algorithm has the highest total utility. It can be seen that the proposed GT algorithm achieves matching results that better meet the need of periodic tasks and workers， which can enhance user satisfaction on online spatial crowdsourcing platforms.

Port operation scheduling algorithm based on enhanced NSGA-Ⅱ under goals of carbon peaking and carbon neutrality

Shudong LIU, Hao WU, Jia CONG, Boyu GU

2025, 45(6): 1945-1953. DOI: 10.11772/j.issn.1001-9081.2024060757

Asbtract ( )

HTML ( )

PDF (2637KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the increasingly serious problem of global climate change， the goals of carbon peaking and carbon neutrality have been established in China. As logistics hubs and cargo distribution centers， the ports have highlighted carbon emission problem. Aiming at optimization problem of port operation scheduling， considering the key factors such as ship arrival time， cargo handling demand， quay crane operation capacity， and carbon emission cost， an optimization model of port operation scheduling was constructed for minimizing both carbon emission cost and terminal operating expense， and a port operation scheduling algorithm based on Enhanced NSGA-Ⅱ （Non-dominated Sorting Genetic Algorithm Ⅱ）（E-NSGA-Ⅱ） under the goals of carbon peaking and carbon neutrality was proposed. Firstly， the coding strategy， population initialization method and crossover and mutation operations of the algorithm were adjusted. Secondly， gene repair operators of infeasible solutions were designed， and adaptive crossover and mutation probability mechanisms were introduced. Experimental results show that compared with FCFS （First Come First Service） scheduling algorithm， the proposed algorithm reduces the total cost of model solving by 7.9%， the carbon emission cost by 19.7%， and the terminal operating expense by 6.5%. The above research results enrich the multi-objective optimization algorithm and port operation scheduling theory， and provide strong support for port enterprises to achieve green scheduling， reduce operating cost， and improve economic benefits.

Enhanced evolutionary algorithm for multi-factor flexible job shop green scheduling

Jianhua WANG, Chuanyu WU, Liping XU

2025, 45(6): 1954-1962. DOI: 10.11772/j.issn.1001-9081.2024050727

Asbtract ( )

HTML ( )

PDF (3269KB) ( )

Figures and Tables | References | Related Articles | Metrics

For Multi-factor Flexible Job shop Green Scheduling Problem with Setup and Transportation time constraints and Variable machine processing Speed （MFJGSP-STVS）， a mathematical model with completion time and energy consumption as optimization objectives was constructed， and an Enhanced Multi-objective Evolutionary Algorithm （EMoEA） was proposed to solve the problem. In the algorithm， a three-layer integer encoding method was adopted， Machine Idle time Preference （MIP） rule and Turning On/Off strategy （TOF） were applied in the decoding to optimize the objectives， and heuristic rules such as Global Search （GS） were employed to generate the initial population； a cluster crossover approach was designed on the basis of non-dominated hierarchy idea， so as to accelerate the algorithm’s convergence； to prevent the algorithm from converging prematurely and falling into the local optimum， a derivation strategy was proposed to diffuse the non-dominated solution set， and an adaptive local search strategy based on critical path was designed to further enhance the exploration capability of the algorithm in solution space. Simulation results show that each design in EMoEA has better Hypervolume （HV） and Inverted Generational Distance （IGD） metrics compared to the original multi-objective evolutionary algorithm， and compared to Non-dominated Sorting Genetic Algorithm Ⅱ （NSGA-Ⅱ） and Hybrid Jaya （HJaya） algorithm， EMoEA achieves advantages in both HV and IGD metrics with faster convergence and the optimal objective value on most instances. It can be seen that EMoEA has better performance， and EMoEA can solve MFJGSP-STVS effectively， providing high-quality scheduling schemes for enterprises.

3D domain parameterization method based on high-dimensional quasi-conformal mapping

Yuanyuan SONG, Maodong PAN

2025, 45(6): 1963-1970. DOI: 10.11772/j.issn.1001-9081.2024060850

Asbtract ( )

HTML ( )

PDF (2805KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the high-quality parameterization problem of constructing a complex Three-Dimensional （3D） computational domain with given boundary conditions in isogeometric analysis， a 3D domain parameterization method based on high-dimensional quasi-conformal mapping was proposed. The core of the proposed method is to establish a nonlinear optimization model that describe the bijectivity， angular distortion， and volume distortion of the mapping simultaneously. Firstly， the high-dimensional quasi-conformal mapping theory was used to derive a new formula for measuring angular distortion in 3D space. Then， exponential variable and volume constant were introduced into the optimization model， and geometrical meaning of the Jacobi matrix was exploited to achieve the goal of adding volume distortion while preserving mapping bijectivity. Finally， Alternating Direction Method of Multipliers （ADMM） framework was combined with L-BFGS （Limited-memory Broyden-Fletcher-Goldfarb-Shanno） method to decompose the original problem into tractable subproblems and they were solved alternatively. Experimental results show that the proposed method guarantees global bijectivity on the experimental model； the proposed method has the orthogonality increased by about 5.8% compared to ADMM-LRP （ADMM algorithm for Low-Rank Parameterization）， and has the volume uniformity improved by about 34.4% compared to TTS （Tet-To-Spline optimization strategy）. It can be seen that the proposed method achieves high-quality parameterization， ensures bijectivity of mapping， and reduces angular distortion as well as and volume distortion.

Low-light image enhancement network combining signal-to-noise ratio guided dual-branch structure and histogram equalization

Ying HUANG, Shengmei GAO, Guang CHEN, Su LIU

2025, 45(6): 1971-1979. DOI: 10.11772/j.issn.1001-9081.2024060762

Asbtract ( )

HTML ( )

PDF (4842KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that deep learning-based Low-Light Image Enhancement （LLIE） techniques generally rely on paired datasets for training， considering the difficulty of acquiring paired datasets in practical applications and its possible limitation of network generalization ability， an LLIE network combining Signal-to-Noise Ratio （SNR） guided dual-branch structure and Histogram Equalization （HE） was proposed to get rid of the dependence on paired datasets. Firstly， based on the Generative Adversarial Network （GAN） framework， a dual-branch structure of Convolutional Neural Network （CNN） and Transformer was introduced， and SNR images were used to guide the network to enhance different regions of the image adaptively， thereby obtaining a balance between image enhancement and noise suppression effectively. Then， HE processed low-light images were adopted to constrain the generation results， thereby enhancing texture details in the generated images significantly. Finally， in the discriminator part， global and local discriminators were combined to ensure distributional consistency between the generated and reference images， thereby further improving visual quality of the image. To test the effectiveness of the proposed network， evaluations were conducted on LOL and LSRW test sets， and the proposed network was compared with 10 state-of-the-art methods， including both supervised and unsupervised methods. Experimental results show that on LOL dataset， the proposed network achieves both second place in Peak Signal-to-Noise Ratio （PSNR）（19.15 dB） and Structural Similarity Index （SSIM）（0.705 1）； on LSRW dataset， the proposed network achieves first and second places with PSNR of 17.28 dB and SSIM of 0.485 7， respectively. In particular， the PSNR on LSRW dataset of the proposed network is improved by 15.7% and 9.6%， respectively， compared to those of KinD （Kindling the Darkness） and EnlightenGAN （deep light Enhancement without paired supervision Generative Adversarial Network）methods. It can be seen that the excellent performance of the proposed network makes it outperforms unsupervised and some supervised methods， and the proposed network improves quality of the generated images significantly.

Point cloud classification and segmentation method based on adaptive dynamic graph convolution and parameter-free attention

Weigang LI, Xinyi LI, Yongqiang WANG, Yuntao ZHAO

2025, 45(6): 1980-1986. DOI: 10.11772/j.issn.1001-9081.2024060878

Asbtract ( )

HTML ( )

PDF (2200KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the challenges of traditional convolution in extracting neighborhood feature information accurately and integrating contextual information effectively in point cloud processing， a point cloud classification and segmentation method based on adaptive dynamic graph convolution and parameter-free attention was proposed. Firstly， the Adaptive Dynamic Graph Convolution module （ADGC） was used to learn feature information of different neighborhoods， generate the adaptive convolution kernels， and update the edge features， thereby extracting local neighborhood features of the point cloud accurately. Then， a residual structure was designed to learn spatial position information of the point cloud， so as to capture geometric structure between the point pairs accurately， and better retain and extract the detailed features. Finally， in order to better pay attention to and extract the local geometric features， the Parameter-Free Attention module （PFA） was combined with convolution operation to enhance connection among the neighbors and improve context-aware ability of the model. Experimental results show that compared to PointNet， the proposed method has significant advantages across various tasks. In specific， the proposed method has an increase of 4.6 percentage points in Overall Accuracy （OA） for classification tasks， an increase of 2.3 percentage points in mean Intersection over Union （mIoU） for part segmentation tasks， and an increase of 24.6 percentage points in mIoU for semantic segmentation tasks. It can be seen that the proposed method further improves the understanding and representation abilities of complex geometries， resulting in more accurate feature extraction and experimental performance in a variety of tasks.

Tiny defect detection algorithm for bearing surface based on RT-DETR

Dehui ZHOU, Jun ZHAO, Jinfeng CHENG

2025, 45(6): 1987-1997. DOI: 10.11772/j.issn.1001-9081.2024050691

Asbtract ( )

HTML ( )

PDF (6857KB) ( )

Figures and Tables | References | Related Articles | Metrics

Surface defects of bearing have a significant impact on performance and stability of electromechanical equipment. Aiming at the issues of low recognition accuracy of small targets and low detection speed in current surface defect detection process for bearings， a tiny defect detection algorithm of bearing surface based on RT-DETR （Real-Time DEtection TRansformer） — FECS-DETR （Faster Expand and Cross hierarchical-scaled feature Screening DETR） algorithm was proposed. Firstly， a lightweight FasterNet-T1 was employed to reconstruct the backbone network of RT-DETR for reducing computational overhead. Secondly， an Attention-embedded Expand Residual Fusion （AERF） module was designed for deep feature extraction， thereby enhancing the description capability of small-scale abstract features. Thirdly， a Cascaded Group Attention （CGA） was applied to further reduce computational redundancy and improve operational efficiency of the model. Fourthly， a Cross hierarchical-scaled Information Screening Feature Pyramid Network （CIS-FPN） was proposed to address the issue of information loss during feature fusion and enhance feature integration capability. Finally， a joint regression loss optimization strategy combining Normalized Wasserstein Distance （NWD） and improved Inner-MPDIoU was employed to accelerate model’s convergence and improve model accuracy for small-scale targets. Experimental results show that on the bearing surface tiny defect dataset， compared with the original RT-DETR algorithm， FECS-DETR algorithm has the mean Average Precision （mAP） improved by 2.5 percentage points， the computation complexity reduced by 28.8%， and the detection speed increased by 20.8%. It can be seen that the proposed algorithm achieves a balance between accuracy and real-time performance， and satisfies the requirements for detection of bearing surface tiny defects in industrial environment.

Information bottleneck-guided intracranial hemorrhage segmentation method

Jie JIANG, Gongning LUO, Suyu DONG, Fanding LI, Xiangyu LI, Qince LI, Yongfeng YUAN, Kuanquan WANG

2025, 45(6): 1998-2006. DOI: 10.11772/j.issn.1001-9081.2024060855

Asbtract ( )

HTML ( )

PDF (2451KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the field of computer-aided diagnosis， accurate segmentation of IntraCranial Hemorrhages （ICHs） in Computed Tomography （CT） images is crucial for subsequent treatment and prognosis. To address the challenge of segmenting small hemorrhage regions， an information bottleneck-guided ICH segmentation method was proposed. Based on this method，an Information Bottleneck-Guided Segmentation Network （IBGS-Net） was proposed. Firstly， the U-Net architecture was used as a base， and an information bottleneck layer was introduced to enhance the recognition of key features related to ICH segmentation. Then， through the designed Residual SPatially-ADaptivE normalization （ResSPADE） module， the Information Activation Map （IAM） was integrated effectively into the segmentation process， thereby improving the network’s ability to identify and locate hemorrhage regions. Finally， an Interactive Guided Loss （IGL） function was introduced to optimize the model’s processing of difficult-to-segment regions， thereby further enhancing the model’s generalization performance. Evaluation results on the internal dataset indicate that the proposed method achieves 78.1% in Dice Similarity Coefficient （DSC）， 90.1% in Normalization Surface Dice （NSD）， and 11.5% in Relative Volume Difference （RVD）. On the public dataset INSTANCE 2022， the results of comparison with other segmentation methods show that compared to the suboptimal results， the three indicators of the proposed method increased by 1.9， 2.4， and decreased by 3.2 percentage points， respectively. The above validates the effectiveness and superiority of the proposed method in ICH segmentation tasks， so that the method is suitable for assisting clinicians in segmenting ICHs.

Segmentation network of coronary artery structure from CT angiography images based on multi-scale spatial features

Yingtao CHEN, Kangkang FANG, Jin’ao ZHANG, Haoran LIANG, Huanbin GUO, Zhaowen QIU

2025, 45(6): 2007-2015. DOI: 10.11772/j.issn.1001-9081.2024060853

Asbtract ( )

HTML ( )

PDF (2646KB) ( )

Figures and Tables | References | Related Articles | Metrics

Owning to the complex morphological structure of coronary artery and the variations in acquisition conditions of Computed Tomography （CT） Angiography （CTA） images， image quality issues such as uneven distribution of image gray scale， motion artifacts and noise， result in missed judgements and misjudgement problems in segmentation of coronary artery structure. Therefore， a segmentation network of coronary artery structure from CTA images based on multi-scale spatial features — Three-Dimensional （3D） Multi-Scale Parallel Net （MSP-Net） was proposed. Firstly， in view of characteristics of large spatial span and small local proportion of coronary artery， a multi-scale parallel fusion network was used to extract global features and local features from coronary artery CTA images respectively for fusion to ensure the complete extraction of coronary artery structure features. Secondly， by adopting a coarse to fine idea in coronary artery reconstruction to enhance redundancy of the image features， thereby ensuring clear boundaries of coronary artery， and then the coronary artery structure was reconstructed using the fusion method of different scale features to enhance accuracy of the segmentation results， thereby reducing missed judgements and misjudgments. Finally， in order to accelerate training process of the network， supervision signals were adopted at different network depths by adopting deep supervision strategy to improve training efficiency. Experimental results show that in coronary artery automatic segmentation task， the average Dice Similarity Coefficient （DSC） of the proposed network reaches 87.16%， which is 4.04 and 2.31 percentage points higher than those of nnU-Net and Swin UNETR （Swin UNEt TRansformers）， and the average 95% Hausdorff Distance （HD95） of the proposed network reaches 3.69 mm， which is 14.43 mm and 13.75 mm lower than those of nnU-Net and Swin UNETR. It can be seen that the proposed network can improve segmentation accuracy of coronary artery structure effectively， and help clinicians to understand the coronary artery structure of patients more accurately， so as to evaluate the disease more effectively.

Wireless capsule endoscopy image classification model based on improved ConvNeXt

Xiang WANG, Qianqian CUI, Xiaoming ZHANG, Jianchao WANG, Zhenzhou WANG, Jialin SONG

2025, 45(6): 2016-2024. DOI: 10.11772/j.issn.1001-9081.2024060806

Asbtract ( )

HTML ( )

PDF (3776KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that Wireless Capsule Endoscopy （WCE） image classification models are only for a single disease or limited to a specific organ， and are difficult to adapt to clinical needs， a WCE image classification model based on improved ConvNeXt-T（ConvNeXt Tiny） was proposed. Firstly， a Simple parameter-free Attention Module （SimAM） was introduced during the model’s feature extraction process to make the model focus on the key areas of WCE images， so as to capture the detailed features such as the boundaries and textures of lesion areas accurately. Secondly， a Global Context Multi-scale Feature Fusion （GC-MFF） module was designed. In the module， global context modeling capability of the model was firstly optimized through Global Context Block （GC Block）， and then the shallow and deep multi-scale features were fused to obtain WCE images features with more representation ability. Finally， the Cross Entropy （CE） loss function was optimized to address the problem of large intra-class differences among WCE images. Experimental results on a WCE dataset show that the proposed model has the accuracy and F1 value increased by 2.96 and 3.16 percentage points， respectively， compared with the original model ConvNeXt-T； compared with Swin-B （Swin Transformer Base） model， which has the best performance among mainstream classification models， the proposed model has the number of parameters reduced by 67.4% and the accuracy and F1 value increased by 0.51 and 0.67 percentage points， respectively. The above indicates that the proposed model has better classification performance and can assist doctors in making accurate diagnosis of digestive tract diseases effectively.

Single-channel speech separation model based on auditory modulation Siamese network

Yuan SONG, Xin CHEN, Yarong LI, Yongwei LI, Yang LIU, Zhen ZHAO

2025, 45(6): 2025-2033. DOI: 10.11772/j.issn.1001-9081.2024050724

Asbtract ( )

HTML ( )

PDF (2813KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem of overlapping time-frequency points among different speakers leading to poor separation performance in single-channel speech separation methods based on spectrogram feature input， a single-channel speech separation model based on auditory modulation Siamese network was proposed. Firstly， the modulation signals were computed through frequency band division and envelope demodulation， and the modulation amplitude spectrum was extracted using Fourier transform. Secondly， mapping relationship between the modulation amplitude spectrum features and speech segments was obtained using a mutation point detection and matching method to achieve effective segmentation of speech segments. Thirdly， a Fusion of Co-attention Mechanisms in Siamese Neural Network （FCMSNN） was designed to extract discriminative features of speech segments of different speakers. Fourthly， a Neighborhood-based Self-Organizing Map （N-SOM） network was proposed to perform feature clustering without pre-specifying the number of speakers by defining a dynamic neighborhood range， so as to obtain mask matrices for different speakers. Finally， to avoid artifacts in the reconstructed signals in the modulation domain， a time-domain filter was designed to convert modulation-domain masks into time-domain masks and reconstruct speech signals by combining phase information. The experimental results show that the proposed model outperforms the Double-Density Dual-Tree Complex Wavelet Transform （DDDTCWT） method in terms of Perceptual Evaluation of Speech Quality （PESQ）， Signal-to-Distortion Ratio improvement （SDRi） and Scale-Invariant Signal-to-Distortion Ratio improvement （SI-SDRi）； on WSJ0-2mix and WSJ0-3mix datasets the proposed model has PESQ， SDRi， and SI-SDRi improved by 3.47%， 6.91% and 7.79% and 3.08%， 6.71% and 7.51% respectively.

Automatic speech segmentation algorithm based on syllable type recognition

Linjia SUN, Lei QIN, Meijin KANG, Yinglin WANG

2025, 45(6): 2034-2042. DOI: 10.11772/j.issn.1001-9081.2024060748

Asbtract ( )

HTML ( )

PDF (1715KB) ( )

Figures and Tables | References | Related Articles | Metrics

The methods based on boundary detection focus on utilizing abrupt changes in the time and frequency domains rather than language knowledge to segment speech data into syllable units. At the same time， satisfactory segmentation results only be achieved by setting various parameters in these methods， so that the methods still have some drawbacks， such as poor stability， difficulty in parameter adjustment， and weak generalization ability in cross-language environments with a lot of data. To address the above issues， an automatic speech segmentation algorithm based on syllable type recognition was proposed. The characteristic of the proposed algorithm is to recognize syllable type in speech data， not syllable specific content. Firstly， common syllable types of different languages under natural pronunciation were obtained by using linguistic research findings and syllable composition patterns. Then， the acoustic model for each syllable type was established by using the traditional Gaussian Mixture Model （GMM） and Hidden Markov Model （HMM）. Moreover， in order to better describe syllable attributes， a channel of feature extraction based on multi-band analysis and significant information fusion was proposed. Finally， based on the sequences of recognized syllable types， Viterbi algorithm was used to determine the speech frames corresponding to the start and end points of syllables. The acoustic models of syllable types were trained by using the speech data from three common languages during experimental phase， and then the recognition experiments were conducted on six languages and dialects. The experimental results show that the average recognition accuracy is over 91.93%； compared with using Mel Frequency cepstral Coefficient （MFCC）， using the proposed features can obtain the average recognition accuracy increased by at least 27.16 percentage points； when the tolerance threshold is 20 ms， the average segmentation accuracy of over 90.70% can still be achieved in six languages and dialects； compared with four representative algorithms in recent years， the proposed algorithm has the average segmentation accuracy improved by at least 5.73 percentage points. The above demonstrates that the proposed algorithm has stronger generalization ability， better stability and higher accuracy.

Table of Content