Journal of Computer Applications

Review of research on efficiency of federated learning

Lina GE, Mingyu WANG, Lei TIAN

2025, 45(8): 2387-2398. DOI: 10.11772/j.issn.1001-9081.2024081119

Asbtract ( )

HTML ( )

PDF (702KB) ( )

Figures and Tables | References | Related Articles | Metrics

Federated learning is a distributed machine learning framework that effectively addresses the data silo problem and is crucial for ensuring privacy protection for individuals and organizations. However， enhancing the efficiency of federated learning remains a pressing issue due to the unsatisfactory high cost of federated learning caused by the characteristics of this learning. Therefore， a comprehensive summary and investigation of current mainstream research on improving the efficiency of federated learning was provided. Firstly， the background of efficient federated learning， including its origins and core ideas， was reviewed， and the concepts as well as classification of federated learning were explained. Secondly， the efficiency challenges generated by federated learning were discussed and categorized into heterogeneous problems， personalized problems， and communication cost issues. Thirdly， on the above basis， detailed solutions to these efficiency problems were analyzed and discussed， and the research on efficiency of federated learning was categorized into two areas： model compression optimization methods and communication optimization methods， and investigated. Fourthly， by comparison analysis， the advantages and disadvantages of each federated learning method were summarized， and the challenges still exist in efficient federated learning were expounded. Finally， the future research directions in efficient federated learning field were given.

Detection and defense scheme for backdoor attacks in federated learning

Jintao SU, Lina GE, Liguang XIAO, Jing ZOU, Zhe WANG

2025, 45(8): 2399-2408. DOI: 10.11772/j.issn.1001-9081.2024081120

Asbtract ( )

HTML ( )

PDF (2521KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the commonly existing malicious backdoor attacks in Federated Learning （FL） systems， and the difficulty of achieving a balance between high accuracy of privacy protection and model training in the existing defense schemes， the backdoor attacks and their defense methods in FL were explored， a safe and efficient integrated scheme called GKFL （Generative Knowledge-based Federated Learning） was proposed to detect backdoor attacks and repair damaged models. In this scheme， there was no need to access original privacy data of the participants， detection data were generated through the central server to detect whether the aggregation model in federal learning was backdoor attacked， and knowledge distillation technology was used to repair the damaged models， thereby ensuring integrity and accuracy of the models. Experimental results on datasets MNIST and Fashion-MNIST show that the overall performance of GKFL is better than that of classic schemes such as FoolsGold， GeoMed， and RFA （Robust Aggregation Algorithm）； GKFL can better protect data privacy than FoolsGold. It can be seen that GKFL scheme has the ability to detect backdoor attacks and repair the damaged models， and is better than the comparison schemes significantly in terms of model poisoning accuracy and the accuracy of model main task.

Evaluation of training efficiency and training performance of graph neural network models based on distributed environment

Yinchuan TU, Yong GUO, Heng MAO, Yi REN, Jianfeng ZHANG, Bao LI

2025, 45(8): 2409-2420. DOI: 10.11772/j.issn.1001-9081.2024081140

Asbtract ( )

HTML ( )

PDF (1623KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the rapid growth of graph data sizes， Graph Neural Network （GNN） faces computational and storage challenges in processing large-scale graph-structured data. Traditional stand-alone training methods are no longer sufficient to cope with increasingly large datasets and complex GNN models. Distributed training is an effective way to address these problems due to its parallel computing power and scalability. However， on one hand， the existing distributed GNN training evaluations mainly focus on the performance metrics represented by model accuracy and the efficiency metrics represented by training time， but pay less attention to the metrics of data processing efficiency and computational resource utilization； on the other hand， the main scenarios for algorithm efficiency evaluation are single machine with one card or single machine with multiple cards， and the existing evaluation methods are relatively simple in a distributed environment. To address these shortcomings， an evaluation method for model training in distributed scenarios was proposed， which includes three aspects： evaluation metrics， datasets， and models. Three representative GNN models were selected according to the evaluation method， and distributed training experiments were conducted on four large open graph datasets with different data characteristics to collect and analyze the obtained evaluation metrics. Experimental results show that all of model complexity， training time， computing node throughput and computing Node Average Throughput Ratio （NATR） are influenced by model architecture and data structure characteristics in distributed training； sample processing and data copying take up much time in training， and the time of one computing node waiting for other computing nodes cannot be ignored either； compared with stand-alone training， distributed training reduces the computing node throughput significantly， and further optimization of resource utilization for distributed systems is needed. It can be seen that the proposed evaluation method provides a reference for optimizing the performance of GNN model training in a distributed environment， and establishes an experimental foundation for further model optimization and algorithm improvement.

Blockchain-based data notarization model for autonomous driving simulation testing

Haiyang PENG, Weixing JI, Fawang LIU

2025, 45(8): 2421-2427. DOI: 10.11772/j.issn.1001-9081.2024091280

Asbtract ( )

HTML ( )

PDF (2223KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the problem of safety caused by multi-party data sharing in autonomous driving simulation testing， a blockchain-based model for data notarization of autonomous driving simulation testing was proposed to ensure secure storage and traceability of the data， thereby providing reliable support for auditing work. Firstly， the semi-public characteristics of consortium blockchain were utilized to ensure that on-chain data were only visible to authorized organizations， while a permission verification mechanism based on RBAC （Role-Based Access Control） model was employed to implement access control for these organizations. Secondly， a smart contract template was defined to standardize the data access process， and process extension points were open to support customized functions， for example， allowing extension of associated smart contracts to ensure automatic execution of simulation resource trading activities. Finally， optimization strategies， including on-chain and off-chain hybrid storage of InterPlanetary File System （IPFS）， data batch processing， and resource data caching， were proposed to address limitations of blockchain storage resources and processing capabilities. Tests were conducted to evaluate the efficiency of data notarization for 500 data simulation scenarios generated by large language models. Experimental results show that compared to the direct access method， the notarization process applying batch processing strategy reduces the total transaction number by 72.00%， decreasing the performance consumption caused by smart contract calls significantly， and has the average time for writing and reading all data reduced by 85.36% and 52.67%， respectively. It can be seen that the proposed model provides reliable technical support for the data security of multi-party data sharing in autonomous driving simulation testing， while the proposed optimization strategies improve the data memory access efficiency significantly.

Dynamic detection method of eclipse attacks for blockchain node analysis

Shuo ZHANG, Guokai SUN, Yuan ZHUANG, Xiaoyu FENG, Jingzhi WANG

2025, 45(8): 2428-2436. DOI: 10.11772/j.issn.1001-9081.2024081101

Asbtract ( )

HTML ( )

PDF (1546KB) ( )

Figures and Tables | References | Related Articles | Metrics

Eclipse attacks， as a significant threat to blockchain network layer， can isolate the attacked node from entire network by controlling its network connections， thus affecting its ability to receive block and transaction information. On this basis， attackers can also launch double-spending and other attacks， which causes substantial damage to blockchain system. To address this issue， a dynamic detection method of eclipse attacks for blockchain node analysis was proposed by incorporating deep learning models. Firstly， Node Comprehensive Resilience Index （NCRI） was utilized to represent multidimensional attribute features of the nodes， and Graph ATtention network （GAT） was introduced to update the node features of network topology dynamically. Secondly， Convolutional Neural Network （CNN） was employed to fuse multidimensional features of the nodes. Finally， a Multi-Layer Perceptron （MLP） was used to predict vulnerability of the entire network. Experimental results indicate that an accuracy of up to 89.80% is achieved by the method under varying intensities of eclipse attacks， and that the method maintains stable performance in continuously changing blockchain networks.

Certificateless ring signature scheme based on SM2

Yu WANG, Minghui ZHENG, Jingyi YANG, Shicheng HUANG

2025, 45(8): 2437-2441. DOI: 10.11772/j.issn.1001-9081.2024081108

Asbtract ( )

HTML ( )

PDF (919KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing SM2-based ring signature schemes have the risk of private key leakage， and the dishonest Key Generation Center （KGC） has the ability to monitor and forge communication between entities. In order to overcome this shortcoming， a CertificateLess Ring Signing scheme based on SM2 （CLRS-SM） was proposed. In this scheme， the user’s private key consists of two independent parts： one part of the user’s private key is calculated by KGC based on the user’s identity and system master key， and the other part is a secret value selected by the user randomly. Therefore， even if a malicious KGC leaks part of the private key， the attacker is not able to obtain the entire private key of the user. The security specification of the scheme is a discrete logarithm problem， and it is proved to be unforgeable and unconditionally anonymous under the random oracle model. Experimental results show that compared with the existing SM2-based ring signature schemes， the proposed scheme resists the malicious KGC attack with only 0.18% more computation， and has higher security.

Dual-stage prompt tuning method for automated preference alignment

Tao FENG, Chen LIU

2025, 45(8): 2442-2447. DOI: 10.11772/j.issn.1001-9081.2024081083

Asbtract ( )

HTML ( )

PDF (2211KB) ( )

Figures and Tables | References | Related Articles | Metrics

Because user prompts often lack professionalism in specific fields and the use of terminology， it is difficult for Large Language Models （LLM） to understand the intentions accurately and generate information that meets requirements of the field. Therefore， an Automated Preference Alignment Dual-Stage Prompt Tuning （APADPT） method was proposed to solve the preference alignment problem faced by LLM when applied in vertical fields. In APADPT， the refinement adjustments of input prompts were realized by constructing a supervised fine-tuning dataset containing human preferences and using LLM for semantic analysis and evaluation of pairwise replies. After dual-stage training， the prompt optimization rules in the general field were mastered by the model， and specialized adjustments based on characteristics of the vertical fields were performed by the model. Experimental results in the medical field show that APADPT improves the preference alignment consistency of API-based LLM and open-source LLM significantly， with the average winning rate increased by 9.5% to 20.5% under the condition of the same model parameter count. In addition， this method shows good robustness and generalization ability on all the open-source models with different parameter scales， providing a new optimization strategy for the application of LLM in vertical specialized fields， and contributing to improving model performance while maintaining generalization and adaptability of the model.

Cross-modal information fusion for video-text retrieval

Yimeng XI, Zhen DENG, Qian LIU, Libo LIU

2025, 45(8): 2448-2456. DOI: 10.11772/j.issn.1001-9081.2024081082

Asbtract ( )

HTML ( )

PDF (2509KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing Video-Text Retrieval （VTR） methods usually assume a strong semantic association between the text descriptions and the videos， but ignore the widely existing weakly related video-text pairs in datasets， so that the models are good at recognizing common general concepts but unable to fully mine the potential information of weak semantic descriptions， thus affecting retrieval performance of models. To address the above problems， a VTR model based on cross-modal information fusion was proposed. In this model， relevant external knowledge was utilized in a cross-modal way to improve retrieval performance of the model. Firstly， two external knowledge retrieval modules were constructed and were used to implement the retrieval of videos and external knowledge as well as the retrieval of texts and external knowledge respectively， so as to strengthen the original video and text feature representations with the help of external knowledge subsequently. Secondly， a cross-modal information fusion module with adaptive cross-attention was designed to remove redundant information in the videos and texts as well as conduct feature fusion by using complementary information between different modalities， thereby learning more discriminative feature representations. Finally， inter-modal and intra-modal similarity loss functions were introduced to ensure the integrity of information representation of the data in the fusion feature space， video feature space， and text feature space， so as to achieve accurate retrieval between cross-modal data. Experimental results show that compared with model MuLTI， the proposed model has the recall R@1 on public datasets MSR-VTT （Microsoft Research Video to Text） and DiDeMo （Distinct Describable Moments） increased by 2.0 and 1.9 percentage points respectively； compared with model CLIP-ViP， the proposed model has the R@1 on public dataset LSMDC （Large Scale Movie Description Challenge） increased by 2.9 percentage points. It can be seen that the proposed model can solve the problem of weakly related data pairs in VTR tasks effectively， thereby improving retrieval accuracy of the model.

Deep variational text clustering model based on distribution augmentation

Ao SHEN, Ruizhang HUANG, Jingjing XUE, Yanping CHEN, Yongbin QIN

2025, 45(8): 2457-2463. DOI: 10.11772/j.issn.1001-9081.2024081100

Asbtract ( )

HTML ( )

PDF (2197KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of missing distribution information and distribution collapse encountered by deep variational text clustering models in practical applications， a Deep Variational text Clustering Model based on Distribution augmentation （DVCMD） was proposed. In this model， the enhanced latent semantic distributions were integrated into the original latent semantic distribution by enhancing distribution information， so as to improve information completeness and accuracy of the latent distribution. At the same time， a distribution consistency constraint strategy was employed to promote the learning of consistent semantic representations by the model， thereby enhancing the model’s ability to express true information of the data through learned semantic distributions， and thus improving clustering performance. Experimental results show that compared with existing deep clustering models and structural semantic-enhanced clustering models， DVCMD has the Normalized Mutual Information （NMI） metric improved by at least 0.16， 9.01， 2.30， and 2.72 percentage points on the four real-world datasets： Abstract， BBC， Reuters-10k， and BBCSports， respectively， validating the effectiveness of the model.

Diversity semantic query on resource description framework graphs based on multi-level neighborhood predicate label tree encoding index

Jiantao JIANG, Baoyan SONG, Xiaohuan SHAN

2025, 45(8): 2464-2469. DOI: 10.11772/j.issn.1001-9081.2024081164

Asbtract ( )

HTML ( )

PDF (2416KB) ( )

Figures and Tables | References | Related Articles | Metrics

Knowledge graphs is a semantic network to reveal the relationships between entities， which is often expressed in the form of Resource Description Framework （RDF）. Faced with the explosive growth of information， diversity semantic query requirements are ignored by the existing semantic query algorithm on RDF graphs. Therefore， considering the rich semantic information of RDF graphs， a Diversity Semantic Query method with distributed processing based on multi-level Neighborhood Predicate Label Tree Encoding index （NPLTE） on RDF graphs （DSQ-NPLTE） was proposed. Firstly， to avoid wasting storage space and assist subsequent parallel queries， a frequency-based predicate encoding and mapping strategy was designed to map the predicates represented by long strings to unique natural number representation. Secondly， after partitioning the RDF graph， the obtained vertices were classified according to their adjacent edge properties， and the corresponding storage modes were given. Thirdly， a multi-level NPLTE was proposed to filter invalid vertices and edges by the use of predicate feature information. Finally， for diversity semantic queries with known predicate， known subject （object） and known mixture， the corresponding matching strategies were given， and an optimal connection based on common vertex was proposed to reduce Cartesian product number and thereby decreasing the cost of connection. Experimental results show that compared with the method without preprocessing， the query efficiency of the proposed method can be improved by 5 to 9 times through using the constructed index for pruning optimization； compared with FAST method on three LUBM standard synthetic datasets of different sizes， the proposed method has the query efficiency improved by 43% on average. It can be seen that the proposed index and query strategy can deal with diversity semantic queries on large-scale RDF graphs effectively.

Heterogeneous graph attention network for relation extraction based on feature combination

Jiaxin YAN, Yanping CHEN, Weizhe YANG, Ruizhang HUANG, Yongbin QIN

2025, 45(8): 2470-2476. DOI: 10.11772/j.issn.1001-9081.2024081076

Asbtract ( )

HTML ( )

PDF (1228KB) ( )

Figures and Tables | References | Related Articles | Metrics

Relation extraction aims to identify predefined semantic relationships between two entities within a sentence. Traditional graph neural network-based relation extraction methods generally use dependency trees to construct a graphical representation structure of the sentence. However， the graph structure constructed by dependency tree has limited expression ability and is unable to fully capture rich syntactic structure information of the target entity. To address these issues， a relation extraction method of Heterogeneous Graph ATtention network （HGAT） based on feature combination was proposed. Firstly， atomic features were extracted from the sentence， and composite features were obtained by combining these atomic features. Secondly， the composite features and relation labels were represented as two types of nodes on the heterogeneous graph to construct a “feature-relation bipartite graph”. Finally， a graph attention network was used to update the nodes dynamically to perform relation extraction. In this method， the composite features and the syntactic structure information in the sentence were utilized effectively， thereby enhancing the performance of relation extraction. Experimental results on ACE05 English dataset and SemEval-2010 task 8 dataset show that this method achieves F1-scores of 84.11% and 90.67%， respectively， demonstrating the effectiveness of the proposed method.

Multi-objective optimization of steel logistics vehicle-cargo matching under multiple constraints

Kaile YU, Jiajun LIAO, Jiali MAO, Xiaopeng HUANG

2025, 45(8): 2477-2483. DOI: 10.11772/j.issn.1001-9081.2024081125

Asbtract ( )

HTML ( )

PDF (1550KB) ( )

Figures and Tables | References | Related Articles | Metrics

Steel logistics platforms often need to split steel products into multiple waybills for transportation when handling customer orders. Less-Than-Truckload （LTL） cargo， which fails to meet the minimum load requirements of a truck， needs to be consolidated with goods from other customer orders to optimize transportation efficiency. Although previous studies had proposed some solutions for consolidation decision-making， none considered the issues of detour distance and prioritizing high-priority cargo simultaneously in consolidated shipments. Therefore， a multi-objective optimization framework for steel cargo consolidation under multiple constraints was proposed. The globally optimal cargo consolidation decisions were achieved by the framework through designing a hierarchical decision network and a representation enhancement module. Specifically， a hierarchical decision network based on Proximal Policy Optimization （PPO） was used to determine the priorities of the optimization objectives first， and then the LTL cargo was consolidated and selected on the basis of these priorities. Meanwhile， a representation enhancement module based on Graph ATtention network （GAT） was employed to represent cargo and LTL cargo information dynamically， which was then input into the decision network to maximize long-term multi-objective gains. Experimental results on a large-scale real-world cargo dataset show that compared to other online methods， the proposed method achieves a 17.3% increase in the proportion of high-priority cargo weight and a 7.8% reduction in average detour distance， with reducing the total shipping weight by 6.75% compared to the LTL cargo consolidation method that only maximizes cargo capacity. This enhances the efficiency of consolidated transportation effectively.

Chinese spelling correction model ReLM enhanced with deep semantic features

Wei ZHANG, Jiaxiang NIU, Jichao MA, Qiongxia SHEN

2025, 45(8): 2484-2490. DOI: 10.11772/j.issn.1001-9081.2024071015

Asbtract ( )

HTML ( )

PDF (1067KB) ( )

Figures and Tables | References | Related Articles | Metrics

As a current leading Chinese Spelling Correction （CSC） model， ReLM （Rephrasing Language Model） has insufficient feature representation in complex semantic scenarios. To address this issue， an ReLM enhanced with deep semantic features， namely FeReLM （Feature-enhanced Rephrasing Language Model）， was proposed. In the model， Depthwise Separable Convolution （DSC） technique was used to integrate deep semantic features generated by feature extraction model BGE （BAAI General Embedding） with global features generated by ReLM， thereby enhancing the model’s ability to parse complex contexts and effectively improving the precision in recognizing and correcting spelling errors. Initially， FeReLM was trained on Wang271K dataset， enabling the model to learn deep semantics and complex expressions within sentences continuously. Subsequently， the trained weights were transferred， so that the knowledge learned by the model was applied to new datasets for fine-tuning. Experimental results show that FeReLM outperforms models such as ReLM， MCRSpell （Metric learning of Correct Representation for Chinese Spelling Correction）， and RSpell （Retrieval-augmented Framework for Domain Adaptive Chinese Spelling Check） on ECSpell and MCSC datasets in key metrics such as precision， recall， and F1 score， with improvements ranging from 0.6 to 28.7 percentage points. The effectiveness of the proposed method is confirmed through ablation experiments.

Metaphor detection for improving representation in linguistic rules

Qing YANG, Yan ZHU

2025, 45(8): 2491-2496. DOI: 10.11772/j.issn.1001-9081.2024071037

Asbtract ( )

HTML ( )

PDF (768KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing research work on metaphor detection task mostly adopts deep learning techniques and does not utilize linguistic rules in depth， which is mainly manifested in defective representation of semantic and basic meanings of target words involved in the rules， resulting in the related models’ inability to focus on differences between the target words and more relevant contextual words， and the boundaries between the basic meanings and the contextual meanings are still fuzzy. To address the above problems， a Metaphor Detection model to improve Representation in Linguistic rules （MeRL） was proposed. Firstly， semantic representation of the target words involved in both Selectional Preference Violation （SPV） and Metaphor Identification Procedure （MIP） rules was enhanced. Secondly， basic meanings of the target words in MIP rules were represented. Finally， the rule-based designed SPV and MIP modules were fused to identify metaphors jointly. Experimental results show that compared with MelBERT （Metaphor-aware late interaction over BERT） and other baseline models， the proposed model has the F1-score improved by at least 0.6， 0.9， and 1.2 percentage points， respectively， on benchmark datasets VUA-18， VUA Verb， and MOH-X， indicating that the proposed model achieves more accurate detection of metaphors. The proposed model has the F1-score improved by at least 0.7 percentage points when performing zero-shot transfer learning on TroFi dataset， indicating that the generalization ability of the proposed model is stronger.

Integrating internal and external data for out-of-distribution detection training and testing

Zhiyuan WANG, Tao PENG, Jie YANG

2025, 45(8): 2497-2506. DOI: 10.11772/j.issn.1001-9081.2024081141

Asbtract ( )

HTML ( )

PDF (1773KB) ( )

Figures and Tables | References | Related Articles | Metrics

Out-Of-Distribution （OOD） detection aims to identify foreign samples deviating from the training data distribution to prevent error predictions by the model in anomalous scenarios. Due to the uncertainty of real OOD data， current OOD detection methods based on Pre-trained Language Model （PLM） do not evaluate the impact of OOD on detection performance during both training and testing stages simultaneously. In response to this issue， integration of Internal and External Data for OOD Detection Training and Testing （IEDOD-TT） was proposed. In this framework， different data integration strategies were utilized in different stages： during training， a Masked Language Model （MLM） was employed to generate pseudo OOD datasets on the original training set， and contrastive learning was introduced to enhance feature disparities between internal and external data； during testing， a comprehensive OOD detection scoring metric was designed by combining density estimation of internal and external data. Experimental results show that on CLINC150， NEWS-TOP5， SST2， and YELP datasets， compared to the optimal baseline method doSCL-cMaha， the proposed method has the average Area Under the Receiver Operating Characteristic curve （AUROC） increased by 1.56 percentage points， the average False Positive Rate at 95% true positive rate （FPR95） decreased by 2.83 percentage points， respectively； compared to the best variant of the proposed method — IS/IEDOD-TT （ID Single/IEDOD-TT）， the proposed method improved the average AUROC by 1.61 percentage points and reduced the average FPR95 by 2.71 percentage points， respectively. The above validates the effectiveness of IEDOD-TT in handling text classification tasks with different data distribution shifts， and confirms the additional performance gains by considering both internal and external data distributions comprehensively.

Dependency type and distance enhanced aspect based sentiment analysis model

Biao ZHAO, Yuhua QIN, Rongkun TIAN, Yuehang HU, Fangrui CHEN

2025, 45(8): 2507-2514. DOI: 10.11772/j.issn.1001-9081.2024081088

Asbtract ( )

HTML ( )

PDF (2245KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aspect-Based Sentiment Analysis （ABSA） tasks aim to determine the sentiment polarity of specific aspect words in comments. In the field of ABSA， dual-channel models that extract both grammar and semantic information have achieved certain results. However， the existing models fail to consider the different degrees of importance among grammar nodes， the additional noise introduced by global attention mechanism， and the existence of correlations between similar features comprehensively. To address these issues， a dual-channel graph convolutional model with dependency type and distance enhancements was proposed. Firstly， dependency types were introduced in the grammar module to measure the importance of neighborhood nodes. Secondly， mask matrices based on the dependency tree distance were constructed to filter out grammar unrelated noise. Finally， a supervised contrastive loss was introduced to facilitate the model to learn correlations between similar features. Experimental results show that on SemEval-2014 Restaurant， SemEval-2014 Laptop and Twitter datasets， compared to the second-best model DGNN （Dual Graph Neural Network）， the proposed model achieves accuracy improvements of 0.11， 0.94， 1.01 percentage points， respectively， and Macro-F1 improvements of 0.63， 1.66， 0.83 percentage points， respectively， verifying the effectiveness of the proposed model.

Speech emotion recognition method based on hybrid Siamese network with CNN and bidirectional GRU

Peng PENG, Ziting CAI, Wenling LIU, Caihua CHEN, Wei ZENG, Baolai HUANG

2025, 45(8): 2515-2521. DOI: 10.11772/j.issn.1001-9081.2024081142

Asbtract ( )

HTML ( )

PDF (1899KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the problems of low accuracy and poor generalization ability in the existing Speech Emotion Recognition （SER） models， a hybrid Siamese Multi-scale CNN-BiGRU network was proposed. In this network， a Multi-Scale Feature Extractor （MSFE） and a Multi-Dimensional Attention （MDA） module were introduced to construct a Siamese network， and the training data were increased by utilizing sample pairs， thereby improving the model’s recognition accuracy and enabling it to better adapt to complex real-world application scenarios. Experimental results on IEMOCAP and EMO-DB public datasets show that the recognition accuracy of the proposed model is enhanced by 8.28 and 7.79 percentage points， respectively， compared to that of CNN-BiGRU model. Furthermore， a customer service speech emotion dataset was constructed by collecting real customer service conversation recordings. Experimental results on this dataset show that the recognition accuracy of the proposed model can reach 87.85%， indicating that the proposed model has good generalization ability.

Sequence labeling optimization method combined with entity boundary offset

Jing YU, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN

2025, 45(8): 2522-2529. DOI: 10.11772/j.issn.1001-9081.2024071036

Asbtract ( )

HTML ( )

PDF (2456KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issue of positional deviation between the predicted entity boundaries and the true entity boundaries in sequence labeling models in Named Entity Recognition （NER）， a sequence labeling optimization method combined with entity boundary offset was proposed. Firstly， the concept of boundary offset was introduced to quantify the positional relationship between each word and entity boundaries， and the relative offset between each word and the nearest entity boundary was calculated， and these offsets were used to generate candidate spans for the entity boundaries. Secondly， Intersection-over-Union （IoU） was used as a filtering criterion to filter out low-quality candidate spans， thereby retaining those spans most likely to represent the entity boundary. Finally， the boundary adjustment module was used to update positions of the entity boundaries in the label sequence based on the candidate spans， thereby optimizing the entity boundaries in the entire label sequence and improving the performance of entity recognition. Experimental results show that the proposed method achieves the F1-scores of 80.48%， 96.42%， and 94.80% on CLUENER2020， Resume-zh， and MSRA datasets， respectively， validating its effectiveness in NER task.

Partial label regression algorithm integrating feature attention and residual connection

Haifeng WU, Liqing TAO, Yusheng CHENG

2025, 45(8): 2530-2536. DOI: 10.11772/j.issn.1001-9081.2024071012

Asbtract ( )

HTML ( )

PDF (1384KB) ( )

Figures and Tables | References | Related Articles | Metrics

Partial Label Regression （PLR） complements the current situation that Partial Label Learning （PLL） only focuses on classification tasks. To address the problem that the existing PLR algorithms ignore characteristic differences between instance features， a Partial Label Regression algorithm integrating Feature Attention and Residual Connection （PLR-FARC） was proposed. Firstly， labels of real datasets were expanded into a set of real-value candidate labels by label enhancement technique. Secondly， the attention mechanism was employed to generate contribution of individual features to labels automatically. Thirdly， the residual connection was introduced to reduce information loss and maintain feature integrity during feature transmission. Finally， prediction loss was calculated based on IDent （IDentification method） and PIDent （Progressive IDentification method）， respectively. Experimental results on Abalone， Airfoil， Concrete， Cpu-act， Housing and Power-plant datasets show that compared to IDent and PIDent， PLR-FARC has the Mean Absolute Error （MAE） reduced by 2.15%， 38.38%， 8.86%， 4.19%， 15.71% and 15.55%， averagely and respectively， and the Mean Squared Error （MSE） reduced by 9.35%， 71.32%， 23.10%， 20.17%， 27.22% and 9.46%， averagely and respectively. It can be seen that the proposed algorithm is feasible and effective.

3D object detection algorithm based on multi-scale network and axial attention

Chengzhi YAN, Ying CHEN, Kai ZHONG, Han GAO

2025, 45(8): 2537-2545. DOI: 10.11772/j.issn.1001-9081.2024071058

Asbtract ( )

HTML ( )

PDF (2618KB) ( )

Figures and Tables | References | Related Articles | Metrics

In 3D object detection， the detection accuracy of small targets such as pedestrians and cyclists remains low， presenting a challenging issue to perception systems of autonomous vehicles. To estimate the state of surrounding environment accurately and enhance driving safety， a 3D object detection algorithm based on a multi-scale network and axial attention was proposed after improving Voxel R-CNN （Voxel Region-based Convolutional Neural Network） algorithm. Firstly， a multi-scale network and a Pixel-level Fusion Module （PFM） were constructed in the backbone network to obtain richer and more precise feature representations， thereby enhancing robustness and generalization of the algorithm in complex scenarios. Secondly， an axial attention mechanism， tailored for 3D spatial dimension features， was designed and applied to Region of Interest （RoI） multi-scale pooling features， so as to capture both local and global features effectively while preserving essential information in 3D spatial structure， thereby improving accuracy and efficiency of object detection and classification of the algorithm. Finally， a Rotation-Decoupled Intersection over Union （RDIoU） method was brought into regression and classification branches， thereby enabling network to learn more precise bounding boxes and addressing alignment issue between classification and regression. Experimental results on KITTI public dataset show that the proposed algorithm achieves the mean Average Precision （mAP） of 62.25% for pedestrians and 79.36% for cyclists， which are improved by 4.02 and 3.15 percentage points， respectively， compared to baseline algorithm Voxel R-CNN， demonstrating the effectiveness of the improved algorithm in detecting hard-to-perceive objects.

Semi-supervised object detection framework guided by self-paced learning

Binhong XIE, Yingkun LA, Yingjun ZHANG, Rui ZHANG

2025, 45(8): 2546-2554. DOI: 10.11772/j.issn.1001-9081.2024081096

Asbtract ( )

HTML ( )

PDF (1517KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to improve the quality of pseudo-labels and solve the problem of confirmation bias in Semi-Supervised Object Detection （SSOD）， an SSOD framework based on dynamic parameters under guidance of Self-Paced Learning （SPL） was proposed. In the framework， dynamic self-paced parameter and continuous weight variable were designed to optimize the effect of SSOD. In specific， the dynamic self-paced parameter was used to evaluate difficulty of the samples based on real-time performance of the model during training process， the continuous weight variable was used to evaluate importance and reliability of each sample in training accurately by comparing relationship between sample loss and dynamic self-paced parameters， and refine weight design of each object in the samples. In addition， a single model was used in the framework for iterative training， and a consistency regularization strategy was introduced to evaluate consistency of the model predictions. This design provided more targeted weight information for the model， and optimized the training process adaptively by the model through dynamic adjustment of the weight information. Extensive comparison experimental results on PASCAL VOC and MS-COCO datasets show that the proposed framework improves the detection accuracy of the model significantly， and verify good generality and efficient convergence performance of the framework. Especially on PASCAL VOC dataset， the proposed framework has the detection precision improved by 0.65， 4.84， and 0.28 percentage points， respectively， compared with LabelMatch， Unbiased Teacher V2， and MixTeacher.

Multi-target detection algorithm for traffic intersection images based on YOLOv9

Yanhua LIAO, Yuanxia YAN, Wenlin PAN

2025, 45(8): 2555-2565. DOI: 10.11772/j.issn.1001-9081.2024071020

Asbtract ( )

HTML ( )

PDF (5505KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of complex traffic intersection images， the difficulty in detecting small targets， and the tendency for occlusion between targets， as well as the color distortion， noise， and blurring caused by changes in weather and lighting， a multi-target detection algorithm ITD-YOLOv9（Intersection Target Detection-YOLOv9） for traffic intersection images based on YOLOv9 （You Only Look Once version 9） was proposed. Firstly， the CoT-CAFRNet （Chain-of-Thought prompted Content-Aware Feature Reassembly Network） image enhancement network was designed to improve image quality and optimize input features. Secondly， the iterative Channel Adaptive Feature Fusion （iCAFF） module was added to enhance feature extraction for small targets as well as overlapped and occluded targets. Thirdly， the feature fusion pyramid structure BiHS-FPN （Bi-directional High-level Screening Feature Pyramid Network） was proposed to enhance multi-scale feature fusion capability. Finally， the IF-MPDIoU （Inner-Focaler-Minimum Point Distance based Intersection over Union） loss function was designed to focus on key samples and enhance generalization ability by adjusting variable factors. Experimental results show that on the self-made dataset and SODA10M dataset， ITD-YOLOv9 algorithm achieves 83.8% and 56.3% detection accuracies and 64.8 frame/s and 57.4 frame/s detection speeds， respectively； compared with YOLOv9 algorithm， the detection accuracies are improved by 3.9 and 2.7 percentage points respectively. It can be seen that the proposed algorithm realizes multi-target detection at traffic intersections effectively.

Combining preprocessing methods and adversarial learning for fair link prediction

Yifeng PENG, Yan ZHU

2025, 45(8): 2566-2571. DOI: 10.11772/j.issn.1001-9081.2024081117

Asbtract ( )

HTML ( )

PDF (1218KB) ( )

Figures and Tables | References | Related Articles | Metrics

Link prediction is a crucial task in network analysis that explores interactions between entities and forecasts new potential relationships in evolving networks. However， link prediction may generate biases， especially concerning links between entities with sensitive attributes. For example， the issue of “filter bubble”， which amplify the isolation of publicly accessible information and reduce diversity for online users. To address these challenges， ALFLP （Adding Link and Adversarial Learning for Fair Link Prediction） method was proposed after combining preprocessing stage methods and processing stage methods for addressing “filtering bubble” issue from the perspective of algorithmic fairness. In preprocessing stage， by adding links to disadvantaged link groups， the difference in link density between different groups was reduced. In processing stage， the output of the preprocessing stage was input into the adversarial learning-based method， and the generator and discriminator were played together to promote more inter-group links， thereby alleviating the “filter bubble” situation. Experimental results on real datasets pokec_n and pokec_z show that compared with baseline methods such as Jaccard， ALFLP method has the AUC index improved by about 12 and 10 percentage points respectively， and the modred index improved by about 0.14 and 0.10 respectively. It can be seen that ALFLP method can achieve a good balance between fairness and prediction accuracy.

Non-redundant and statistically significant discriminative high utility pattern mining algorithm

Jun WU, Aijia OUYANG, Ya WANG

2025, 45(8): 2572-2581. DOI: 10.11772/j.issn.1001-9081.2024071063

Asbtract ( )

HTML ( )

PDF (1526KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of false positive patterns and redundant patterns in tasks of discriminative high utility pattern mining， a discriminative high utility pattern mining algorithm based on unlimited testing and independent growth rate technique — UTDHU （Unlimited Testing for Discriminative High Utility pattern mining） was designed. Firstly， the discriminative high utility patterns that meet utility and difference thresholds were mined from a target transaction set. Then， the redundant patterns were screened out by independent growth rates of patterns which were calculated by constructing a shared tree of prefix-items. Finally， the statistical significance measure p-value for each remaining pattern was calculated by the unlimited testing， and the false positive discriminative high utility patterns were filtered out according to the family wise error rates. Experimental results on four benchmark transaction sets and two synthetic transaction sets show that compared with Hamm， YBHU （Yekutieli-Benjamini resampling for High Utility pattern mining） and other algorithms， the proposed algorithm outputs the least in terms of the number of patterns， with more than 97.8% of tested patterns moved. In terms of mode quality， the proportions of false positive discriminative high utility patterns of the proposed algorithm are less than 5.2% and the classification accuracies of constructed features of the proposed algorithm are at least 1.5 percentage points higher than those of the compared algorithms. Additionally， in terms of running time， although the proposed algorithm is slower than Hamm algorithm， it is faster than the other three algorithms based on statistical significance testing. It can be seen that the proposed algorithm can effectively eliminate a certain number of false positive and redundant discriminative high-utility patterns， exhibits superior mining performance， and achieves higher operational efficiency.

Incremental missing value imputation algorithm for time series based on diffusion model

Xingjie FENG, Xingpeng BIAN, Xiaorong FENG, Xinglong WANG

2025, 45(8): 2582-2591. DOI: 10.11772/j.issn.1001-9081.2024071046

Asbtract ( )

HTML ( )

PDF (3896KB) ( )

Figures and Tables | References | Related Articles | Metrics

It is a common issue in time series to encounter missing data， which complicates subsequent time series analysis. Effective missing value imputation is crucial for improving data quality and mining data value. However， attention modules designed for complete data in time series prediction tasks are often used in the existing imputation algorithms， which are insufficient for extracting spatio-temporal features from time series with missing values. Additionally， it is rare for the existing imputation algorithms to perform in-depth research on imputation patterns， as they underestimate the intermediate values generated during imputation process， so that there is still room for improvement in the accuracy of the imputation. In view of the above problems， an Incremental missing value Imputation algorithm for Time series based on Diffusion Model （I2TDM） was proposed. In I2TDM， to enhance the feature extraction capabilities for time series with missing values， a temporal attention module was incorporated into the traditional diffusion model. At the same time， to improve stability and accuracy of the imputation algorithm， a novel incremental imputation algorithm was proposed to use the incremental selection module to retain partial intermediate imputation values. Experimental results of imputation experiments on 3 datasets — Air Quality Index （AQI）， Electricity Transformer Temperature （ETT） and Weather show that compared with baseline models such as CSDI， SAITS， and PriSTI， the I2TDM achieves a reduction of at least 2.92% in the Mean Absolute Error （MAE） metric and at least 3.49% in the Root Mean Square Error （RMSE） metric， which demonstrates the effectiveness of I2TDM in improving missing value imputation accuracy of time series.

Multi-task social item recommendation method based on dynamic adaptive generation of item graph

Yi WANG, Yinglong MA

2025, 45(8): 2592-2599. DOI: 10.11772/j.issn.1001-9081.2024071038

Asbtract ( )

HTML ( )

PDF (2217KB) ( )

Figures and Tables | References | Related Articles | Metrics

Besides considering the social relationships between users， mining the implicit relationship features between items also plays a crucial role in enhancing the representation learning ability of users and items. The static item graph construction process in current social item recommendation systems is difficult to capture the latent relationships between items accurately， and the subsequent graph fusion process lacks deep interaction， thereby limiting the related models’ ability to understand complex and multi-level relationships among multi-graph features. To this end， a Multi-Task social item recommendation method based on Dynamic Adaptive Generation of item graph （MTDAG） was proposed. Firstly， in joint training process based on Multi-Task Learning （MTL）， the item graph dynamic generation module was used to adjust item graph structure adaptively by combining feedback information from downstream recommendation tasks. Secondly， the social item recommendation module was used to propagate the feature representations of users and items among input graphs iteratively through a deep multi-graph feature fusion method. Finally， extensive experiments on two public datasets， Yelp and Ciao， of comparing MTDAG and six baseline methods such as ECGN （Efficient Complementary Graph convolutional Network） and MGL （Meta Graph Learning framework） were carried out. Experimental results show that MTDAG improves all the Hit Rate （HR）， Recall， and Normalized Discounted Cumulative Gain （NDCG） by at least 3%， and the robustness of MTDAG is fully verified in evaluation experiments for cold user and cold item recommendation， indicating that MTDAG can solve the recommendation problem of cold user and cold item with sparse interaction to some extent.

Secure and efficient frequency estimation method based on shuffled differential privacy

Yan YAN, Feifei LI, Yaqin LYU, Tao FENG

2025, 45(8): 2600-2611. DOI: 10.11772/j.issn.1001-9081.2024070911

Asbtract ( )

HTML ( )

PDF (3001KB) ( )

Figures and Tables | References | Related Articles | Metrics

Shuffled Differential Privacy （SDP） models can balance the degree of privacy protection at user side and the usability of published results at server side. Therefore， they are more suitable for privacy-preserving big data collection and statistical publishing scenarios. Aiming at the problems of low shuffling efficiency and insufficient shuffling process security of the existing SDP frequency estimation methods， the following work was performed： firstly， an SDP Blind Signature Algorithm （SDPBSA） was designed on the basis of optimized elliptic curve to achieve discrimination of tampered or forged information， thereby improving the security of shuffling process. Then， a Matrix Column Rearrangement Transposition （MCRT） shuffling method was proposed to realize data shuffling by random matrix column rearrangement and matrix transposition operations， thereby improving the efficiency of shuffling process. Finally， above methods were combined to construct a complete SDP frequency estimation privacy protection framework — SM-SDP （SDP based on blind Signature and Matrix column rearrangement transposition）， and its privacy and error level were analyzed theoretically. Experimental results on datasets such as Normal， Zipf， and IPUMS （Integrated Public Use Microdata Series） demonstrate that the MCRT shuffling method improves the shuffling efficiency by about 1 to 2 orders of magnitude compared to shuffling methods such as Fisher-Yates， ORShuffle （Oblivious Recursive Shuffling）， and MRS （Message Random Shuffling）； SM-SDP framework reduces the Mean Squared Error （MSE） by 2 to 11 orders of magnitude in the presence of different proportions of malicious data compared to frequency estimation methods such as mixDUMP， PSDP （Personalized Differential Privacy in Shuffle model）， and HP-SDP （Histogram Publication with SDP）.

Dynamic searchable encryption scheme based on puncturable pseudorandom function

Yundong LIU, Xueming WANG

2025, 45(8): 2612-2621. DOI: 10.11772/j.issn.1001-9081.2024071011

Asbtract ( )

HTML ( )

PDF (1426KB) ( )

Figures and Tables | References | Related Articles | Metrics

Dynamic searchable encryption has attracted wide attention due to its ability to add， delete， and search data on cloud servers. The existing dynamic searchable encryption schemes are usually constructed using highly secure cryptographic primitives， and multiple bilinear pairing operations need to be performed during scheme searching. In view of the large computational overhead of dynamic searchable encryption schemes when searching on servers， a Puncturable PseudoRandom Function （PPRF） was introduced into dynamic searchable encryption， and a dynamic searchable encryption scheme based on PPRF was designed and proposed. In this scheme， file identifiers did not need to be encrypted using symmetric encryption algorithms， and it is also not necessary to decrypt ciphertext to obtain file identifiers during server searches， and the client and the server were able to complete data search with only one interaction. At the same time， in the scheme， the key was marked when deleting keywords， the marked key was used to calculate the PPRF during search， and backward security was implemented with a forward-secure scheme， thereby ensuring security while improving search efficiency. According to security model of the dynamic searchable encryption scheme， the security of the scheme was verified. Simulation results show that compared with ROSE （RObust Searchable Encryption） scheme built on Key-Updatable Pseudorandom Function （KUPRF）， Janus++ scheme built on Symmetric Puncturable Encryption （SPE）， and Aura scheme built on Symmetric Revocable Encryption （SRE）， the proposed scheme has the average search time of each keyword reduced by 17%， 65%， and 58%， respectively. It can be seen that the proposed scheme is effective and feasible， and reduces the search cost of the server effectively， improves the search efficiency of the scheme， and increases the practicality of the scheme.

Anonymous and traceable authentication key agreement protocol in intelligent vehicle networking systems

Xiaojun ZHANG, Zhouyang WANG, Lei LI, Haoyu TANG, Jingting XUE, Xinpeng ZHANG

2025, 45(8): 2622-2629. DOI: 10.11772/j.issn.1001-9081.2024081137

Asbtract ( )

HTML ( )

PDF (1345KB) ( )

Figures and Tables | References | Related Articles | Metrics

Intelligent vehicle networking systems are core components of intelligent modern urban transportation systems， and are crucial for traffic information sharing and security management. Privacy protection authentication is the main approach to maintain the security of intelligent vehicle networking systems， among which， protecting identity privacy and tracking malicious nodes are particularly important. Most existing protocols protect users’ privacy through anonymous identity， but they do not trace anonymous identity. There is a situation where malicious users evade traffic accident accountability by forging or tampering with anonymous identity information. To address these security threats， an efficient anonymous and traceable authentication key agreement protocol based on elliptic curve in intelligent vehicle networking systems was proposed. In particular， when the roadside base station unit received an authentication request， it would perform security verification on the intelligent vehicle’s signature and anonymous identity， and finally achieve bidirectional authentication of key agreement. The vehicle would maintain anonymous authentication authority until be revoked by intelligent vehicle networking systems. The protocol was designed on the basis of an elliptic curve identity cryptosystem， thereby avoiding expensive calculation of bilinear pairwise mapping operations. Experimental results show that compared with the Public Key Infrastructure （PKI） authentication protocol， the protocol based on pseudo-identity and Hash Message Authentication Code （HMAC）， the protocol based on Physical Unclonable Function （PUF）， the distributed intelligent vehicle networking system protocol， and the protocol based on bilinear pairwise mapping， the proposed protocol has the lowest communication cost， while its computational cost is roughly equivalent to that of the distributed intelligent vehicle networking system protocol， which has the lowest computational cost among the compared protocols. Security analysis and performance evaluation show that the proposed protocol protects users’ privacy in intelligent vehicle networking systems， has efficient computation performance in anonymous authentication process， and thus can be deployed in intelligent transportation systems effectively.

P-Dledger： blockchain edge node security architecture

Di WANG

2025, 45(8): 2630-2636. DOI: 10.11772/j.issn.1001-9081.2024111579

Asbtract ( )

HTML ( )

PDF (3808KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to the issues of open deployment environment， weak security measures， vulnerability to security attacks， and insufficient computing and network resources of blockchain edge nodes， a blockchain security architecture based on Trusted Execution Environment （TEE）， named P-Dledger， was proposed. In this architecture， by constructing a two-stage trust chain， the trustworthiness of the loaded components was ensured on the basis of meeting requirements for convenient software iteration； by constructing a trustworthy execution framework for smart contracts and a trustworthy data storage based on Serial Peripheral Interface Not OR Flash （SPI NOR Flash）， the trustworthy computation of smart contracts and the secure storage of data were guaranteed. Additionally， a monotonically increasing unique identifier was assigned to consensus proposals to restrict the behavior of Byzantine nodes. Experimental and analysis results demonstrate that this architecture ensures the security and trustworthiness of loaded entities， ledger data， and execution processes. When the network latency exceeds 60 ms or the number of nodes is greater than 8， P-Dledger achieves higher throughput than blockchain systems employing Practical Byzantine Fault Tolerance （PBFT） algorithm， and P-Dledger has more stable performance as network latency and the number of nodes increase.

Design and implementation of FPGA hardware structure optimization based on R2²FFT algorithm

Hailin XIAO, Yudong YANG, Ziyi YANG, Hailong LIU, Yu WANG, Zhongshan ZHANG, Xiaoming DAI

2025, 45(8): 2637-2645. DOI: 10.11772/j.issn.1001-9081.2024071010

Asbtract ( )

HTML ( )

PDF (3291KB) ( )

Figures and Tables | References | Related Articles | Metrics

A design and implementation method of Field Programmable Gate Array （FPGA） hardware structure optimization based on Radix 2² Fast Fourier Transformation （R2²FFT） algorithm was proposed to solve the problem that Fast Fourier Transformation （FFT） algorithm requires a lot of resources and time to process large-scale data and thus leading to a low operation speed. Firstly， by using R2²FFT algorithm， a Y-shaped dual parallel array structure combining a sequence conversion function and a pipeline structure was constructed， which reduced the usage number of hardware multipliers and increases throughput of hardware structure， so as to improve operation speed of FFT algorithm on FPGA. Secondly， correlation characteristics of the twiddle factors were adopted in single-stage operation of the R2²FFT pipeline to optimize the on-chip storage resource consumption and reduce storage space by about 50.00%. Finally， scalability of hardware structure was further improved in realizing expansion operations such as $2 N$ points and $4 N$ points based on optimization of N-point R2²FFT algorithm structure. Verilog HDL language and Modelsim were utilized to implement hardware design and simulation， respectively. Then， the proposed method was synthesized and placed-and-routed by using Vivado2018.3 software， and performance analysis was carried out. Experimental results show that compared with four improved FFT hardware implementation methods， the proposed method reduces the operation time by 75.10%， 95.34%， 38.49%， and 49.20%， respectively， which shows significant improvement of the method in operation speed. At the same time， the resource consumption of the proposed method is reasonable and the consumption proportion of the method is low， and the method has characteristics of low operation power consumption and strong scalability.

Improved multi-layer perceptron and attention model-based power consumption prediction algorithm

Chao JING, Yutao QUAN, Yan CHEN

2025, 45(8): 2646-2655. DOI: 10.11772/j.issn.1001-9081.2024081092

Asbtract ( )

HTML ( )

PDF (3117KB) ( )

Figures and Tables | References | Related Articles | Metrics

Although the use of heterogeneous computing systems can accelerate the processing of neural network parameters， it also increases system power consumption significantly. Good power consumption prediction methods are fundamental for optimizing power consumption in heterogeneous systems and handling multi-type workloads. Based on the above， by improving multi-layer perceptron and attention model， a power consumption prediction algorithm was proposed for CPU/GPU heterogeneous computing systems with multi-type workloads. Firstly， considering server power consumption and system features， a workload power consumption model based on features was established. Then， to address the issue that the existing power consumption prediction algorithms cannot solve long-range dependence between system features and system power consumption， an improved power consumption prediction algorithm based on multi-layer perceptron-attention model was proposed， namely Prophet. In the algorithm， the multi-layer perceptron was modified to extract system features at different moments， and the attention mechanism was employed to synthesize these features， so that the long-range dependency problem between system features and power consumption was solved effectively. Finally， the experiments were conducted on real heterogeneous systems， and the proposed algorithm was compared with the power consumption prediction algorithms such as MLSTM_PM （Power consumption Model based on Multi-layer Long Short-Term Memory） and ENN_PM （Power consumption Model based on Elman Neural Network）. Experimental results show that Prophet achieves higher prediction accuracy， reducing the Mean Relative Error （MRE） for workloads blk， memtest， and busspd by 1.22， 1.01， and 0.93 percentage points， respectively， compared to MLSTM_PM， and has low complexity， indicating the proposed algorithm’s effectiveness and feasibility.

Dual-population dual-stage evolutionary algorithm for complex constrained multi-objective optimization problems

Zhichao YUAN, Lei YANG, Jinglin TIAN, Xiaowei WEI, Kangshun LI

2025, 45(8): 2656-2665. DOI: 10.11772/j.issn.1001-9081.2024081130

Asbtract ( )

HTML ( )

PDF (2608KB) ( )

Figures and Tables | References | Related Articles | Metrics

For Constrained Multi-Objective Optimization Problem （CMOP） with complex constraints， balancing the algorithm's convergence and diversity effectively while ensuring strict constraint satisfaction is a significant challenge. Therefore， a Dual-Population Dual-Stage Evolutionary Algorithm （DPDSEA） was proposed. In this algorithm， two independently evolving populations were introduced： the main and secondary populations， and the feasibility rules and an improved epsilon constraint handling method were used for updating， respectively. In the first stage， the main and secondary populations were employed to explore the Constrained Pareto Front （CPF） and the Unconstrained Pareto Front （UPF）， respectively， to obtain positional information about the UPF and the CPF. In the second stage， a classification method was designed to classify CMOPs based on positions of the UPF and the CPF， thereby executing specific evolutionary strategies for different types of CMOPs. Additionally， a random perturbation strategy was proposed to perturb the secondary population evolved near the CPF randomly to generate some individuals on the CPF， thereby promoting convergence and distribution of the main population on the CPF. Finally， experiments were conducted on LIRCMOP and DASCMOP test sets to compare the proposed algorithm with six representative algorithms： CMOES （Constrained Multi-Objective Optimization based on Even Search）， dp-ACS （dual-population evolutionary algorithm based on Adaptive Constraint Strength）， c-DPEA （Dual-population based Evolutionary Algorithm for constrained multi-objective optimization）， CAEAD （Constrained Evolutionary Algorithm based on Alternative Evolution and Degeneration）， BiCo （evolutionary algorithm with Bidirectional Coevolution）， and DDCMOEA （Dual-stage Dual-Population Evolutionary Algorithm for Constrained Multiobjective Optimization）. The results show that DPDSEA achieves 15 best Inverted Generational Distance （IGD） values and 12 best Hyper Volume （HV） values in 23 problems， demonstrating DPDSEA’s performance advantages in handling complex CMOPs significantly.

Lagrangian particle flow simulation by equivariant graph neural network

Quan JIANG, Wenqing HUANG, Zhiyong GOU

2025, 45(8): 2666-2671. DOI: 10.11772/j.issn.1001-9081.2024111601

Asbtract ( )

HTML ( )

PDF (2667KB) ( )

Figures and Tables | References | Related Articles | Metrics

Graph Neural Network （GNN） is increasingly applied to complex fluid system predictions due to the superior capability in handling structured grids and strong combinatorial generalization. However， from a Lagrangian mesh-free perspective， GNN has unpredictable output variations when processing fluid particle information subjected to translation， rotation， or reflection transformations. To address this problem， an Equivariant Graph Neural Network-based Simulator （EGNS） method was proposed. Geometric vectors were first converted into relative equivariants. Then， equivariant message passing was employed at each step to ensure equivariance of entire neural network， maintaining consistency of spatial transformations between output and input equivariants. Finally， the optimized EGNS model was obtained by training in particle trajectories simulated with the Smoothed Particle Hydrodynamics （SPH） method. Experimental results on public fluid simulation datasets demonstrate that EGNS has superior predictive performance compared with Graph Neural Network-based Simulator （GNS）； specifically， EGNS achieves better accuracy in predicting fluid particle movement， velocity， and typical details， decreasing Mean Squared Error （MSE） in predicting particle positions by about 16%.

Modulation recognition network for complex electromagnetic environments

Jin ZHOU, Yuzhi LI, Xu ZHANG, Shuo GAO, Li ZHANG, Jiachuan SHENG

2025, 45(8): 2672-2682. DOI: 10.11772/j.issn.1001-9081.2025010117

Asbtract ( )

HTML ( )

PDF (5665KB) ( )

Figures and Tables | References | Related Articles | Metrics

Automated Modulation Recognition （AMR） plays a critical role in wireless communications. A Denoising & Dual-modal Attention CNN-Transformer （D-DmACT） was proposed to address problems of poor transfer ability and insufficient abilities to distinguish noise and modulation signal features of AMR networks in complex electromagnetic environments. Firstly， a generator to generate complex electromagnetic interference iteratively and a discriminator to be against interference were proposed to enhance generalization ability of the network when encountering complex electromagnetic environments. Secondly， a complex attention-based Transformer module was designed to capture time-domain features of In-phase and Quadrature （IQ） signals， and a coordination attention module based on time-frequency information was proposed to acquire features of time-frequency images， then the features were crossed and fused. Thirdly， temporal sequence complex signals and time-frequency image obtained by the generator were sent to the dual-modal attention fusion model. Finally， lightweight classification and recognition were implemented. Experiments were conducted on datasets RadioML2016.10A and RadioML2018.01a under white Gaussian noise and complex electromagnetic environment， respectively. Experimental results with impulsive noise show that compared with CLDNN（Convolutional Long short-term Deep Neural Networks）， Residual Network （ResNet）， and LSTM（Long Short-Term Memory） Network， the proposed network has the average recognition accuracy increased by 53.98%， 28.82%， and 24.64%； compared with Multi-Modal approach toward AMR （MM-Net）， Threshold Autoencoder Denoiser Convolutional Neural Network （TADCNN）， and Generative Adversarial Network & Multi-modal Attention mechanism CNN-LSTM （GAN-MnACL）， the proposed network has the average recognition accuracy enhanced by 19.74%， 13.55% and 11.17%， respectively. In terms of computational complexity， deployability of the proposed network is validated through metrics such as parameters and FLoating point OPerations （FLOPs）.

DTOps： integrated development and operations method for digital twin systems

Ronghua MIAO, Yicheng SUN, Sen WANG, Yanting WU, Ming DU, Jinsong BAO

2025, 45(8): 2683-2693. DOI: 10.11772/j.issn.1001-9081.2024071051

Asbtract ( )

HTML ( )

PDF (6775KB) ( )

Figures and Tables | References | Related Articles | Metrics

To reduce iteration and maintenance time of Digital Twin （DT） systems as well as evolution cost of DT systems， the potential of integrating Development and Operations （DevOps） methodology into DT systems was explored， and an innovative DT system DevOps practice DT Operations （DTOps） was proposed. A service-oriented system architecture was designed for the specific needs and characteristics of DT systems， thereby enhancing scalability and agility of system， and the infrastructure of DTOps as well as the implementation methods of Continuous Integration （CI） and Continuous Delivery （CD） were provided. In a case study of a gear production line， various stages of DTOps were validated using open-source tools， demonstrating feasibility and convenience of DTOps. Experimental results show that DTOps improves the evolution efficiency by 29.7% and 26.9% compared to monolithic and microservice architectures， respectively. This effect is particularly significant in highly integrated and data-intensive environments， confirming the effectiveness of DTOps in engineering applications.

Layered solving method for virtual maintenance posture in narrow aircraft space

Zhexu LIU, Aobing ZHANG, Zhiyong FAN

2025, 45(8): 2694-2703. DOI: 10.11772/j.issn.1001-9081.2024071068

Asbtract ( )

HTML ( )

PDF (4570KB) ( )

Figures and Tables | References | Related Articles | Metrics

Virtual simulation of maintainability is an important tool for aircraft structure and systems design， in which the rapid generation of appropriate virtual human maintenance posture is crucial to the efficiency and feasibility of maintainability analysis. Aiming at the problems of low efficiency and limited applicability of the current virtual human maintenance posture solving methods， a virtual maintenance posture layered solving method for the narrow space of aircraft was proposed. In the method， with the waist as the demarcation point， the maintenance posture was decomposed into two parts： upper body and lower body， and Elitist Nondominated Sorting Genetic Algorithm Ⅱ （NSGA-Ⅱ） was used to optimize and solve them respectively to obtain the overall maintenance posture. Firstly， the maintenance posture optimization criterion was constructed by considering the space limitations and human body structure constraints comprehensively. Secondly， multi-objective optimization models of upper body and lower body postures were constructed on the basis of the maintenance posture optimization criterion through analyzing geometry and inverse kinematics principles， and NSGA-Ⅱ was applied to solve them sequentially. Through analysis of cases of disassembly of aircraft cockpit Air Data Module （ADM） and cargo promotion in cargo hold， it is verified that the proposed method can generate virtual maintenance posture in narrow space of aircraft effectively， and has good applicability and feasibility.

Research and implementation of large-scale unmanned aerial vehicle swarm simulation engine based on container

Hengxian TANG, Yuan YAO, Haoxiang KANG

2025, 45(8): 2704-2711. DOI: 10.11772/j.issn.1001-9081.2024081090

Asbtract ( )

HTML ( )

PDF (3780KB) ( )

Figures and Tables | References | Related Articles | Metrics

Simulation engine is critical to the operation of simulation platform. Aiming at the problems of low parallelism， insufficient computing resources and difficulty in expanding of the existing Unmanned Aerial Vehicle （UAV） simulation platforms， a UAV Swarm Containerized Parallel Simulation Engine （USCPSE） with distributed framework and container mechanism was designed and implemented. In the proposed simulation engine， containers were used as the running carriers of UAV virtual entities， and the containers were deployed to multiple parallel simulation nodes to realize large-scale UAV swarm simulation. Besides， based on container live migration technology， a container scheduling strategy integrating communication and computing load was proposed， which was able to migrate containers dynamically according to communication relationships between swarms and computational load changes of simulation nodes， thereby improving comprehensive performance of large-scale UAV swarm simulation. Experimental results show that under clusters with 100， 150 and 200 nodes， compared with Message Passing Interface （MPI）-based parallel simulation architecture， USCPSE increases the speed-up ratio by 22.4%， 59.8% and 101.9%， respectively， and decreases the communication traffic by 51.8% on average.

Thoracic disease classification method based on cross-scale attention network

Jinhao LIN, Chuan LUO, Tianrui LI, Hongmei CHEN

2025, 45(8): 2712-2719. DOI: 10.11772/j.issn.1001-9081.2024071019

Asbtract ( )

HTML ( )

PDF (1831KB) ( )

Figures and Tables | References | Related Articles | Metrics

Automatic identification of thoracic diseases from chest X-rays is a significant area of research in computer-aided diagnosis. However， many existing methods for thoracic disease classification struggle to handle differences in lesion area sizes and often fail to identify and localize the lesion areas of different diseases accurately. To address the above problems， a thoracic disease classification method based on Cross-scale Attention Network （CANet） was proposed. In this method， DenseNet-121 was employed as the feature extraction network， and three main modules were integrated： Self Aware Attention （SAA）， Upward Focus Attention （UFA）， and Downward Guidance Attention （DGA） modules. In the SAA module， the spatial pathological features were refined and the interference from irrelevant areas was reduced by extracting channel and abnormal area information relevant to thoracic diseases. In order to achieve cross-scale interaction of spatial context information of different scales， image feature calibration was performed using the UFA and DGA modules. Additionally， the Spatial Attention Pyramid Pooling （SAPP） module was proposed to fuse multi-scale features from different feature maps， thereby enhancing the detection performance for thoracic diseases. Experimental results on ChestX-ray14 and DR-Pneumonia datasets show that the proposed method has the average Area Under Curve （AUC） values of 83.4% and 82.6%， respectively， outperforming DualCheXNet， A³Net， and CheXGAT methods. Specifically， compared with CheXGAT method， the proposed method improves the average AUC values by 0.7 and 0.1 percentage points， respectively. It can be seen that the proposed method identifies critical information in chest X-rays effectively， improving the performance of thoracic disease classification significantly.

Few-shot skin image classification model based on spatial transformer network and feature distribution calibration

Jing WANG, Jiaxing LIU, Wanying SONG, Jiaxing XUE, Wenxin DING

2025, 45(8): 2720-2726. DOI: 10.11772/j.issn.1001-9081.2024071039

Asbtract ( )

HTML ( )

PDF (1746KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep learning-based image classification methods typically require a lot of labeled data. However， in classification task of skin lesions in the medical field， collecting a lot of image data faces numerous challenges. To classify few-shot skin diseases accurately， a few-shot classification model based on Spatial Transformer Network （STN） and feature distribution calibration was proposed. Firstly， transfer learning and meta-learning were integrated to address the overfitting issue in cross-domain few-shot transfer. Secondly， a rotation angle prediction task was inserted before the pre-training classification task to better adapt the model to the high complexity of medical image data. Thirdly， after downsampling the images， a STN was introduced to perform affine transformations on the input images explicitly， thereby enhancing feature extraction and recognition capabilities. Finally， feature distribution calibration was used to constrain new class features， and the nearest centroid algorithm was introduced for classification decisions， thereby reducing algorithm complexity while improving classification accuracy significantly. Experimental results on ISIC2018 skin lesion dataset show that compared to the current mainstream few-shot model Meta-Baseline， the proposed model has the accuracy improvements of 11.80 and 10.82 percentage points in 2-way and 3-way classification tasks， respectively； compared to the model MetaMed， the proposed model has the average accuracy improvements of 6.65 and 9.58 percentage points in 2-way 3-shot and 3-way 3-shot classification tasks， respectively. It can be seen that the proposed model improves the classification accuracy of few-shot skin diseases effectively， and can assist doctors better in enhancing clinical diagnosis accuracy.

Table of Content