Journal of Computer Applications

Review of operational mechanisms for blockchain-based decentralized science systems

Yuqi PENG, Jiaolong CHEN, Jiaqi YAN

2025, 45(11): 3407-3415. DOI: 10.11772/j.issn.1001-9081.2024121847

Asbtract ( )

HTML ( )

PDF (1077KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the rapid development and widespread application of emerging technologies such as blockchain， the Decentralized Science （DeSci） movement has emerged， aiming to improve the traditional centralized scientific system and build an open and inclusive autonomous scientific ecosystem. To address challenges in the centralized scientific system， including issues in information flow， resource allocation， and evaluation mechanisms， the operational mechanisms of blockchain-based DeSci system were reviewed. Firstly， the concept of DeSci and its current applications were introduced. Secondly， through a literature review of DeSci， the operational mechanisms of scientific systems were classified into four aspects — asset management， token economics， organizational governance， and trust construction — from both economic and social perspectives. The current state of the scientific system and the advantages of blockchain technology in addressing existing limitations were summarized according to these categories. Next， the operational mechanisms of decentralized scientific systems based on blockchain and related technologies were analyzed in detail. Finally， the future decentralized scientific ecosystem was explored from the perspective of the entire scientific research process， with challenges and future opportunities faced by decentralized scientific system practices discussed. The review suggests that blockchain-based DeSci systems have the potential to enhance the existing scientific framework， and their operational mechanisms can provide both theoretical foundation and practical directions for developing an autonomous scientific ecosystem.

Survey of DDoS protection research based on blockchain

Mei TANG, Wunan WAN, Shibin ZHANG, Jinquan ZHANG

2025, 45(11): 3416-3423. DOI: 10.11772/j.issn.1001-9081.2024121850

Asbtract ( )

HTML ( )

PDF (1234KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the escalating severity of cybersecurity threats， Distributed Denial of Service （DDoS） attacks remain a persistent challenge in network security research. Traditional DDoS protection solutions usually rely on centralized architectures， which suffer from single point of failure， data tampering and other problems， and are difficult to deal with complex and diverse attack scenarios. Blockchain technology provides a new solution for DDoS protection with its characteristics of decentralization， immutability and transparency. In view of the technical challenges in DDoS protection， the progress of blockchain-based DDoS protection was summarized. Firstly， the basic concepts of DDoS attacks and their threats to environments such as traditional networks， Internet of Things （IoT） and Software Defined Networking （SDN） were introduced， and the necessity and potential advantages of introducing blockchain technology were analyzed. Secondly， from the aspects of blockchain combined with smart contracts， deep learning， cross-domain collaboration， and so on， the existing DDoS protection mechanisms were reviewed and compared. Finally， considering the technical difficulties in blockchain performance optimization， multi-domain collaboration， and real-time response， the future development directions of blockchain-based DDoS protection technology were prospected， providing theoretical references for researchers in the field of cybersecurity and further promoting the practical applications of blockchain in DDoS protection.

Adaptive online blockchain sharding algorithm based on trusted execution environments

Fei WANG, Hengdi WANG, Konglin ZHU, Lin ZHANG

2025, 45(11): 3424-3431. DOI: 10.11772/j.issn.1001-9081.2024121839

Asbtract ( )

HTML ( )

PDF (1086KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the performance bottleneck caused by multi-round inter-shard communication in cross-shard transaction protocols， an adaptive online blockchain sharding algorithm based on Trusted Execution Environments （TEE） was proposed. The algorithm optimizes the execution process of cross-shard transactions， reducing communication overhead and improving system throughput. Firstly， an adaptive online sharding algorithm was designed， which delayed the allocation time of transactions to shards， allowing related transactions to be clustered together， thereby reducing the number of cross-shard transactions and minimizing communication overhead. Secondly， by combining TEE technology， off-chain cross-shard transactions were securely and efficiently executed， eliminating the need for multi-round inter-shard communication in traditional schemes. Finally， a one-sided feedback optimization algorithm was introduced to dynamically adapt to changes in transaction patterns based on current system status and transaction demands， optimizing the sharding strategy in real time. Experimental results showed that compared with the random sharding algorithm， the proposed algorithm increased throughput by 35%. By reducing unnecessary communication and computational overhead， the proposed algorithm significantly improves overall system performance， while ensuring the security of cross-shard transactions. It is suitable for blockchain systems requiring high throughput and low latency， and has considerable application value.

Multi-domain access control scheme in blockchain based on SM2 homomorphic encryption

Bimang SUN, Wunan WAN, Shibin ZHANG, Jinquan ZHANG

2025, 45(11): 3432-3439. DOI: 10.11772/j.issn.1001-9081.2024121849

Asbtract ( )

HTML ( )

PDF (816KB) ( )

Figures and Tables | References | Related Articles | Metrics

Addressing the issues of attribute privacy leakage and insufficient scalability in existing blockchain multi-domain access control models， a Cross-Chain based Multi-Domain Access Control Model （CC-MDACM） was proposed. Firstly， based on Attribute-Based Access Control （ABAC） and relay chain technology， a cross-blockchain multi-domain access control model was proposed， enabling autonomous authorization within domains and fine-grained access control across heterogeneous blockchains through the relay chain between domains. Secondly， by combining a threshold homomorphic encryption algorithm based on SM2 with zero-knowledge proof technology， a cross-blockchain multi-domain access control scheme with dual concealment of attributes and policies as well as scalability was proposed. This scheme allowed data to be verified and decrypted by distributed nodes on the relay chain and facilitated access control decisions in the ciphertext state. Attributes and policies were protected through dual concealment， and access control policies were dynamically extended. Additionally， Raft consensus was adopted to ensure the reliability of decryption. Finally， the proposed scheme was analyzed by security theoretical analysis and simulation experiments. The results demonstrate that， while ensuring dual concealment of attributes and policies and supporting dynamic expansion of access policies， the proposed scheme effectively resolves the multi-domain access control problem across heterogeneous blockchains. Compared to the Distributed Two trapdoor Public Key Cryptosystem （DT-PKC）， encryption and decryption efficiencies of the proposed scheme were improved by 34.4% and 44.9%， respectively.

Semi-centralized cross-chain bridge design based on hash time lock and incentive mechanism

Tingda SHEN, Konglin ZHU, Lin ZHANG

2025, 45(11): 3440-3445. DOI: 10.11772/j.issn.1001-9081.2024121848

Asbtract ( )

HTML ( )

PDF (726KB) ( )

Figures and Tables | References | Related Articles | Metrics

Cross-chain bridge technologies in blockchain ecosystems are categorized into centralized and decentralized types. Centralized bridges， due to asset concentration， are vulnerable to attacks with potentially significant losses， while decentralized bridges align with the trustless principle but face challenges such as high resource demands， prolonged implementation cycles， and poor scalability issues. Limited research has addressed balancing single points of failure and operational efficiency in decentralized designs. To address these challenges， a semi-centralized cross-chain bridge architecture that integrates centralized and decentralized models was proposed. In this architecture， initial node services were provided by a central server， while decentralized services， including routing and staking， were facilitated by smart contracts on the blockchain. Firstly， participation of blockchain nodes was incentivized through a reward mechanism， and the trust was established via remote attestation using code signing and hash value verification. Secondly， the verified nodes were incorporated into a decentralized routing table to participate in cross-chain transaction verification and auditing. Finally， cross-chain transactions were validated using Hash Time Locked Contract （HTLC）. Experimental results on transaction costs and latency demonstrate that the proposed architecture reduces transaction latency to ［24，36］ seconds with a per-transaction cost of 403 299 gas， comparable to that of centralized cross-chain bridges； a security analysis identifies three typical cross-chain bridge attacks and corresponding solutions. The proposed semi-centralized cross-chain bridge architecture achieves performance comparable to that of centralized cross-chain bridges while maintaining the security of decentralized cross-chain bridges， balancing security， efficiency， and transaction costs in cross-chain operations.

Blockchain storage optimization strategy based on consistent hashing

Minghao LIU, Jianlei HONG, Chengxiang WANG, Jindong ZHAO

2025, 45(11): 3446-3452. DOI: 10.11772/j.issn.1001-9081.2024121836

Asbtract ( )

HTML ( )

PDF (867KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the storage challenges caused by the rapid growth of blockchain data， an improved consistent hashing algorithm was proposed to achieve blockchain storage expansion. To address the issues of uneven storage load and data skew of nodes in Hyperledger Fabric for enterprise-level applications， an improved solution based on consistent hashing algorithm was proposed， namely Hash algorithm based on Virtual Node allocation and Dynamic Weight Strategy （VNDWS）. Firstly， a virtual node allocation mechanism was adopted to dynamically assign multiple virtual nodes to each physical node， ensuring an even distribution of data across the hashing ring and reducing load imbalance. Secondly， a dynamic weight mechanism was applied to adjust weights in real-time based on performance indicators such as node storage capacity and network latency， enabling high-performance nodes to handle larger data load， and thereby optimizing data distribution and storage efficiency. Simulation experimental results showed that compared to the traditional blockchain Fabric network and the conventional consistent hashing algorithm， VNDWS reduced node storage consumption by 48.31 and 6.39 percentage points， respectively， while improving data query efficiency by 96.25% and 21.95%. VNDWS effectively reduces node storage consumption and enhances query efficiency in terms of storage expansion.

Blockchain-based deduplication and data integrity audit scheme

Tingting GAO, Zhongyuan YAO, Miao JIA, Xueming SI, Huanming TAN, Yufeng ZHAN

2025, 45(11): 3453-3462. DOI: 10.11772/j.issn.1001-9081.2024121835

Asbtract ( )

HTML ( )

PDF (1045KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of data redundancy and data integrity protection in current cloud storage systems， a blockchain-based cloud storage data deduplication and integrity audit scheme was proposed. This approach combined deduplication technology with a blockchain distributed auditing mechanism， ensuring data confidentiality and integrity while achieving deduplicated storage. Firstly， the Message-Locked Encryption （MLE） method was utilized to generate identical encrypted ciphertexts for the same data from different users. For duplicate data uploaders， ownership verification was conducted using a Verkle Tree （VT）-based Proof of Ownership （PoW） mechanism， thus enabling secure ciphertext deduplication. Secondly， based on the immutable characteristic of blockchain， an efficient data integrity auditing mechanism was designed， allowing transparent auditing and verification without compromising user data privacy. This mechanism can defend against various threats， including external attackers， malicious behaviors from Cloud Service Providers （CSPs）， and key management risks. Finally， through functional analysis and security analyses such as key security and resistance to collusion attacks， it is demonstrated that the proposed scheme has high security and practicality. Performance analysis results show that the integrity verification time is proportional to the number of challenge blocks， and when the number of challenge blocks is 1 000， the integrity verification time is approximately 85 ms. Experimental analysis results show that the scheme can provide with reliable deduplication and data integrity auditing services for the cloud storage systems.

Blockchain-based disaster recovery backup scheme for power grid status monitoring data in cloud-edge collaboration environments

Lihua ZHANG, Wenbiao WANG, Yi YANG, Jiali LUO

2025, 45(11): 3463-3469. DOI: 10.11772/j.issn.1001-9081.2024121846

Asbtract ( )

HTML ( )

PDF (1679KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address issues in data disaster recovery backup solutions such as low data security， susceptibility to data loss and damage， and excessive reliance on third-party services， a blockchain-based data disaster recovery backup scheme for cloud-edge collaborative grid status monitoring was proposed. Firstly， a cloud-edge collaboration approach was adopted to enhance the efficiency of monitoring data backup. Secondly， the Paillier algorithm and threshold secret sharing were combined to encrypt monitoring data， achieving distributed backup. Finally， an improved Hot-Stuff consensus mechanism was applied to blockchain to optimize its consensus efficiency. Security analysis results demonstrate that the proposed scheme ensures privacy， integrity， unforgeability， and correctness； performance analysis results show that the scheme reduces backup and recovery latency， shortens backup and recovery time， lowers computational overhead， and increases throughput.

Trust management scheme for internet of vehicles based on blockchain and multi-attribute decision making

Xinyang LUO, Wunan WAN, Shibin ZHANG, Jinquan ZHANG

2025, 45(11): 3470-3476. DOI: 10.11772/j.issn.1001-9081.2024121865

Asbtract ( )

HTML ( )

PDF (901KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of conducting reasonable trust evaluations for vehicles and ensuring timely updates of consistent trust values among multiple RoadSide Units （RSUs） in the Internet of Vehicles （IoV）， a trust management scheme for IoV based on blockchain and multi-attribute decision making was proposed on the basis of the existing trust management schemes for IoV， named BCIoVTrust （BlockChain IoV Trust）. Firstly， the comprehensive trust value and the malicious probability indicator of a vehicle were calculated by attribute values and dynamical attribute weights. Secondly， a reward and punishment mechanism was introduced to reduce the time that malicious vehicles stay in the IoV. Finally， a hybrid consensus mechanism was used to dynamically change the blocking out difficulty of the miner node by taking the sum of the absolute values of the vehicles' trust values as the stake. Experimental results show that the scheme can calculate the vehicle trust value more comprehensively and accurately， identify and remove malicious vehicles， and update the trust value stored on the blocks faster to effectively solve the cold-start problem， dynamically adjust the rate of trust decay， reasonably select the optimal recommended nodes， and prevent the malicious vehicles from conspiring and colluding.

Blockchain-based identity authentication scheme for cross-departmental collaboration in e-government

Rui WANG, Heng PAN, Kun LIU, Xueming SI, Bowei ZHANG, Kunyang LI

2025, 45(11): 3477-3485. DOI: 10.11772/j.issn.1001-9081.2024121851

Asbtract ( )

HTML ( )

PDF (1398KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the challenges of complicated credential verification processes， limited credential sharing， and users' repeatedly applying for credentials during cross-department collaborative identity authentication in the digital transformation of government services， a blockchain-based cross-department collaborative identity authentication scheme for e-government was proposed. Firstly， a Verifiable Credential （VC） and its Proof of existence （VC Proof） mechanism were designed to store credential hash values and proof information on the blockchain to enable efficient multi-departmental credential verification. Secondly， an authorized credential mechanism was constructed to facilitate credential interactions between verifiers and relevant departments， thereby reducing the burden on users to repeatedly apply for credentials. Meanwhile， a smart contract-based non-interactive zero-knowledge proof technique was introduced to complete identity authentication while preserving the privacy of VCs. Experimental results show that the proposed scheme has low verification gas consumption， stabilizing at around 500 gas， while the gas consumption for contract deployment increases linearly with the scale of the contract. When the verification gas consumption is 140.55 Gwei， its throughput reaches the highest， about 7×10⁴ TPS （Transactions Per Second）， and when the verification gas consumption increases to 562.562 Gwei， the throughout drops to approximately 2×10⁴ TPS. In addition， compared to experimental results on Ethereum， the proposed scheme demonstrates better performance under the same concurrency conditions， with the average response time reduced by approximately 0.32 seconds.

Improved coin mixing protocol based on verifiable shuffle

Wenjun LI, Xian XU, Mingchao WAN, Chunmiao LI

2025, 45(11): 3486-3492. DOI: 10.11772/j.issn.1001-9081.2024121838

Asbtract ( )

HTML ( )

PDF (619KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the bottlenecks and limitations in performance and security of existing coin mixing protocols on Ethereum， an improved coin mixing protocol based on verifiable shuffle， named EncMix， was developed. Firstly， a verifiable shuffle mechanism was incorporated to ensure that the fund flows of all participants remained completely untraceable. Next， ElGamal encryption combined with Chaum-Pedersen zero-knowledge proof was used to significantly strengthen anonymity. Finally， smart contract logic was optimized to eliminate unnecessary computational steps， thereby reducing the gas consumption required for the coin mixing process. Experimental results showed that， compared to the existing MixEth protocol， EncMix reduced costs by at least 200 000 gas per full coin mixing operation. Furthermore， under random oracle model， the EncMix protocol was proven to possess anonymity， availability， and theft-proof characteristics. In conclusion， EncMix enhances performance and reduces costs while ensuring the security of blockchain transactions， providing more robust technical support for decentralized finance applications and generating considerable economic value.

Deep learning-based vulnerability detection tool for C/C++ smart contracts at function-body slice level

Yushu LI, Ying XING, Siqi LU, Heng PAN, Senchun CHAI, Xueming SI

2025, 45(11): 3493-3501. DOI: 10.11772/j.issn.1001-9081.2024121858

Asbtract ( )

HTML ( )

PDF (1282KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the frequent occurrence of security incidents caused by smart contract vulnerabilities， existing detection tools lack sufficient support for multiple programming languages， particularly in terms of their inability to detect vulnerabilities at the source code level in C/C++ smart contracts. To address this issue， a deep learning-based vulnerability detection method for C/C++ smart contracts was proposed and a function-body slicing-level detection tool CDFSentry was designed. Starting from the perspective of source code， the concept of target regions in deep learning applications within the field of image processing was applied to smart contract vulnerability detection. The implementation of the tool involved four steps： first， extracting function-body slices related to vulnerabilities to obtain complete function-body information； second， annotating the extracted slices； third， encoding the slices into vectors to convert them into input formats suitable for deep learning； four， completing vector labeling and model training. Besides， by analyzing the causes of vulnerabilities in C/C++ smart contracts， five types of vulnerabilities were defined： integer overflow， permission control， token transfer， memory management， and transaction delay， and a dataset containing 5 024 source codes was constructed to solve the problems of insufficient open-source datasets and inconsistent definitions of vulnerability types in this field. Experimental results on this dataset demonstrate that while the comparable deep learning tool GNNSCVulDetector can only detect one type of vulnerability， CDFSentry detects five types with 12.68 percentage points higher accuracy. By leveraging deep learning to detect vulnerabilities in C/C++ smart contract source code， CDFSentry reduces reliance on experts while offering higher detection accuracy and broader coverage than similar tools. In addition， through continuous learning and training， its detection ability can be improved continuously.

Open chain-network integration architecture with semantic-rich non-fungible data element

Jingjing WANG, Mengze CHEN, Ziwei YAN, Jiaxun WANG, Zan LUO, Yakun REN, Kai LEI

2025, 45(11): 3502-3509. DOI: 10.11772/j.issn.1001-9081.2024121845

Asbtract ( )

HTML ( )

PDF (986KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address limitations of cross-chain openness， security issues， network congestion， and increased transaction costs in the circulation mechanism of existing Non-Fungible Token （NFT）， an open semantic-rich Non-Fungible Data Element （NFDE） chain-network integration architecture named IEN NFDE （Intelligent Eco Networking Non-Fungible Data Element） was designed. Firstly， a transmission protocol for the overlay network was designed， encompassing naming rules， data structure and encoding rules of packets， as well as a data forwarding and verification mechanism based on named addressing. Then， the semantic-rich NFDE was defined by encapsulating metadata and digital assets themselves into an independent digital entity with semantic attributes， enabling free circulation of on-chain and off-chain data， and decoupling the binding limitations between blockchain systems and data elements， and improving the flexibility of data element circulation. Finally， a prototype system of the IEN NFDE architecture was designed and implemented， and its performance was tested. Experimental results showed that when the data transmission frequency exceeds 600 times， compared to the NFT trading platforms based on Transmission Control Protocol/Internet Protocol （TCP/IP） architecture， the prototype system of IEN NFDE architecture reduced the average data transmission delay by at least 46.34%， the total data transmission delay by at least 52.43%， the network communication overhead by at least 36.98%， and increased the throughput by at least 135.71%. These metrics indicate that the prototype system of the IEN NFDE architecture significantly improves the efficiency of cross-chain data circulation while markedly reducing the consumption of network resource. The IEN NFDE architecture not only provides a new efficient solution for the efficient circulation of NFDE， but also effectively alleviates network congestion， reduces transaction costs in practical applications， and offers technical support for the healthy development of NFT market.

New federated learning scheme balancing efficiency and security

Jian YUN, Xinru GAO, Tao LIU, Wenjie BI

2025, 45(11): 3510-3518. DOI: 10.11772/j.issn.1001-9081.2024121834

Asbtract ( )

HTML ( )

PDF (1323KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem that implementing privacy-preserving mechanisms in federated learning exacerbates system communication burden， while attempting to improve communication efficiency often sacrifices model accuracy， a new federated learning scheme that balances both efficiency and security — FedPSR （Federated Parameter Sparsification with Secure aggregation and Reconstruction） was proposed. It aims to achieve a balance between model communication efficiency （comprising time complexity and communication overhead） and privacy security. Firstly， the parameter sparsification strategy of the Sparse Ternary Compression （STC） algorithm was used to compress the model parameters to be uploaded into triples， thereby reducing the amount of data transmission. Secondly， to compensate for the information loss caused by parameter compression， the error feedback mechanism accumulated the compression error of the previous round into the local gradient computed in the subsequent round. Finally， Paillier homomorphic encryption technology was applied to ensure the privacy security of parameter transmission and aggregation under the premise of efficient model communication. FedPSR was compared with current cutting-edge schemes on multiple public datasets under both Independent and Identically Distributed （IID） and Non-Independent and Identically Distributed （Non-IID） data scenarios. Experimental results show that FedPSR solves the existing problem of being unable to strike a balance between time complexity， communication overhead and privacy protection， and improves accuracy， convergence and robustness on three mainstream datasets under IID and Non-IID conditions.

Recommendation method based on multi-perspective relation-enhanced knowledge graph

Ke GAN, Xiaofei ZHU, Jiawei CHENG

2025, 45(11): 3519-3528. DOI: 10.11772/j.issn.1001-9081.2024111665

Asbtract ( )

HTML ( )

PDF (1948KB) ( )

Figures and Tables | References | Related Articles | Metrics

Knowledge graph-based recommendation methods learn representations of user and item nodes by integrating relational connections from both item-attribute graph and user-interaction graph to recommend suitable items. However， since knowledge graphs contain both noisy and high-quality relations， the primary challenge lies in circumventing noisy relations while effectively mining valuable relations. Existing methods typically rely on global reconstruction strategies to optimize knowledge graph relations through single manner of pruning noisy relations or mining high-quality relations， thereby learning user and item representations. However， global perspectives struggle to sufficiently capture detailed local information and tend to overlook potential complementarity between local and global information. Furthermore， solely employing pruning or supplementation strategies fail to simultaneously mitigate interference of noisy relations and comprehensively exploit high-quality relations. To address these issues， a Recommendation method based on Multi-Perspective Relation-Enhanced Knowledge Graph （RMPREKG） was proposed. This method alleviated noise impacts in both item-attribute graphs and user?interaction graphs through an item hybrid relation alignment module and an interaction hybrid relation enhancement module， while deeply exploring valuable high-order relations. The item hybrid relation alignment module extracted local and global relations separately via importance pruning strategy and high-order relation mining approach， followed by a knowledge alignment method to synergize both types of information for effectively refining high-quality item auxiliary information. The interaction hybrid relation enhancement module constructed local hybrid pruning relation graph and global hybrid supplementation relation graph， with cross-channel and cross-layer contrastive learning employed to enhance their informational complementarity， thereby comprehensively learning user and item representations. Finally， a hierarchical gated adaptive fusion method was adopted to integrate multiple sets of user and item embeddings for recommendation. When the recommendation length is 20， compared to VRKG4Rec （Virtual Relational Knowledge Graphs for Recommendation）， RMPREKG achieves a 10.17% improvement in Normalized Discounted Cumulative Gain （NDCG） on the Last.FM dataset and a 1.13% improvement in NDCG on the MovieLens-1M dataset.

Multiscale information diffusion prediction model based on hypergraph neural network

Jinghua ZHAO, Zhu ZHANG, Xiting LYU, Huidan LIN

2025, 45(11): 3529-3539. DOI: 10.11772/j.issn.1001-9081.2024111657

Asbtract ( )

HTML ( )

PDF (993KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the limitations of existing multiscale information diffusion prediction models， which ignore the dynamic characteristic of cascade propagation and exhibit limited performance in independent microscopic information prediction， a Multiscale Information Diffusion prediction model based on HyperGraph Neural Network （MIDHGNN）was proposed. Firstly， Graph Convolutional Network （GCN） was used to extract user social relationship features from the social network graphs， while HyperGraph Neural Network （HGNN）was used to extract global user preference features from propagation cascade graphs. These two types of features were fused to enable microscopic information diffusion prediction. Secondly， Gated Recurrent Unit （GRU） was employed to sequentially predict potential spreaders until reaching virtual users. The cumulative number of predicted users at each step was regarded as the determined cascade size for macroscopic propagation forecasting. Finally， a Reinforcement Learning （RL） framework using policy gradient to optimize parameters significantly enhanced macroscopic information diffusion prediction performance. For microscopic information diffusion prediction， compared to the suboptimal model， MIDHGNN achieves average improvements of 12.01%， 11.64%， and 9.74% in Hits@k on Twitter， Douban， and Android datasets， respectively， and average improvements of 31.31%， 14.85%， and 13.24% in mAP@k. For macroscopic prediction， MIDHGNN reduces the Mean Squared Logarithmic Error （MSLE） by at least 8.10%， 12.61%， and 3.24% on these three datasets， respectively， with all metrics significantly outperforming the comparison models， validating its effectiveness.

Multimodal knowledge graph link prediction method based on fusing image and textual information

Huilin GUI, Kun YUE, Liang DUAN

2025, 45(11): 3540-3546. DOI: 10.11772/j.issn.1001-9081.2024111561

Asbtract ( )

HTML ( )

PDF (995KB) ( )

Figures and Tables | References | Related Articles | Metrics

The introduction of multimodal information to enhance knowledge graph link prediction has become a recent hotspot. However， most existing methods typically rely on simple concatenation or attention mechanisms for multimodal feature fusion， ignoring the correlation and semantic inconsistency between different modalities， which may fail to preserve modality-specific information and inadequately exploit the complementary information between modalities. To address these issues， a multimodal knowledge graph link prediction model based on cross-modal attention mechanism and contrastive learning was proposed， namely FITILP（Fusing Image and Textual Information for Link Prediction）. Firstly， pretrained models， such as BERT （Bidirectional Encoder Representation of Transformer） and ResNet （Residual Network）， were used to extract textual and visual features of entities. Then， a Contrastive Learning （CL） approach was applied to reduce semantic inconsistencies across modalities. A cross-modal attention module was designed to refine text feature attention parameters using image features， thereby enhancing the cross-modal correlations between text and image features. And Translation models， such as TransE （Translating Embeddings） and TransH （Translation on Hyperplanes）， were employed to generate graph structural， visual， and textual features. Finally， the three types of features were fused to perform link prediction between entities. Experimental results on the DB15K dataset show that the FITILP model improves Mean Reciprocal Rank （MRR） by 6.6 percentage points compared to single-modal baseline TransE， and achieves improvements of 3.95， 11.37， and 14.01 percentage points in Hits@1， Hits@10 and Hits@100， respectively. The results indicate that the proposed method outperforms comparative baseline methods， demonstrating its effectiveness in leveraging multimodal information to enhance prediction performance.

Nested named entity recognition method for multi-directional gradient feature extraction

Xiaoman WANG, Yanping CHEN, Caiwei YANG, Ruizhang HUANG, Yongbin QIN

2025, 45(11): 3547-3554. DOI: 10.11772/j.issn.1001-9081.2024111606

Asbtract ( )

HTML ( )

PDF (1777KB) ( )

Figures and Tables | References | Related Articles | Metrics

Nested Named Entity Recognition （NER） is a fundamental task in natural language processing. Span-based methods treat entity recognition as a span classification task， effectively handling nested entities. Existing methods organize sentence spans into a two-dimensional plane， where each unit represents a span， similar to pixels in an image. Subsequently， edge detection techniques from image processing are employed to enhance and extract semantic edge features of entities in planarized sentence representation by using gradient operators. However， existing gradient operator-based approaches neglect multi-directional edge features between adjacent spans. To address this limitation， a nested NER method for multi-directional gradient feature extraction was proposed. This method treated entity positions as pixels in an image. Leveraging the gradient properties of edges， an eight-direction Sobel operator was employed to extract more comprehensive and discriminative semantic edge features in the planarized sentence representation. The proposed method achieves F1 scores of 88.01% and 81.23% on the ACE 2005 Chinese dataset and the GENIA English dataset， respectively， demonstrating its effectiveness for nested NER tasks. Additionally， it also achieves F1 score of 92.52% on the CoNLL2003 English flat dataset， validating its scalability.

Zero-shot relation extraction model based on dual contrastive learning

Bingjie QIU, Chaoqun ZHANG, Weidong TANG, Bicheng LIANG, Danyang CUI, Haisheng LUO, Qiming CHEN

2025, 45(11): 3555-3563. DOI: 10.11772/j.issn.1001-9081.2024111587

Asbtract ( )

HTML ( )

PDF (1131KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of overlapping relation representations and incorrect relation predictions in Zero-Shot Relation Extraction （ZSRE） caused by similar entities or relations， a Dual Contrastive Learning-based Zero-Shot Relation Extraction （DCL-ZSRE） model was proposed. Firstly， both instances and relation descriptions were encoded using pre-trained encoders to obtain their vector representations. Secondly， a dual contrastive learning was designed to enhance the distinguishability of relation representations： Instance-level Contrastive Learning （ICL） was used to learn mutual information between instances， then the representations of instances and relation descriptions were concatenated； and Matching-level Contrastive Learning （MCL） was applied to learn the associations between instances and relation descriptions， thereby resolving the problem of overlapping relation representations. Finally， the learned representations from contrastive learning were utilized in the classification module to predict unseen relations. Experimental results on FewRel and Wiki-ZSL datasets demonstrate that DCL-ZSRE model significantly outperforms eight state-of-the-art models in terms of precision， recall， and F1-score， especially with the large number of unseen relation categories. With 15 unseen relations， DCL-ZSRE achieves improvements of 4.76， 4.63 and 4.69 percentage points in three indicators over EMMA （Efficient Multi-grained Matching Approach） model on FewRel dataset， and also achieves improvements of 1.32， 2.20 and 1.76 percentage points on Wiki-ZSL dataset. These results confirm that DCL-ZSRE model effectively distinguishes overlapping relation representations， establishing an efficient and robust approach for ZSRE.

Joint extraction model of entities and relations based on memory enhancement and span screening

Shuang LIU, Guijun LUO, Jiana MENG

2025, 45(11): 3564-3572. DOI: 10.11772/j.issn.1001-9081.2024111567

Asbtract ( )

HTML ( )

PDF (959KB) ( )

Figures and Tables | References | Related Articles | Metrics

Entity and Relation Extraction （ERE） is typically handled in a pipeline manner. However， such an approach relies only on the output of the preceding task， resulting in limited information interaction between named entity recognition and relation extraction， and is susceptible to error propagation. To address these challenges， a Memory-Enhanced model for Entity and Relation Extraction （MEERE） was proposed. This model introduced a memory-like mechanism， allowing each task not only to utilize the output of the preceding task， but also to influence it in reverse， thereby capturing complex interactions between entities and relations. To further mitigate error propagation， an entity span screening mechanism was incorporated. This mechanism dynamically screened and verified entity spans in the joint module， ensuring that only high-quality entities were used for relation extraction， thus enhancing the robustness and accuracy of the model. A table decoding method was finally employed to handle relation overlap. Experimental results on three widely used benchmark datasets （ACE05， SciERC， and CoNLL04） demonstrated significant advantages of MEERE in ERE tasks. In specific， on the CoNLL04 dataset， MEERE outperformed Tab-Seq model in both named entity recognition and relation extraction tasks with a 0.5 percentage point increase in named entity recognition F1-score and a 3.0 percentage point improvement in relation strict evaluation F1-score； compared to PURE-F model， MEERE achieved no less than a ninefold acceleration effect with improved relation extraction performance. These findings confirm the effectiveness of the proposed memory enhancement model in exploring interactions between entities and relations.

Neighborhood-attention and topology-aware graph convolution method for robust point cloud registration

Mengnan XU, Hailiang YE, Feilong CAO

2025, 45(11): 3573-3582. DOI: 10.11772/j.issn.1001-9081.2024111625

Asbtract ( )

HTML ( )

PDF (3219KB) ( )

Figures and Tables | References | Related Articles | Metrics

Most existing point cloud registration methods often ignore the relationship between neighboring nodes in the neighborhood， resulting in insufficient feature extraction of local geometric structures. To solve this problem， a Neighborhood-Attention and Topology-Aware graph convolution （NATA） method for robust point cloud registration was proposed to capture deeper semantic features and richer geometric information. Firstly， a cascade geometry-aware module was designed， which used a self-attention-based local neighborhood update graph convolution module to focus on the intrinsic geometric structure of local graphs， thereby obtaining more accurate local topological information. Secondly， a cascade structure combined various levels of local topological information to produce a more discriminative collection of local descriptors. Finally， a feature interaction-graph update module was proposed， which created an attention mechanism in the point clouds to capture their implicit relationships and perceive shape features of the point clouds. Experimental results on a challenging 3D point cloud benchmark test show that the proposed method achieves excellent Mean Absolute Error （MAE） of 0.157 2 and 0.154 4 for partial noisy point clouds registration under unknown shapes and unknown categories， respectively.

Multi-view clustering algorithm based on bipartite graph and consensus graph learning

Shunyong LI, Kun LIU, Lina CAO, Xingwang ZHAO

2025, 45(11): 3583-3592. DOI: 10.11772/j.issn.1001-9081.2024111593

Asbtract ( )

HTML ( )

PDF (1225KB) ( )

Figures and Tables | References | Related Articles | Metrics

Most existing multi-view clustering algorithms suffer from issues such as incomplete fusion mechanisms， insufficient exploration of multi-view collaborative relationships， and weak robustness. These limitations result in low consistency in clustering results and unstable performance under noise and redundant information. To address these issues， a Multi-View Clustering algorithm based on Bipartite Graph and Consensus graph learning （BGC-MVC） was developed to enhance clustering consistency and complementarity by integrating information from multiple views. Specifically， BGC-MVC constructed a bipartite graph to capture neighborhood relationships across different views， and then learned a consensus graph to strengthen inter-view similarity. It integrated embeddings of the original multi-view data into a unified framework that combined graph learning with clustering process， thereby improving the overall clustering performance. Experimental results demonstrate that BGC-MVC achieves significant improvements in accuracy， F-score， Normalized Mutual Information （NMI） and purity under convergence conditions. Notably， on the MSRC_v1 dataset， BGC-MVC outperforms Large-scale Multi-View Subspace Clustering （LMVSC） by increasing the F-score by 19.48 percentage points and exhibits enhanced robustness and accuracy.

Probabilistic generative graph attention network method for multi-dimensional time series root cause analysis

Qiuyan YAN, Hui JIANG, Zhujun JIANG, Boxue LI

2025, 45(11): 3593-3600. DOI: 10.11772/j.issn.1001-9081.2024111574

Asbtract ( )

HTML ( )

PDF (875KB) ( )

Figures and Tables | References | Related Articles | Metrics

Root Cause Analysis （RCA） is of critical importance in aiding rapid system recovery， accurately assessing risks， and ensuring production safety. Addressing the limitations of current RCA methods， which struggle to adequately characterize dependencies among different sensors and fail to capture stochastic fluctuations in time series， a new approach called GPRCA for multi-dimensional time series root cause analysis using a probabilistic generative graph attention network was proposed. This method regarded dimension feature embeddings as Gaussian distribution vectors to characterize latent features of different sensors， effectively capturing stochastic fluctuations in multi-dimensional time series and enhancing the model robustness against noise. Simultaneously， a deep probability generative graph attention network was constructed to learn the nonlinear dependencies between dimensions， thereby effectively modeling dependencies within sensor networks. Finally， the root cause analysis was conducted by integrating the network topology causal score and the individual node causal scores. Experimental results on two public datasets （SWaT and WADI） and one private dataset （Mine） showed that GPRCA achieved the optimal values on certain metrics. Specifically， on the SWaT dataset， GPRCA improved 2.2%， 6.3%， and 11.6% on P@5 （Precision）， mAP@5 （mean Average Precision）， and Mean Reciprocal Rank （MRR）， respectively，compared to the sub-optimal method； on the WADI dataset， GPRCA improved 8.1%， 7.0%， and 11.0% on P@5， mAP@5， and MRR， respectively， compared to the sub-optimal method. On the Mine dataset， GPRCA improved 3.6% and 1.8% on mAP@3 and MRR， respectively， compared to the sub-optimal method. It can be seen that GPRCA method has the effectiveness and better performance than the baseline methods.

Interactive visualization method for multi-category urban spatiotemporal big data based on point aggregation

Shijiao LI, Boyang HAN, Chuishi MENG, Xiaolong ZHANG, Tianrui LI, Yu ZHENG

2025, 45(11): 3601-3608. DOI: 10.11772/j.issn.1001-9081.2024111590

Asbtract ( )

HTML ( )

PDF (1671KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the challenges of difficult visual management and inefficient localization in large-scale， multi-category urban spatiotemporal data， an interactive visualization method for multi-category urban spatiotemporal big data based on point aggregation was proposed. Firstly， two efficient aggregation methods were introduced： geographic location-based aggregation and geographic hierarchy-based aggregation， to meet the government personnel’s needs for efficient visual management across diverse scenarios. Secondly， on the basis of efficient point aggregation， a conditional parsing algorithm was proposed o enable real-time parsing and conversion of spatiotemporal conditions and category visibility， improving data localization efficiency. Finally， 270 000 pieces of urban entity data from Beijing were used to conduct geographically hierarchical parsing algorithms and aggregation interaction experiments. Experimental results demonstrate that the average processing time of the two proposed aggregation methods were reduced by approximately 69.66% and 63.15%， respectively， compared to K-means in different scenarios， confirming the efficiency and stability of data processing and aggregation services in the system during data storage. Besides， this aggregation application service has been successfully deployed in a city governance project， supporting million-scale urban data aggregation with proven applicability.

Review of JPEG recompression forensics for color images

Hao WANG, Jinwei WANG, Xin CHENG, Jiawei ZHANG, Hao WU, Xiangyang LUO, Bin MA

2025, 45(11): 3609-3620. DOI: 10.11772/j.issn.1001-9081.2024111614

Asbtract ( )

HTML ( )

PDF (2091KB) ( )

Figures and Tables | References | Related Articles | Metrics

The JPEG （Joint Photographic Experts Group） compression is one of the most widely used image compression standards and is involved in various forensic scenarios and security models such as image operation chain forensics， image source forensics， steganography， steganalysis， and JPEG anti-forensics. Researchers have conducted extensive studies on JPEG recompression forensics based on the characteristics of JPEG images， discovering that it not only provides prior knowledge for image forensics but also can be directly applied in forensic scenarios. Therefore， JPEG recompression forensics for color images was reviewed. Firstly， the research background of recompression forensics was introduced， and recompression forensics were classified into three types： non-aligned， aligned asynchronous， and aligned synchronous. Secondly， the basic knowledge required for recompression forensics was detailed， including the JPEG compression process， convergence error， error images， and algorithm evaluation metrics. Furthermore， existing methods for each type of problem were thoroughly reviewed and systematized. Additionally， since fields such as image steganography and adversarial examples involved robustness studies related to JPEG recompression， the applications of JPEG recompression features in these domains were highlighted， and common algorithms were compared to summarize their advantages and disadvantages. Finally， open challenges and future research directions in JPEG recompression forensics were discussed.

Active protection method for deep neural network model based on four-dimensional Chen chaotic system

Xintao DUAN, Mengru BAO, Yinhang WU, Chuan QIN

2025, 45(11): 3621-3631. DOI: 10.11772/j.issn.1001-9081.2024111583

Asbtract ( )

HTML ( )

PDF (1955KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep Neural Network （DNN）-based models have been widely applied due to their superior performance. However， training a powerful DNN model requires extensive datasets， expertise， computational resources， specialized hardware， and significant time investment. Unauthorized exploitation of such models could cause substantial losses to model owners. Aiming at the security and intellectual property issues of DNN models， an active protection method was proposed. The method employed a new comprehensive weight selection strategy to precisely identify critical weights within the model. Combining with the structural characteristics of the convolutional layer in DNN model， the four-dimensional Chen chaotic system was introduced for the first time on the basis of the three-dimensional chaotic system to scramble and encrypt a small number of weights in the convolutional layer. Meanwhile， to address the problem that authorized users cannot decrypt even with the key， an Elliptic Curve Cryptography （ECC）-based digital signature scheme was integrated for encryption models. After encryption， the weight positions and the initial values of chaotic sequence were combined to form an encryption key. Authorized users can use the key to correctly decrypt the DNN model， while unauthorized attackers cannot functionally use intercepted models even if acquired. Experimental results show that scrambling a minimal fraction of weight positions significantly degrades classification accuracy， and the decryption model can be restored without any loss. In addition， the method is resistant to fine-tuning and pruning attacks， and the obtained key has strong sensitivity and is resistant to brute force attacks. Furthermore， the experiments verify the method’s transferability， it is effective for image classification models， and can protect deep image steganography models and object detection models simultaneously.

Highly reliable matching method based on multi-dimensional resource measurement and rescheduling in computing power network

Lin WEI, Jinyang LI, Yajie WANG, Mengyang HE

2025, 45(11): 3632-3641. DOI: 10.11772/j.issn.1001-9081.2024111653

Asbtract ( )

HTML ( )

PDF (1211KB) ( )

Figures and Tables | References | Related Articles | Metrics

Computing Power Network （CPN） is a new network system that solves the contradiction between computing power supply and demand， network transmission problems， and the issue of universal access to computing resources. According to the supply capacity of computing power resource providers and the dynamic resource requirements of application demanders， the computing， storage， network and other multi-dimensional resources of the underlying computing power infrastructure in the region are integrated to provide users with personalized computing power resource services and realize efficient management and on-demand allocation of computing power resources. To enhance the utilization and reliability of CPN resource matching and scheduling， a highly reliable matching method was proposed， namely Resource Measurement and Rescheduling Matching Method （RMRMM）. To achieve high-utilization resource scheduling， RMRMM designed a resource measurement matching scheme based on entropy weighted Technique for Order Preference by Similarity to Ideal Solution （entropy weighted TOPSIS） method and Deep Reinforcement Learning （DRL）， comprehensively measured the Structural Feature Value （SFV）， computing power， storage capacity， and network communication capacity of the node， and narrowed the resource matching range to improve the matching accuracy and resource utilization. Additionally， RMRMM considered the failure of nodes due to attacks， and designed a rescheduling module based on the Adaptive Large Neighborhood Search （ALNS） algorithm. When matches failed， nodes and tasks were rescheduled to improve the acceptance rate of tasks and enhance the overall reliability. Simulation experimental results on OMNet++ platform demonstrate that average BandWidth （BW） utilization， average Random Access Memory （RAM） utilization， average STORAGE utilization， and task request reception rate of RMRMM reach 69.7%， 66.4%， 68.5%， and 75.5%， respectively. Both resource utilization and request reception rate of RMRMM outperform other matching strategies， improving the efficiency and reliability of RMRMM.

Capacitated vehicle routing problem solving method based on improved MAML and GVAE

Yanpeng ZHANG, Yuqian ZHAO, Fan ZHANG, Tenghai QIU, Gui GUI, Lingli YU

2025, 45(11): 3642-3648. DOI: 10.11772/j.issn.1001-9081.2024111589

Asbtract ( )

HTML ( )

PDF (779KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep Reinforcement Learning （DRL）-based vehicle routing planning methods have garnered significant attention for their rapid solving speed and end-to-end processing capabilities. However， most existing methods are limited to scenarios with uniformly distributed nodes and fixed node numbers， demonstrating performance degradation when handling unevenly distributed nodes or varying numbers of nodes. To address this issue， a meta-learning framework based on improved Model-Agnostic Meta-Learning （MAML） and Graph Variational AutoEncoder （GVAE） was proposed to obtain a well-initialized model through meta-training， and perform quick fine-tuning for out-of-distribution tasks， improving the model's generalization performance. Besides， a GVAE was employed for initializing parameters of the meta-learning framework to further enhance the effect of meta-learning. Experimental results show that the proposed method can handle Vehicle Routing Problems （VRPs） with different node distributions， performs well when dealing with varying numbers of nodes. The average gap across five tasks reduced by 0.45 percentage points compared to the method that does not use meta-learning. It can be seen that the proposed meta-learning framework enhances the effect of reinforcement learning， achieves comparable solution quality to state-of-the-art solvers while significantly shortening computation time.

Wavelet decomposition-based enhanced time delay awareness for traffic flow prediction

Lihu PAN, Menglin ZHANG, Guangrui FAN, Linliang ZHANG, Rui ZHANG

2025, 45(11): 3649-3657. DOI: 10.11772/j.issn.1001-9081.2024111602

Asbtract ( )

HTML ( )

PDF (996KB) ( )

Figures and Tables | References | Related Articles | Metrics

Traditional traffic flow prediction models often fail to effectively account for temporal delays across regions and time periods， and struggle to capture both short-term fluctuations and long-term trends in traffic flow. To address these limitations， a Wavelet Transform and Attention-based Latency-Aware long short Graph Neural Network （WTA-LAGNN） was proposed. Firstly， wavelet decomposition was applied to separate traffic flow data into long-term trend and short-term fluctuation patterns. Key features in the short-term fluctuation pattern were enhanced using a feature enhancement module， improving the model's sensitivity to short-term variations. For the long-term trend， a sequence-enhanced multi-head self-attention was designed to capture sustained changes in flow. To address temporal delay effects， a time delay-aware layer was designed to optimize spatio-temporal dependencies in traffic flow propagation between regions. Finally， the fusion layer outputted the prediction results. Experiments were conducted on real highway traffic datasets including PeMS03， PeMS04， PeMS07 and PeMS08 for 60-minute flow prediction. The results showed that compared to Spatio-Temporal Graph Neural Controlled Differential Equation（STG-NCDE）， WTA-LAGNN reduced the Mean Absolute Error （MAE） and Root Mean Square Error （RMSE） by 5.14% and 2.69%， as well as 5.80% and 2.69% on the PeMS03 and PeMS07 datasets， respectively； compared to Traffic Flow Matrix-based Graph Convolutional Attention Mechanism （TFM-GCAM）， WTA-LAGNN reduced the MAE and RMSE by 9.28% and 3.32% on PeMS08； compared to Spatio-Temporal Fusion Graph Convolutional Network（STFGCN）， WTA-LAGNN reduced the MAE and RMSE by 3.53% and 2.72% on PeMS04， respectively. These results demonstrate that WTA-LAGNN outperforms comparison baseline models， and can effectively capture spatio-temporal dependencies， thereby improving traffic flow prediction precision.

Prediction of drug-target interactions based on sequence and multi-view networks

Jiahao ZHANG, Qi WANG, Mingming LIU, Xiaofeng WANG, Biao HUANG, Pan LIU, Zhi YE

2025, 45(11): 3658-3665. DOI: 10.11772/j.issn.1001-9081.2024111664

Asbtract ( )

HTML ( )

PDF (1595KB) ( )

Figures and Tables | References | Related Articles | Metrics

Identifying Drug-Target Interactions （DTI） is a crucial step in drug repurposing and novel drug discovery. Currently， many sequence-based computational methods have been widely used for DTI prediction. However， previous sequence-based studies typically focus solely on the sequence itself for feature extraction， neglecting heterogeneous information networks such as drug-drug interaction networks and drug-target interaction networks. Therefore， a novel method for DTI prediction based on sequence and multi-view networks was proposed， namely SMN-DTI （prediction of Drug-Target Interactions based on Sequence and Multi-view Networks）. The Variational AutoEncoder （VAE） was used to learn the embedding matrices of drug SMILES （Simplified Molecular-Input Line-Entry System） strings and target amino acid sequences in this method. Subsequently， a Heterogeneous graph Attention Network （HAN） with two-level attention mechanism was used to aggregate information from different neighbors of drugs or targets in the networks from both node and semantic perspectives， obtaining the final embeddings. Two benchmark datasets widely used for DTI prediction， Hetero-seq-A and Hetero-seq-B， were used to evaluate SMN-DTI and the baseline methods. The results show that SMN-DTI achieves the best performance in Area Under the receiver operating Characteristic curve （AUC） and the Area Under the Precision-Recall curve （AUPR） under three different positive-and-negative sample ratios. It can be seen that SMN-DTI outperforms current mainstream advanced prediction methods.

Spatial-temporal Transformer-based hybrid return implicit Q-learning for crowd navigation

Shuai ZHOU, Hao FU, Wei LIU

2025, 45(11): 3666-3673. DOI: 10.11772/j.issn.1001-9081.2024111654

Asbtract ( )

HTML ( )

PDF (1343KB) ( )

Figures and Tables | References | Related Articles | Metrics

In crowded environments， robots typically utilize online reinforcement learning algorithms to perform crowd navigation tasks. However， the complex and dynamic characteristics of pedestrian movements significantly reduce the sample efficiency of online reinforcement learning. To address this issue， a Spatial-temporal Transformer-based Hybrid Return Implicit Q-Learning （STHRIQL） algorithm within Offline Reinforcement Learning （ORL） framework was proposed. Firstly， the Monte Carlo （MC） return mechanism was incorporated into the Implicit Q-Learning （IQL） algorithm to enhance the convergence of the learning process. Then， a spatial-temporal Transformer model was further integrated into the Actor-Critic framework， so as to effectively capture and analyze the highly dynamic and complex interactions between robots and pedestrians in offline crowd navigation datasets， thereby optimizing the training process and efficiency of the algorithm. Finally， simulation experiments were conducted to compare STHRIQL algorithm with existing online reinforcement learning-based crowd navigation algorithms， followed by quantitative and qualitative analyses based on evaluation metrics. Experimental results show that STHRIQL algorithm has superior performance in crowd navigation tasks， and improves sample efficiency by 30.5% - 55.8% compared to existing online crowd navigation algorithms. This indicates that the STHRIQL algorithm provides a new approach and solution for enhancing robot navigation capabilities in complex crowd environments.

Improved Q-learning-based algorithm for safe braking of intelligent vehicles in multiple scenarios

Xianwen ZHOU, Xiao LONG, Xinlei YU, Yilian ZHANG

2025, 45(11): 3674-3681. DOI: 10.11772/j.issn.1001-9081.2024111569

Asbtract ( )

HTML ( )

PDF (1060KB) ( )

Figures and Tables | References | Related Articles | Metrics

Addressing the safety issues of intelligent vehicles driving in mixed traffic flows， an improved Q-learning-based algorithm for safe braking of intelligent vehicles in multiple scenarios was proposed. Firstly， a three-vehicle model was established according to road conditions and vehicle parameters， with scenarios of braking， car-following， and lane-changing being simulated respectively. Secondly， linear programming was applied to the training data to ensure the possibility of safe braking for intelligent vehicles. At the same time， a reward function was designed to guide the agent to perform safe braking while striving to equalize the distances between the middle vehicle and both the preceding and following vehicles. Finally， an interval-block method was incorporated to handle continuous state space issues. Simulation and comparison experiments were conducted in braking， car-following， and lane-changing scenarios， the results show that compared with the traditional Q-learning algorithm， the proposed algorithm has the safety rate increased from 76.02% to 100.00%， and the total training time reduced to 69% of the traditional algorithm. It can be seen that the proposed algorithm has better safety and higher training efficiency， and can ensure safety while striving to equalize the distances between the middle vehicle and both the preceding and following vehicles in braking， car-following， and lane-changing scenarios.

Joint beamforming and power allocation for multi-IRS- assisted NOMA-SWIPT cooperative transmission model

Mengjia GE, Jing DAI, Yuchen LI, Lili PAN, Xiaorong JING

2025, 45(11): 3682-3691. DOI: 10.11772/j.issn.1001-9081.2024111643

Asbtract ( )

HTML ( )

PDF (920KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the challenges caused by the rapid development of the Fifth-Generation （5G） mobile Internet of Things （IoT）， including spectrum shortages， coverage blind zones， and power supply deficiencies， by integrating Intelligent Reflecting Surfaces （IRS）， Simultaneous Wireless Information and Power Transfer （SWIPT）， and Non-Orthogonal Multiple Access （NOMA） technologies， a multi-IRS-assisted NOMA-SWIPT cooperative transmission model was proposed. Based on this model， a non-convex optimization problem was formulated with the goal of maximizing system sum rate， and a two-stage joint beamforming （including active beamforming of base stations and passive beamforming of IRS） and power allocation optimization algorithm was then proposed to solve this problem. During the solution process， to decouple the variables to be optimized， a popular optimization method was first used to solve for the passive beamforming in the cooperative transmission phase. Then， the Block Coordinate Descent （BCD） algorithm was applied to decompose the non-convex problem into four sub-problems through alternating iteration， which were later solved using methods such as quadratic transformation and Successive Convex Approximation （SCA）. Simulation results show that compared with schemes of no SWIPT and Orthogonal Multiple Access （OMA）， the proposed cooperative transmission model and its corresponding optimization algorithm improves the system sum rate by approximately 0.5 bit·s^-1·Hz^-1 and 1.5 bit·s^-1·Hz^-1， respectively， and has more robust convergence characteristics.

Throughput maximization of UAV relaying system with multiple user pairs

Lili GUO, Xiaodong JI, Shibing ZHANG

2025, 45(11): 3692-3697. DOI: 10.11772/j.issn.1001-9081.2024110663

Asbtract ( )

HTML ( )

PDF (695KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the throughput of Unmanned Aerial Vehicle （UAV） relaying system with multiple user pairs， a UAV assisted amplify-and-forward relaying system was investigated， where a Full-Duplex （FD） fixed-wing UAV employed a Time-Division Multiple Access （TDMA） scheduling protocol to provide relay services for multiple source-destination user pairs. With the goal of maximizing the throughput of the system， a joint optimization problem of communication scheduling， the transmit power and trajectory of the UAV was formulated. Since the optimization variables of the problem were coupled， it was non-convex and hence hard to solve directly. To this end， the initial problem was decomposed into three subproblems corresponding to the optimization of communication scheduling， the optimization of transmit power and the optimization of trajectory of the UAV， respectively. On the basis of solving the three subproblems， an iterative algorithm based on the block coordinate descent technique was proposed to tackle the joint optimization problem by optimizing the three variables blocks alternately. Simulation results demonstrate that the throughput of the joint optimization scheme is at least 6.8% and 79.4% higher than that of two benchmark optimization schemes， respectively.

Binocular vision object localization algorithm for robot arm grasping

Changjiang JIANG, Jie XIANG, Xuying HE

2025, 45(11): 3698-3706. DOI: 10.11772/j.issn.1001-9081.2024111599

Asbtract ( )

HTML ( )

PDF (1975KB) ( )

Figures and Tables | References | Related Articles | Metrics

Recognizing the object and locating its spatial coordinates using machine vision algorithm is crucial for achieving visual grasping with robotic arms. Aiming at the problems of low localization accuracy and inefficient performance in binocular vision-based object recognition and localization， a BDS-YOLO （Binocular Detect and Stereo YOLO）-based binocular vision object localization algorithm for robotic arm grasping was proposed， which joints object detection and stereo depth estimation. The algorithm integrated object detection with stereo depth estimation algorithm， leveraging attention mechanisms for cross-view feature interaction to enhance feature representation. This enabled the network to obtain high-quality disparity maps through depth feature matching. After being further improved through self-attention mechanism， the disparity maps were converted into depth information using triangulation principle. BDS-YOLO network adopted multi-task learning to jointly train both object detection and stereo depth estimation networks using both synthetic and real-world data. To overcome the challenge of annotating dense depth for real data， self-supervised learning technology was applied to optimize the image reconstruction process from disparities， improving generalization ability of the BDS-YOLO network in real-world scenarios. Experimental results show that BDS-YOLO network achieves a 6.5 percentage points higher Average Precision （AP） in object detection compared to YOLOv8l on real-world dataset， outperforms specialized stereo depth estimation algorithm in disparity prediction and depth conversion， achieves an inference speed of over 20 frame/s， and surpasses comparative methods in both object recognition and localization. It can be seen that BDS-YOLO network can meet the requirements for real-time object detection and localization.

Multi-attention contrastive learning for infrared small target detection

Xiaoyong BIAN, Qiren HU

2025, 45(11): 3707-3712. DOI: 10.11772/j.issn.1001-9081.2024101554

Asbtract ( )

HTML ( )

PDF (904KB) ( )

Figures and Tables | References | Related Articles | Metrics

InfRared Small Target Detection （IRSTD） is a hotspot and suffers from difficulties in the field of target detection. IRSTD is difficult to learn accurate feature representation from limited and distorted information of small targets due to its characteristics of small pixels， low contrast and lacking texture， thus IRSTD methods still face many challenges. To address the above issue， a multi-attention contrastive learning based IRSTD method was proposed. Firstly， with U-Net adopted as the fundamental framework， a Context Mixer Block （CMB） that integrates Frequency Attention （FA） and spatial attention was proposed during the encoding phase to produce a preliminary attention feature map. Then， in the decoding phase， a Multi-Kernel Central Difference Convolution （MKCDC） was designed to extract the core information of small targets， which remained stable with different scales. Finally， by combining binary cross-entropy loss and contrastive loss functions， the small target detection network was trained， so that the feature representation ability of small targets was enhanced and a discriminative small target detection model was obtained. Experimental results show that the Probability of detection （Pd） of the proposed method on IRSTD-1k and NUAA-SIRST datasets reaches 96.63% and 100.00% respectively， which is improved by 4.71 and 1.90 percentage points， respectively， compared with Dense Nested Attention Network （DNA-Net）. It can be seen that the proposed method improves the performance of IRSTD effectively.

Human-object interaction detection algorithm by fusing local feature enhanced perception

Junyi LIN, Mingxuan CHEN, Yongbin GAO

2025, 45(11): 3713-3720. DOI: 10.11772/j.issn.1001-9081.2024111662

Asbtract ( )

HTML ( )

PDF (1324KB) ( )

Figures and Tables | References | Related Articles | Metrics

The core of Human-Object Interaction （HOI） detection is to identify humans and objects in the images and accurately classify their interactions， which is crucial for deepening scene understanding. However， existing algorithms struggle with complex interactions due to insufficient local information， leading to erroneous associations and difficulties in distinguishing fine-grained operations. To address this limitation， a Local Feature-enhanced Perceptual Module （LFPM） was designed to enhance the model's capability of capturing local feature information through the integration of local and non-local feature interactions. This module comprised three key components： the Downsampling Aggregation branch Module （DAM）， which acquired low-frequency features through downsampling and aggregated non-local structural information； the Fine-Grained Feature Branch （FGFB） module， which performed parallel convolution operations to supplement the DAM's local information extraction； and the Multi-Scale Wavelet Convolution （MSWC） module， which further optimized output features in spatial and channel dimensions for more precise and comprehensive feature representations. Additionally， to address the limitations of Transformer in local spatial and channel feature mining， a spatial and channel Squeeze and Excitation （scSE） module was introduced. This module allocated attention across spatial and channel dimensions， enhancing the model's sensitivity to locally salient regions and effectively improving HOI detection accuracy. Finally， the LFPM， scSE， and Transformer architectures were integrated to form the Local Feature Enhancement Perception model （LFEP） framework. Experimental results show that， compared with the SQA （Strong guidance Query with self-selected Attention） algorithm， LFEP framework achieves 1.1 percentage points improvement in Average Precision on the V-COCO dataset， and 0.49 percentage points improvement in mean Average Precision （mAP） on the HICO-DET dataset. Ablation experimental results also validate the effectiveness of each module of LFEP.

Unsupervised industrial anomaly detection based on denoising reverse distillation

Yuxuan LI, Bin CHEN, Weizhi XIAN

2025, 45(11): 3721-3729. DOI: 10.11772/j.issn.1001-9081.2024111594

Asbtract ( )

HTML ( )

PDF (1185KB) ( )

Figures and Tables | References | Related Articles | Metrics

Anomaly detection with localization capabilities is an important application of computer vision in industrial manufacturing. Recently， anomaly detection algorithms based on Reverse Distillation （RD） demonstrate good performance for this task. However， previous RD-based approaches only apply constraints to normal data， failing to ensure student network's feature reconstruction capability when dealing with anomalies. In addition， RD-based anomaly detection algorithms fuse multi-level difference information of the network based on experience solely， leading to suboptimal anomaly localization. To further enhance performance， an unsupervised industrial anomaly detection algorithm based on denoising RD， called DeRD， was proposed. It consists of an RD network with memory bank， an Multi-Scale Feature Denoising （MSFD） module， and a segmentation network. Firstly， to strengthen constraints on anomalous data， a MSFD module based on contrastive learning and multi-task learning was designed. Combined with a memory bank mechanism， this model enabled the student network to learn more discriminative feature representations. Secondly， a self-supervised segmentation network using synthetic anomalies was trained to adaptively integrate the feature difference information between multi-level teacher and student networks， thereby significantly improving anomaly localization performance. Experimental results on industrial detection benchmark datasets demonstrate the advanced performance of DeRD. It achieves an image-level AUC （Area Under the receiver operating characteristic Curve） of 98.8%， a pixel-level AUC of 98.48%， and a pixel-level Average Precision （AP） of 73.5%， surpassing comparison algorithms.

Frequency domain attention-based method for structural seismic response prediction

Maozu GUO, Zheng CUI, Lingling ZHAO, Qingyu ZHANG

2025, 45(11): 3730-3738. DOI: 10.11772/j.issn.1001-9081.2024111612

Asbtract ( )

HTML ( )

PDF (1166KB) ( )

Figures and Tables | References | Related Articles | Metrics

Existing methods struggle to accurately predict the structural response of buildings to dynamic loads， such as earthquakes， facing challenges such as the inability to effectively learn the cyclic variation of seismic waves and insufficient feature fusion. To address these challenges， a deep learning model for structural response prediction based on a frequency-domain attention mechanism was proposed. By combining the frequency-domain augmented attention mechanism with Gated Recurrent Units （GRUs）， the sparse nature of seismic wave time-series data in the frequency domain was exploited to mine its feature information deeply， and the high efficiency of GRU in time-series tasks was also retained， thereby enabling the efficient encoding of potential seismic wave features. Furthermore， a pyramid network structure with weight stacking was introduced to address the problem of training deep networks by facilitating shortcuts across layers. Additionally， an autoregressive prediction framework was proposed to enrich the feature space and enhance the prediction accuracy of the network by utilizing historical structural responses as auxiliary features. Experimental results of three case studies demonstrate that the proposed model outperforms existing approaches， such as the Residual Long Short-Term Memory （ResLSTM） network and the Physics-informed LSTM （PhyLSTM） network.

Design of a longitudinal trajectory tracking control law for somersault maneuver flight

Donghong ZHAO, Chuangxin ZHAO, Hewei NIE, Quhui ZHANG, Jiaqi YANG

2025, 45(11): 3739-3746. DOI: 10.11772/j.issn.1001-9081.2024111592

Asbtract ( )

HTML ( )

PDF (1965KB) ( )

Figures and Tables | References | Related Articles | Metrics

To enhance the combat capability of Unmanned Aerial Vehicles （UAVs） in the military field and address challenges such as significant aerodynamic parameter perturbations and strong nonlinear coupling during maneuvering flight， a longitudinal trajectory tracking control law design was proposed for a transonic/supersonic UAV， which integrated Robustness Servomechanism Linear Quadratic Regulator （RSLQR） fusion with Model Reference Adaptive Control （MRAC）. Specifically， a mathematical model of UAV was established， and based on the framework of RSLQR and MRAC， the pitch channel control law and engine channel control law were designed. A numerical simulation environment was then constructed using Matlab to simulate and analyze the Kulbit flight performance of the proposed control law from the perspectives of calibration， wind disturbance， and parameter perturbation conditions. Simulation results show that the designed control law can ensure robust performance of the somersault maneuver process， improve the ability to suppress disturbances， compensate the uncertainty in the system online and improve the response quality of the system. The law can be used in engineering applications.

Table of Content