Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Power work order classification in substation area based on MiniRBT-LSTM-GAT and label smoothing
Jiaxin LI, Site MO
Journal of Computer Applications    2025, 45 (4): 1356-1362.   DOI: 10.11772/j.issn.1001-9081.2024040533
Abstract42)   HTML5)    PDF (1024KB)(12)       Save

Record of power work orders in substation area serves as a reflection of substation operational conditions and user requirements, and is an important basis for establishing substation’s electricity safety management system and meeting the electricity demands of users. To address the issues of power work order classification in substation areas brought by high complexity and strong professionalism of the orders, a power work order classification in substation area, Mini RoBERTa-Long Short-Term Memory-Graph Attention neTwork (MiniRBT-LSTM-GAT) was proposed. Label Smoothing (LS) and a pre-trained language model were integrated by the proposed model. Firstly, a pre-trained model was utilized to calculate the character-level feature vector representation in the power work order text. Secondly, Bidirectional Long Short-Term Memory (BiLSTM) network was employed to capture the dependency within the power text sequence. Thirdly, Graph Attention neTwork (GAT) was applied to emphasize the feature information that contributes to text classification significantly. Finally, LS was used to modify the loss function, so as to improve the classification accuracy of the model. The proposed model was compared with mainstream text classification algorithms on Power Work Order dataset in Rural power Station area (RSPWO), 95598 Power Work Order dataset in ZheJiang province (ZJPWO), and THUCNews (TsingHua University Chinese News) dataset. Experimental results show that compared with Bidirectional Encoder Representations from Transformers (BERT) model for Electric Power Audit Text classification (EPAT-BERT), the proposed model has an increase of 2.76 percentage points in precision and 2.02 percentage points in F1 value on RSPWO, and has an increase of 1.77 percentage points in precision and 1.40 percentage points in F1 value on ZJPWO. In comparison with capsule network based on BERT and dependency syntax (BRsyn-caps), the proposed model has an increase of 0.76 percentage points in precision and 0.71 percentage points in accuracy on THUCNews dataset. The above confirms the effectiveness of the proposed model in enhancing the classification performance of power work orders in substation area, and the good performance of the proposed model on THUCNews dataset, verifying the generality of the model.

Table and Figures | Reference | Related Articles | Metrics
Stability analysis of nonlinear time-delay system based on memory-based saturation controller
Chao GE, Shuiqing YE, Hong WANG, Zheng YAO
Journal of Computer Applications    2025, 45 (4): 1349-1355.   DOI: 10.11772/j.issn.1001-9081.2024030406
Abstract41)   HTML2)    PDF (1231KB)(11)       Save

Exponential stability of nonlinear systems with time delay under the action of memory-based saturation controller was studied. Firstly, the factors of system parameter uncertainty were considered. Secondly, the polytopic method with distributed time-delay term auxiliary feedback was used to deal with the saturated nonlinearity. At the same time, an augmented Lyapunov-Krasovskii functional was established, and the integral terms were scaled by the improved integral inequality, thus the stability criteria based on Linear Matrix Inequalities (LMI) were derived. In addition, an attraction domain optimization scheme with less conservativeness was developed to increase the upper bound of the attraction domain. Finally, a simulation example was given to prove the effectiveness and practicability of the proposed scheme. Experimental results show that compared with the existing attraction domain optimization scheme without memory-based controller, the proposed attraction domain optimization scheme with memory-based controller is less conservative.

Table and Figures | Reference | Related Articles | Metrics
Construction and application of knowledge graph for epidemiological investigation
Zixin XU, Xiuwen YI, Jie BAO, Tianrui LI, Junbo ZHANG, Yu ZHENG
Journal of Computer Applications    2025, 45 (4): 1340-1348.   DOI: 10.11772/j.issn.1001-9081.2024040479
Abstract37)   HTML2)    PDF (5297KB)(11)       Save

Major sudden infectious diseases are often characterized by high infectivity, rapid mutation and significant risk, which pose substantial threats to human life security and economic development. Epidemiological investigation is a crucial step in curbing the spread of infectious diseases and are prerequisites for implementing precise full-chain infection prevention and control measures. Existing epidemiological investigation systems have many shortcomings, such as manual inefficiencies, poor data quality, and lack of specialized knowledge. To address these defects, a set of technological application schemes were proposed to assist in epidemiological investigation based on the existing digitization combined with knowledge graph. Firstly, a knowledge graph was constructed on the basis of the five categories of entities: people, locations, events, items, and organizations, as well as their relationships and attributes. Secondly, following the idea of identifying risk points and tracing to close contacts based on cases, cases were used as the starting point with points as the focuses to aid in determining at-risk populations and points risk. Finally, through the visual analysis of epidemiological investigation data, several applications were implemented, including information placement in epidemiological investigation, tracing of the spread and propagation, and the awareness of epidemic situations, so as to assist in the successful implementation of major sudden infectious disease prevention and control work. Within the same error range, the accuracy of the graph enhancement-based trajectory placement method is significantly higher than that of the traditional manual inquiry-based method, with the determination accuracy within one kilometer reached 85.15%; the graph enhancement-based method for determining risk points and populations improves the efficiency significantly, reducing the average time to generate reports to within 1 h. Experimental results demonstrate that the proposed scheme integrates the technical advantages of knowledge graph effectively, improves the scientific nature and effectiveness of precise epidemic prevention and control strategy formulation, and provides important reference value for practical exploration in the field of infectious disease prevention.

Table and Figures | Reference | Related Articles | Metrics
Cervical cell nucleus image segmentation based on multi-scale guided filtering
Xinyao LINGHU, Yan CHEN, Pengcheng ZHANG, Yi LIU, Zhiguo GUI, Wei ZHAO, Zhanhao DONG
Journal of Computer Applications    2025, 45 (4): 1333-1339.   DOI: 10.11772/j.issn.1001-9081.2024040546
Abstract39)   HTML1)    PDF (2232KB)(14)       Save

Aiming at the problems such as lack of contextual information connection, inaccurate and low-precision segmentation of cervical cell nucleus images, a cervical cell nucleus segmentation network named DGU-Net (Dense-Guided-UNet) was proposed on the basis of improved U-net combined with dense block and U-shaped convolutional multi-scale guided filtering module, which could segment cervical cell nucleus images more completely and accurately. Firstly, the U-net model with encoder and decoder structures was used as backbone of the network to extract image features. Secondly, the dense block module was introduced to connect the features between different layers, so as to realize transmission of contextual information, thereby enhancing feature extraction ability of the model. Meanwhile, the multi-scale guided filtering module was introduced after each downsampling and before each upsampling to introduce obvious edge detail information in the grayscale guided image for enhancement of the image details and edge information. Finally, a side output layer was added to each decoder path, so as to fuse and average all the output feature information, thereby fusing the feature information of different scales and levels to increase accuracy and completeness of the results. Experiments were conducted on Herlev dataset and the proposed network was compared with three deep learning models: U-net, Progressive Growing of U-net+ (PGU-net+), and Lightweight Feature Attention Network (LFANet). Results show that compared with PGU-net+, DGU-Net increases the accuracy by 70.06%; compared with LFANet, DGU-Net increases the Intersection-over-Union (IoU) by 6.75%. It can be seen that DGU-Net is more accurate in processing edge detail information, and outperforms the comparison models in segmentation indicators generally.

Table and Figures | Reference | Related Articles | Metrics
Multi-scale 2D-Adaboost microscopic image recognition algorithm of Chinese medicinal materials powder
Yiding WANG, Zehao WANG, Yaoli LI, Shaoqing CAI, Yuan YUAN
Journal of Computer Applications    2025, 45 (4): 1325-1332.   DOI: 10.11772/j.issn.1001-9081.2024040438
Abstract37)   HTML1)    PDF (3858KB)(8)       Save

A multi-scale 2D-Adaboost algorithm was proposed to solve the problem that the microscopic images of Chinese medicinal materials powder contain a large number of fine features and background interference factors, which leads to excessive changes in the same medicinal materials (large differences within the class) and too similar features among various medicinal materials (small differences between the classes). Firstly, a global-local feature fusion backbone network architecture was constructed to extract multi-scale features better. By combining the advantages of Transformer and Convolutional Neural Network (CNN), this architecture was able to extract and fuse global and local features at various scales effectively, thereby improving the feature capture capability of the backbone network significantly. Secondly, the single-scale output of Adaboost was extended to multi-scale output, and a 2D-Adaboost structure-based background suppression module was constructed. With this module, the output feature maps of each scale of the backbone network were divided into foreground and background, thereby suppressing feature values of the background region effectively and enhancing the strength of discriminative features. Finally, an extra classifier was added to each scale of the 2D-Adaboost structure to build a feature refinement module, which coordinated the collaborative learning among the classifiers by controlling temperature parameters, thereby refining the feature maps of different scales gradually, helping the network to learn more appropriate feature scales, and enriching the detailed feature representation. Experimental results show that the recognition accuracy of the proposed algorithm reaches 96.85%, which is increased by 7.56, 5.26, 3.79 and 2.60 percentage points, respectively, compared with those of ConvNeXt-L, ViT-L, Swin-L, and Conformer-L models. The high accuracy and stability of the classification validate the effectiveness of the proposed algorithm in classification tasks of Chinese medicinal materials powder microscopic images.

Table and Figures | Reference | Related Articles | Metrics
YOLOv5s-MRD: efficient fire and smoke detection algorithm for complex scenarios based on YOLOv5s
Yang HOU, Qiong ZHANG, Zixuan ZHAO, Zhengyu ZHU, Xiaobo ZHANG
Journal of Computer Applications    2025, 45 (4): 1317-1324.   DOI: 10.11772/j.issn.1001-9081.2024040527
Abstract84)   HTML3)    PDF (4304KB)(62)       Save

Current fire and smoke detection methods mainly rely on site inspection by staff, which results in low efficiency and poor real-time performance, so an efficient fire and smoke detection algorithm for complex scenarios based on YOLOv5s, called YOLOv5s-MRD (YOLOv5s-MPDIoU-RevCol-Dyhead), was proposed. Firstly, the MPDIoU (Maximized Position-Dependent Intersection over Union) method was employed to modify the border loss function, thereby enhancing the accuracy and efficiency of Bounding Box Regression (BBR) by adapting to BBR in overlapping or non-overlapping scenarios. Secondly, the RevCol (Reversible Column) network model concept was applied to reconstruct the backbone of YOLOv5s, transforming it into a backbone network with multi-column network architecture. At the same time, by incorporating reversible links across various layers of the model, so that the retention of feature information was maximized, thereby improving the network’s feature extraction capability. Finally, with the integration of Dynamic head detection heads, scale awareness, spatial awareness, and task awareness were unified, thereby improving detection heads’ accuracy and effectiveness significantly without additional computational cost. Experimental results demonstrate that on DFS (Data of Fire and Smoke) dataset, compared to the original YOLOv5s algorithm, the proposed algorithm achieves a 9.3% increase in mAP@0.5 (mean Average Precision), a 6.6% improvement in prediction accuracy, and 13.8% increase in recall. It can be seen that the proposed algorithm can meet the requirements of current fire and smoke detection application scenarios.

Table and Figures | Reference | Related Articles | Metrics
Remote sensing image building extraction network based on dual promotion of semantic and detailed features
Yang ZHOU, Hui LI
Journal of Computer Applications    2025, 45 (4): 1310-1316.   DOI: 10.11772/j.issn.1001-9081.2024030387
Abstract60)   HTML5)    PDF (3171KB)(22)       Save

Accurate edge information extraction is crucial for building segmentation. Current approaches often simply fuse multi-scale detailed features with semantic features or design complex loss functions to guide the network’s focus on edge information, ignoring the mutual promotion effect between semantic and detailed features. To address these issues, a remote sensing image building extraction network based on dual promotion of semantic and detailed features was developed. The structure of the proposed network was similar to the framework of U-Net. The shallow high-resolution detailed feature maps were extracted in the encoder, and the deep Semantic and Detail Feature dual Facilitation module(SDFF) was embedded in the backbone network in the decoder, so as to enable the network to have both good semantic and detail feature extraction capabilities. After that, channel fusion was performed on semantic and detailed features, and combined with edge loss supervision of images with varying resolutions, the ability to extract building details and the generalization of the network were enhanced. Experimental results demonstrate that compared to various mainstream methods such as U-Net and Dual-Stream Detail-Concerned Network (DSDCNet), the proposed network achieves superior semantic segmentation results on WHU and Massachusetts buildings (Massachusetts) datasets, showing better preservation of building edge features and effective improvement of building segmentation accuracy in remote sensing images.

Table and Figures | Reference | Related Articles | Metrics
Video anomaly detection for moving foreground regions
Lihu PAN, Shouxin PENG, Rui ZHANG, Zhiyang XUE, Xuzhen MAO
Journal of Computer Applications    2025, 45 (4): 1300-1309.   DOI: 10.11772/j.issn.1001-9081.2024040519
Abstract29)   HTML3)    PDF (2907KB)(5)       Save

Imbalance in data distribution between static background information and moving foreground objects often leads to insufficient learning of abnormal foreground region information, thereby affecting the accuracy of Video Anomaly Detection (VAD). To address this issue, a Nested U-shaped Frame Predictive Generative Adversarial Network (NUFP-GAN) was proposed for VAD. In the proposed method, a nested U-shaped frame prediction network architecture, which had the capability to highlight significant targets in video frames, was utilized as the frame prediction module. In the discrimination phase, a self-attention patch discriminator was designed to extract more important appearance and motion features from video frames using receptive fields of different sizes, thereby enhancing the accuracy of anomaly detection. Additionally, to ensure the consistency of multi-scale features of predicted frames and real frames in high-level semantic information, a multi-scale consistency loss was introduced to further improve the method’s anomaly detection performance. Experimental results show that the proposed method achieves the Area Under Curve (AUC) values of 87.6%, 85.2%, 96.0%, and 73.3%, respectively, on CUHK Avenue, UCSD Ped1, UCSD Ped2, and ShanghaiTech datasets; on ShanghaiTech dataset, the AUC value of the proposed method is 1.8 percentage points higher than that of MAMC (Memory-enhanced Appearance-Motion Consistency) method. It can be seen that the proposed method can meet the challenges brought by data distribution imbalance in VAD effectively.

Table and Figures | Reference | Related Articles | Metrics
3D hand pose estimation combining attention mechanism and multi-scale feature fusion
Shiyue GUO, Jianwu DANG, Yangping WANG, Jiu YONG
Journal of Computer Applications    2025, 45 (4): 1293-1299.   DOI: 10.11772/j.issn.1001-9081.2024040507
Abstract33)   HTML0)    PDF (2747KB)(20)       Save

To address the problem of inaccurate 3D hand pose estimation from a single RGB image due to occlusion and self-similarity, a 3D hand pose estimation network combining attention mechanism and multi-scale feature fusion was proposed. Firstly, Sensory Enhancement Module (SEM) was proposed, which combined dilated convolution and CBAM (Convolutional Block Attention Module) attention mechanism, and it was used to replace the Basicblock of HourGlass Network (HGNet) to expand the receptive field and enhance the sensitivity to spatial information, so as to improve the ability of extracting hand features. Secondly, a multi-scale information fusion module SS-MIFM (SPCNet and Soft-attention-Multi-scale Information Fusion Module) combining SPCNet (Spatial Preserve and Content-aware Network) and Soft-Attention enhancement was designed to aggregate multi-level features effectively and improve the accuracy of 2D hand keypoint detection significantly with full consideration of the spatial content awareness mechanism. Finally, a 2.5D pose conversion module was proposed to convert 2D pose into 3D pose, thereby avoiding the problem of spatial loss caused by the direct regression of 2D keypoint coordinates to calculate 3D pose information. Experimental results show that on InterHand2.6M dataset, the two?hand Mean Per Joint Position Error (MPJPE), the single?hand MPJPE, and the Mean Relative-Root Position Error (MRRPE) of the proposed algorithm reach 12.32, 9.96 and 29.57 mm, respectively; on RHD (Rendered Hand pose Dataset), compared with InterNet and QMCG-Net algorithms, the proposed algorithm has the End-Point Error (EPE) reduced by 2.68 and 0.38 mm, respectively. The above results demonstrate that the proposed algorithm can estimate hand pose more accurately and is more robust in some two-hand interaction and occlusion scenarios.

Table and Figures | Reference | Related Articles | Metrics
Gait recognition method based on dilated reparameterization and atrous convolution architecture
Lina HUO, Leren XUE, Yujun DAI, Xinyu ZHAO, Shihang WANG, Wei WANG
Journal of Computer Applications    2025, 45 (4): 1285-1292.   DOI: 10.11772/j.issn.1001-9081.2024050566
Abstract73)   HTML0)    PDF (1928KB)(29)       Save

Gait recognition aims at identifying people by their walking postures. To solve the problem of poor matching between the Effective Receptive Field (ERF) and the human silhouette region, a gait recognition method based on atrous convolution, named DilatedGait, was proposed. Firstly, atrous convolution was employed to expand the neurons’ receptive fields, thereby alleviating the resolution degradation by downsampling and model deepening. Therefore, the recognizability of the silhouette structure was enhanced. Secondly, Dilated Reparameterization Module (DRM) was proposed to optimize the ERF focus range by fusing the multi-scale convolution kernel parameters through reparameterization method, thus enabling the model to capture more global contextual information. Finally, the discriminative gait features were extracted via feature mapping. Experiments were conducted on the outdoor datasets Gait3D and GREW, and the results show that compared with the existing state-of-the-art method GaitBase, DilatedGait improves 9.0 and 14.2 percentage points respectively in Rank-1 and mean Inverse Negative Penalty (mINP) on Gait3D and increases 11.6 and 8.8 percentage points respectively in Rank-1 and Rank-5 on GREW. It can be seen that DilatedGait overcomes the adverse effects of complex covariates and further enhances the accuracy of gait recognition in outdoor scenes.

Table and Figures | Reference | Related Articles | Metrics
Moving pedestrian detection neural network with invariant global sparse contour point representation
Qingqing ZHAO, Bin HU
Journal of Computer Applications    2025, 45 (4): 1271-1284.   DOI: 10.11772/j.issn.1001-9081.2024040561
Abstract36)   HTML0)    PDF (7106KB)(7)       Save

As pedestrians are non-rigid objects, effective invariant representation of their visual features is the key to improving recognition performance. In natural visual scenes, moving pedestrians often undergo changes in scale, background, and pose, which creates obstacles for existing techniques for extracting these irregular features. The issue was addressed by exploring the problem of invariant recognition of moving pedestrians based on the neural structural characteristics of mammalian retinas, and a Moving Pedestrian Detection Neural Network (MPDNN) was proposed for visual scenes. MPDNN was composed of two neural modules: the presynaptic network and the postsynaptic network. The presynaptic network was used to perceive low-level visual motion cues representing the moving object and extract the object’s binarized visual information, and the postsynaptic network was utilized to take advantage of the sparse invariant response properties in the biological visual system and use the invariant relationship between large concave and convex regions of the object’s contour after continuous shape changes, then, stably changed visual features were encoded from low-level motion cues to build invariant representations of pedestrians. Experimental results show that MPDNN achieves a 96.96% cross-domain detection accuracy on the public datasets CUHK Avenue and EPFL, which is 4.52 percentage points higher than the SOTA (State of the Art) model; MPDNN demonstrates good robustness on scale and motion posture variation datasets, with accuracy of 89.48% and 91.45%, respectively. The effectiveness of the biological invariant object recognition mechanism in moving pedestrian detection was validated by the above experimental results.

Table and Figures | Reference | Related Articles | Metrics
Protocol conversion method based on semantic similarity
Dingmu YANG, Longqiang NI, Jing LIANG, Zhaoyuan QIU, Yongzhen ZHANG, Zhiqiang QI
Journal of Computer Applications    2025, 45 (4): 1263-1270.   DOI: 10.11772/j.issn.1001-9081.2024040534
Abstract27)   HTML1)    PDF (2168KB)(6)       Save

Protocol conversion is usually used to solve the problem of data interaction between different protocols, and its nature is to find mapping relationship between different protocol fields. In the traditional methods of protocol conversion, several drawbacks are identified: traditional conversions are mainly designed on the basis of specific protocols, so that they are static and lack flexibility, and are not suitable for environments with multi-protocol conversion; whenever a protocol changes, a reanalysis of the protocol’s structure and semantic fields is required to reconstruct the mapping relationship between fields, leading to an exponential increase in workload and a decrease in protocol conversion efficiency. Therefore, a general method of protocol conversion based on semantic similarity was proposed to enhance protocol conversion efficiency by exploring the relationship between fields intelligently. Firstly, the BERT (Bidirectional Encoder Representations from Transformers) model was employed to classify the protocol fields, and eliminate the fields that “should not” have mapping relationship. Secondly, the semantic similarities between fields were computed to reason the mapping relationship between fields, resulting in the formation of a field mapping table. Finally, a general framework for protocol conversion based on semantic similarity was introduced, and related protocols were defined for validation. Simulation results show that the precision of field classification of the proposed method reaches 94.44%; and the precision of mapping relationship identification of the proposed method reaches 90.70%, which is 13.93% higher than that of the method based on knowledge extraction. The above results verify that the proposed method is feasible, can identify the mapping relationships between different protocol fields quickly, and is suitable for scenarios with multi-protocol conversion in unmanned collaboration.

Table and Figures | Reference | Related Articles | Metrics
Joint beamforming and power allocation in RIS-assisted multi-cluster NOMA-DFRC system
Yuchen LI, Junyi WU, Mengjia GE, Lili PAN, Xiaorong JING
Journal of Computer Applications    2025, 45 (4): 1256-1262.   DOI: 10.11772/j.issn.1001-9081.2024040530
Abstract31)   HTML0)    PDF (3211KB)(8)       Save

Facing the higher demands for communication and sensing in upcoming Dual-Function Radar Communication (DFRC) systems, a DFRC system model was proposed that combines multi-cluster Non-Orthogonal Multiple Access (NOMA) technology and Reconfigurable Intelligent Surface (RIS). In the proposed model, the superimposed multi-cluster NOMA signals were utilized by the DFRC base stations to achieve target perception and the virtual line-of-sight links established by RIS reflection were used to enhance the communication performance of users in multi-cluster NOMA. Based on the proposed model, with the goal of maximizing weighted sum of the system sum rate and the sensing power, a non-convex objective function with multiple constraints and coupled variables was constructed. To solve this objective function, an optimization scheme for joint beamforming and power allocation was proposed. In the proposed scheme, firstly, the original optimization problem was decomposed into three subproblems. Subsequently, methods such as Successive Convex Approximation (SCA) and SemiDefinite Relaxation (SDR) were employed to transform the original non-convex optimization subproblems into convex optimization subproblems. Finally, the Alternating Optimization (AO) method was applied to solve the subproblems, thereby achieving joint beamforming (including active and passive beamforming) and intra-cluster power allocation coefficient optimization. Simulation results indicate that the proposed scheme has good performance of communication and sensing, and compared with the Orthogonal Multiple Access (OMA) scheme, it has the system sum rate improved by about 1 bit/(s·Hz) with high target perception performance, achieving a good compromise between communication performance and perception performance.

Table and Figures | Reference | Related Articles | Metrics
Post-quantum certificateless public audit scheme based on lattice
Haifeng MA, Jiewei CAI, Qingshui XUE, Jiahai YANG, Jing HAN, Zixuan LU
Journal of Computer Applications    2025, 45 (4): 1249-1255.   DOI: 10.11772/j.issn.1001-9081.2024050605
Abstract46)   HTML1)    PDF (1220KB)(13)       Save

Periodic audit of data stored on cloud servers is a core strategy to ensure the security and integrity of cloud-stored data. It can identify and address the risks of data tampering or loss effectively. However, traditional public audit schemes suffer from issues such as certificate management or key escrow, leading to privacy leak problem during data querying and dynamic modification. Furthermore, with the continuous development of quantum computing technology, public audit schemes based on traditional public key systems face serious threats of being cracked by quantum computers. To address the above issues, a post-quantum certificateless public audit scheme based on lattice was proposed. Firstly, a certificateless public key cryptosystem was used to solve the certificate management and key escrow problems in traditional public audit schemes. Secondly, during data querying and dynamic modification processes, Data Owners (DO) were not required to provide specific data block information, thereby ensuring the privacy of the DO. Finally, lattice cryptography technology was employed to resist attacks from quantum computers. Theoretical analysis and experimental comparison results demonstrate that the proposed scheme can resist malicious attacks while ensuring the privacy of DO operations, and it achieves higher efficiency in label generation.

Table and Figures | Reference | Related Articles | Metrics
Secure cluster control of UAVs under DoS attacks based on APF and DDPG algorithm
Bingquan LIN, Lei LIU, Huafeng LI, Chen LIU
Journal of Computer Applications    2025, 45 (4): 1241-1248.   DOI: 10.11772/j.issn.1001-9081.2024040464
Abstract40)   HTML2)    PDF (4132KB)(14)       Save

Addressing the issues of communication obstruction and unpredictable motion trajectories of Unmanned Aerial Vehicles (UAVs) under Denial of Service (DoS) attacks, research was conducted on the secure cluster control strategy for multi-UAV during DoS attacks within a framework that integrates Artificial Potential Field (APF) and Deep Deterministic Policy Gradient (DDPG) algorithm. Firstly, Hping3 was utilized to detect DoS attacks on all UAVs, thereby determining the network environment of the UAV cluster in real time. Secondly, when no attack was detected, the traditional APF was employed for cluster flight. After detecting attacks, the targeted UAVs were marked as dynamic obstacles while other UAV switched to control strategies generated by DDPG algorithm. Finally, with the proposed framework, the cooperation and advantage complementary of APF and DDPG were realized, and the effectiveness of the DDPG algorithm was validated through simulation in Gazebo. Simulation results indicate that Hping3 can detect the UAVs under attack in real time, and other normal UAVs can avoid obstacles stably after switching to DDPG algorithm, so as to ensure cluster security; the success rate of the switching obstacle avoidance strategy during DoS attacks is 72.50%, significantly higher than that of the traditional APF (31.25%), and the switching strategy converges gradually, demonstrating a pretty stability; the trained DDPG obstacle avoidance strategy exhibits a degree of generalization, capable of completing tasks stably with 1 to 2 unknown obstacles appeared in the environment.

Table and Figures | Reference | Related Articles | Metrics
Framework and implementation of network data security protection based on zero trust
Zuoguang WANG, Chao LI, Li ZHAO
Journal of Computer Applications    2025, 45 (4): 1232-1240.   DOI: 10.11772/j.issn.1001-9081.2024040526
Abstract33)   HTML1)    PDF (3893KB)(22)       Save

In order to address the failure of boundary protection measures caused by the evolution with complexity, dynamics and fragmentation of network architecture, and to cope with the challenge for network data security caused by the continuous emergence of vulnerabilities in non-autonomous controllable systems, software, hardware and cryptographic algorithms, the following tasks were performed. Firstly, a zero trust network architecture implementation model was designed on the basis of zero trust concept. Secondly, a zero trust network security protection framework was proposed, which integrated concept of zero trust security, Chinese cryptographic algorithm system, and trusted computing technology in links such as identity management and authentication, authorization and access, data processing and transmission, framework processes such as Chinese cryptographic certificate application and issuance, business data secure processing and transmission were designed, and functional components such as identity and access management module, terminal trusted network access proxy device were designed and implemented. Finally, a network platform based on the security protection framework was built, which provided new frameworks, technologies and tools for network data security protection and zero trust security practices. Security analysis and performance test results show that with the proposed platform, the signing and signature verification performance of the SM2 reaches 1 118.72 and 441.43 times per second respectively, the encryption and decryption performance of SM4 reaches 10.05 MB/s and 9.96 MB/s respectively, and the secure data access/response performance reaches 7.23 MB/s, demonstrating that the proposed framework can provide stable support for data security.

Table and Figures | Reference | Related Articles | Metrics
Multi-behavior recommendation based on cascading residual graph convolutional network
Weichao DANG, Chujun SONG, Gaimei GAO, Chunxia LIU
Journal of Computer Applications    2025, 45 (4): 1223-1231.   DOI: 10.11772/j.issn.1001-9081.2024040461
Abstract41)   HTML1)    PDF (2164KB)(15)       Save

A Multi-Behavior Recommendation based on Cascading Residual graph convolutional network (CRMBR) model was proposed to address the problems of data sparsity and neglecting the complex connections among multiple behaviors in multi-behavior recommendation research. Firstly, the global embeddings of users and items were learned from a unified isomorphic graph constructed from the interactions of all behaviors and used as initialization embeddings. Secondly, the embeddings of different types of behaviors were refined continuously to improve the user preferences by capturing the connections among different behaviors through cascading residual blocks. Finally, user and item embeddings were aggregated through two different aggregation strategies, respectively, and optimized using Multi-Task Learning (MTL). Experimental results on several real datasets show that the recommendation performance of CRMBR model is better than that of the current mainstream models. Compared with the advanced benchmark model — Multi-Behavior Hierarchical Graph Convolutional Network (MB-HGCN), the proposed model has the Hit Rate (HR@20) and Normalized Discount Cumulative Gain (NDCG@20) improved by 3.1% and 3.9% on Tmall dataset, increased by 15.8% and 16.9% on Beibei dataset, and improved by 1.0% and 3.3% on Jdata dataset, respectively, which validates the effectiveness of the proposed model.

Table and Figures | Reference | Related Articles | Metrics
Developer recommendation for open-source projects based on collaborative contribution network
Lan YOU, Yuang ZHANG, Yuan LIU, Zhijun CHEN, Wei WANG, Xing ZENG, Zhangwei HE
Journal of Computer Applications    2025, 45 (4): 1213-1222.   DOI: 10.11772/j.issn.1001-9081.2024040454
Abstract36)   HTML0)    PDF (4564KB)(11)       Save

Recommending developers for open-source projects is of great significance to the construction of open-source ecology. Different from traditional software development, developers, projects, organizations and correlations in the open-source field reflect the characteristics of open collaborative projects, and their embedded semantics help to recommend developers accurately for open-source projects. Therefore, a Developer Recommendation method based on Collaborative Contribution Network (DRCCN) was proposed. Firstly, a CCN was constructed by utilizing the contribution relationships among Open-Source Software (OSS) developers, OSS projects and OSS organizations. Then, based on CCN, a three-layer deep heterogeneous GraphSAGE (Graph SAmple and aggreGatE) Graph Neural Network (GNN) model was constructed to predict the links between developer nodes and open-source project nodes, so as to generate the corresponding embedding pairs. Finally, according to the prediction results, the K-Nearest Neighbor (KNN) algorithm was adopted to complete the developer recommendation. The proposed model was trained and tested on GitHub dataset, and the experimental results show that compared to the contrastive learning model for sequential recommendation CL4SRec (Contrastive Learning for Sequential Recommendation), DRCCN improves the precision, recall, and F1 score by approximately 10.7%, 2.6%, and 4.2%, respectively. It can be seen that the proposed model can provide important reference for the developer recommendation of open-source community projects.

Table and Figures | Reference | Related Articles | Metrics
Group recommendation model by graph neural network based on multi-perspective learning
Cong WANG, Yancui SHI
Journal of Computer Applications    2025, 45 (4): 1205-1212.   DOI: 10.11772/j.issn.1001-9081.2024030337
Abstract88)   HTML1)    PDF (2528KB)(77)       Save

Focusing on the problem that it is difficult for the existing group recommendation models based on Graph Neural Networks (GNNs) to fully utilize explicit and implicit interaction information, a Group Recommendation by GNN based on Multi-perspective learning (GRGM) model was proposed. Firstly, hypergraphs, bipartite graphs, as well as hypergraph projections were constructed according to the group interaction data, and the corresponding GNN was adopted aiming at the characteristics of each graph to extract node features of the graph, thereby fully expressing the explicit and implicit relationships among users, groups, and items. Then, a multi-perspective information fusion strategy was proposed to obtain the final group and item representations. Experimental results on Mafengwo, CAMRa2011, and Weeplases datasets show that compared to the baseline model ConsRec, GRGM model improves the Hit Ratio (HR@5, HR@1) and Normalized Discounted Cumulative Gain (NDCG@5, NDCG@10) by 3.38%, 1.96% and 3.67%, 3.84%, respectively, on Mafengwo dataset, 2.87%, 1.18% and 0.96%, 1.62%, respectively, on CAMRa2011 dataset, and 2.41%, 1.69% and 4.35%, and 2.60%, respectively, on Weeplaces dataset. It can be seen that GRGM model has better recommendation performance compared with the baseline models.

Table and Figures | Reference | Related Articles | Metrics
Tibetan word segmentation system based on pre-trained model tokenization reconstruction
Jie YANG, Tashi NYIMA, Dongrub RINCHEN, Jindong QI, Dondrub TSHERING
Journal of Computer Applications    2025, 45 (4): 1199-1204.   DOI: 10.11772/j.issn.1001-9081.2024040442
Abstract32)   HTML1)    PDF (1442KB)(10)       Save

To address poor performance of the existing pre-trained model in Tibetan segmentation tasks, a method was proposed to establish a tokenization reconstruction standard to regulate the constraint text, and subsequently reconstruct the tokenization of the Tibetan pre-trained model to perform Tibetan segmentation tasks. Firstly, the standardization operation was performed on the original text to solve the incorrect cuts due to language mixing and so on. Secondly, reconstruction of the tokenization at syllable granularity was performed on the pre-trained model to make the cut-off units parallel to the labeled units. Finally, after completing the sticky cuts using the improved sliding window restoration method, the Re-TiBERT-BiLSTM-CRF model was established by the use of the “Begin, Middle, End and Single” (BMES) four element annotation method, so as to obtain the Tibetan word segmentation system. Experimental results show that the pre-trained model after reconstructing the tokenization is significantly better than the original pre-trained model in the segmentation tasks. The obtained system has a high Tibetan word segmentation precision, and its F1 value can reach up to 97.15%, so it can complete Tibetan segmentation tasks well.

Table and Figures | Reference | Related Articles | Metrics
Novel speaker identification framework based on narrative unit and reliable label
Tianyu LIU, Ye TAO, Chaofeng LU, Jiawang LIU
Journal of Computer Applications    2025, 45 (4): 1190-1198.   DOI: 10.11772/j.issn.1001-9081.2024030331
Abstract69)   HTML0)    PDF (2354KB)(353)       Save

Speaker Identification (SI) in novels aims to determine the speaker of a quotation by its context. This task is of great help in assigning appropriate voices to different characters in the production of audiobooks. However, the existing methods mainly use fixed window values in the selection of the context of quotations, which is not flexible enough and may produce redundant segments, making it difficult for the model to capture useful information. Besides, due to the significant differences in the number of quotations and writing styles in different novels, a small number of labeled samples cannot enable the model to fully generalize, and the labeling of datasets is expensive. To solve the above problems, a novel speaker identification framework that integrates narrative units and reliable labels was proposed. Firstly, a Narrative Unit-based Context Selection (NUCS) method was used to select a suitable length of context for the model to focus highly on the segment closest to the quotation attribution. Secondly, a Speaker Scoring Network (SSN) was constructed with the generated context as input. In addition, the self-training was introduced, and a Reliable Pseudo Label Selection (RPLS) algorithm was designed to compensate for the lack of labeled samples to some extent and screen out more reliable pseudo-label samples with higher quality. Finally, a Chinese Novel Speaker Identification corpus (CNSI) containing 11 Chinese novels was built and labeled. To evaluate the proposed framework, experiments were conducted on two public datasets and the self-built dataset. The results show that the novel speaker identification framework that integrates narrative units and reliable labels is superior to the methods such as CSN (Candidate Scoring Network), E2E_SI and ChatGPT-3.5.

Table and Figures | Reference | Related Articles | Metrics
Fact verification of semantic fusion collaborative reasoning based on graph embedding
Malei SHEN, Zhicai SHI, Yongbin GAO, Jianyang HU
Journal of Computer Applications    2025, 45 (4): 1184-1189.   DOI: 10.11772/j.issn.1001-9081.2024040436
Abstract37)   HTML2)    PDF (2217KB)(18)       Save

As a critical task in the field of natural language processing, fact verification requires the ability to retrieve relevant evidences from large amount of plain text based on a given claim and use this evidence to reason and verify the claim. Previous studies usually use concatenation of evidence sentences or graph structure to represent the relationships among the evidences, but cannot represent the internal relevance among the evidences clearly. Therefore, a collaborative reasoning network model based on graph and text fusion — CNGT (Co-attention Network with Graph and Text fusion) was designed. The semantic fusion of evidence sentences was achieved by constructing evidence knowledge graph. Firstly, the evidential knowledge graph was constructed according to the evidence sentences, and the graph representation was learned by graph transformation encoder. Then, the BERT (Bidirectional Encoder Representations from Transformers) model was used to encode the claim and evidence sentences. Finally, the reasoning graph information and text features were fused effectively through the double-layer cooperative reasoning network. Experimental results show that the proposed model is better than the advanced model KGAT (Knowledge Graph Attention neTwork) on FEVER (Fact Extraction and VERification) dataset with Label Accuracy (LA) increased by 0.84 percentage points and FEVER score increased by 1.51 percentage points. It can be seen that the model pays more attention to the relationships among evidence sentences, demonstrating the interpretability of the model for the relationships among evidence sentences through the evidence graph.

Table and Figures | Reference | Related Articles | Metrics
Open-world knowledge reasoning model based on path and enhanced triplet text
Liqin WANG, Zhilei GENG, Yingshuang LI, Yongfeng DONG, Meng BIAN
Journal of Computer Applications    2025, 45 (4): 1177-1183.   DOI: 10.11772/j.issn.1001-9081.2024030265
Abstract65)   HTML0)    PDF (838KB)(138)       Save

Traditional knowledge reasoning methods based on representation learning can only be used for closed-world knowledge reasoning. Conducting open-world knowledge reasoning effectively is a hot issue currently. Therefore, a knowledge reasoning model based on path and enhanced triplet text, named PEOR (Path and Enhanced triplet text for Open world knowledge Reasoning), was proposed. First, multiple paths generated by structures between entity pairs and enhanced triplets generated by individual entity neighborhood structures were utilized. Among then, the path text was obtained by concatenating the text of triplets in the path, and the enhanced triplet text was obtained by concatenating the text of head entity neighborhood, relation, and tail entity neighborhood. Then, BERT (Bidirectional Encoder Representations from Transformers) was employed to encode the path text and enhanced triplet text separately. Finally, semantic matching attention calculation was performed using path vectors and triplet vectors, followed by aggregation of semantic information from multiple paths using semantic matching attention. Comparison experimental results on three open-world knowledge graph datasets: WN18RR, FB15k-237, and NELL-995 show that compared with suboptimal model BERTRL (BERT-based Relational Learning), the proposed model has Hits@10 (Hit ratio) metric improved by 2.6, 2.3 and 8.5 percentage points, respectively, validating the effectiveness of the proposed model.

Table and Figures | Reference | Related Articles | Metrics
Tender information extraction method based on prompt tuning of knowledge
Yiheng SUN, Maofu LIU
Journal of Computer Applications    2025, 45 (4): 1169-1176.   DOI: 10.11772/j.issn.1001-9081.2024030336
Abstract51)   HTML4)    PDF (1313KB)(18)       Save

Current information extraction tasks mainly rely on Large Language Models (LLMs). However, the frequent occurrence of domain terms in tender information and the lack of relevant prior knowledge of the models result in low fine-tuning efficiency and poor extraction performance. Additionally, the extraction and generalization performance of the models depend on the quality of prompt information and the construction way of prompt templates to a great extent. To address these issues, a Tender Information Extraction method based on Prompt Learning (TIEPL) was proposed. Firstly, prompt learning method for generative information extraction was utilized to inject domain knowledge into the LLM, thereby achieving unified optimization of pre-training and fine-tuning stages. Secondly, with the LoRA (Low-Rank Adaptation) fine-tuning method as framework, a prompt training bypass was designed separately, and a prompt template with keywords was designed in the tender scenarios, thereby enhancing the bidirectional association between model information extraction and prompts. Experimental results on a self-built tender inviting and winning dataset indicate that TIEPL improves Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L) and BLEU-4 (BiLingual Evaluation Understudy) by 1.05 and 4.71 percentage points, respectively, compared to the sub-optimal method, UIE(Universal Information Extraction), and TIEPL can generate extraction results more accurately and completely. This demonstrates the effectiveness of the proposed method in improving the accuracy and generalization of tender information extraction.

Table and Figures | Reference | Related Articles | Metrics
Consultation recommendation method based on knowledge graph and dialogue structure
Chun XU, Shuangyan JI, Huan MA, Enwei SUN, Mengmeng WANG, Mingyu SU
Journal of Computer Applications    2025, 45 (4): 1157-1168.   DOI: 10.11772/j.issn.1001-9081.2024050573
Abstract40)   HTML3)    PDF (2938KB)(26)       Save

Aiming at the problems that the existing consultation recommendation methods do not fully utilize the rich dialogue information between doctors and patients and do not capture patients’ real-time health needs and preferences, a consultation recommendation method based on Knowledge Graph and Dialogue Structure (KGDS) was proposed. Firstly, a medical Knowledge Graph (KG) including comment sentiment analysis and professional medical knowledge was constructed to improve the fine-grained feature representations of doctors and patients. Secondly, in the patient representation learning part, a patient query encoder was designed to extract key features of query text at both word and sentence levels, and to improve the higher-level feature interactions between doctor and patient vectors through attention mechanism. Thirdly, a diagnosis dialogue was modeled to make full use of the rich dialogue information between doctors and patients to enhance the doctor-patient feature representation. Finally, a dialogue simulator based on contrastive learning was designed to capture the dynamic needs and real-time preferences of patients, and the simulated dialogue representation was used to support recommendation score prediction. Experimental results on a real dataset show that compared with the optimal baseline method, KGDS increases AUC(Area Under the Curve), MRR@15(Mean Reciprocal Rank), Diversity@15, F1@15, HR@15 (Hit Ratio) and NDCG@15(Normalized Discounted Cumulative Gain) by 1.82, 1.78, 3.85, 3.06, 10.02 and 4.51 percentage points, respectively, which verifies the effectiveness of the proposed consultation recommendation method, and it can be seen that adding sentiment analysis and KG improves the interpretability of the recommendation results.

Table and Figures | Reference | Related Articles | Metrics
Knowledge graph completion using hierarchical attention fusing directed relationships and relational paths
Sheping ZHAI, Qing YANG, Yan HUANG, Rui YANG
Journal of Computer Applications    2025, 45 (4): 1148-1156.   DOI: 10.11772/j.issn.1001-9081.2024030321
Abstract96)   HTML1)    PDF (1723KB)(517)       Save

Most of the existing Knowledge Graph Completion (KGC) methods do not fully exploit the relational paths in the triple structure, and only consider the graph structure information; meanwhile, the existing models focus on considering the neighborhood information in the process of entity aggregation, and the learning of relations is relatively simple. To address the above problems, a graph attention model that integrates directed relations and relational paths was proposed, namely DRPGAT. Firstly, the regular triples were converted into directed relationship-based triples, and the attention mechanism was introduced to give different weights to different directed relationships, so as to realize the entity information aggregation. At the same time, the relational path model was established, and the relational positions were embedded into the path information to distinguish the relationships among different positions. And the irrelevant paths were filtered to obtain the useful path information. Secondly, the attention mechanism was used to carry out deep path information learning to realize the aggregation of relations. Finally, the entities and relations were fed into the decoder and trained to obtain the final completion results. Link prediction experiments were conducted on two real datasets to verify the effectiveness of the proposed model. Experimental results show that compared to the optimal results of the baseline models, on FB15k-237 dataset, DRPGAT has the Mean Rank (MR) reduced by 13, and the Mean Reciprocal Rank (MRR), Hits@1, Hits@3, and Hits@10 improved by 1.9, 1.2, 2.3, and 1.6 percentage points, respectively; on WN18RR dataset, DRPGAT has the MR reduced by 125, and the MRR, Hits@1, Hits@3, and Hits@10 improved by 1.1, 0.4, 1.2, and 0.6 percentage points, respectively, indicating the effectiveness of the proposed model.

Table and Figures | Reference | Related Articles | Metrics
Unsupervised text style transfer based on semantic perception of proximity
Junxiu AN, Linwang YANG, Yuan LIU
Journal of Computer Applications    2025, 45 (4): 1139-1147.   DOI: 10.11772/j.issn.1001-9081.2024040536
Abstract24)   HTML1)    PDF (3019KB)(5)       Save

Aiming at the problem that the distance boundaries between word vectors in latent space are not fully considered in discrete word perturbation and embedding perturbation methods, a Semantic Proximity-aware Adversarial Auto-Encoders (SPAAE) method was proposed. Firstly, adversarial auto-encoders were used as the underlying model. Secondly, standard deviation of the probability distribution of noise vectors was obtained on the basis of proximity distance of the word vectors. Finally, by randomly sampling the probability distribution, the perturbation parameters were adjusted dynamically to maximize the blurring of its own semantics without affecting the semantics of other word vectors. Experimental results show that compared with the DAAE (Denoising Adversarial Auto-Encoders) and EPAAE (Embedding Perturbed Adversarial Auto-Encoders) methods, the proposed method has the natural fluency increased by 14.88% and 15.65%, respectively, on Yelp dataset; the proposed method has the Text Style Transfer (TST) accuracy improved by 11.68% and 6.45%, respectively, on Scitail dataset; the proposed method has the BLEU (BiLingual Evaluation Understudy) increased by 28.16% and 26.17%, respectively, on Tenses dataset. It can be seen that SPAAE method provides a more accurate way of perturbing word vectors in theory, and demonstrates its significant advantages in different style transfer tasks on 7 public datasets. Especially in the guidance of online public opinion, the proposed method can be used for style transfer of emotional text.

Table and Figures | Reference | Related Articles | Metrics
Domain adaptation integrating environment label smoothing and nuclear norm discrepancy
Meirong DING, Jinxin ZHUO, Yuwu LU, Qinglong LIU, Jicong LANG
Journal of Computer Applications    2025, 45 (4): 1130-1138.   DOI: 10.11772/j.issn.1001-9081.2024040417
Abstract24)   HTML2)    PDF (2993KB)(14)       Save

The existing domain adaptation methods overly focus on fine-grained feature learning in the source domain, hindering their ability to extend to the target domain effectively, making them prone to overfitting in specific environments, and lacking robustness to complex environments. To address the above mentioned issues, a domain adaptation model that integrates Environment Label Smoothing and Nuclear norm Discrepancy (ELSND) was proposed. In the proposed model, through the environment label smoothing module, the probability of true labels was reduced and the probability of non-true labels was increased to enhance the model adaptability to different scenarios. At the same time, the nuclear norm discrepancy module was employed to measure distribution difference between the source and target domains, thereby improving the classification certainty at decision boundaries. Large number of experiments were conducted on adaptive benchmark datasets of three domains including Office-31, Office-Home and MiniDomainNet. Compared with the state-of-the-art baseline model DomainAdaptor-Aug (DomainAdaptor with generalized entropy minimization-Augmentation) on MiniDomainNet dataset, ELSND model achieves a 1.23 percentage points increase in accuracy of image classification domain adaptation tasks. Therefore, the proposed model has a higher precision and generalization in image classification.

Table and Figures | Reference | Related Articles | Metrics
Boundary-cross supervised semantic segmentation network with decoupled residual self-attention
Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU
Journal of Computer Applications    2025, 45 (4): 1120-1129.   DOI: 10.11772/j.issn.1001-9081.2024040415
Abstract41)   HTML4)    PDF (4007KB)(18)       Save

Focused on the challenges of edge information loss and incomplete segmentation of large lesions in endoscopic semantic segmentation networks, a Boundary-Cross Supervised semantic Segmentation Network (BCS-SegNet) with Decoupled Residual Self-Attention (DRA) was proposed. Firstly, DRA was introduced to enhance the network’s ability to learn distantly related lesions. Secondly, a Cross Level Fusion (CLF) module was constructed to combine multi-level feature maps within the encoding structure in a pairwise way, so as to realize the fusion of image details and semantic information at low computational cost. Finally, multi-directional and multi-scale 2D Gabor transform was utilized to extract edge information, and spatial attention was used to weight edge features in the feature maps, so as to supervise decoding process of the segmentation network, thereby providing more accurate intra-class segmentation consistency at pixel level. Experimental results demonstrate that on ISIC2018 dermoscopy and Kvasir-SEG/CVC-ClinicDB colonoscopy datasets, BCS-SegNet achieves the mIoU (mean Intersection over Union) and Dice coefficient of 84.27%, 90.68% and 79.24%, 87.91%, respectively; on the self-built esophageal endoscopy dataset, BCS-SegNet achieves the mIoU of 82.73% and Dice coefficient of 90.84%, while the above mIoU is increased by 3.30% over that of U-net and 4.97% over that of UCTransNet. It can be seen that the proposed network can realize visual effects such as more complete segmentation regions and clearer edge details.

Table and Figures | Reference | Related Articles | Metrics
Data augmentation technique incorporating label confusion for Chinese text classification
Haitao SUN, Jiayu LIN, Zuhong LIANG, Jie GUO
Journal of Computer Applications    2025, 45 (4): 1113-1119.   DOI: 10.11772/j.issn.1001-9081.2024040550
Abstract42)   HTML0)    PDF (863KB)(27)       Save

Traditional data augmentation techniques, such as synonym substitution, random insertion, and random deletion, may change the original semantics of text and even result in the loss of critical information. Moreover, data in text classification tasks typically have both textual and label parts. However, traditional data augmentation methods only focus on the textual part. To address these issues, a Label Confusion incorporated Data Augmentation (LCDA) technique was proposed for providing a comprehensive enhancement of data from both textual and label aspects. In terms of text, by enhancing the text through random insertion and replacement of punctuation marks and completing end-of-sentence punctuation marks, textual diversity was increased with all textual information and sequence preserved. In terms of labels, simulated label distribution was generated using a label confusion approach, and used to replace the traditional one-hot label distribution, so as to better reflect the relationships among instances and labels as well as between labels. In experiments conducted on few-shot datasets constructed from THUCNews (TsingHua University Chinese News) and Toutiao Chinese news datasets, the proposed technique was combined with TextCNN, TextRNN, BERT (Bidirectional Encoder Representations from Transformers), and RoBERTa-CNN (Robustly optimized BERT approach Convolutional Neural Network) text classification models. The experimental results indicate that compared to those before enhancement, all models demonstrate significant performance improvements. Specifically, on 50-THU, a dataset constructed on THUCNews dataset, the accuracies of four models combing LCDA technique are improved by 1.19, 6.87, 3.21, and 2.89 percentage points, respectively, compared to those before enhancement, and by 0.78, 7.62, 1.75, and 1.28 percentage points, respectively, compared to those of the four models combining softEDA (Easy Data Augmentation with soft labels) method. By both textual and label processing results, model accuracy is enhanced by LCDA technique significantly, particularly in application scenarios characterized by limited data availability.

Table and Figures | Reference | Related Articles | Metrics