Journal of Computer Applications

Select

Survey and prospect of large language models

Xiaolin QIN, Xu GU, Dicheng LI, Haiwen XU

Journal of Computer Applications 2025, 45 (3): 685-696. DOI: 10.11772/j.issn.1001-9081.2025010128

Abstract （690）

HTML （58）

PDF （2035KB）（1026）

Save

Large Language Models （LLMs） are a class of language models composed of artificial neural networks with a vast number of parameters （typically billions of weights or more）. They are trained on a large amount of unlabeled text using self-supervised or semi-supervised learning and are the core of current generative Artificial Intelligence （AI） technologies. Compared to traditional language models， LLMs demonstrate stronger language understanding and generation capabilities， supported by substantial computational power， extensive parameters， and large-scale data. They are widely applied in tasks such as machine translation， question answering systems， and dialogue generation with good performance. Most of the existing surveys focus on the theoretical construction and training techniques of LLMs， while systematic exploration of LLMs’ industry-level application practices and evolution of the technological ecosystem remains insufficient. Therefore， based on introducing the foundational architecture， training techniques， and development history of LLMs， the current general key technologies in LLMs and advanced integration technologies with LLMs bases were analyzed. Then， by summarizing the existing research， challenges faced by LLMs in practical applications were further elaborated， including problems such as data bias， model hallucination， and computational resource consumption， and an outlook was provided on the ongoing development trends of LLMs.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey on hypergraph application methods： issues， advances， and challenges

Li ZENG, Jingru YANG, Gang HUANG, Xiang JING, Chaoran LUO

Journal of Computer Applications 2024, 44 (11): 3315-3326. DOI: 10.11772/j.issn.1001-9081.2023111629

Abstract （640）

HTML （25）

PDF （795KB）（417）

Save

Hypergraph is the generalization of graph， which has significant advantages in representing higher-order features of complex relationships compared with ordinary graph. As a relatively new data structure， hypergraph is playing a crucial role in various application fields increasingly. By appropriately using hypergraph models and algorithms， specific problems in real world were modeled and solved with higher efficiency and quality. Existing surveys of hypergraph mainly focus on the theory and techniques of hypergraph itself， and lack of a summary of modeling and solving methods in specific scenarios. To this end， after summarizing and introducing some fundamental concepts of hypergraph， the application methods， techniques， common issues， and solutions of hypergraph in various application scenarios were analyzed； by summarizing the existing work， some problems and obstacles that still exist in the applications of hypergraph to real-world problems were elaborated. Finally， the future research directions of hypergraph applications were prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Overview of research and application of knowledge graph in equipment fault diagnosis

Jie WU, Ansi ZHANG, Maodong WU, Yizong ZHANG, Congbao WANG

Journal of Computer Applications 2024, 44 (9): 2651-2659. DOI: 10.11772/j.issn.1001-9081.2023091280

Abstract （586）

HTML （53）

PDF （2858KB）（1543）

Save

Useful knowledge can be extracted from equipment fault diagnosis data for construction of a knowledge graph， which can effectively manage complex equipment fault diagnosis information in the form of triples （entity， relationship， entity）. This enables the rapid diagnosis of equipment faults. Firstly， the related concepts of knowledge graph for equipment fault diagnosis were introduced， and the framework of knowledge graph for equipment fault diagnosis domain was analyzed. Secondly， the research status at home and abroad about several key technologies， such as knowledge extraction， knowledge fusion and knowledge reasoning for equipment fault diagnosis knowledge graph， was summarized. Finally， the applications of knowledge graph in equipment fault diagnosis were summarized， some shortcomings and challenges in the construction of knowledge graph in this field were proposed， and some new ideas were provided for the field of equipment fault diagnosis in the future.

Table and Figures | Reference | Related Articles | Metrics

Select

Underwater target detection algorithm based on improved YOLOv8

Dahai LI, Bingtao LI, Zhendong WANG

Journal of Computer Applications 2024, 44 (11): 3610-3616. DOI: 10.11772/j.issn.1001-9081.2023111550

Abstract （544）

HTML （19）

PDF （1637KB）（493）

Save

Due to the unique characteristics of underwater creatures， underwater images usually exit many small targets being hard to detect and often overlapping with each other. In addition， light absorption and scattering in underwater environment can cause underwater images' color offset and blur. To overcome those challenges， an underwater target detection algorithm， namely WCA-YOLOv8， was proposed. Firstly， the Feature Fusion Module （FFM） was designed to improve the focus on spatial dimension in order to improve the recognition ability for targets with color offset and blur. Secondly， the FReLU Coordinate Attention （FCA） module was added to enhance the feature extraction ability for overlapped and occluded underwater targets. Thirdly， Complete Intersection over Union （CIoU） loss function was replaced by Wise-IoU version 3 （WIoU v3） loss function to strengthen the detection performance for small size targets. Finally， the Downsampling Enhancement Module （DEM） was designed to preserve context information during feature extraction more completely. Experimental results show that WCA-YOLOv8 achieves 75.8% and 88.6% mean Average Precision （mAP_0.5） and 60 frame/s and 57 frame/s detection speeds on RUOD and URPC datasets， respectively. Compared with other state-of-the-art underwater target detection algorithms， WCA-YOLOv8 can achieve higher detection accuracy with faster detection speed.

Table and Figures | Reference | Related Articles | Metrics

Select

Development， technologies and applications of blockchain 3.0

Peng FANG, Fan ZHAO, Baoquan WANG, Yi WANG, Tonghai JIANG

Journal of Computer Applications 2024, 44 (12): 3647-3657. DOI: 10.11772/j.issn.1001-9081.2023121826

Abstract （535）

HTML （39）

PDF （2294KB）（342）

Save

Blockchain 3.0 is the third stage of the development of blockchain technology and the core of building a value Internet. Its innovations in sharding， cross-chain and privacy protection have given it a wide range of application scenarios and research value. It is highly valued by relevant people in academia and industry. For the development， technologies and applications of blockchain 3.0， the relevant literature on blockchain 3.0 at home and abroad in the past five years were surveyed and reviewed. Firstly， the basic theory and technical characteristics of blockchain were introduced， laying the foundation for an in-depth understanding of the research progress of blockchain. Subsequently， based on the evolution trend of blockchain technology over time， the development process and various key development time nodes of blockchain 3.0， as well as the reasons of the division of different stages of development of blockchain using sharding and side-chain technologies as benchmarks， were given. Then， the current research status of key technologies of blockchain 3.0 was analyzed in detail， and typical applications of blockchain 3.0 in six major fields such as internet of things， medical care， and agriculture were summarized. Finally， the key challenges and future development opportunities faced by blockchain 3.0 in its development process were summed up.

Table and Figures | Reference | Related Articles | Metrics

Select

Logo detection algorithm based on improved YOLOv5

Yeheng LI, Guangsheng LUO, Qianmin SU

Journal of Computer Applications 2024, 44 (8): 2580-2587. DOI: 10.11772/j.issn.1001-9081.2023081113

Abstract （519）

HTML （10）

PDF （4682KB）（439）

Save

To address the challenges posed by complex background and varying size of logo images， an improved detection algorithm based on YOLOv5 was proposed. Firstly， in combination with the Channel Block Attention Module （CBAM）， compression was applied in both image channels and spatial dimensions to extract critical information and significant regions within the image. Subsequently， the Switchable Atrous Convolution （SAC） was employed to allow the network to adaptively adjust the receptive field size in feature maps at different scales， improving the detection effects of objects across multiple scales. Finally， the Normalized Wasserstein Distance （NWD） was embedded into the loss function. The bounding boxes were modeled as 2D Gaussian distributions， the similarity between corresponding Gaussian distributions was calculated to better measure the similarity among objects， thereby enhancing the detection performance for small objects， and improving model robustness and stability. Compared to the original YOLOv5 algorithm： in small dataset FlickrLogos?32， the improved algorithm achieved a mean of Average Precision （mAP@0.5） of 90.6%， with an increase of 1 percentage point； in large dataset QMULOpenLogo， the improved algorithm achieved an mAP@0.5 of 62.7%， with an increase of 2.3 percentage points； in LogoDet3K for three types of logos， the improved algorithm increased the mAP@0.5 by 1.2， 1.4， and 1.4 percentage points respectively. Experimental results demonstrate that the improved algorithm has better small object detection ability of logo images.

Table and Figures | Reference | Related Articles | Metrics

Select

Small object detection algorithm from drone perspective based on improved YOLOv8n

Tao LIU, Shihong JU, Yimeng GAO

Journal of Computer Applications 2024, 44 (11): 3603-3609. DOI: 10.11772/j.issn.1001-9081.2023111644

Abstract （499）

HTML （20）

PDF （1561KB）（279）

Save

In view of the low accuracy of object detection algorithms in small object detection from drone perspective， a new small object detection algorithm named SFM-YOLOv8 was proposed by improving the backbone network and attention mechanism of YOLOv8. Firstly， the SPace-to-Depth Convolution （SPDConv） suitable for low-resolution images and small object detection was integrated into the backbone network to retain discriminative feature information and improve the perception ability to small objects. Secondly， a multi-branch attention named MCA （Multiple Coordinate Attention） was introduced to enhance the spatial and channel information on the feature layer. Then， a convolution FE-C2f fusing FasterNet and Efficient Multi-scale Attention （EMA） was constructed to reduce the computational cost and lightweight the model. Besides， a Minimum Point Distance based Intersection over Union （MPDIoU） loss function was introduced to improve the accuracy of the algorithm. Finally， a small object detection layer was added to the network structure of YOLOv8n to retain more location information and detailed features of small objects. Experimental results show that compared with YOLOv8n， SFM-YOLOv8 achieves a 4.37 percentage point increase in mAP₅₀ （mean Average Precision） with a 5.98% reduction in parameters on VisDrone-DET2019 dataset. Compared to the related mainstream models， SFM-YOLOv8 achieves higher accuracy and meets real-time detection requirements.

Table and Figures | Reference | Related Articles | Metrics

Select

Correlation filtering based target tracking with nonlinear temporal consistency

Wentao JIANG, Wanxuan LI, Shengchong ZHANG

Journal of Computer Applications 2024, 44 (8): 2558-2570. DOI: 10.11772/j.issn.1001-9081.2023081121

Abstract （464）

HTML （3）

PDF （7942KB）（82）

Save

Concerning the problem that existing target tracking algorithms mainly use the linear constraint mechanism LADCF （Learning Adaptive Discriminative Correlation Filters）， which easily causes model drift， a correlation filtering based target tracking algorithm with nonlinear temporal consistency was proposed. First， a nonlinear temporal consistency term was proposed based on Stevens’ Law， which aligned closely with the characteristics of human visual perception. The nonlinear temporal consistency term allowed the model to track the target relatively smoothly， thus ensuring tracking continuity and preventing model drift. Next， the Alternating Direction Method of Multipliers （ADMM） was employed to compute the optimal function value， ensuring real-time tracking of the algorithm. Lastly， Stevens’ Law was used for nonlinear filter updating， enabling the filter update factor to enhance and suppress the filter according to the change of the target， thereby adapting to target changes and preventing filter degradation. Comparison experiments with mainstream correlation filtering and deep learning algorithms were performed on four standard datasets. Compared with the baseline algorithm LADCF， the tracking precision and success rate of the proposed algorithm were improved by 2.4 and 3.8 percentage points on OTB100 dataset， and 1.5 and 2.5 percentage points on UAV123 dataset. The experimental results show that the proposed algorithm effectively avoids tracking model drift， reduces the likelihood of filter degradation， has higher tracking precision and success rate， and stronger robustness in complicated situations such as occlusion and illumination changes.

Table and Figures | Reference | Related Articles | Metrics

Select

Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU

Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG

Journal of Computer Applications 2024, 44 (8): 2493-2499. DOI: 10.11772/j.issn.1001-9081.2023081112

Abstract （424）

HTML （4）

PDF （1194KB）（1264）

Save

Network traffic anomaly detection is a network security defense method that involves analyzing and determining network traffic to identify potential attacks. A new approach was proposed to address the issue of low detection accuracy and high false positive rate caused by imbalanced high-dimensional network traffic data and different attack categories. One Dimensional Convolutional Neural Network（1D-CNN） and Bidirectional Gated Recurrent Unit （BiGRU） were combined to construct a model for traffic anomaly detection. For class-imbalanced data， balanced processing was performed by using an improved Synthetic Minority Oversampling TEchnique （SMOTE）， namely Borderline-SMOTE， and an undersampling clustering technique based on Gaussian Mixture Model （GMM）. Subsequently， a one-dimensional CNN was utilized to extract local features in the data， and BiGRU was used to better extract the time series features in the data. Finally， the proposed model was evaluated on the UNSW-NB15 dataset， achieving an accuracy of 98.12% and a false positive rate of 1.28%. The experimental results demonstrate that the proposed model outperforms other classic machine learning and deep learning models， it improves the recognition rate for minority attacks and achieves higher detection accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Review on bimodal emotion recognition based on speech and text

Lingmin HAN, Xianhong CHEN, Wenmeng XIONG

Journal of Computer Applications 2025, 45 (4): 1025-1034. DOI: 10.11772/j.issn.1001-9081.2024030319

Abstract （418）

HTML （62）

PDF （1625KB）（1593）

Save

Emotion recognition is a technology that allows computers to recognize and understand human emotions. It plays an important role in many fields and is an important development direction in the field of artificial intelligence. Therefore， the research status of bimodal emotion recognition based on speech and text was summarized. Firstly， the representation space of emotion was classified and elaborated. Secondly， the emotion databases were classified according to their emotion representation space， and the common multi-modal emotion databases were summed up. Thirdly， the methods of bimodal emotion recognition based on speech and text were introduced， including feature extraction， modal fusion， and decision classification. Specifically， the modal fusion methods were highlighted and divided into four categories， namely feature level fusion， decision level fusion， model level fusion and multi-level fusion. In addition， results of a series of bimodal emotion recognition methods based on speech and text were compared and analyzed. Finally， the application scenarios， challenges， and future development directions of emotion recognition were introduced. The above aims to analyze and review the work of multi-modal emotion recognition， especially bimodal emotion recognition based on speech and text， providing valuable information for emotion recognition.

Table and Figures | Reference | Related Articles | Metrics

Select

Personalized federated learning method based on dual stream neural network

Zheyuan SHEN, Keke YANG, Jing LI

Journal of Computer Applications 2024, 44 (8): 2319-2325. DOI: 10.11772/j.issn.1001-9081.2023081207

Abstract （414）

HTML （54）

PDF （2185KB）（542）

Save

Classic Federated Learning （FL） algorithms are difficult to achieve good results in scenarios where data is highly heterogeneous. In Personalized FL （PFL）， a new solution was proposed aiming at the problem of data heterogeneity in federated learning， which is to “tailor” a dedicated model for each client. In this way， the models had good performance. However， it brought the difficulty in extending federated learning to new clients at the same time. Focusing on the challenges of performance and scalability in PFL， FedDual， a FL model with dual stream neural network structure， was proposed. By adding an encoder for analyzing the personalized characteristics of clients， this model was not only able to have the performance of personalized models， but also able to be extended to new clients easily. Experimental results show that compared to the classic Federated Averaging （FedAvg） algorithm on datasets such as MNIST and FashionMNIST， FedDual obviously improves the accuracy； on CIFAR10 dataset， FedDual improves the accuracy by more than 10 percentage points， FedDual achieves “plug and play” for new clients without decrease of the accuracy， solving the problem of difficult scalability for new clients.

Table and Figures | Reference | Related Articles | Metrics

Select

Distributed UAV cluster pursuit decision-making based on trajectory prediction and MADDPG

Yu WANG, Zhihui GUAN, Yuanpeng LI

Journal of Computer Applications 2024, 44 (11): 3623-3628. DOI: 10.11772/j.issn.1001-9081.2023101538

Abstract （392）

HTML （5）

PDF （918KB）（127）

Save

A Trajectory Prediction based Distributed Multi-Agent Deep Deterministic Policy Gradient （TP-DMADDPG） algorithm was proposed to address the problems of insufficient flexibility and poor generalization ability of Unmanned Aerial Vehicle （UAV） cluster pursuit decision-making algorithms in complex mission environments. Firstly， to enhance the realism of the pursuit mission， an intelligent escape strategy was designed for the target. Secondly， considering the conditions such as missing information of target due to communication interruption and other reasons， a Long Short-Term Memory （LSTM） network was used to predict the position information of target in real time， and the state space of the decision-making model was constructed on the basis of the prediction information. Finally， TP-DMADDPG was designed based on the distributed framework and Multi-Agent Deep Deterministic Policy Gradient （MADDPG） algorithm， which enhanced the flexibility and generalization ability of pursuit decision-making in the process of complex air combat. Simulation results show that compared with Deep Deterministic Policy Gradient （DDPG）， Twin Delayed Deep Deterministic policy gradient （TD3） and MADDPG algorithms， the TP-DMADDPG algorithm increases the success rate of collaborative decision-making by more than 15 percentage points， and can solve the problem of pursuing intelligent escaping target with incomplete information.

Table and Figures | Reference | Related Articles | Metrics

Select

Proximal policy optimization algorithm based on clipping optimization and policy guidance

Yi ZHOU, Hua GAO, Yongshen TIAN

Journal of Computer Applications 2024, 44 (8): 2334-2341. DOI: 10.11772/j.issn.1001-9081.2023081079

Abstract （391）

HTML （15）

PDF （3877KB）（692）

Save

Addressing the two issues in the Proximal Policy Optimization （PPO） algorithm， the difficulty in strictly constraining the difference between old and new policies and the relatively low efficiency in exploration and utilization， a PPO based on Clipping Optimization And Policy Guidance （COAPG-PPO） algorithm was proposed. Firstly， by analyzing the clipping mechanism of PPO， a trust-region clipping approach based on the Wasserstein distance was devised， strengthening the constraint on the difference between old and new policies. Secondly， within the policy updating process， ideas from simulated annealing and greedy algorithms were incorporated， improving the exploration efficiency and learning speed of algorithm. To validate the effectiveness of COAPG-PPO algorithm， comparative experiments were conducted using the MuJoCo testing benchmarks between PPO based on Clipping Optimization （CO-PPO）， PPO with Covariance Matrix Adaptation （PPO-CMA）， Trust Region-based PPO with RollBack （TR-PPO-RB）， and PPO algorithm. The experimental results indicate that COAPG-PPO algorithm demonstrates stricter constraint capabilities， higher exploration and exploitation efficiencies， and higher reward values in most environments.

Table and Figures | Reference | Related Articles | Metrics

Select

Multi-robot path following and formation based on deep reinforcement learning

Haodong HE, Hao FU, Qiang WANG, Shuai ZHOU, Wei LIU

Journal of Computer Applications 2024, 44 (8): 2626-2633. DOI: 10.11772/j.issn.1001-9081.2023081120

Abstract （376）

HTML （9）

PDF （3411KB）（584）

Save

Aiming at the obstacle avoidance and trajectory smoothness problem of multi-robot path following and formation in crowd environment， a multi-robot path following and formation algorithm based on deep reinforcement learning was proposed. Firstly， a pedestrian danger priority mechanism was established， which was combined with reinforcement learning to design a danger awareness network to enhance the safety of multi-robot formation. Subsequently， a virtual robot was introduced as the reference target for multiple robots， thus transforming path following into tracking control of the virtual robot by the multiple robots， with the purpose of enhancing the smoothness of the robot trajectories. Finally， quantitative and qualitative analysis was conducted through simulation experiments to compare the proposed algorithm with existing ones. The experimental results show that compared with the existing point-to-point path following algorithms， the proposed algorithm has excellent obstacle avoidance performance in crowd environments， which ensures the smoothness of multi-robot motion trajectories.

Table and Figures | Reference | Related Articles | Metrics

Select

Semi-supervised object detection framework guided by curriculum learning

Yingjun ZHANG, Niuniu LI, Binhong XIE, Rui ZHANG, Wangdong LU

Journal of Computer Applications 2024, 44 (8): 2326-2333. DOI: 10.11772/j.issn.1001-9081.2023081062

Abstract （363）

HTML （26）

PDF （2042KB）（542）

Save

In order to enhance the quality of pseudo labels， address the issue of confirmation bias in Semi-Supervised Object Detection （SSOD）， and tackle the challenge of ignoring complexities in unlabeled data leading to erroneous pseudo labels in existing algorithms， an SSOD framework guided by Curriculum Learning （CL） was proposed. The framework consisted of two modules： the ICSD （IoU-Confidence-Standard-Deviation） difficulty measurer and the BP （Batch-Package） training scheduler. The ICSD difficulty measurer comprehensively considered information such as IoU （Intersection over Union） between pseudo-bounding boxes， confidence， class label， etc.，and the C_IOU （Checkpoint_IOU） method was introduced to evaluate the reliability of unlabeled data. The BP training scheduler designed two efficient scheduling strategies， starting from the perspectives of Batch and Package respectively， giving priority to unlabeled data with high reliability indicators to achieve full utilization of the entire unlabeled data set in the form of course learning. Extensive comparative experimental results on the Pascal VOC and MS-COCO datasets demonstrate that the proposed framework applies to existing SSOD algorithms and exhibits significant improvements in detection accuracy and stability.

Table and Figures | Reference | Related Articles | Metrics

Select

Hybrid internet of vehicles intrusion detection system for zero-day attacks

Jiepo FANG, Chongben TAO

Journal of Computer Applications 2024, 44 (9): 2763-2769. DOI: 10.11772/j.issn.1001-9081.2023091328

Abstract （361）

HTML （14）

PDF （2618KB）（1650）

Save

Existing machine learning methods suffer from over-reliance on sample data and insensitivity to anomalous data when confronted with zero-day attack detection， thus making it difficult for Intrusion Detection System （IDS） to effectively defend against zero-day attacks. Therefore， a hybrid internet of vehicles intrusion detection system based on Transformer and ANFIS （Adaptive-Network-based Fuzzy Inference System） was proposed. Firstly， a data enhancement algorithm was designed and the problem of unbalanced data samples was solved by denoising first and then generating. Secondly， a feature engineering module was designed by introducing non-linear feature interactions into complex feature combinations. Finally， the self-attention mechanism of Transformer and the adaptive learning method of ANFIS were combined， which enhanced the ability of feature representation and reduced the dependence on sample data. The proposed system was compared with other SOTA （State-Of-The-Art） algorithms such as Dual-IDS on CICIDS-2017 and UNSW-NB15 intrusion datasets. Experimental results show that for zero-day attacks， the proposed system achieves 98.64% detection accuracy and 98.31% F1 value on CICIDS-2017 intrusion dataset， and 93.07% detection accuracy and 92.43% F1 value on UNSW-NB15 intrusion dataset， which validates high accuracy and strong generalization ability of the proposed algorithm for zero-day attack detection.

Table and Figures | Reference | Related Articles | Metrics

Select

Small target detection model in overlooking scenes on tower cranes based on improved real-time detection Transformer

Yudong PANG, Zhixing LI, Weijie LIU, Tianhao LI, Ningning WANG

Journal of Computer Applications 2024, 44 (12): 3922-3929. DOI: 10.11772/j.issn.1001-9081.2023121796

Abstract （360）

HTML （10）

PDF （3128KB）（262）

Save

In view of a series of problems of security guarantee of construction site personnel such as casualties led by falling objects and tower crane collapse caused by mutual collision of tower hooks， a small target detection model in overlooking scenes on tower cranes based on improved Real-Time DEtection TRansformer （RT-DETR） was proposed. Firstly， the multiple training and single inference structures designed by applying the idea of model reparameterization were added to the original model to improve the detection speed. Secondly， the convolution module in FasterNet Block was redesigned to replace BasicBlock in the original BackBone to improve performance of the detection model. Thirdly， the new loss function Inner-SIoU （Inner-Structured Intersection over Union） was utilized to further improve precision and convergence speed of the model. Finally， the ablation and comparison experiments were conducted to verify the model performance. The results show that， in detection of the small target images in overlooking scenes on tower cranes， the proposed model achieves the precision of 94.7%， which is higher than that of the original RT-DETR model by 6.1 percentage points. At the same time， the Frames Per Second （FPS） of the proposed model reaches 59.7， and the detection speed is improved by 21% compared with the original model. The Average Precision （AP） of the proposed model on the public dataset COCO 2017 is 2.4， 1.5， and 1.3 percentage points higher than those of YOLOv5， YOLOv7， and YOLOv8， respectively. It can be seen that the proposed model meets the precision and speed requirements for small target detection in overlooking scenes on tower cranes.

Table and Figures | Reference | Related Articles | Metrics

Select

Lightweight safety helmet wearing detection algorithm based on YOLO v8

Yong FENG, Sizhuo YANG, Hongyan XU

Journal of Computer Applications 0, (): 251-256. DOI: 10.11772/j.issn.1001-9081.2024010020

Abstract （354）

HTML （0）

PDF （2164KB）（808）

PDF（mobile）（1480KB）（22）

Save

Construction， mining， exploration and other industries have mandatory regulations on helmet wearing in production. The helmet wearing detection algorithms have been widely used in the above industries， but the existing algorithms have problems such as too many parameters， high complexity and poor real-time performance. Therefore， a lightweight safety helmet wearing detection algorithm YOLO v8-s-LE was proposed on the basis of YOLO v8 （You Only Look Once v8）. Firstly， the LAD （Light Adaptive-weight Downsampling） method was designed，so that compared with the original YOLO v8 algorithm， the proposed algorithm reduced floating-point computation significantly. Then， the efficient multi-scale convolution C2f_EMC （C2f_Efficient Multi-Scale Conv） method was used to extract multi-scale feature information， which increased the depth of the network effectively， made the neural network take into account both shallow and deep semantic information， and further improved the expression ability of the algorithm for feature information. Experimental results show that compared with YOLO v8-s algorithm on the public dataset SHWD （Safety Helmet Wearing Dataset）， the proposed algorithm has the parameters reduced by 77%， the floating-point computation reduced by 73%， and the precision reached 92.6%， verifying that the algorithm takes into account the requirements of accuracy and real-time performance， and is more suitable for the deployment and application in actual production environments.

Table and Figures | Reference | Related Articles | Metrics

Select

Multivariate time series prediction model based on decoupled attention mechanism

Liting LI, Bei HUA, Ruozhou HE, Kuang XU

Journal of Computer Applications 2024, 44 (9): 2732-2738. DOI: 10.11772/j.issn.1001-9081.2023091301

Abstract （353）

HTML （11）

PDF （1545KB）（1423）

Save

Aiming at the problem that it is difficult to fully utilize the sequence contextual semantic information and the implicit correlation information among variables in multivariate time-series prediction， a model based on decoupled attention mechanism — Decformer was proposed for multivariate time-series prediction. Firstly， a novel decoupled attention mechanism was proposed to fully utilize the embedded semantic information， thereby improving the accuracy of attention weight allocation. Secondly， a pattern correlation mining method without relying on explicit variable relationships was proposed to mine and utilize implicit pattern correlation information among variables. On three different types of real datasets （TTV， ECL and PeMS-Bay）， including traffic volume of call， electricity consumption and traffic， Decformer achieves the highest prediction accuracy over all prediction time lengths compared with excellent open-source multivariate time-series prediction models such as Long- and Short-term Time-series Network （LSTNet）， Transformer and FEDformer. Compared with LSTNet， Decformer has the Mean Absolute Error （MAE） reduced by 17.73%-27.32%， 10.89%-17.01%， and 13.03%-19.64% on TTV， ECL and PeMS-Bay datasets， respectively， and the Mean Squared Error （MSE） reduced by 23.53%-58.96%， 16.36%-23.56% and 15.91%-26.30% on TTV， ECL and PeMS-Bay datasets， respectively. Experimental results indicate that Decformer can enhance the accuracy of multivariate time series prediction significantly.

Table and Figures | Reference | Related Articles | Metrics

Select

Graph data generation approach for graph neural network model extraction attacks

Ying YANG, Xiaoyan HAO, Dan YU, Yao MA, Yongle CHEN

Journal of Computer Applications 2024, 44 (8): 2483-2492. DOI: 10.11772/j.issn.1001-9081.2023081110

Abstract （347）

HTML （3）

PDF （3213KB）（339）

Save

Data-free model extraction attacks are a class of machine learning security problems based on the fact that the attacker has no knowledge of the training data information required to carry out the attack. Aiming at the research gap of data-free model extraction attacks in the field of Graphical Neural Network （GNN）， a GNN model extraction attack method was proposed. The graph node feature information and edge information were optimized with the graph neural network interpretability method GNNExplainer and the graph data enhancement method GAUG-M， respectively， so as to generate the required graph data and achieve the final GNN model extraction. Firstly， the GNNExplainer method was used to obtain the important graph node feature information from the interpretable analysis of the response results of the target model. Secondly， the overall optimization of the graph node feature information was achieved by up weighting the important graph node features and downweighting the non-important graph node features. Then， the graph autoencoder was used as the edge information prediction module， which obtained the connection probability information between nodes according to the optimized graph node features. Finally， the edge information was optimized by adding or deleting the corresponding edges according to the probability. Three GNN model architectures trained on five graph datasets were experimented as the target models for extraction attacks， and the obtained alternative models achieve 73% to 87% accuracy in node classification task and 76% to 89% fidelity with the target model performance， which verifies the effectiveness of the proposed method.

Table and Figures | Reference | Related Articles | Metrics

Select

Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation

Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU

Journal of Computer Applications 2024, 44 (9): 2911-2918. DOI: 10.11772/j.issn.1001-9081.2023091332

Abstract （346）

HTML （3）

PDF （2025KB）（1119）

Save

3D/2D registration is a key technique for intraoperative guidance. In existing deep learning based registration methods， image features were extracted through the network to regress the corresponding pose transformation parameters. This kind of method relies on real samples and their corresponding 3D labels for training， however， this part of expert-annotated medical data is scarce. In the alternative solution， the network was trained with Digital Reconstructed Radiography （DRR） images， which struggled to keep the original accuracy on Xray images due to the differences of image features across domains. For the above problems， an Unsupervised Cross-Domain Transfer Network （UCDTN） based on self-attention was designed. Without relying on Xray images and their 3D spatial labels as the training samples， the correspondence between the image features captured in the source domain and spatial transformations were migrated to the target domain. The public features were used to reduce the disparity of features between domains to minimize the negative impact of cross-domain. Experimental results show that the mTRE （mean Registration Target Error） of the result predicted by UCDTN is 2.66 mm， with a 70.61% reduction compared to the model without cross-domain transfer training， indicating the effectiveness of UCDTN in cross-domain registration tasks.

Table and Figures | Reference | Related Articles | Metrics

Select

Incomplete multi-view clustering algorithm based on self-attention fusion

Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO

Journal of Computer Applications 2024, 44 (9): 2696-2703. DOI: 10.11772/j.issn.1001-9081.2023091253

Abstract （341）

HTML （9）

PDF （2806KB）（1187）

Save

Multi-view clustering task based on incomplete data has become one of the research hotspots in the field of unsupervised learning. However， most multi-view clustering algorithms based on “shallow” models often find it difficult to extract and characterize potential feature structures within views when dealing with large-scale high-dimensional data. At the same time， the stacking or averaging methods of multi-view information fusion ignore the differences between views and does not fully consider the different contributions of each view to building a common consensus representation. To address the above issues， an Incomplete Multi-View Clustering algorithm based on Self-Attention Fusion （IMVCSAF） was proposed. Firstly， the potential features of each view were extracted on the basis of a deep autoencoder， and the consistency information among views was maximized by using contrastive learning. Secondly， a self-attention mechanism was adopted to recode and fuse the potential representations of each view， and the inherent causality as well as feature complementarity between different views was considered and mined comprehensively. Thirdly， based on the common consensus representation， the potential representation of missing instance was predicted and recovered， thereby fully implementing the process of multi-view clustering. Experimental results on Scene-15， LandUse-21， Caltech101-20 and Noisy-MNIST datasets show that， the accuracy of IMVCSAF is higher than those of other comparison algorithms while meeting the convergence requirements. On Noisy-MNIST dataset with 50% miss rate， the accuracy of IMVCSAF is 6.58 percentage points higher than that of the second best algorithm — COMPETER （inCOMPlete muLti-view clustEring via conTrastivE pRediction）.

Table and Figures | Reference | Related Articles | Metrics

Select

Multi-domain fake news detection model enhanced by APK-CNN and Transformer

Jinjin LI, Guoming SANG, Yijia ZHANG

Journal of Computer Applications 2024, 44 (9): 2674-2682. DOI: 10.11772/j.issn.1001-9081.2023091359

Abstract （338）

HTML （17）

PDF （1378KB）（1520）

Save

In order to solve the problems of domain shifting and incomplete domain labeling in social media news， as well as to explore more efficient multi-domain news feature extraction and fusion networks， a multi-domain fake news detection model based on enhancement by APK-CNN （Adaptive Pooling Kernel Convolutional Neural Network） and Transformer was proposed， namely Transm3. Firstly， a three-channel network was designed for feature extraction and representation of semantic， emotional， and stylistic information of the text and view combination of these features using a multi-granularity cross-domain interactor. Secondly， the news domain labels were refined by optimized soft-shared memory networking and domain adapters. Then， Transformer was combined with a multi-granularity cross-domain interactor to dynamically and weighty aggregate the interaction features of different domains. Finally， the fused features were fed into the classifier for true/false news discrimination. Experimental results show that compared with M³FEND （Memory-guided Multi-view Multi-domain FakE News Detection） and EANN （Event Adversarial Neural Networks for multi-modal fake news detection）， Transm3 improves the comprehensive F1 value by 3.68% and 6.46% on Chinese dataset， and 6.75% and 11.93% on English dataset； and the F1 values on sub-domains are also significantly improved. The effectiveness of Transm3 for multi-domain fake news detection is fully validated.

Table and Figures | Reference | Related Articles | Metrics

Select

Industrial defect detection method with improved masked autoencoder

Kaili DENG, Weibo WEI, Zhenkuan PAN

Journal of Computer Applications 2024, 44 (8): 2595-2603. DOI: 10.11772/j.issn.1001-9081.2023081122

Abstract （335）

HTML （10）

PDF （4261KB）（31）

Save

Considering the problem of missed detection or over detection in the existing defect detection methods that only need normal samples， an method that combined an improved masked autoencoder with an improved Unet was constructed to achieve pixel-level defect detection. Firstly， a defect fitting module was used to generate the defect mask image and the defect image corresponding to the normal image. Secondly， the defect image was randomly masked to remove most of the defect information from the defect image. The autoencoder with Transformer structure was stimulated to learn the representations from unmasked normal regions and to repair the defect image based on context. In order to improve the model’s ability to repair details of the image， a new loss function was designed. Finally， in order to achieve pixel-level defect detection， the defect image and the repaired image were concatenated and input into the Unet with the channel cross-fusion Transformer structure. Experimental results on MVTec AD dataset show that the average image-based and pixel-based Area Under the Receiver Operating Characteristic Curve （ROC AUC） of the proposed method reached 0.984 and 0.982 respectively； compared with DRAEM （Discriminatively trained Reconstruction Anomaly Embedding Model）， it was increased by 2.9 and 3.2 percentage points； compared with CFLOW-AD （Anomaly Detection via Conditional normalizing FLOWs）， it was increased by 3.1 and 0.8 percentage points. It verifies that the proposed method has high recognition rate and detection accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Improved U-Net algorithm based on attention mechanism and multi-scale fusion

Song WU, Xin LAN, Jingyang SHAN, Haiwen XU

Journal of Computer Applications 0, (): 24-28. DOI: 10.11772/j.issn.1001-9081.2022121844

Abstract （333）

HTML （6）

PDF （2163KB）（130）

Save

Aiming at the problems of computational redundancy and difficulty in segmenting fine structures of the original U-Net in medical image segmentation tasks， an improved U-Net algorithm based on attention mechanism and multi-scale fusion was proposed. Firstly， by integrating channel attention mechanism into the skip connections， the channels containing more important information were focused by the network， thereby reducing computational resource cost and improving computational efficiency. Secondly， the feature fusion strategy was added to increase the contextual information for the feature maps passed to the decoder， which realized the complementary and multiple utilization among the features. Finally， the joint optimization was performed by using Dice loss and binary cross entropy loss， so as to handle with the problem of dramatic oscillations of loss function that may occur in fine structure segmentation. Experimental validation results on Kvasir_seg and DRIVE datasets show that compared with the original U-Net algorithm， the proposed improved algorithm has the Dice coefficient increased by 1.82 and 0.82 percentage points， the SEnsitivity （SE） improved by 1.94 and 3.53 percentage points， and the Accuracy （Acc） increased by 1.62 and 0.04 percentage points， respectively. It can be seen that the proposed improved algorithm can enhance performance of the original U-Net for fine structure segmentation.

Table and Figures | Reference | Related Articles | Metrics

Select

Time series causal inference method based on adaptive threshold learning

Qinzhuang ZHAO, Hongye TAN

Journal of Computer Applications 2024, 44 (9): 2660-2666. DOI: 10.11772/j.issn.1001-9081.2023091278

Abstract （331）

HTML （27）

PDF （1142KB）（195）

Save

Time-series data exhibits recency characteristic， i.e.， variable values are generally dependent on recent historical information. Existing time-series causal inference methods do not fully consider the recency characteristic， which use a uniform threshold when inferring causal relationships with different delays through hypothesis testing， so that it is difficult to effectively infer weaker causal relationships. To address the aforementioned issue， a method for time-series causal inference based on adaptive threshold learning was proposed. Firstly， data characteristics were extracted. Then， based on the data characteristics at different delays， a combination of thresholds used in the hypothesis testing process was automatically learned. Finally， this threshold combination was applied to the hypothesis testing processes of the PC （Peter-Clark） algorithm， PCMCI （Peter-Clark and Momentary Conditional Independence） algorithm， and VAR-LINGAM （Vector AutoRegressive LINear non-Gaussian Acyclic Model） algorithm to obtain more accurate causal relationship structures. Experimental results on the simulation dataset show that the F1 values of adaptive PC algorithm， adaptive PCMCI algorithm， and adaptive VAR-LINGAM algorithm using the proposed method are all improved.

Table and Figures | Reference | Related Articles | Metrics

Select

Intermittent demand forecasting method based on adaptive matching of demand patterns

Lilin FAN, Fukang CAO, Wanting WANG, Kai YANG, Zhaoyu SONG

Journal of Computer Applications 2024, 44 (9): 2747-2755. DOI: 10.11772/j.issn.1001-9081.2023091372

Abstract （314）

HTML （4）

PDF （1955KB）（504）

Save

The demand for after-sales parts in large manufacturing enterprises is characterized by sparse distribution and high volatility， with high uncertainty in both demand frequency and demand quantity， and the demand sequences present typical intermittent characteristic. However， in actual operation and maintenance， the demand for parts fluctuates greatly in terms of frequency and quantity， resulting in various demand patterns. The existing intermittent demand prediction mainly uses single model or static combination of fixed prediction models， which is difficult to fully explore the evolution laws of demand sequences under different demand patterns， and the prediction accuracy and stability are hard to guarantee. To solve the above problems， an intermittent demand forecasting method based on adaptive matching of demand patterns was proposed， in which demand patterns were adaptively matched， and the prediction effect of intermittent sequences was improved by dynamically identifying and matching demand patterns. The method included two stages. In the model training stage， firstly， according to the intermittent characteristics of the historical demand data of parts， it was divided into demand sequences and interval sequences， and the two types of sequences were clustered separately to capture the different demand and interval patterns corresponding to each type of sequence. Secondly， a prediction model library containing statistical analysis models， shallow machine learning models， and deep learning models was established， and the prediction effects of different models on each demand pattern were tested to identify and mark the optimal prediction model for each type of demand pattern. In the prediction stage， the sequence to be predicted was divided into demand sequences and interval sequences， the demand pattern was identified and matched with the optimal prediction model， and the predicted values of demand and interval were combined to form the final prediction result. The experimental validation was carried out on the intermittent parts demand datasets of the American Automobile Company and the Royal Air Force， and the results showed that the proposed method could be applied to the historical data of parts with different demand patterns， and effectively improved the prediction accuracy by adaptively matching the demand pattern and the optimal prediction model.

Table and Figures | Reference | Related Articles | Metrics

Select

Dynamic ciphertext sorting and retrieval scheme based on blockchain

Xiaoling SUN, Danhui WANG, Shanshan LI

Journal of Computer Applications 2024, 44 (8): 2500-2505. DOI: 10.11772/j.issn.1001-9081.2023081114

Abstract （312）

HTML （6）

PDF （1741KB）（107）

Save

To address the untrusted issue of cloud storage servers， a Dynamic ciphertext sorting and retrieval scheme based on blockchain was proposed. A balanced binary tree was utilized as the index tree to achieve sublinear search efficiency. A vector space model was employed to reduce text complexity. The sorting of search results for multiple keywords was achieved through the TF-IDF （Term Frequency-Inverse Document Frequency） weighted statistical algorithm. By employing a separate index tree for newly added files and maintaining a revocation list for deleted files， dynamic updating was enabled for the blockchain-based searchable encryption solution. Through leakage function， it is proven that the proposed scheme is secure against adaptive chosen keyword attacks. Performance testing analysis demonstrates that compared to the ｛key， value｝ index structure， the tree index structure adopted in the proposed scheme reduces index tree generation time by 98%， file search time by 7% and dynamic updating time by 99% averagely， with significant efficiency improvements on each step.

Table and Figures | Reference | Related Articles | Metrics

Select

Session-based recommendation based on graph co-occurrence enhanced multi-layer perceptron

Tingjie TANG, Jiajin HUANG, Jin QIN, Hui LU

Journal of Computer Applications 2024, 44 (8): 2357-2364. DOI: 10.11772/j.issn.1001-9081.2023081063

Abstract （305）

HTML （11）

PDF （1743KB）（170）

Save

Aiming at the problem that the Multi-Layer Perceptron （MLP） architecture can not capture the co-occurrence relationship in the context of session sequence， a session-based recommendation model based on Graph Co-occurrence Enhanced MLP （GCE-MLP） was proposed. Firstly， the sequential dependency of the session sequence was captured by the MLP architecture， and at the same time， the co-occurrence relationship in the sequence context was obtained through the co-occurrence relationship learning layer， and the session representation was obtained through the information fusion module. Secondly， a specific feature selection layer was designed to amplify the diversity of input features of different relation learning layers. Finally， the representation learning of sessional interest was further enhanced by maximizing the mutual information between two relational representations via a noise contrastive task. Experimental results on multiple real datasets show that the recommendation performance of the GCE-MLP is better than those of the current mainstream models， which verifies the effectiveness of GCE-MLP. Compared with the optimal MLP architecture model FMLP-Rec（Filter-enhanced MLP for Recommendation）， GCE-MLP achieves the P@20 of 54.08% and the MRR@20 of 18.87% for Diginetica dataset， which are respectively increased by about 2.14 and 1.43 percentage points； GCE-MLP achieves the P@20 of 71.77% and the MRR@20 of 31.78% for Yoochoose dataset， which are respectively increased by about 0.48 and 1.77 percentage points.

Table and Figures | Reference | Related Articles | Metrics

Select

Formation obstacle-avoidance and reconfiguration method for multiple UAVs

Lingxia MU, Zhengjun ZHOU, Ban WANG, Youmin ZHANG, Xianghong XUE, Kaikai NING

Journal of Computer Applications 2024, 44 (9): 2938-2946. DOI: 10.11772/j.issn.1001-9081.2023091342

Abstract （304）

HTML （9）

PDF （5293KB）（385）

Save

Aiming at the problems of obstacles and failures in multi-UAV （Unmanned Aerial Vehicle） formation flight， a dynamic formation switching and reconfiguration method was proposed. For a certain UAV in the formation， the obstacles and other UAVs in formation were regarded as dynamic threats. By adaptively adjusting score weights considering different flying scenarios， the ability of the UAV formation to avoid obstacles in dynamic environment was improved. When one UAV in the formation occurs faults， the remaining UAVs were formation reconfigured. By changing the positions of the followers relative to the leader in the objective function of the dynamic window approach， a new formation without the fault UAV was reconfigured. By this means， fault-tolerant formation could be obtained. Simulation results show that the proposed formation obstacle avoidance and reconfiguration algorithm can realize dynamic obstacle avoidance and fault-tolerant formation flight in the case of one UAV fault or lack of power. At the same time， compared with the traditional method， the distance error between the UAVs in the formation is lower.

Table and Figures | Reference | Related Articles | Metrics

Select

Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation

Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO

Journal of Computer Applications 2024, 44 (8): 2421-2429. DOI: 10.11772/j.issn.1001-9081.2023081065

Abstract （303）

HTML （6）

PDF （2292KB）（1014）

Save

Relational extraction is an important means of sorting out discipline knowledge as well as an important step in the construction of educational knowledge graph. In the current research， most of the pre-trained language models based on the Transformer architecture， such as the Bidirectional Encoder Representations from Transformers （BERT）， suffer from large number of parameters and excessive complexity， which make them difficult to be deployed on end devices and limite their applications in real educational scenarios. In addition， most traditional lightweight relation extraction models do not model the data through text structure， which are easy to ignore the structural information between entities， and the generated word embedding vectors are difficult to capture the contextual features of the text， have poor ability to solve the problem of multiple meanings of words， and are difficult to fit the unstructured nature of discipline knowledge texts and the high proportion of proper nouns， which is not conducive to high-quality relation extraction. In order to solve the above problems， a relation extraction method between discipline knowledge entities based on improved Piecewise Convolutional Neural Network （PCNN） and Knowledge Distillation （KD） was proposed. Firstly， BERT was used to generate high-quality domain text word vectors to improve the input layer of the PCNN model， so as to effectively capture the text context features and solve the problem of multiple meanings of words to a certain extent. Then， convolution and piecewise max pooling operations were utilized to deeply mine inter-entity structural information， constructing the BERT-PCNN model， and achieving high-quality relation extraction. Lastly， by taking into account the demands for efficient and lightweight models in educational scenarios， the knowledge of the output layer and middle layer of the BERT-PCNN model was distilled for guiding the PCNN model to complete the construction of the KD-PCNN model. The experimental results show that， the weighted-average F1 of the BERT-PCNN model reaches 94%， which is improved by 1 and 2 percentage points compared with the R-BERT and EC_BERT models； the weighted-average F1 of the KD-PCNN model reaches 92%， which is the same as the EC_BERT model， and the parameter quantity of the KD-PCNN model decreased by 3 orders of magnitude compared with the BERT-PCNN and KD-RB-l models. It can be seen that the proposed method can achieve a better trade-off between the performance evaluation index and the network parameter quantity， which is conducive to the improvement of the automated construction level of educational knowledge graph and the development and deployment of new educational applications.

Table and Figures | Reference | Related Articles | Metrics

Select

Recommendation model combining self-features and contrastive learning

Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG

Journal of Computer Applications 2024, 44 (9): 2704-2710. DOI: 10.11772/j.issn.1001-9081.2023091264

Abstract （302）

HTML （12）

PDF （1737KB）（536）

Save

Aiming at the over-smoothing and noise problems in the embedding representation in the message passing process of graph convolution based on graph neural network recommendation， a Recommendation model combining Self-features and Contrastive Learning （SfCLRec） was proposed. The model was trained using a pre-training-formal training architecture. Firstly， the embedding representations of users and items were pre-trained to maintain the feature uniqueness of the nodes themselves by fusing the node self-features and a hierarchical contrastive learning task was introduced to mitigate the noisy information from the higher-order neighboring nodes. Then， the collaborative graph adjacency matrix was reconstructed according to the scoring mechanism in the formal training stage. Finally， the predicted score was obtained based on the final embedding. Compared with existing graph neural network recommendation models such as LightGCN and Simple Graph Contrastive Learning （SimGCL）， SfCLRec achieves the better recall and NDCG （Normalized Discounted Cumulative Gain） in three public datasets ML-latest-small， Last.FM and Yelp， validating the effectiveness of SfCLRec.

Table and Figures | Reference | Related Articles | Metrics

Select

Adaptive hybrid network for affective computing in student classroom

Yan RONG, Jiawen LIU, Xinlei LI

Journal of Computer Applications 2024, 44 (9): 2919-2930. DOI: 10.11772/j.issn.1001-9081.2023091303

Abstract （302）

HTML （5）

PDF （4730KB）（1114）

Save

Affective computing can provide a better teaching effectiveness and learning experience for intelligent education. Current research on affective computing in classroom domain still suffers from limited adaptability and weak perception on complex scenarios. To address these challenges， a novel hybrid architecture was proposed， namely SC-ACNet， aiming at accurate affective computing for students in classroom. In the architecture， the followings were included： a multi-scale student face detection module capable of adapting to small targets， an affective computing module with an adaptive spatial structure that can adapt to different facial postures to recognize five emotions （calm， confused， jolly， sleepy， and surprised） of students in classroom， and a self-attention module that visualized the regions of the model contributing most to the results. In addition， a new student classroom dataset， SC-ACD， was constructed to alleviate the lack of face emotion image datasets in classroom. Experimental results on SC-ACD dataset show that SC-ACNet improves the mean Average Precision （mAP） by 4.2 percentage points and the accuracy of affective computing by 9.1 percentage points compared with the baseline method YOLOv7. Furthermore， SC-ACNet has the accuracies of 0.972 and 0.994 on common sentiment datasets， namely KDEF and RaFD， validating the viability of the proposed method as a promising solution to elevate the quality of teaching and learning in intelligent classroom.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of neural architecture search

Renke SUN, Zhiyu HUANGFU, Hu CHEN, Zhongnian LI, Xinzheng XU

Journal of Computer Applications 2024, 44 (10): 2983-2994. DOI: 10.11772/j.issn.1001-9081.2023101374

Abstract （299）

HTML （32）

PDF （3686KB）（562）

Save

In recent years， deep learning has made breakthroughs in many fields due to its powerful representation capability， and the architecture of neural network is crucial to the final performance. However， the design of high-performance neural network architecture heavily relies on the priori knowledge and experience of the researchers. Because there are a lot of parameters for neural networks， it is difficult to design optimal neural network architecture. Therefore， automated Neural Architecture Search （NAS） gains significant attention. NAS is a technique that uses machine learning to automatically search for optimal network architecture without the need for a lot of human effort， and is an important means of future neural network design. NAS is essentially a search optimization problem， by designing search space， search strategy and performance evaluation strategy， NAS can automatically search the optimal network structure. Detailed and comprehensive analysis， comparison and summary for the latest research progress of NAS were provided from three aspects： search space， search strategy， and performance evaluation strategy， which facilitates readers to quickly understand the development process of NAS. And the future research directions of NAS were proposed.

Table and Figures | Reference | Related Articles | Metrics

Select

Help-seeking information extraction model for flood event in social media data

Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU

Journal of Computer Applications 2024, 44 (8): 2437-2445. DOI: 10.11772/j.issn.1001-9081.2023081080

Abstract （298）

HTML （4）

PDF （2708KB）（329）

Save

Because of data inconsistency and different information importance， how to extract desired information from the social media precisely and automatically becomes a challenging task. To solve the above problem， through Formal Concept Analysis （FCA）， word co-occurrence relationship and contextual semantics， the knowledge system of flood event was built up. Using the constructed knowledge system， a type of fine-tuned Large Language Model （LLM）， ChatFlowFlood， an information extraction model based on the TencentPretrain framework， was developed. The in-live disaster information such as locations and material shortage could be extracted only with few mannual annotations. Based on the information extraction model， Fuzzy Analytic Hierarchy Process （FAHP） and CRITIC （CRiteria Importance Through Intercriteria Correlation） methods were combined to evaluate the rescue priority of help-seeking information subjectively and objectively， which helped decision makers understand the emergency degree of the disaster. The experimental results show that on Chinese social media data， compared with the ChatFlow-7B model， the F_BERT index of the ChatFlowFlood model is improved by 73.09%.

Table and Figures | Reference | Related Articles | Metrics

Select

Multivariate time series anomaly detection based on multi-domain feature extraction

Pei ZHAO, Yan QIAO, Rongyao HU, Xinyu YUAN, Minyue LI, Benchu ZHANG

Journal of Computer Applications 2024, 44 (11): 3419-3426. DOI: 10.11772/j.issn.1001-9081.2023111636

Abstract （297）

HTML （4）

PDF （754KB）（1877）

PDF（mobile）（1807KB）（24）

Save

Due to the high dimensionality and the complex variable distribution of Multivariate Time Series （MTS） data， the existing anomaly detection models generally suffer from high error rates and training difficulties when dealing with MTS datasets. Moreover， most models only consider the spatial-temporal features of time series samples， which are not sufficient to learn the features of time series. To solve the above problems， a multivariate Time Series anomaly detection model based on Multi-domain Feature Extraction （MFE-TS） was proposed. Firstly， starting from the original data domain， the Long Short-Term Memory （LSTM） network and the Convolutional Neural Network （CNN） were used to extract the temporal correlation and spatial correlation features of the MTS respectively. Secondly， Fourier transform was used to convert the original time series into frequency domain space， and Transformer was used to learn the amplitude and phase features of the data in frequency domain space. Multi-domain feature learning was able to model time series features more comprehensively， thereby improving anomaly detection performance of the model to MTS. In addition， the masking strategy was introduced to further enhance the feature learning ability of the model and make the model have a certain degree of noise resistance. Experimental results show that MFE-TS has superior performance on multiple real MTS datasets， while it still maintain good detection accuracy on datasets with noise.

Table and Figures | Reference | Related Articles | Metrics

Select

Pixel-level unsupervised industrial anomaly detection based on multi-scale memory bank

Yongjiang LIU, Bin CHEN

Journal of Computer Applications 2024, 44 (11): 3587-3594. DOI: 10.11772/j.issn.1001-9081.2023111690

Abstract （293）

HTML （6）

PDF （1125KB）（116）

Save

Unsupervised anomaly detection methods based on feature embedding often use patch-level features to localize anomalies. Patch-level features are competitive in image-level anomaly detection tasks， but suffer from insufficient accuracy in pixel-level localization. To address this issue， MemAD， a pixel-level anomaly detection method composed of a multi-scale memory bank and a segmentation network， was proposed. Firstly， a pre-trained feature extraction network was used to extract features from normal samples in the training set， thereby constructing a normal sample feature memory bank at three scales. Then， during the training of the segmentation network， difference features between simulated pseudo-anomaly sample features and the nearest normal sample features in the memory bank were calculated， thereby further guiding the segmentation network to learn how to locate anomalous pixels. Experimental results show that MemAD achieves image-level and pixel-level AUC （Area Under the Receiver Operating Characteristic curve） of 0.980 and 0.974 respectively on MVTec AD （MVTec Anomaly Detection） dataset， outperforming most existing methods and confirming the accuracy of the proposed method in pixel-level anomaly localization.

Table and Figures | Reference | Related Articles | Metrics

Select

Safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction

Hailin XIAO, Tianyi HUANG, Qiuxiang DAI, Yuejun ZHANG, Zhongshan ZHANG

Journal of Computer Applications 2024, 44 (9): 2958-2963. DOI: 10.11772/j.issn.1001-9081.2023091266

Abstract （288）

HTML （5）

PDF （1903KB）（617）

Save

Deep reinforcement learning easily leads to unsafe actions in the training process due to its trial-and-error learning characteristics in decision-making problem of autonomous lane changing. Therefore， a safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction was proposed. Firstly， the future trajectories of the vehicles were predicted through probabilistic modeling of maximum likelihood estimation. Secondly， driving risk assessment was performed by using the obtained trajectory prediction and safety distance. And the safe actions were constrained according to the driving risk assessment results， which means that the action space was cut into the safe action space and the intelligent vehicle was guided to avoid dangerous actions. The proposed method was tested and compared with Deep Q-Network （DQN） and its improved methods in the freeway scene of simulation platform. Experimental results show that the proposed method can reduce the number of collisions by 47%-57% compared to other methods while ensuring fast convergence during intelligent vehicle training process， and thus improves the safety during training process effectively.

Table and Figures | Reference | Related Articles | Metrics

Select

Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network

Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI

Journal of Computer Applications 2024, 44 (9): 2952-2957. DOI: 10.11772/j.issn.1001-9081.2023081100

Abstract （286）

HTML （1）

PDF （1614KB）（789）

Save

Accurate prediction of port traffic flow is a challenging task due to its stochastic uncertainty and time-unsteady characteristics. In order to improve the accuracy of port traffic flow prediction， a port traffic flow prediction model based on knowledge graph and spatio-temporal diffusion graph convolution network， named KG-DGCN-GRU， was proposed， taking into account the external disturbances such as meteorological conditions and the opening and closing status of the port-adjacent highway. The factors related to the port traffic network were represented by the knowledge graph， and the semantic information of various external factors were learned from the port knowledge graph by using the knowledge representation method， and Diffusion Graph Convolutional Network （DGCN） and Gated Recurrent Unit （GRU） were used to effectively extract the spatio-temporal dependency features of the port traffic flow. The experimental results based on the Tianjin Port traffic dataset show that KG-DGCN-GRU can effectively improve the prediction accuracy through knowledge graph and diffusion graph convolutional network， the Root Mean Squared Error （RMSE） is reduced by 4.85% and 7.04% and the Mean Absolute Error （MAE） is reduced by 5.80% and 8.17%， compared with Temporal Graph Convolutional Network （T-GCN） and Diffusion Convolutional Recurrent Neural Network （DCRNN） under single step prediction （15 min）.

Table and Figures | Reference | Related Articles | Metrics

Select

Text-to-SQL model based on semantic enhanced schema linking

Xianglan WU, Yang XIAO, Mengying LIU, Mingming LIU

Journal of Computer Applications 2024, 44 (9): 2689-2695. DOI: 10.11772/j.issn.1001-9081.2023091360

Abstract （286）

HTML （25）

PDF （739KB）（1117）

Save

To optimize Text-to-SQL generation performance based on heterogeneous graph encoder， SELSQL model was proposed. Firstly， an end-to-end learning framework was employed by the model， and the Poincaré distance metric in hyperbolic space was used instead of the Euclidean distance metric to optimize semantically enhanced schema linking graph constructed by the pre-trained language model using probe technology. Secondly， K-head weighted cosine similarity and graph regularization method were used to learn the similarity metric graph so that the initial schema linking graph was iteratively optimized during training. Finally， the improved Relational Graph ATtention network （RGAT） graph encoder and multi-head attention mechanism were used to encode the joint semantic schema linking graphs of the two modules， and Structured Query Language （SQL） statement decoding was solved using a grammar-based neural semantic decoder and a predefined structured language. Experimental results on Spider dataset show that when using ELECTRA-large pre-training model， the accuracy of SELSQL model is increased by 2.5 percentage points compared with the best baseline model， which has a great improvement effect on the generation of complex SQL statements.

Table and Figures | Reference | Related Articles | Metrics

Most Read articles