Journal of Computer Applications

Select

Review of online education learner knowledge tracing

Yajuan ZHAO, Fanjun MENG, Xingjian XU

Journal of Computer Applications 2024, 44 (6): 1683-1698. DOI: 10.11772/j.issn.1001-9081.2023060852

Abstract （324）

HTML （21）

PDF （2932KB）（4029）

Save

Knowledge Tracing （KT） is a fundamental and challenging task in online education， and it involves the establishment of learner knowledge state model based on the learning history； by which learners can better understand their knowledge states， while teachers can better understand the learning situation of learners. The KT research for learners of online education was summarized. Firstly， the main tasks and historical progress of KT were introduced. Subsequently， traditional KT models and deep learning KT models were explained. Furthermore， relevant datasets and evaluation metrics were summarized， alongside a compilation of the applications of KT. In conclusion， the current status of knowledge tracing was summarized， and the limitations and future prospects for KT were discussed.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of application analysis and research progress of deep learning in weather forecasting

Runting DONG, Li WU, Xiaoying WANG, Tengfei CAO, Jianqiang HUANG, Qin GUAN, Jiexia WU

Journal of Computer Applications 2023, 43 (6): 1958-1968. DOI: 10.11772/j.issn.1001-9081.2022050745

Abstract （1606）

HTML （134）

PDF （1570KB）（3970）

Save

With the advancement of technologies such as sensor networks and global positioning systems， the volume of meteorological data with both temporal and spatial characteristics has exploded， and the research on deep learning models for Spatiotemporal Sequence Forecasting （STSF） has developed rapidly. However， the traditional machine learning methods applied to weather forecasting for a long time have unsatisfactory effects in extracting the temporal correlations and spatial dependences of data， while the deep learning methods can extract features automatically through artificial neural networks to improve the accuracy of weather forecasting effectively， and have a very good effect in encoding long-term spatial information modeling. At the same time， the deep learning models driven by observational data and Numerical Weather Prediction （NWP） models based on physical theories are combined to build hybrid models with higher prediction accuracy and longer prediction time. Based on these， the application analysis and research progress of deep learning in the field of weather forecasting were reviewed. Firstly， the deep learning problems in the field of weather forecasting and the classical deep learning problems were compared and studied from three aspects： data format， problem model and evaluation metrics. Then， the development history and application status of deep learning in the field of weather forecasting were looked back， and the latest progress in combining deep learning technologies with NWP was summarized and analyzed. Finally， the future development directions and research focuses were prospected to provide a certain reference for future deep learning research in the field of weather forecasting.

Table and Figures | Reference | Related Articles | Metrics

Select

Multimodal knowledge graph representation learning： a review

Chunlei WANG, Xiao WANG, Kai LIU

Journal of Computer Applications 2024, 44 (1): 1-15. DOI: 10.11772/j.issn.1001-9081.2023050583

Abstract （1274）

HTML （118）

PDF （3449KB）（3043）

Save

By comprehensively comparing the models of traditional knowledge graph representation learning， including the advantages and disadvantages and the applicable tasks， the analysis shows that the traditional single-modal knowledge graph cannot represent knowledge well. Therefore， how to use multimodal data such as text， image， video， and audio for knowledge graph representation learning has become an important research direction. At the same time， the commonly used multimodal knowledge graph datasets were analyzed in detail to provide data support for relevant researchers. On this basis， the knowledge graph representation learning models under multimodal fusion of text， image， video， and audio were further discussed， and various models were summarized and compared. Finally， the effect of multimodal knowledge graph representation on enhancing classical applications， including knowledge graph completion， question answering system， multimodal generation and recommendation system in practical applications was summarized， and the future research work was prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Embedded road crack detection algorithm based on improved YOLOv8

Huantong GENG, Zhenyu LIU, Jun JIANG, Zichen FAN, Jiaxing LI

Journal of Computer Applications 2024, 44 (5): 1613-1618. DOI: 10.11772/j.issn.1001-9081.2023050635

Abstract （1733）

HTML （68）

PDF （2002KB）（2446）

Save

Deploying the YOLOv8L model on edge devices for road crack detection can achieve high accuracy， but it is difficult to guarantee real-time detection. To solve this problem， a target detection algorithm based on the improved YOLOv8 model that can be deployed on the edge computing device Jetson AGX Xavier was proposed. First， the Faster Block structure was designed using partial convolution to replace the Bottleneck structure in the YOLOv8 C2f module， and the improved C2f module was recorded as C2f-Faster； second， an SE （Squeeze-and-Excitation） channel attention layer was connected after each C2f-Faster module in the YOLOv8 backbone network to further improve the detection accuracy. Experimental results on the open source road damage dataset RDD20 （Road Damage Detection 20） show that the average F1 score of the proposed method is 0.573， the number of detection Frames Per Second （FPS） is 47， and the model size is 55.5 MB. Compared with the SOTA （State-Of-The-Art） model of GRDDC2020 （Global Road Damage Detection Challenge 2020）， the F1 score is increased by 0.8 percentage points， the FPS is increased by 291.7%， and the model size is reduced by 41.8%， which realizes the real-time and accurate detection of road cracks on edge devices.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of multi-modal medical image segmentation based on deep learning

Meng DOU, Zhebin CHEN, Xin WANG, Jitao ZHOU, Yu YAO

Journal of Computer Applications 2023, 43 (11): 3385-3395. DOI: 10.11772/j.issn.1001-9081.2022101636

Abstract （1884）

HTML （93）

PDF （3904KB）（2386）

Save

Multi-modal medical images can provide clinicians with rich information of target areas （such as tumors， organs or tissues）. However， effective fusion and segmentation of multi-modal images is still a challenging problem due to the independence and complementarity of multi-modal images. Traditional image fusion methods have difficulty in addressing this problem， leading to widespread research on deep learning-based multi-modal medical image segmentation algorithms. The multi-modal medical image segmentation task based on deep learning was reviewed in terms of principles， techniques， problems， and prospects. Firstly， the general theory of deep learning and multi-modal medical image segmentation was introduced， including the basic principles and development processes of deep learning and Convolutional Neural Network （CNN）， as well as the importance of the multi-modal medical image segmentation task. Secondly， the key concepts of multi-modal medical image segmentation was described， including data dimension， preprocessing， data enhancement， loss function， and post-processing， etc. Thirdly， different multi-modal segmentation networks based on different fusion strategies were summarized and analyzed. Finally， several common problems in medical image segmentation were discussed， the summary and prospects for future research were given.

Table and Figures | Reference | Related Articles | Metrics

Select

Technology application prospects and risk challenges of large language models

Yuemei XU, Ling HU, Jiayi ZHAO, Wanze DU, Wenqing WANG

Journal of Computer Applications 2024, 44 (6): 1655-1662. DOI: 10.11772/j.issn.1001-9081.2023060885

Abstract （1291）

HTML （103）

PDF （1142KB）（2308）

Save

In view of the rapid development of Large Language Model （LLM） technology， a comprehensive analysis was conducted on its technical application prospects and risk challenges which has great reference value for the development and governance of Artificial General Intelligence （AGI）. Firstly， with representative language models such as Multi-BERT （Multilingual Bidirectional Encoder Representations from Transformer）， GPT （Generative Pre-trained Transformer） and ChatGPT （Chat Generative Pre-trained Transformer） as examples， the development process， key technologies and evaluation systems of LLM were reviewed. Then， a detailed analysis of LLM on technical limitations and security risks was conducted. Finally， suggestions were put forward for technical improvement and policy follow-up of the LLM. The analysis indicates that at a developing status， the current LLMs still produce non-truthful and biased output， lack real-time autonomous learning ability， require huge computing power， highly rely on data quality and quantity， and tend towards monotonous language style. They have security risks related to data privacy， information security， ethics， and other aspects. Their future developments can continue to improve technically， from “large-scale” to “lightweight”， from “single-modal” to “multi-modal”， from “general-purpose” to “vertical”； for real-time follow-up in policy， their applications and developments should be regulated by targeted regulatory measures.

Table and Figures | Reference | Related Articles | Metrics

Select

Research status and prospect of CT image ring artifact removal methods

Yaoyao TANG, Yechen ZHU, Yangchuan LIU, Xin GAO

Journal of Computer Applications 2024, 44 (3): 890-900. DOI: 10.11772/j.issn.1001-9081.2023030305

Abstract （396）

HTML （20）

PDF （1994KB）（2263）

Save

Ring artifact is one of the most common artifacts in various types of CT （Computed Tomography） images， which is usually caused by the inconsistent response of detector pixels to X-rays. Effective removal of ring artifacts， which is a necessary step in CT image reconstruction， will greatly improve the quality of CT images and enhance the accuracy of later diagnosis and analysis. Therefore， the methods of ring artifact removal （also known as ring artifact correction） were systematically reviewed. Firstly， the performance and causes of ring artifacts were introduced， and commonly used datasets and algorithm libraries were given. Secondly， ring artifact removal methods were divided into three categories to introduce. The first category was based on detector calibration. The second category was based on analytical and iterative solution， including projection data preprocessing， CT image reconstruction and CT image post-processing. The last category was based on deep learning methods such as convolutional neural network and generative adversarial network. The principle， development process， advantages and limitations of each method were analyzed. Finally， the technical bottlenecks of existing ring artifact removal methods in terms of robustness， dataset diversity and model construction were summarized， and the solutions were prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Federated learning survey：concepts， technologies， applications and challenges

Tiankai LIANG, Bi ZENG, Guang CHEN

Journal of Computer Applications 2022, 42 (12): 3651-3662. DOI: 10.11772/j.issn.1001-9081.2021101821

Abstract （3000）

HTML （205）

PDF （2464KB）（2216）

Save

Under the background of emphasizing data right confirmation and privacy protection， federated learning， as a new machine learning paradigm， can solve the problem of data island and privacy protection without exposing the data of all participants. Since the modeling methods based on federated learning have become mainstream and achieved good effects at present， it is significant to summarize and analyze the concepts， technologies， applications and challenges of federated learning. Firstly， the development process of machine learning and the inevitability of the appearance of federated learning were elaborated， and the definition and classification of federated learning were given. Secondly， three federated learning methods （including horizontal federated learning， vertical federated learning and federated transfer learning） which were recognized by the industry currently were introduced and analyzed. Thirdly， concerning the privacy protection issue of federated learning， the existing common privacy protection technologies were generalized and summarized. In addition， the recent mainstream open-source frameworks were introduced and compared， and the application scenarios of federated learning were given at the same time. Finally， the challenges and future research directions of federated learning were prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Application review of deep models in medical image segmentation: from U-Net to Transformer

Journal of Computer Applications DOI: 10.11772/j.issn.1001-9081.2023071059
Online available: 26 October 2023

Select

Survey of subgroup optimization strategies for intelligent algorithms

Xiaoxin DU, Wei ZHOU, Hao WANG, Tianru HAO, Zhenfei WANG, Mei JIN, Jianfei ZHANG

Journal of Computer Applications 2024, 44 (3): 819-830. DOI: 10.11772/j.issn.1001-9081.2023030380

Abstract （354）

HTML （11）

PDF （2404KB）（1799）

Save

The optimization of swarm intelligence algorithms is a main way to improve swarm intelligence algorithms. As the swarm intelligence algorithms are more and more widely used in all kinds of model optimization， production scheduling， path planning and other problems， the demand for performance of intelligent algorithms is also getting higher and higher. As an important means to optimize swarm intelligence algorithms， subgroup strategies can balance the global exploration ability and local exploitation ability flexibly， and has become one of the research hotspots of swarm intelligence algorithms. In order to promote the development and application of subgroup strategies， the dynamic subgroup strategy， the subgroup strategy based on master-slave paradigm， and the subgroup strategy based on network structure were investigated in detail. The structural characteristics， improvement methods and application scenarios of various subgroup strategies were expounded. Finally， the current problems and the future research trends and development directions of the subgroup strategies were summarized.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of recommendation system

Meng YU, Wentao HE, Xuchuan ZHOU, Mengtian CUI, Keqi WU, Wenjie ZHOU

Journal of Computer Applications 2022, 42 (6): 1898-1913. DOI: 10.11772/j.issn.1001-9081.2021040607

Abstract （2175）

HTML （202）

PDF （3152KB）（1755）

Save

With the continuous development of network applications， network resources are growing exponentially and information overload is becoming increasingly serious， so how to efficiently obtain the resources that meet the user needs has become one of the problems that bothering people. Recommendation system can effectively filter mass information and recommend the resources that meet the users needs. The research status of the recommendation system was introduced in detail， including three traditional recommendation methods of content-based recommendation， collaborative filtering recommendation and hybrid recommendation， and the research progress of four common deep learning recommendation models based on Convolutional Neural Network （CNN）， Deep Neural Network （DNN）， Recurrent Neural Network （RNN） and Graph Neural Network （GNN） were analyzed in focus. The commonly used datasets in recommendation field were summarized， and the differences between the traditional recommendation algorithms and the deep learning-based recommendation algorithms were analyzed and compared. Finally， the representative recommendation models in practical applications were summarized， and the challenges and the future research directions of recommendation system were discussed.

Table and Figures | Reference | Related Articles | Metrics

Select

Overview of cryptocurrency regulatory technologies research

Jiaxin WANG, Jiaqi YAN, Qian’ang MAO

Journal of Computer Applications 2023, 43 (10): 2983-2995. DOI: 10.11772/j.issn.1001-9081.2022111694

Abstract （520）

HTML （67）

PDF （911KB）（1743）

Save

With the help of blockchain and other emerging technologies， cryptocurrencies are decentralized， autonomous and cross-border. Research on cryptocurrency regulatory technologies is not only helpful to fight criminal activities based on cryptocurrencies， but also helpful to provide feasible supervision schemes for the expansion of blockchain technologies in other fields. Firstly， based on the application characteristics of cryptocurrency， the Generation， Exchange and Circulation （GEC） cycle theory of cryptocurrency was defined and elaborated. Then， the frequent international and domestic crimes based on cryptocurrencies were analyzed in detail， and the research status of cryptocurrency security supervision technologies in all three stages was investigated and surveyed as key point. Finally， the cryptocurrency regulatory platform ecology systems and current challenges faced by the regulatory technologies were summarized， and the future research directions of cryptocurrency regulatory technologies were prospected in order to provide reference for subsequent research.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of evolutionary multitasking from the perspective of optimization scenarios

Jiawei ZHAO, Xuefeng CHEN, Liang FENG, Yaqing HOU, Zexuan ZHU, Yew‑Soon Ong

Journal of Computer Applications 2024, 44 (5): 1325-1337. DOI: 10.11772/j.issn.1001-9081.2024020208

Abstract （502）

HTML （74）

PDF （1383KB）（1596）

Save

Due to the escalating complexity of optimization problems， traditional evolutionary algorithms increasingly struggle with high computational costs and limited adaptability. Evolutionary MultiTasking Optimization （EMTO） algorithms have emerged as a novel solution， leveraging knowledge transfer to tackle multiple optimization issues concurrently， thereby enhancing evolutionary algorithms’ efficiency in complex scenarios. The current progression of evolutionary multitasking optimization research was summarized， and different research perspectives were explored by reviewing existing literature and highlighting the notable absence of optimization scenario analysis. By focusing on the application scenarios of optimization problems， the scenarios suitable for evolutionary multitasking optimization and their fundamental solution strategies were systematically outlined. This study thus could aid researchers in selecting the appropriate methods based on specific application needs. Moreover， an in-depth discussion on the current challenges and future directions of EMTO were also presented to provide guidance and insights for advancing research in this field.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey on privacy-preserving technology for blockchain transaction

Qingqing XIE, Nianmin YANG, Xia FENG

Journal of Computer Applications 2023, 43 (10): 2996-3007. DOI: 10.11772/j.issn.1001-9081.2022101555

Abstract （484）

HTML （48）

PDF （2911KB）（1586）

Save

Blockchain ledger is open and transparent. Some attackers can obtain sensitive information through analyzing the ledger data. It causes a great threat to users’ privacy preservation of transaction. In view of the importance of blockchain transaction privacy preservation， the causes of the transaction privacy leakage were analyzed at first， and the transaction privacy was divided into two types： the transaction participator’s identity privacy and transaction data privacy. Then， in the perspectives of these two types of transaction privacy， the existing privacy-preserving technologies for blockchain transaction were presented. Next， in view of the contradiction between the transaction identity privacy preservation and supervision， transaction identity privacy preservation schemes considering supervision were introduced. Finally， the future research directions of the privacy-preserving technologies for blockchain transaction were summarized and prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of code similarity detection technology

Xiangjie SUN, Qiang WEI, Yisen WANG, Jiang DU

Journal of Computer Applications 2024, 44 (4): 1248-1258. DOI: 10.11772/j.issn.1001-9081.2023040551

Abstract （392）

HTML （19）

PDF （1868KB）（1466）

Save

Code reuse not only brings convenience to software development， but also introduces security risks， such as accelerating vulnerability propagation and malicious code plagiarism. Code similarity detection technology is to calculate code similarity by analyzing lexical， syntactic， semantic and other information between codes. It is one of the most effective technologies to judge code reuse， and it is also a program security analysis technology that has developed rapidly in recent years. First， the latest technical progress of code similarity detection was systematically reviewed， and the current code similarity detection technology was classified. According to whether the target code was open source， it was divided into source code similarity detection and binary code similarity detection. According to the different programming languages and instruction sets， the second subdivision was carried out. Then， the ideas and research results of each technology were summarized， the successful cases of machine learning technology in the field of code similarity detection were analyzed， and the advantages and disadvantages of existing technologies were discussed. Finally， the development trend of code similarity detection technology was given to provide reference for relevant researchers.

Table and Figures | Reference | Related Articles | Metrics

Select

Unsupervised log anomaly detection model based on CNN and Bi-LSTM

Chunyong YIN, Yangchun ZHANG

Journal of Computer Applications 2023, 43 (11): 3510-3516. DOI: 10.11772/j.issn.1001-9081.2022111738

Abstract （356）

HTML （11）

PDF （1759KB）（1464）

Save

Logs can record the specific status of the system during the operation， and automated log anomaly detection is critical to network security. Concerning the problem of low accuracy in anomaly detection caused by the evolution of log sentences over time， an unsupervised log anomaly detection model LogCL was proposed. Firstly， the log parsing technique was used to convert semi-structured log data into structured log templates. Secondly， the sessions and fixed windows were employed to divide log events into log sequences. Thirdly， quantitative characteristics of the log sequences were extracted， natural language processing technique was used to extract semantic features of log templates， and Term Frequency-Inverse Word Frequency （TF-IWF） algorithm was utilized to generate weighted sentence embedding vectors. Finally， the feature vectors were input into a parallel model based on Convolutional Neural Network （CNN） and Bi-directional Long Short-Term Memory （Bi-LSTM） network for detection. Experimental results on two public real datasets show that the proposed model improves the anomaly detection F1-score by 3.6 and 2.3 percentage points respectively compared with the baseline model LogAnomaly. Therefore， LogCL can perform effectively on log anomaly detection.

Table and Figures | Reference | Related Articles | Metrics

Select

Few-shot object detection algorithm based on Siamese network

Junjian JIANG, Dawei LIU, Yifan LIU, Yougui REN, Zhibin ZHAO

Journal of Computer Applications 2023, 43 (8): 2325-2329. DOI: 10.11772/j.issn.1001-9081.2022121865

Abstract （634）

HTML （51）

PDF （1472KB）（1413）

Save

Deep learning based algorithms such as YOLO （You Only Look Once） and Faster Region-Convolutional Neural Network （Faster R-CNN） require a huge amount of training data to ensure the precision of the model， and it is difficult to obtain data and the cost of labeling data is high in many scenarios. And due to the lack of massive training data， the detection range is limited. Aiming at the above problems， a few-shot object Detection algorithm based on Siamese Network was proposed， namely SiamDet， with the purpose of training an object detection model with certain generalization ability by using a few annotated images. Firstly， a Siamese network based on depthwise separable convolution was proposed， and a feature extraction network ResNet-DW was designed to solve the overfitting problem caused by insufficient samples. Secondly， an object detection algorithm SiamDet was proposed based on Siamese network， and based on ResNet-DW， Region Proposal Network （RPN） was introduced to locate the interested objects. Thirdly， binary cross entropy loss was introduced for training， and contrast training strategy was used to increase the distinction among categories. Experimental results show that SiamDet has good object detection ability for few-shot objects， and SiamDet improves AP₅₀ by 4.1% on MS-COCO 20-way 2-shot and 2.6% on PASCAL VOC 5-way 5-shot compared with the suboptimal algorithm DeFRCN （Decoupled Faster R-CNN）.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of visual object tracking methods based on Transformer

Ziwen SUN, Lizhi QIAN, Chuandong YANG, Yibo GAO, Qingyang LU, Guanglin YUAN

Journal of Computer Applications 2024, 44 (5): 1644-1654. DOI: 10.11772/j.issn.1001-9081.2023060796

Abstract （566）

HTML （22）

PDF （1615KB）（1413）

Save

Visual object tracking is one of the important tasks in computer vision， in order to achieve high-performance object tracking， a large number of object tracking methods have been proposed in recent years. Among them， Transformer-based object tracking methods become a hot topic in the field of visual object tracking due to their ability to perform global modeling and capture contextual information. Firstly， existing Transformer-based visual object tracking methods were classified based on their network structures， an overview of the underlying principles and key techniques for model improvement were expounded， and the advantages and disadvantages of different network structures were also summarized. Then， the experimental results of the Transformer-based visual object tracking methods on public datasets were compared to analyze the impact of network structure on performance. in which MixViT-L （ConvMAE） achieved tracking success rates of 73.3% and 86.1% on LaSOT and TrackingNet， respectively， proving that the object tracking methods based on pure Transformer two-stage architecture have better performance and broader development prospects. Finally， the limitations of these methods， such as complex network structure， large number of parameters， high training requirements， and difficulty in deploying on edge devices， were summarized， and the future research focus was outlooked， by combining model compression， self-supervised learning， and Transformer interpretability analysis， more kinds of feasible solutions for Transformer-based visual target tracking could be presented.

Table and Figures | Reference | Related Articles | Metrics

Select

Privacy-preserving federated learning algorithm based on blockchain in edge computing

Wanzhen CHEN, En ZHANG, Leiyong QIN, Shuangxi HONG

Journal of Computer Applications 2023, 43 (7): 2209-2216. DOI: 10.11772/j.issn.1001-9081.2022060909

Abstract （360）

HTML （24）

PDF （1974KB）（1407）

Save

Aiming at the problems of the leakage of model parameters， that the untrusted server may return wrong aggregation results， and the users participating in training may upload wrong or low-quality model parameters in the process of federated learning in edge computing scenarios， a privacy-preserving federated learning algorithm based on blockchain in edge computing was proposed. In the training process， firstly， the global model parameters were trained on the local dataset of each user by the users， and the model parameters obtained by training were uploaded to neighboring edge nodes through secret sharing， thereby protecting the local model parameters of the users. Secondly， the Euclidean distances between the shares of model parameters received by the edge nodes were computed， and the results of these calculations were uploaded to the blockchain. Finally， the Euclidean distances between model parameters were reconstructed by the blockchain， and then the global model parameter was aggregated after removing the poisoned updates. The security analysis proves the security of the proposed algorithm： even in the case of collusion of a part of edge nodes， the users’ local model parameter information will not be leaked. At the same time， the experimental results show the high accuracy of this algorithm： the accuracy of the proposed algorithm is 94.2% when the proportion of poisoned samples is 30%， which is close to the accuracy of the Federated Averaging （FedAvg） algorithm without poisoned samples （97.8%）， and the accuracy of FedAvg algorithm is decreased to 68.7% when the proportion of poisoned samples is 30%.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of multimodal pre-training models

Huiru WANG, Xiuhong LI, Zhe LI, Chunming MA, Zeyu REN, Dan YANG

Journal of Computer Applications 2023, 43 (4): 991-1004. DOI: 10.11772/j.issn.1001-9081.2022020296

Abstract （1738）

HTML （149）

PDF （5539KB）（1404）

PDF（mobile）（3280KB）（111）

Save

By using complex pre-training targets and a large number of model parameters， Pre-Training Model （PTM） can effectively obtain rich knowledge from unlabeled data. However， the development of the multimodal PTMs is still in its infancy. According to the difference between modals， most of the current multimodal PTMs were divided into the image-text PTMs and video-text PTMs. According to the different data fusion methods， the multimodal PTMs were divided into two types： single-stream models and two-stream models. Firstly， common pre-training tasks and downstream tasks used in validation experiments were summarized. Secondly， the common models in the area of multimodal pre-training were sorted out， and the downstream tasks of each model and the performance and experimental data of the models were listed in tables for comparison. Thirdly， the application scenarios of M6 （Multi-Modality to Multi-Modality Multitask Mega-transformer） model， Cross-modal Prompt Tuning （CPT） model， VideoBERT （Video Bidirectional Encoder Representations from Transformers） model， and AliceMind （Alibaba’s collection of encoder-decoders from Mind） model in specific downstream tasks were introduced. Finally， the challenges and future research directions faced by related multimodal PTM work were summed up.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of mean field theory for deep neural network

Mengmei YAN, Dongping YANG

Journal of Computer Applications 2024, 44 (2): 331-343. DOI: 10.11772/j.issn.1001-9081.2023020166

Abstract （497）

HTML （59）

PDF （1848KB）（1365）

Save

Mean Field Theory （MFT） provides profound insights to understand the operation mechanism of Deep Neural Network （DNN）， which can theoretically guide the engineering design of deep learning. In recent years， more and more researchers have started to devote themselves into the theoretical study of DNN， and in particular， a series of works based on mean field theory have attracted a lot of attention. To this end， a review of researches related to mean field theory for deep neural networks was presented to introduce the latest theoretical findings in three basic aspects： initialization， training process， and generalization performance of deep neural networks. Specifically， the concepts， properties and applications of edge of chaos and dynamical isometry for initialization were introduced， the training properties of overparameter networks and their equivalence networks were analyzed， and the generalization performance of various network architectures were theoretically analyzed， reflecting that mean field theory is a very important basic theoretical approach to understand the mechanisms of deep neural networks. Finally， the main challenges and future research directions were summarized for the investigation of mean field theory in the initialization， training and generalization phases of DNN.

Table and Figures | Reference | Related Articles | Metrics

Select

Semantic segmentation method for remote sensing images based on multi-scale feature fusion

Ning WU, Yangyang LUO, Huajie XU

Journal of Computer Applications 2024, 44 (3): 737-744. DOI: 10.11772/j.issn.1001-9081.2023040439

Abstract （401）

HTML （26）

PDF （2809KB）（1320）

Save

To improve the accuracy of semantic segmentation for remote sensing images and address the loss problem of small-sized target information during feature extraction by Deep Convolutional Neural Network （DCNN）， a semantic segmentation method based on multi-scale feature fusion named FuseSwin was proposed. Firstly， an Attention Enhancement Module （AEM） was introduced in the Swin Transformer to highlight the target area and suppress background noise. Secondly， the Feature Pyramid Network （FPN） was used to fuse the detailed information and high-level semantic information of the multi-scale features to complement the features of the target. Finally， the Atrous Spatial Pyramid Pooling （ASPP） module was used to capture the contextual information of the target from the fused feature map and further improve the model segmentation accuracy. Experimental results demonstrate that the proposed method outperforms current mainstream segmentation methods.The mean Pixel Accuracy （mPA） and mean Intersection over Union （mIoU） of the proposed method on Potsdam remote sensing dataset are 2.34 and 3.23 percentage points higher than those of DeepLabV3 method， and 1.28 and 1.75 percentage points higher than those of SegFormer method. Additionally， the proposed method was applied to identify and segment oyster rafts in high-resolution remote sensing images of the Maowei Sea in Qinzhou， Guangxi， and achieved Pixel Accuracy （PA） and Intersection over Union （IoU） of 96.21% and 91.70%， respectively.

Table and Figures | Reference | Related Articles | Metrics

Select

Few-shot object detection via fusing multi-scale and attention mechanism

Hongtian LI, Xinhao SHI, Weiguo PAN, Cheng XU, Bingxin XU, Jiazheng YUAN

Journal of Computer Applications 2024, 44 (5): 1437-1444. DOI: 10.11772/j.issn.1001-9081.2023050699

Abstract （296）

HTML （14）

PDF （2781KB）（1300）

Save

The existing two-stage few-shot object detection methods based on fine-tuning are not sensitive to the features of new classes， which will cause misjudgment of new classes into base classes with high similarity to them， thus affecting the detection performance of the model. To address the above issue， a few-shot object detection algorithm that incorporates multi-scale and attention mechanism was proposed， namely MA-FSOD （Few-Shot Object Detection via fusing Multi-scale and Attention mechanism）. Firstly， grouped convolutions and large convolution kernels were used to extract more class-discriminative features in the backbone network， and Convolutional Block Attention Module （CBAM） was added to achieve adaptive feature augmentation. Then， a modified pyramid network was used to achieve multi-scale feature fusion， which enables Region Proposal Network （RPN） to accurately find Regions of Interest （RoI） and provide more abundant high-quality positive samples from multiple scales to the classification head. Finally， the cosine classification head was used for classification in the fine-tuning stage to reduce the intra-class variance. Compared with the Few-Shot object detection via Contrastive proposal Encoding （FSCE） algorithm on PASCAL-VOC 2007/2012 dataset， the MA-FSOD algorithm improved AP₅₀ for new classes by 5.6 percentage points； and on the more challenging MSCOCO dataset， compared with Meta-Faster-RCNN， the APs corresponding to 10-shot and 30-shot were improved by 0.1 percentage points and 1.6 percentage points， respectively. Experimental results show that MA-FSOD can more effectively alleviate the misclassification problem and achieve higher accuracy in few-shot object detection than some mainstream few-shot object detection algorithms.

Table and Figures | Reference | Related Articles | Metrics

Select

Review on privacy-preserving technologies in federated learning

Teng WANG, Zheng HUO, Yaxin HUANG, Yilin FAN

Journal of Computer Applications 2023, 43 (2): 437-449. DOI: 10.11772/j.issn.1001-9081.2021122072

Abstract （1745）

HTML （165）

PDF （2014KB）（1296）

Save

In recent years， federated learning has become a new way to solve the problems of data island and privacy leakage in machine learning. Federated learning architecture does not require multiple parties to share data resources， in which participants only needed to train local models on local data and periodically upload parameters to the server to update the global model， and then a machine learning model can be built on large-scale global data. Federated learning architecture has the privacy-preserving nature and is a new scheme for large-scale data machine learning in the future. However， the parameter interaction mode of this architecture may lead to data privacy disclosure. At present， strengthening the privacy-preserving mechanism in federated learning architecture has become a new research hotspot. Starting from the privacy disclosure problem in federated learning， the attack models and sensitive information disclosure paths in federated learning were discussed， and several types of privacy-preserving techniques in federated learning were highlighted and reviewed， such as privacy-preserving technology based on differential privacy， privacy-preserving technology based on homomorphic encryption， and privacy-preserving technology based on Secure Multiparty Computation （SMC）. Finally， the key issues of privacy protection in federated learning were discussed， the future research directions were prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Lightweight gesture recognition algorithm for basketball referee

Zhongyu LI, Haodong SUN, Jiao LI

Journal of Computer Applications 2023, 43 (7): 2173-2181. DOI: 10.11772/j.issn.1001-9081.2022060810

Abstract （425）

HTML （39）

PDF （4447KB）（1290）

Save

Aiming at the problem that the number of parameters， calculation amount and accuracy of general gesture recognition algorithms are difficult to balance， a lightweight gesture recognition algorithm for basketball referee was proposed. The proposed algorithm was reconstructed on the basis of YOLOV5s （You Only Look Once Version 5s） algorithm： Firstly， the Involution operator was used to replace CSP1_1 （Cross Stage Partial 1_1） convolution operator to expand the context information capturing range and reduce the kernel redundancy. Secondly， the Coordinate Attention （CA） mechanism was added after the C3 module to obtain stronger gesture feature extraction ability. Thirdly， a lightweight content aware upsampling operator was used to improve the original upsampling module， and the sampling points were concentrated in the object area and the background part was ignored. Finally， the Ghost-Net with SiLU （Sigmoid Weighted Liner Unit） as the activation function was used for lightweight pruning. Experimental results on the self-made basketball referee gesture dataset show that the calculation amount， number of parameters and model size of this lightweight gesture recognition algorithm for basketball referee are 3.3 GFLOPs， 4.0×10⁶ and 8.5 MB respectively， which are only 79%， 44% and 40% of those of YOLOV5s algorithm， mAP@0.5 of the proposed algorithm is 91.7%， and the detection frame rate of the proposed algorithm on the game video with a resolution of 1 920×1 280 reaches 89.3 frame/s， verifying that the proposed algorithm can meet the requirements of low error， high detection rate and lightweight.

Table and Figures | Reference | Related Articles | Metrics

Select

Review on interpretability of deep learning

Xia LEI, Xionglin LUO

Journal of Computer Applications 2022, 42 (11): 3588-3602. DOI: 10.11772/j.issn.1001-9081.2021122118

Abstract （1640）

HTML （94）

PDF （1703KB）（1283）

Save

With the widespread application of deep learning， human beings are increasingly relying on a large number of complex systems that adopt deep learning techniques. However， the black?box property of deep learning models offers challenges to the use of these models in mission?critical applications and raises ethical and legal concerns. Therefore， making deep learning models interpretable is the first problem to be solved to make them trustworthy. As a result， researches in the field of interpretable artificial intelligence have emerged. These researches mainly focus on explaining model decisions or behaviors explicitly to human observers. A review of interpretability for deep learning was performed to build a good foundation for further in?depth research and establishment of more efficient and interpretable deep learning models. Firstly， the interpretability of deep learning was outlined， the requirements and definitions of interpretability research were clarified. Then， several typical models and algorithms of interpretability research were introduced from the three aspects of explaining the logic rules， decision attribution and internal structure representation of deep learning models. In addition， three common methods for constructing intrinsically interpretable models were pointed out. Finally， the four evaluation indicators of fidelity， accuracy， robustness and comprehensibility were introduced briefly， and the possible future development directions of deep learning interpretability were discussed.

Table and Figures | Reference | Related Articles | Metrics

Select

Summary of network intrusion detection systems based on deep learning

Miaolei DENG, Yupei KAN, Chuanchuan SUN, Haihang XU, Shaojun FAN, Xin ZHOU

Journal of Computer Applications 2025, 45 (2): 453-466. DOI: 10.11772/j.issn.1001-9081.2024020229

Abstract （186）

HTML （19）

PDF （1427KB）（1258）

Save

Security mechanisms such as Intrusion Detection System （IDS） have been used to protect network infrastructure and communication from network attacks. With the continuous progress of deep learning technology， IDSs based on deep learning have become a research hotspot in the field of network security gradually. Through extensive literature research， a detailed introduction to the latest research progress in network intrusion detection using deep learning technology was given. Firstly， a brief overview of several IDSs was performed. Secondly， the commonly used datasets and evaluation metrics in deep learning-based IDSs were introduced. Thirdly， the commonly used deep learning models in network IDSs and their application scenarios were summarized. Finally， the problems faced in the current related research were discussed， and the future development directions were proposed.

Table and Figures | Reference | Related Articles | Metrics

Select

Transformer based U-shaped medical image segmentation network： a survey

Liyao FU, Mengxiao YIN, Feng YANG

Journal of Computer Applications 2023, 43 (5): 1584-1595. DOI: 10.11772/j.issn.1001-9081.2022040530

Abstract （1704）

HTML （85）

PDF （1887KB）（1179）

Save

U-shaped Network （U-Net） based on Fully Convolutional Network （FCN） is widely used as the backbone of medical image segmentation models， but Convolutional Neural Network （CNN） is not good at capturing long-range dependency， which limits the further performance improvement of segmentation models. To solve the above problem， researchers have applied Transformer to medical image segmentation models to make up for the deficiency of CNN， and U-shaped segmentation networks combining Transformer have become the hot research topics. After a detailed introduction of U-Net and Transformer， the related medical image segmentation models were categorized by the position in which the Transformer module was located， including only in the encoder or decoder， both in the encoder and decoder， as a skip-connection， and others， the basic contents， design concepts and possible improvement aspects about these models were discussed， the advantages and disadvantages of having Transformer in different positions were also analyzed. According to the analysis results， it can be seen that the biggest factor to decide the position of Transformer is the characteristics of the target segmentation task， and the segmentation models of Transformer combined with U-Net can make better use of the advantages of CNN and Transformer to improve segmentation performance of models， which has great development prospect and research value.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of marine ship communication cybersecurity

Zhongdai WU, Dezhi HAN, Haibao JIANG, Cheng FENG, Bing HAN, Chongqing CHEN

Journal of Computer Applications 2024, 44 (7): 2123-2136. DOI: 10.11772/j.issn.1001-9081.2023070975

Abstract （310）

HTML （6）

PDF （3942KB）（1173）

Save

Maritime transportation is one of the most important modes of human transportation. Maritime cybersecurity is crucial to avoid financial loss and ensure shipping safety. Due to the obvious weakness of maritime cybersecurity， maritime cyberattacks are frequent. There are a lot of research literatures about maritime cybersecurity at domestic and abroad， but most of them have not been reviewed yet. The structures， risks and countermeasures of the maritime network were systematically organized and comprehensively introduced. On this basis， some suggestions were put forward to deal with the maritime cyberthreats.

Table and Figures | Reference | Related Articles | Metrics

Select

Research progress of blockchain‑based federated learning

Rui SUN, Chao LI, Wei WANG, Endong TONG, Jian WANG, Jiqiang LIU

Journal of Computer Applications 2022, 42 (11): 3413-3420. DOI: 10.11772/j.issn.1001-9081.2021111934

Abstract （1439）

HTML （102）

PDF （1086KB）（1140）

Save

Federated Learning （FL） is a novel privacy?preserving learning paradigm that can keep users' data locally. With the progress of the research on FL， the shortcomings of FL， such as single point of failure and lack of credibility， are gradually gaining attention. In recent years， the blockchain technology originated from Bitcoin has achieved rapid development， which pioneers the construction of decentralized trust and provides a new possibility for the development of FL. The existing research works on blockchain?based FL were reviewed， the frameworks for blockchain?based FL were compared and analyzed. Then， key points of FL solved by the combination of blockchain and FL were discussed. Finally， the application prospects of blockchain?based FL were presented in various fields， such as Internet of Things （IoT）， Industrial Internet of Things （IIoT）， Internet of Vehicles （IoV） and medical services.

Table and Figures | Reference | Related Articles | Metrics

Most Download articles