Journal of Computer Applications

Deep network compression method based on low-rank decomposition and vector quantization

Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG

2024, 44(7): 1987-1994. DOI: 10.11772/j.issn.1001-9081.2023071027

Asbtract ( )

HTML ( )

PDF (1506KB) ( )

Figures and Tables | References | Related Articles | Metrics

As the development of artificial intelligence， deep neural network has become an essential tool in various pattern recognition tasks. Deploying deep Convolutional Neural Networks （CNN） on edge computing equipment is challenging due to storage space and computing resource constraints. Therefore， deep network compression has become an important research topic in recent years. Low-rank decomposition and vector quantization are the most popular network compression techniques， which both try to find a compact representation of the original network， thereby reducing the redundancy of network parameters. By establishing a joint compression framework， a deep network compression method based on low-rank decomposition and vector decomposition — Quantized Tensor Decomposition （QTD） was proposed to obtain higher compression ratio by performing further quantization based on the low-rank structure of network. Experimental results of classical ResNet and the proposed method on CIFAR-10 dataset show that the volume can be compressed to 1% by QTD with a slight accuracy drop of 1.71 percentage points. Moreover， the proposed method was compared with the quantization-based method PQF （Permute， Quantize， and Fine-tune）， the low-rank decomposition-based method TDNR （Tucker Decomposition with Nonlinear Response）， and the pruning-based method CLIP-Q （Compression Learning by In-parallel Pruning-Quantization） on large dataset ImageNet. Experimental results show that QTD can maintain better classification accuracy with same compression range.

Enhanced deep subspace clustering method with unified framework

Qing WANG, Jieyu ZHAO, Xulun YE, Nongxiao WANG

2024, 44(7): 1995-2003. DOI: 10.11772/j.issn.1001-9081.2023101395

Asbtract ( )

HTML ( )

PDF (3432KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep subspace clustering is a method that performs well in processing high-dimensional data clustering tasks. However， when dealing with challenging data， current deep subspace clustering methods with fixed self-expressive matrix usually exhibit suboptimal clustering results due to the conventional practice of treating self-expressive learning and indicator learning as two separate and independent processes， and the quality of self-expressive matrix has a crucial impact on the accuracy of clustering results. To solve the above problems， an enhanced deep subspace clustering method with unified framework was proposed. Firstly， by integrating feature learning， self-expressive learning， and indicator learning together to optimize all parameters， the self-expressive matrix was dynamically learned based on the characteristics of the data， ensuring accurate capture of data features. Secondly， to improve the effects of self-representative learning， class prototype pseudo-label learning was proposed to provide self-supervised information for feature learning and indicator learning， thereby promoting self-expressive learning. Finally， to enhance the discriminative ability of embedded representations， orthogonality constraints were introduced to help achieve self-expressive attribute. The experimental results show that compared with AASSC （Adaptive Attribute and Structure Subspace Clustering network）， the proposed method improves clustering accuracy by 1.84， 0.49 and 0.34 percentage points on the MNIST， UMIST and COIL20 datasets. It can be seen that the proposed method improves the accuracy of self-representative matrix learning， thereby achieving better clustering effects.

Boundary-aware approach to machine reading comprehension

Qing LIU, Yanping CHEN, Anqi ZOU, Ruizhang HUANG, Yongbin QIN

2024, 44(7): 2004-2010. DOI: 10.11772/j.issn.1001-9081.2023081178

Asbtract ( )

HTML ( )

PDF (1315KB) ( )

Figures and Tables | References | Related Articles | Metrics

Existing methods for answer acquisition based on pre-trained language models may suffer from inaccuracies in predicting boundaries， a boundary-aware approach for span-based extraction Machine Reading Comprehension （MRC） is proposed to mitigate this issue. Firstly， special characters were introduced to mark the question boundary during the question input stage， enhancing the semantic information of the question to improve boundary perception. Secondly， during the answer prediction stage， an answer boundary regressor was constructed to facilitate semantic interaction between the perceived question boundary and the output of the predicted answer boundary. Lastly， the biased predicted answer boundary was further adjusted based on the post-interaction semantic information to calibrate the predicted answers. Experimental results demonstrate that when compared to the SpanBERT （Span-based Bidirectional Encoder Representation from Transformers）， the proposed method improves the F1 value by 0.2 percentage points and the Exact Match （EM） value by 0.9 percentage points on the public dataset SQuAD （Stanford Question Answering Dataset）1.1， it achieved improvements of 0.7 percentage points in both F1 score and EM value on the HotpotQA （Hotpot Question Answering） dataset， and it improved the F1 score by 2.8 percentage points and the EM value by 3.3 percentage points on the NewsQA （News Question Answering） dataset. The effectiveness of this method is rooted in its capacity to enhance the model’s perception of question boundary information and to accomplish the calibration of predicted answer boundary. Consequently， it results in an enhancement of system accuracy in applications such as intelligent question answering and intelligent customer service when dealing with text data comprehension and analysis.

Relation extraction model based on multi-scale hybrid attention convolutional neural networks

Yuan TANG, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN

2024, 44(7): 2011-2017. DOI: 10.11772/j.issn.1001-9081.2023081183

Asbtract ( )

HTML ( )

PDF (1983KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issue of insufficient extraction of semantic feature information with different scales and the lack of focus on crucial information when obtaining sentence semantic information by Convolutional Neural Network （CNN）-based relation extraction， a model for relation extraction based on a multi-scale hybrid attention CNN was proposed. Firstly， relation extraction was modeled as label prediction with two-dimensional representation. Secondly， by extracting and fusing multi-scale feature information， finer-grained multi-scale spatial information was obtained. Thirdly， through the combination of attention and convolution， the feature maps were refined adaptively to make the model concentrate on important contextual information. Finally， two predictors were used jointly to predict the relation labels between entity pairs. Experimental results demonstrate that the multi-scale hybrid convolutional attention model can capture multi-scale semantic feature information，And the key information in channels and spatial locations was captured by the channel attention and spatial attention by assigning appropriate weights， thereby improving the performance of relation extraction. The proposed model achieves F1 scores of 90.32% on SemEval （SemEval-2010 task 8） dataset， 70.74% on TACRED （TAC Relation Extraction Dataset）， 85.71% on Re-TACRED （Revised-TACRED）， and 89.66% on SciERC （Entities， Relations， and Coreference for Scientific knowledge graph construction）.

Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism

Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN

2024, 44(7): 2018-2025. DOI: 10.11772/j.issn.1001-9081.2023071051

Asbtract ( )

HTML ( )

PDF (2387KB) ( )

Figures and Tables | References | Related Articles | Metrics

In recent years， with the rapid development of deep learning technology， entity and relation extraction has made remarkable progress in many fields. However， due to complex syntactic structures and semantic relationships of Chinese text， there are still many challenges in Chinese entity and relation extraction. Among them， the problem of overlapping triple in Chinese text is one of the important challenges. A Hybrid Neural Network Entity and Relation Joint Extraction （HNNERJE） model was proposed in this article to address the issue of overlapping triple in Chinese text. HNNERJE model fused sequence attention mechanism and heterogeneous graph attention mechanism in a parallel manner， and combined them with a gated fusion strategy， so that it could capture both word order information and entity association information of Chinese text， and adaptively adjusted the output of subject and object markers， effectively solving the overlapping triple issue. Moreover， adversarial training algorithm was introduced to improve the model’s adaptability in processing unseen samples and noise. Finally， SHapley Additive exPlanations （SHAP） method was adopted to explain and analyze HNNERJE model， which effectively revealed key features in extracting entities and relations. HNNERJE model achieved high performance on NYT， WebNLG， CMeIE， and DuIE datasets with F1 score of 92.17%， 93.42%， 47.40%， and 67.98%， respectively. The experimental results indicate that HNNERJE model can transform unstructured text data into structured knowledge representations and effectively extract valuable information.

Triplet extraction method for mine electromechanical equipment field

Xindong YOU, Yingzi WEN, Xinpeng SHE, Xueqiang LYU

2024, 44(7): 2026-2033. DOI: 10.11772/j.issn.1001-9081.2023070943

Asbtract ( )

HTML ( )

PDF (1515KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the challenges of scarce domain-specific corpora， insufficient feature mining of relation types， and the presence of overlapping triplets in texts for electromechanical equipment domain， a triplet extraction method TBPA （Triplet extraction Based on Prompt and Antagonistic training） based on prompt learning with prior knowledge through iterative adversarial training was proposed. Firstly， the BERT （Bidirectional Encoder Representations from Transformers） model was fine-tuned on a self-constructed corpus to obtain feature vectors for input text. Then， an iterative adversarial training using the Projection Gradient Descent （PGD） method was conducted at the embedding layer to enhance the model’s resistance to perturbed samples and generalization ability to real samples. Furthermore， a single-layer head-tail pointer network was used to identify the head entity， and domain-specific prior features corresponding to the head entity were obtained by incorporating the word vectors with the prompt vectors predicted by the prompt learning templates. Finally， within a hierarchical annotation framework， another single-layer head-tail pointer network was employed to sequentially identify the tail entities associated with predefined relation types. In comparison with the baseline model CasRel， TBPA achieves improvements of 3.10， 6.12 and 4.88 percentage points in precision， recall， and F1 score， respectively. Experimental results demonstrate its advantages in triplet extraction tasks within the domain of mine electromechanical equipment.

Classification model of nuclear power equipment quality text based on improved recurrent pooling network

Qianhui LU, Yu ZHANG, Mengling WANG, Tingwei WU, Yuzhong SHAN

2024, 44(7): 2034-2040. DOI: 10.11772/j.issn.1001-9081.2023071005

Asbtract ( )

HTML ( )

PDF (1893KB) ( )

Figures and Tables | References | Related Articles | Metrics

The quality text of nuclear power equipment describes the quality defects and other issues that occur during the design， procurement， construction， and commissioning stages of nuclear power equipment. Due to the different frequencies of quality events occurring at different stages， and the existence of the same keywords and similar expressions in quality texts corresponding to the same equipment at different stages， an improved recurrent pooling network classification model was proposed by integrating regularization and feedback for focus loss function to address the quality text classification problems with imbalanced number of categories and semantic description coupling. Firstly， BERT （Bidirectional Encoder Representation from Transformers） was used to convert nuclear power equipment quality text into word vectors. Then， an improved three-layer recurrent pooling network classification model structure was proposed， which expanded the extraction space for parameter training by adding intermediate layers and selecting appropriate weights， and enhanced the ability to represent semantic features of quality defects. Next， regularization and feedback for focus loss function was proposed to train the parameters of the proposed classification model. To solve the problem of uneven gradient bias of imbalanced samples during the training process， the regularization term was used to make the gradient change of the loss function more stable， and the feedback term was used to iteratively adjust the loss function based on the error between the true value and the predicted value. Finally， the corresponding stages of nuclear power equipment quality events were calculated using a normalized exponential function. On the real dataset of a certain nuclear power company and a public dataset， F1 value of this model was 2 percentage points and 1 percentage point respectively higher than that of Fast_Text network. The experimental results show that the proposed model has high accuracy in text classification tasks.

Convolutional recurrent neural network optimized by multiple context vectors in EEG-based emotion recognition

Hao CHAO, Shuqi FENG, Yongli LIU

2024, 44(7): 2041-2046. DOI: 10.11772/j.issn.1001-9081.2023070970

Asbtract ( )

HTML ( )

PDF (1601KB) ( )

Figures and Tables | References | Related Articles | Metrics

Existing emotion recognition models based on Electroencephalogram （EEG） almost ignore differences in emotional states at different time periods， and fail to reinforce key emotional information. To solve the above problem， a Multiple Context Vectors optimized Convolutional Recurrent neural network （CR-MCV）was proposed. Firstly， feature matrix sequence of EEG signals was constructed to obtain spatial features of multi-channel EEG by Convolutional Neural Network （CNN）. Then， recurrent neural network based on multi-head attention was adopted to generate multiple context vectors for high-level abstract feature extraction. Finally， a fully connected layer was used for emotion classification. Experiments were carried out on DEAP （Database for Emotion Analysis using Physiological signals） dataset， and the classification accuracy in arousal and valence dimensions was 88.09% and 89.30%， respectively. Experimental results show that the CR-MCV can adaptively allocate attention of features and strengthen salient information related to emotion states based on utilization of electrode spatial position information and saliency characteristics of emotional states at different time periods.

Gaze estimation model based on multi-scale aggregation and shared attention

Sailong SHI, Zhiwen FANG

2024, 44(7): 2047-2054. DOI: 10.11772/j.issn.1001-9081.2023081172

Asbtract ( )

HTML ( )

PDF (1857KB) ( )

Figures and Tables | References | Related Articles | Metrics

Gaze estimation is a method for estimating 3D gaze directions from face images， where information about eye details directly related to gaze is concentrated in the face image and has a significant impact on gaze estimation. However， existing gaze estimation models ignore small-scale eye details and are easily overwhelmed by gaze-independent information in image features. For this reason， a model based on multi-scale aggregation and shared attention was proposed to enhance the representativeness of features. First， the omission of eye details in images by the model was dealt with by using shunted self-attention to aggregate eye and face information at different scales in an image and guiding the model to learn the correlation between objects at different scales； second， the attention to gaze-irrelevant features was reduced by establishing shared attention to capture shared features between images； and lastly， the combination of multi-scale aggregation and shared attention was used to further improve the accuracy of gaze estimation. On the public datasets MPIIFaceGaze， Gaze360， Gaze360_Processed， and GAFA-Head， the average angular errors of the proposed model are lower by 5.74%， 4.09%， 4.82%， and 10.55% compared to GazeTR （Gaze TRansformer）. For difficult images with back-to-camera on the Gaze360， the average angular error of the proposed model is lower by 4.70% compared to GazeTR. The experimental results show that the proposed model can effectively aggregate multi-scale gaze information and shared attention to improve the accuracy and robustness of gaze estimation.

Mobile robot 3D space path planning method based on deep reinforcement learning

Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG

2024, 44(7): 2055-2064. DOI: 10.11772/j.issn.1001-9081.2023060749

Asbtract ( )

HTML ( )

PDF (5732KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of high complexity and uncertainty in 3D unknown environment， a mobile robot 3D path planning method based on deep reinforcement learning was proposed， under a limited observation space optimization strategy. First， the depth map information was used as the agent’s input in the limited observation space， which could simulate complex 3D space environments with limited and unknown movement conditions. Second， a two-stage action selection policy in discrete action space was designed， including directional actions and movement actions， which could reduce the searching steps and time. Finally， based on the Proximal Policy Optimization （PPO） algorithm， the Gated Recurrent Unit （GRU） was added to combine the historical state information， to enhance the policy stability in unknown environments， so that the accuracy and smoothness of the planned path could be improved. The experimental results show that， compared with Advantage Actor-Critic （A2C）， the average search time is reduced by 49.07% and the average planned path length is reduced by 1.04%. Meanwhile， the proposed method can complete the multi-objective path planning tasks under linear sequential logic constraints.

Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network

Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG

2024, 44(7): 2065-2072. DOI: 10.11772/j.issn.1001-9081.2023071045

Asbtract ( )

HTML ( )

PDF (1969KB) ( )

Figures and Tables | References | Related Articles | Metrics

High-quality public traffic demand has become one of the major challenges for Intelligent Transportation Systems （ITS）. For public traffic demand prediction， most of existing models adopt graphs with fixed structure to describe the spatial correlation of traffic demand， ignoring that traffic demand has different spatial dependence at different scales. Thus， a Multi-scale Spatial-Temporal Graph Convolutional Network （MSTGCN） model was proposed for public traffic demand prediction. Firstly， global demand similarity graph and local demand similarity graph were constructed at global and local scales. Two graphs were used to capture long-term stable and short-term dynamic features of public traffic demand. Graph Convolutional Network （GCN） was introduced to extract global and local spatial information in two graphs； besides， attention mechanism was adopted to combine the two kinds of spatial information adaptively. Moreover， Gated Recurrent Unit （GRU） was used to capture time-varying features of public traffic demand. The experimental results show that the MSTGCN model achieves the Root Mean Squared Error （RMSE）， Mean Absolute Error （MAE）， and Pearson Correlation Coefficient （PCC） of 2.788 6， 1.737 1， and 0.799 2 on New York City （NYC） Bike dataset； and 9.573 4， 5.861 2， and 0.963 1 on NYC Taxi dataset. It proves that MSTGCN model can effectively mine multi-scale spatial-temporal features to accurately predict future public traffic demand.

Distance weighted discriminant analysis based on robust principal component analysis for matrix data

Junchi GE, Weihua ZHAO

2024, 44(7): 2073-2079. DOI: 10.11772/j.issn.1001-9081.2023070923

Asbtract ( )

HTML ( )

PDF (3828KB) ( )

Figures and Tables | References | Related Articles | Metrics

Distance Weighted Discrimination （DWD） is a widely used matrix data classification model. However， the model usually experiences significant performance degradation when severe noise contamination is present in the data. Robust Principal Component Analysis （RPCA） has become one of the effective ways to solve this problem due to its ability to separate the low-rank structure and sparse component of matrix data. Therefore， a Robust DWD for matrix data （RDWD-2D） model was proposed. In particular， the model performs robust principal component analysis on data in a supervised way， which can achieve the recovery and classification of clean data simultaneously. Experimental results on MNIST and COIL20 datasets show that in the case of matrix data contaminated with noise or missing values， the RDWD-2D model has the best data recovery capability and the highest classification accuracy compared with DWD-2D， RPCA+DWD and other models. Also， the RDWD-2D model demonstrates good robustness to the degree of data contamination.

DKP： defending against model stealing attacks based on dark knowledge protection

Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU

2024, 44(7): 2080-2086. DOI: 10.11772/j.issn.1001-9081.2023071056

Asbtract ( )

HTML ( )

PDF (1427KB) ( )

Figures and Tables | References | Related Articles | Metrics

In black-box scenarios， using model function stealing methods to generate piracy models has posed a serious threat to the security and intellectual property protection of models in the cloud. To solve the problem of existing model stealing defense techniques， such as perturbation and softening labels （variable temperature）， may cause the category with the maximum confidence value in the model output to change， thereby affecting the performance of the model in the original task， a model stealing defense method based on dark knowledge protection was proposed which was called DKP （defending against model stealing attacks based on Dark Knowledge Protection）. First， the cloud model to be protected was used to process the test samples， obtaining its initial confidence distribution vector. Then， a dark knowledge protection layer was added after model output layer， and the initial confidence distribution vector was perturbed through the partitioned temperature-regulated softmax mechanism. Finally， the defended confidence distribution vector was obtained， thus reducing the risk of model information leakage. The proposed method achieved significant defensive effects on four public datasets； especially on the blog dataset， the accuracy of the piracy model was reduced by 17.4 percentage points， while the method of noise perturbation of the posterior probability only reduced the accuracy of the piracy model by about 2 percentage points. The experimental results show that the proposed method solves the problems of existing active defense methods such as perturbation and softening labels， successfully reduces the accuracy of the piracy model by perturbing the category probability distribution features of the cloud model output without affecting the classification results， and achieves a reliable guarantee of the confidentiality of cloud model.

Consortium blockchain modification method based on chameleon hash and verifiable secret sharing

Baoyan SONG, Junxiang DING, Junlu WANG, Haolin ZHANG

2024, 44(7): 2087-2092. DOI: 10.11772/j.issn.1001-9081.2023081179

Asbtract ( )

HTML ( )

PDF (1806KB) ( )

Figures and Tables | References | Related Articles | Metrics

Blockchain has the characteristics of decentralization， tamper resistance， and traceability. The existing consortium blockchain systems would leave traces throughout the entire process after uploading data， which can not work when sensitive information or malicious data appears， or the blockchains tend to fork or interrupt after processing. Therefore， a consortium blockchain data modification method based on chameleon hash and verifiable secret sharing was proposed to address these issues. Firstly， the trap doors of the chameleon hash were redistributed to the identity nodes， thereby isolating the initiator of modification and actual modifier. Secondly， in order to ensure the correctness of redistribution values， the data corresponding to chameleon hash in different time periods were set as verifiable data， the commitment was uploaded by the verification node to verifiable data， and the secret shared value was verified by the proposal node through the commitment. Finally， to prevent nodes from committing wrongdoing， a data correction method based on a reward mechanism was proposed， which increased the enthusiasm of nodes to correct wrongdoing and reduced the possibility of wrongdoing. The experiments were carried out on the DApps dataset developed by InPlusLab， the research center for blockchain and intelligent finance at Sun Yat-sen University. Experimental results show that compared to the traditional chameleon hash method for modifying consortium blockchain data，when the number of malicious nodes reaches 30， the consortium blockchain modification method based on chameleon hash and verifiable secret sharing improves the efficiency of processing malicious nodes by 44.1%； and when the amount of malicious data reached 30， the processing time for malicious data was shortened by 53.7%.

Cloud data auditing scheme based on voting and Ethereum smart contracts

He HUANG, Yu JIN

2024, 44(7): 2093-2101. DOI: 10.11772/j.issn.1001-9081.2023071036

Asbtract ( )

HTML ( )

PDF (2265KB) ( )

Figures and Tables | References | Related Articles | Metrics

Ensuring cloud data integrity has become a security challenge that needs to be solved immediately. Widely-utilized blockchain technology provides a suitable solution to deal with this security challenge. The existing schemes combining blockchain and smart contract technology in which miners perform auditing validation work， suffer from low auditing efficiency， high communication overhead， and heavy auditing burden on Data Owner （DO）. In response to the above issues， a Cloud data Auditing Scheme based on Voting mechanism and Ethereum smart Contracts （CASVEC） was proposed. Firstly， a Decentralized Autonomous Organization （DAO） was designed and deployed on Ethereum by combining voting mechanism and smart contract technology. The nodes of DAO voted to elect an auditing node to replace miners for the auditing verification work， effectively solving the defect of low efficiency in validation audit proof phase. Besides， reputation value was designed to ensure fairness and reliability of the voting process. Secondly， only a few on-chain resources were used to store final auditing results to reduce data volume during communication process， thus effectively solving the problem of high communication overhead in validation audit proof phase. Furthermore， DO only needed to delegate one audit request and retrieve final audit result from DAO. In the above process， DO had no need to call smart contracts so frequently to exchange related information， lightening the auditing burden of DO. Finally， from the theoretical analysis and experimental result perspectives， it was verified that compared with current blockchain-based cloud auditing schemes， CASVEC can significantly reduce time overhead and communication overhead of validation audit proof phase， as well as DO time overhead of audit phase.

Generative data hiding algorithm based on multi-scale attention

Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG

2024, 44(7): 2102-2109. DOI: 10.11772/j.issn.1001-9081.2023070919

Asbtract ( )

HTML ( )

PDF (4109KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming to the problems of low embedding capacity and poor visual quality of the extracted secret images in existing generative data hiding algorithms， a generative data hiding algorithm based on multi-scale attention was proposed. First， a generator with dual encode-single decode based on multi-scale attention was designed. The features of the cover image and secret image were extracted independently at the encoding end in two branches， and fused at the decoding end by a multi-scale attention module. Skip connections were used to provide different scales of detail features， thereby ensuring high-quality of the stego-image. Second， self-attention module was introduced into the extractor of the U-Net structure to weaken the deep features of the cover image and enhance the deep features of the secret image. The skip connections were used to compensate for the detail features of the secret image， so as to improve the accuracy of the extracted secret data. At the same time， the adversarial training of the multi-scale discriminator and generator could effectively improve the visual quality of the stego-image. Experimental results show that the proposed algorithm can achieve an average Peak Signal-to-Noise Ratio （PSNR） and Structure Similarity Index Measure （SSIM） of 40.93 dB and 0.988 3 for the generated stego-images， and an average PSNR and SSIM of 30.47 dB and 0.954 3 for the extracted secret images under the embedding capacity of 24 bpp.

Key generation algorithm based on cyclic grouping of difference thresholds

Guiyong LI, Bin HE, Lei FANG

2024, 44(7): 2110-2115. DOI: 10.11772/j.issn.1001-9081.2023070917

Asbtract ( )

HTML ( )

PDF (2048KB) ( )

Figures and Tables | References | Related Articles | Metrics

For that channel features extracted by the two communication parties are significantly different， resulting in a low key generation rate and consistency in wireless communications， a Key Generation algorithm based on cyclic Grouping of difference Thresholds （TGKG） was proposed. First， the Channel State Information （CSI） was grouped with a threshold， by which the interval of data within the group was greater than the threshold， so that quantization error and hence key disagreement rate was reduced. To avoid inconsistent grouping due to data errors， circular grouping was performed on one side only. Meanwhile， the algorithm was applied to quantize real part and imaginary part of CSI， respectively， significantly increasing key generation rate. The advantages of the proposed algorithm was proved in terms of key disagreement rate， key generation rate and security by probabilistic and information theoretic formulas. The experimental results show that the key disagreement rate of the proposed algorithm for both communicating parties is about 2%， and the key generation rate is about 142% under the Signal-to-Noise Ratio （SNR） condition of 5 dB.

Reliability enhancement algorithm for physical unclonable function based on non-orthogonal discrete transform

Shiyang LI, Shaojie NI, Ding DENG, Lei CHEN, Honglei LIN

2024, 44(7): 2116-2122. DOI: 10.11772/j.issn.1001-9081.2023070936

Asbtract ( )

HTML ( )

PDF (2608KB) ( )

Figures and Tables | References | Related Articles | Metrics

A reliability enhancement algorithm for Physical Unclonable Function （PUF） was proposed to address the instability of PUF’s response caused by external and internal factors. The proposed algorithm is based on the Non-Orthogonal Discrete （NOD） transform. Firstly， a reorder mixer was designed to iteratively process the random seed vector and PUF response， resulting in the inner product of the non-orthogonal confusion matrix and the response confusion matrix， upon which the NOD spectrum was established. The algorithm effectively solved the bias of key caused by insufficient uniformity of PUF. Then， the partition encoding and decoding strategy enabled the NOD spectrum to have the ability to tolerate certain errors， significantly improving the reliability of the final response by limiting the impact of unstable responses to a limited range. Compared to traditional error correcting code-based methods， the proposed algorithm requires fewer auxiliary data. Experimental results on SRAM-XMC dataset show that， during 101 repeated experiments with 2 949 120 sets of 64-bit responses， the average reliability of the proposed algorithm reaches 99.97%， the uniqueness achieves 49.92%， and the uniformity reaches 50.61%. The experimental results demonstrate that the proposed algorithm can effectively improve reliability while ensuring uniformity and uniqueness of PUF responses.

Review of marine ship communication cybersecurity

Zhongdai WU, Dezhi HAN, Haibao JIANG, Cheng FENG, Bing HAN, Chongqing CHEN

2024, 44(7): 2123-2136. DOI: 10.11772/j.issn.1001-9081.2023070975

Asbtract ( )

HTML ( )

PDF (3942KB) ( )

Figures and Tables | References | Related Articles | Metrics

Maritime transportation is one of the most important modes of human transportation. Maritime cybersecurity is crucial to avoid financial loss and ensure shipping safety. Due to the obvious weakness of maritime cybersecurity， maritime cyberattacks are frequent. There are a lot of research literatures about maritime cybersecurity at domestic and abroad， but most of them have not been reviewed yet. The structures， risks and countermeasures of the maritime network were systematically organized and comprehensively introduced. On this basis， some suggestions were put forward to deal with the maritime cyberthreats.

Solution cluster structure analysis of random regular 3-satisfiability problems

Lichao PANG, Xiaofeng WANG, Zhixin XIE, Yi YANG, Xingyu ZHAO, Lan YANG

2024, 44(7): 2137-2143. DOI: 10.11772/j.issn.1001-9081.2023070940

Asbtract ( )

HTML ( )

PDF (2297KB) ( )

Figures and Tables | References | Related Articles | Metrics

Regular 3-SATisfiability （3-SAT） problem is an NP-hard problem. Studying alterations in the cluster structure of solutions to regular 3-SAT problem is to enhance the comprehension of the difficulty involved in problem determination and distribution of satisfiable solutions. However， existing analysis models only study a few discrete values near the cluster phase transition point. Under different constraint densities， there is lack of unified analysis model to describe the structural evolution of solution clusters. To solve this problem， a Phase transition Model of Solution cluster Structure （PMSS） was proposed. The main idea of this model is to obtain an initial solution of regular 3-SAT problem using WalkSAT algorithm and information propagation algorithm， construct a solution cluster of initial solutions by using random walks， and analyze the solution cluster. Modularity and community were used to measure community structure， and structural entropy was used to measure the complexity of the solution cluster structure. Experimental results show that PMSS can accurately analyze the structural evolution process of solution clusters， and the phase transition point of regular 3-SAT problem instances is between 13 and 14， which is consistent with the phase transition point obtained using Zchaff solver， further verifying the effectiveness of PMSS.

Experimental design and staged PSO-Kriging modeling based on weighted hesitant fuzzy set

Peigen GAO, Bin SUO

2024, 44(7): 2144-2150. DOI: 10.11772/j.issn.1001-9081.2023070982

Asbtract ( )

HTML ( )

PDF (1631KB) ( )

Figures and Tables | References | Related Articles | Metrics

Excessive experimental costs lead to fewer experimental sample points obtained for complex systems with nonlinear multipolar outputs， and lower accuracy of proxy models. An experimental design and modeling method based on priori information was proposed to address this issue. Priori information was utilized to divide experimental design regions， weighted hesitant fuzzy set was constructed for each region based on volatility indicator to increase the rationality of evaluating results. The number of experimental sample points was determined by combining volatility and range of each region， and the sample points were obtained by Hammersley sequence sampling. Then staged search Particle Swarm Optimization （PSO） algorithm was combined with Kriging method to improve the computational accuracy of proxy model. Finally， the damage model of simulating planar truss structure was used to verify the effectiveness of the proposed method. Experimental results show that the model goodness-of-fit of model established by the proposed method is improved by 0.84% and 4.94% on average and root mean squared error is reduced by 31.02% and 57.18% on average compared with the models built by Hammersley sequence sampling and Latin hypercube design.

Automatic foreign function interface generation method based on source code analysis

Shuo SUN, Wei ZHANG, Wendi FENG, Yuwei ZHANG

2024, 44(7): 2151-2159. DOI: 10.11772/j.issn.1001-9081.2023070968

Asbtract ( )

HTML ( )

PDF (3540KB) ( )

Figures and Tables | References | Related Articles | Metrics

Foreign Function Interface （FFI） is a fundamental method to invoke interfaces provided in other programming languages. Focusing on huge amount of manual coding required when using FFI， an Automatic Foreign function Interface Generation （AFIG） method was proposed. The reverse source code analysis technique based on abstract syntax tree was employed by AFIG to accurately retrieve the multilingual intermediate representation from library binaries， in which function interface information was uniformly described. Based on the representation， the multilingual conversion rule matrix could be utilized by different platform code generators to automatically generate FFI codes for various platforms without handcrafting. To further reduce generation time usage， a dependency analysis-based task aggregation strategy was proposed， by which tasks with dependencies were consolidated as monolithic ones. Hence， blocking and deadlocks were efficiently eliminated， and load balancing and scalability on multi-core systems were achieved， accordingly. Experimental results indicate that AFIG achieves a reduction of 98.14% for FFI developing codes and 41.95% for testing codes compared to manual coding method； under the same task， AFIG further reduces development cost by 61.27% compared to SWIG （Simplified Wrapper and Interface Generator）. And the code generation efficiency of AFIG increases linearly with the increase of computing resources.

Binary code identification based on user system call sequences

Haixiang HUANG, Shuanghe PENG, Ziyu ZHONG

2024, 44(7): 2160-2167. DOI: 10.11772/j.issn.1001-9081.2023070992

Asbtract ( )

HTML ( )

PDF (2494KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the low accuracy problem of binary code identification caused by compilation optimization， cross-compiler， obfuscation， etc.， UstraceDiff， an identification scheme based on user system call sequences， was proposed. First， to extract the sequences of user system calls and parameters of the binary codes， a dynamic binary instrumentation tool based on Intel Pin framework was designed. Second， the common sequences of system call sequences of two compared binary codes were obtained through sequence alignment， and a valid parameter table was designed to filter out valid system call parameters. Finally， an algorithm was proposed to evaluate the similarity of binary codes by combining the common sequences and valid parameters to calculate the homology score. UstraceDiff was evaluated by using the Coreutils dataset under four different compilation conditions. The results show that the average accuracy of UstraceDiff for homologous program identification is 35.1 percentage points and 55.4 percentage points higher than those of Bindiff and DeepBinDiff respectively， and the distinction effect for non-homologous programs of UstraceDiff is also better.

Local information based path selection algorithm for service migration

Runze TIAN, Yulong ZHOU, Hong ZHU, Gang XUE

2024, 44(7): 2168-2174. DOI: 10.11772/j.issn.1001-9081.2023070921

Asbtract ( )

HTML ( )

PDF (1021KB) ( )

Figures and Tables | References | Related Articles | Metrics

With rapid development of mobile edge computing， providing high-quality mobile services requires considering various factors that affect network communication diversifiedly according to the real-time changes of user mobility trajectories， and service migration paths should be dynamically planned. Addressing existing gaps in service migration path planning studies， particularly the lack of predictive models for user mobility trajectories in urban scenarios and low similarity between planned and actual user paths， an algorithm was proposed for service migration path selection based on real-time user movement trajectories. The user’s future movement trajectory was predicted through a trajectory prediction algorithm based on Long Short-Term Memory （LSTM） model and a road network matching algorithm based on Hidden Markov Model （HMM）. Then， according to predicted movement trajectory and status information of nearby local base stations， the optimal migration edge server was selected， thereby completing service migration path selection in urban scenarios. On the dataset constructed from taxi trajectory dataset and mobile base station status dataset in Shenzhen， compared to the improved Depth-First Search （DFS） algorithm， improved A^* algorithm， Matrix-based Dynamic Multi-Path Selection （MDMPS） algorithm and Grid Division-based Service Migration Path Selection （GDSMPS） algorithm， the proposed algorithm reduced the average service migration time by 34.8%， 44.5%， 24.9% and 12.7% respectively， and increased average path similarity by 26.2%， 49.7%， 14.3% and 4.7% respectively. On noise datasets and long path datasets， the proposed algorithm had the smallest fluctuation in average service migration time and the highest average trajectory similarity. Experimental results show that the proposed algorithm not only effectively reduces service migration time， enhances the similarity between migration path and user movement path， but also has good resistance to data noise and excellent long-distance path planning capability.

Dual-branch low-light image enhancement network combining spatial and frequency domain information

Dahai LI, Zhonghua WANG, Zhendong WANG

2024, 44(7): 2175-2182. DOI: 10.11772/j.issn.1001-9081.2023070933

Asbtract ( )

HTML ( )

PDF (3079KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of blurred texture details and color distortion in low-light image enhancement， an end-to-end lightweight dual-branch network by combining spatial and frequency information， named SAFNet， was proposed. Transformer-based spatial block and frequency block were adopted by SAFNet to process spatial information and Fourier transformed frequency information of input image in spatial and frequency branchs， respectively. Attention mechanism was also applied in SAFNet to fuse features captured from spatial and frequency branchs adaptively to obtain final enhanced image. Furthermore， a frequency-domain loss function for frequency information was added into joint loss function， in order to constrain SAFNet on both spatial and frequency domains. Experiments on public datasets LOL and LSRW were conducted to evaluate the performance of SAFNet. Experimed results show that SAFNet achieved 0.823， 0.114 in metrics of Structural SIMilarity （SSIM） and Learned Perceptual Image Patch Similarity （LPIPS） on LOL， respectively， and 17.234 dB， 0.550 in Peak Signal-to-Noise Ratio （PSNR） and SSIM on LSRW. SAFNet achieve supreme performance than evaluated mainstream methods， such as LLFormer （Low-Light Transformer）， IAT （Illumination Adaptive Transformer）， and KinD （Kindling the Darkness） ++ with only 0.07×10⁶ parameters. On DarkFace dataset， the average precision of human face detection is increased from 52.6% to 72.5% by applying SAFNet as preprocessing step. Above experimental results illustrate that SAFNet can effectively enhance low-light images quality and improve performance of low-light face detection for downstream tasks significantly.

Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention

Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG

2024, 44(7): 2183-2191. DOI: 10.11772/j.issn.1001-9081.2023070976

Asbtract ( )

HTML ( )

PDF (3503KB) ( )

Figures and Tables | References | Related Articles | Metrics

Existing infrared and visible image fusion models ignore illumination factors in fusion process and use conventional fusion strategies， leading to the fusion results with the loss of detail information and inconspicuous salient information. To solve these problems， a deep learning model for infrared and visible image fusion based on illumination weight allocation and attention was proposed. Firstly， an Illumination Weight Allocation Network （IWA-Net） was designed to estimate the illumination distribution and calculate illumination weights. Secondly， a CM-L1-norm fusion strategy was introduced to enhance the dependency between pixels and achieve smooth processing of salient features. Finally， a decoding network composed of fully convolutional layers was employed to reconstruct fused images. The results of the fusion experiments on publicly available datasets show that the proposed model outperforms the contrastive models， with improvements observed in all six selected evaluation metrics； specifically， the Spatial Frequency （SF） and Mutual Information （MI） metrics increase by 45% and 41% in average， respectively. The proposed model effectively reduces edge blurring and enhances clarity and contrast of the fused images. The fusion results of the proposed model exhibits superior performance in both subjective and objective aspects compared to other contrastive models.

Lightweight video salient object detection network based on spatiotemporal information

Song XU, Wenbo ZHANG, Yifan WANG

2024, 44(7): 2192-2199. DOI: 10.11772/j.issn.1001-9081.2023070926

Asbtract ( )

HTML ( )

PDF (2111KB) ( )

Figures and Tables | References | Related Articles | Metrics

There are two issues faced by existing Video Salient Object Detection （VSOD） networks： first， considerable computational overhead associated with acquisition of temporal information impedes the viable deployment of the network on edge devices； second， relatively constrained generalization capacity of the network renderes it inadequately equipped to effectively address challenge scenarios characterized by occlusion and motion blur within video content. Consequently， an innovative and resource-efficient VSOD network founded upon principles of dynamic filtering and contrastive learning was proposed. To begin with， a preliminary foreground feature sampling was performed on each frame to compute the similarity matrix， which was leveraged for weighted manipulation to effectively eliminate noise-related features. Following this， denoised foreground features were utilized for generation of parameters of the dynamic filter， which was then employed to execute convolutional operations on the original feature maps， thereby facilitating the extraction of foreground objects. Lastly， during training phase， a contrastive learning module was designed to aid network’s learning process， and notably， this module did not introduce supplementary computational overhead during inference phase. Extensive experimentations were conducted on three datasets including DAVIS， DAVSOD， and VOS. Experimental results show that the proposed network has close performance in F-measure， S-measure and Mean Absolute Error （MAE）， and the frame rate is increased from 28 frame/s to 38 frame/s which is increased by 35.7% compared with DCFNet （Dynamic Context-sensitive Filtering Network for video salient object detection）. The number of network parameters only have 15.6×10⁶， which is more conducive to deploy on the edge side in practical applications.

Tiny target detection based on improved VariFocalNet

Zhangjian JI, Na DU

2024, 44(7): 2200-2207. DOI: 10.11772/j.issn.1001-9081.2023071033

Asbtract ( )

HTML ( )

PDF (1781KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of small target size and little effective feature information in aerial photography scenes， an improved tiny target detection algorithm based on variable focal network VFNet （VariFocalNet） was proposed. Firstly， in order to enhance the feature representation capability for tiny targets， the Recurrent Layer Aggregation Network （RLANet） with better feature extraction performance was adopted as the backbone network， replacing ResNet. Next， a Feature Enhancement Module （FEM） was introduced to solve the problem of the top-level feature information loss when the feature pyramid was fused from top to bottom. Then， to solve the problem of unbalanced sample distribution in the label assignment of tiny targets in existing label allocation methods， in the improved VFNet， the label assignment stratery based on Gaussian receptive field was adopted. Finally， to reduce the sensitivity of position deviation for tiny targets， a boundingbox regression loss function， Wasserstein loss， was introduced to measure the similarity between the Gaussian distribution of predicted bounding box and that of groundtruth bounding box. The experimental results on the AI-TOD dataset demonstrate that the mean Average Precision （mAP） of the improved VFNet algorithm reaches 14.9%； compared with the previous VFNet， the detection mAP of tiny targets increases by 4.7 percentage points in aerial photography scenes.

Monocular 3D object detection method integrating depth and instance segmentation

Xun SUN, Ruifeng FENG, Yanru CHEN

2024, 44(7): 2208-2215. DOI: 10.11772/j.issn.1001-9081.2023070990

Asbtract ( )

HTML ( )

PDF (4804KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the limitations of monocular 3D object detection， when encountering changing object size due to changing perspective and occlusion， a new monocular 3D object detection method was proposed fusing depth information with instance segmentation masks. Firstly， with the help of the Depth-Mask Attention Fusion （DMAF） module， depth information was combined with instance segmentation masks to provide more accurate object boundaries. Secondly， dynamic convolution was introduced， and the fused features obtained from the DMAF module were used to guide the generation of dynamic convolution kernels for dealing with objects of different scales. Moreover， a 2D-3D bounding box consistency loss function was introduced into loss function， adjusting the predicted 3D bounding box to highly coincide with corresponding 2D detection box， thereby enhancing performance in instance segmentation and 3D object detection tasks. Lastly， the effectiveness of the proposed method was confirmed through ablation studies and validated on the KITTI test set. The results indicate that， compared to methods using only depth estimation maps and instance segmentation masks， the proposed method improves the average accuracy of vehicle detection under medium difficulty by 6.36 percentage points， and it outperforms comparative techniques like D4LCN （Depth-guided Dynamic-Depthwise-Dilated Local Convolutional Network） and M3D-RPN （Monocular 3D Region Proposal Network） in both 3D object detection and aerial view object detection tasks.

Structure segmentation model for 3D kidney images based on asymmetric multi-decoder and attention module

Zhe KONG, Han LI, Shaowei GAN, Mingru KONG, Bingtao HE, Ziyu GUO, Ducheng JIN, Zhaowen QIU

2024, 44(7): 2216-2224. DOI: 10.11772/j.issn.1001-9081.2023060773

Asbtract ( )

HTML ( )

PDF (2905KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of accurate segmentation difficulties for kidney structures caused by large differences between different structures， small sizes and thin structures of arteries and veins， and uneven grayscale distribution and artifacts in Computed Tomography Angiography （CTA） images， a kidney 3D structure segmentation model MDAUnet （MultiDecoder-Attention-Unet） based on multi-decoder and attention mechanism with CTA was proposed. Firstly， to address the problem that the network cannot share weights due to large differences between different structures， a multi-decoder structure was used to match different decoder branches for feature structures with different semantic structures. Secondly， to address the problem that it is difficult to segment blood vessels with small size and thin structure， an asymmetric spatial channel joint attention module was introduced to make the model more focused on tubular structures， and the learned feature information was simultaneously calibrated in spatial dimension and channel dimension. Finally， in order to ensure that the model paid enough attention to the vessel structure in back propagation， an improved WHRA （Weighted Hard Region Adaptation） loss was proposed as a loss function to dynamically maintain the inter-class balance of the vessel structure during training as well as to preserve the characteristics of the background information. In addition， in order to improve the contrast of the grayscale values of the feature map， the edge detection operator in traditional image processing was embedded into the pre-processing stage of the model， and the feature enhancement of the boundary of the region of interest to be segmented made the model more focused on the boundary information and suppressed the artifact information. The experimental results show that the Dice Similarity Coefficient （DSC）， Hausdorff Distance 95 （HD95） and Average Surface Distance （AVD） of the proposed MDAUnet model on the kidney structure segmentation task are 89.1%， 1.76 mm and 1.04 mm， respectively. Compared with suboptimal MGANet （Meta Greyscale Adaptive Network）， MDAUnet improves the DSC index by 1.2 percentage points； compared with suboptimal UNETR （UNEt TRansformers）， MDAUnet reduces HD95 and ASD indexes by 0.87 mm and 0.45 mm， respectively. It can be seen that MDAUnet can effectively improve the segmentation accuracy of the three-dimensional structure of the kidney， and help doctors to evaluate the condition objectively and effectively in clinical operations.

Handwriting identification method based on multi-scale mixed domain attention mechanism

Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG

2024, 44(7): 2225-2232. DOI: 10.11772/j.issn.1001-9081.2023071018

Asbtract ( )

HTML ( )

PDF (2275KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the task of handwriting identification， the large area of image is background， handwriting information is sparse， key information is difficult to capture， and personal handwriting signature style has slight changes and handwriting imitated is highly similar， as well as there is few public Chinese handwriting datasets. By improving attention mechanism and Siamese network model， a handwriting identification method based on Multi-scale and Mixed Domain Attention mechanism （MMDANet） was proposed. Firstly， a maximum pooling layer was connected in parallel to the effective channel attention module， and to extend the number of channels of two-dimensional strip pooling module to three dimensions. The improved effective channel attention module and strip pooling module were fused to generate a Mixed Domain Module （MDM）， thereby solving the problems that large area of handwriting image is background， handwriting information is sparse and detailed features are difficult to extract. Secondly， the Path Aggregation Network （PANet） feature pyramid was used to extract features at multiple scales to capture the subtle differences between true and false handwriting， and the comparison loss of Siamese network and Additive Margin Softmax （AM-Softmax） loss were weightedly fused for training to increase the discrimination between categories and solve the problem of personal handwriting style variation and high similarity between true and false handwriting. Finally， a Chinese Handwriting Dataset （CHD） with a total sample size of 8 000 was self-made. The accuracy of the proposed method on the Chinese dataset CHD reached 84.25%； and compared with the suboptimal method Two-stage Siamese Network （Two-stage SiamNet）， the proposed method increased the accuracy by 4.53%， 1.02% and 1.67% respectively on three foreign language datasets Cedar， Bengla and Hindi. The experimental results show that the MMDANet can more accurately capture the subtle differences between true and false handwriting， and complete complex handwriting identification tasks.

Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network

Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN

2024, 44(7): 2233-2242. DOI: 10.11772/j.issn.1001-9081.2023070918

Asbtract ( )

HTML ( )

PDF (4123KB) ( )

Figures and Tables | References | Related Articles | Metrics

The method of obtaining regression density map based on Convolutional Neural Network （CNN） has become the mainstream method of crowd counting and locating， however， there are still two problems in the existing methods. Firstly， density maps obtained by traditional methods have adhesion and overlap problems in crowded areas， which leads to mistakes in final crowd counting and locating of the network. Secondly， due to weight invariance of conventional convolution， it is difficult to achieve adaptive extraction of image features and to process images with complex background and uneven crowd density distribution. To solve these above problems， a method for counting and locating dense crowds was proposed based on Pixel Distance Map （PDMap） and Four-Dimensional Dynamic Convolutional Network（FDDCNet）. Firstly， a new PDMap was defined， which used the spatial distance relationship between pixel level points to enhance the smoothness of pixels around the center point of human head through reverse operation， hence solving the problem of adhesion and overlap in crowded areas. Secondly， an FDDC module was designed to adaptively change the weights of the four-dimensions of convolutions， extract the prior knowledge provided by different views to deal with the challenge of counting and locating difficulties caused by complex scenes and uneven distribution， improving the generalization ability and robustness of the model. Finally， the threshold value was used to filter local uncertain predicted value to further improve the accuracy of counting and locating. On the test set of NWPU-Crowd dataset： in terms of crowd counting， the Mean Absolute Error （MAE） and Mean Squared Error （MSE） of the proposed method were 82.4 and 334.7， respectively， which were 8.7% and 26.9% lower than those of Multi-scale Feature Pyramid Network （MFP-Net）； and in terms of crowd locating， The comprehensive evaluation indicators F1 value and precision of the proposed method were 71.2% and 73.6%， respectively， which were 3.0% and 5.9% lower than those of Topological Count （TopoCount）. The experimental results show that the proposed method can process dense crowd images with complex background， and achieve high counting accuracy and locating accuracy.

Crowd counting algorithm with multi-scale fusion based on normal inverse Gamma distribution

Wei LI, Xiaorong ZHANG, Peng CHEN, Qing LI, Changqing ZHANG

2024, 44(7): 2243-2249. DOI: 10.11772/j.issn.1001-9081.2023060782

Asbtract ( )

HTML ( )

PDF (2793KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problem of large variation caused by different distances between monitoring camera and crowd in the crowd analysis tasks， a crowd counting algorithm with multi-scale fusion based on normal inverse Gamma distribution was proposed， named MSF （Multi-Scale Fusion crowd counting） algorithm. Firstly， the common features were extracted with the traditional backbone， and then the pedestrian information of different scales was obtained with the multi-scale information extraction module. Secondly， a crowd density estimation module and an uncertainty estimation module for evaluating the reliability of the prediction results of each scale were contained in each scale network. Finally， more accurate density regression results were obtained by dynamically fusing the multi-scale prediction results based on the reliability in the multi-scale prediction fusion module. The experimental results show that after the expansion of the existing method Converged Scene Recognition Network （CSRNet） by multi-scale trusted fusion， the Mean Absolute Error （MAE） and Mean Squared Error （MSE） of crowd counting on UCF-QNRF dataset are significantly decreased by 4.43% and 1.37%， respectively， which verifies the rationality and effectiveness of MSF algorithm. In addition， different from the existing methods， the MSF algorithm can not only predict the crowd density， but also provide the reliability of the prediction during the deployment stage， so that the inaccurate areas predicted by the algorithm can be timely warned in practical applications， reducing the wrong prediction risks in subsequent analysis tasks.

Gait recognition algorithm based on multi-layer refined feature fusion

Ruihua LIU, Zihe HAO, Yangyang ZOU

2024, 44(7): 2250-2257. DOI: 10.11772/j.issn.1001-9081.2023070977

Asbtract ( )

HTML ( )

PDF (2168KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep learning has further promoted the research of gait recognition algorithms， but there are still some problems， such as ignoring the detailed information extracted by shallow networks， and difficulty in fusing unlimited time-space information of gait videos. In order to effectively utilize shallow features and fuse time-space features， a cross-view gait recognition algorithm based on multi-layer refined feature fusion was proposed. The proposed algorithm was consisted of two parts： Edge Motion Capture Module （EMCM） was used to extract edge motion features containing temporal information， Multi-layer refined Feature Extraction Module （MFEM） was used to extract multi-layer fine features containing global and local information at different granularities. Firstly， EMCM and MFEM were used to extract multi-layer fine features and edge motion features. Then， the features extracted from the two modules were fused to obtain discriminative gait features. Finally， comparative experiments were conducted in multiple scenarios on the public datasets CASIA-B and OU-MVLP. The average recognition accuracy on CASIA-B can reach 89.9%， which is improved by 1.1 percentage points compared with GaitPart. The average recognition accuracy is improved by 3.0 percentage points over GaitSet in the 90-degree view of the OU-MVLP dataset. The proposed algorithm can effectively improve the accuracy of gait recognition in many situations.

Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation

Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU

2024, 44(7): 2258-2263. DOI: 10.11772/j.issn.1001-9081.2023071017

Asbtract ( )

HTML ( )

PDF (2845KB) ( )

Figures and Tables | References | Related Articles | Metrics

Head pose estimation has been extensively studied in various fields. However， in the medical field， the research on utilizing head pose estimation for monitoring patient recovery issues in the Post-Anesthesia Care Unit （PACU） is limited. Existing approaches， such as Learning Fine-Grained Structure Aggregation （FSA-Net） for head pose estimation from a single image， suffer from poor convergence and overfitting problems. To address these issues， three publicly available datasets， 300W-LP， AFLW2000 and BIWI， were used to monitor the head movements of patients during anesthesia resuscitation， and a method for classifying the amplitude of patient head movements based on the estimation of head posture was proposed. Firstly， the activation function Rectifier Linear Unit （ReLU） of one of the streams of FSA-Net was replaced with a Leakage-Rectifier Linear Unit （LeakyReLU） to optimize the convergence of the model， and Adam Weight decay optimizer （AdamW） was employed instead of Adaptive Moment Estimation （Adam） to mitigate overfitting. Secondly， the magnitude of head movements during patient anesthesia resuscitation was classified into three categories： small， medium， and large movements. Finally， the collected data was visualized using Hypertext Preprocessor （PHP）， EnterpriseCharts （EChart）， and PostgreSQL to provide real-time monitoring graphs of patient head movements. The experimental results show that the mean absolute error of the improved FSA-Net is reduced by 0.334° and 0.243° compared to the mean absolute error of the original FSA-Net on the AFLW2000 dataset and the BIWI dataset， respectively. Thus， the improved model demonstrates practical effectiveness in anaesthesia resuscitation monitoring and serves as a valuable tool for healthcare professionals to make decisions regarding patient anaesthesia resuscitation.

Road damage detection algorithm based on enhanced feature extraction

Wudan LONG, Bo PENG, Jie HU, Ying SHEN, Danni DING

2024, 44(7): 2264-2270. DOI: 10.11772/j.issn.1001-9081.2023070956

Asbtract ( )

HTML ( )

PDF (2806KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to the challenge posed by the difficulty in detecting small road damage areas and the uneven distribution of damage categories， a road damage detection algorithm termed RDD-YOLO was introduced based on the YOLOv7-tiny architecture. Firstly， the K-means++ algorithm was employed to determine anchor boxes better conforming to object dimensions. Subsequently， a Quantization Aware RepVGG （QARepVGG） module was utilized within the auxiliary detection branch， thereby enhancing the extraction of shallow features. Concurrently， an Addition and Multiplication Convolutional Block Attention Module （AM-CBAM） was embedded into the three inputs of the neck， effectively suppressing disturbances arising from intricate background. Furthermore， the feature fusion module Res-RFB （Resblock with Receptive Field Block） was devised to emulate the expansion of receptive field in human visual perception， consequently fusing information across multiple scales and thereby amplifying representational aptitude. Additionally， a lightweight Small Decoupled Head （S-DeHead） was introduced to elevate the precision of detecting small objects. Ultimately， the process of localizing small objects was optimized through the application of the Normalized Wasserstein Distance （NWD） metric， which in turn mitigated the challenge of imbalanced samples. Experimental results show that RDD-YOLO algorithm achieves a notable 6.19 percentage points enhancement in mAP₅₀， a 5.31 percentage points elevation in F1-Score and the detection velocity of 135.26 frame/s by only increasing 0.71×10⁶ parameters and 1.7 GFLOPs， which can meet the requirements for both accuracy and speed in road maintenance.

Lightweight algorithm for impurity detection in raw cotton based on improved YOLOv7

Yongjin ZHANG, Jian XU, Mingxing ZHANG

2024, 44(7): 2271-2278. DOI: 10.11772/j.issn.1001-9081.2023070969

Asbtract ( )

HTML ( )

PDF (8232KB) ( )

Figures and Tables | References | Related Articles | Metrics

Addressing the challenges posed by high throughput of raw cotton and long impurity inspection duration in cotton mills， an improved YOLOv7 model incorporating lightweight modifications was proposed for impurity detection in raw cotton. Initially， redundant convolutional layers within YOLOv7 model were pruned， thereby increasing detection speed. Following this， FasterNet convolutional layer was integrated into the primary network to mitigate model computational load， diminish redundancy in feature maps， and consequently realized real-time detection. Ultimately， CSP-RepFPN （Cross Stage Partial networks with Replicated Feature Pyramid Network） was used within neck network to facilitate the reconstruction of feature pyramid， augment flow of feature information， minimize feature loss， and elevate the detection precision. Experimental results show that， the improved YOLOv7 model achieves a detection mean Average Precison of 96.0%， coupled with a 37.5% reduction in detection time on self-made raw cotton impurity dataset； and achieves a detection accuracy of 82.5% with a detection time of only 29.8 ms on publicly DWC （Drinking Waste Classification） dataset. This improved YOLOv7 model provides a lightweight approach for real-time detection， recognition and classification of impurities in raw cotton， yielding substantial time savings.

High-order internal model based robust iterative learning control with varying trajectory

Junhao ZHENG, Hongfeng TAO

2024, 44(7): 2279-2284. DOI: 10.11772/j.issn.1001-9081.2023070971

Asbtract ( )

HTML ( )

PDF (2195KB) ( )

Figures and Tables | References | Related Articles | Metrics

For a class of discrete linear systems with non-repetitive uncertainties， a robust indirect-type Iterative Learning Control （ILC） algorithm based on dual-loop control structure was proposed to track the batch-varying trajectories. In the inner-loop， a Proportional-Integral （PI） type feedback controller was designed to ensure the stability of the closed-loop system along the time axis， realizing the fast tracking in the first few batches. In the outer-loop， a High-Order Internal Model （HOIM） was used to describe the pattern of varying trajectories， a HOIM based Proportional （P） type ILC controller was designed to improve the tracking performance along the batch axis， realizing the accurate tracking of the varying trajectories. For the disturbance caused by non-repetitive uncertainties， a performance criterion function was designed and the system model was transformed into a repetitive process model under the effect of the indirect-type ILC controller. Based on the stability theory of repetitive process model， the conditions of stability along the batch axis were transformed into Linear Matrix Inequality （LMI） conditions. Finally， the effectiveness of proposed algorithm was verified through the simulation of a class of permanent magnet motor.

Trajectory tracking of caster-type omnidirectional mobile platform based on MPC and PID

Huaxia LI, Xiaorong HUANG, Anlin SHEN, Peng JIANG, Yiqiang PENG, Liqi SUI

2024, 44(7): 2285-2293. DOI: 10.11772/j.issn.1001-9081.2023071003

Asbtract ( )

HTML ( )

PDF (4693KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that existing motion control strategies cannot guarantee high-precision control for independently driven caster-type omnidirectional mobile platform， a double closed-loop trajectory tracking control strategy was proposed by combining Model Predictive Control （MPC） and Proportion Integral Differential （PID） control. Firstly， kinematic geometric relationship was used to establish three-degree-of-freedom kinematic model of independently driven caster-type omnidirectional mobile platform in world coordinate system， and based on orthogonal decomposition method， inverse kinematic model of platform in robot coordinate system was established to reflect the relationship between center point speed of platform and rotation speed of each caster. Secondly， MPC was used to design position controller based on three-degree-of-freedom kinematic model， so that platform could track positions of desired trajectory， and the optimal control quantity was solved through the position controller while taking multi-objective constraints into account. Finally， PID was used to design speed controller to track desired speed output by position controller. Desired caster speed was calculated through the inverse kinematic model of platform， thereby driving platform to achieve omnidirectional motion. The effectiveness of the proposed control strategy was verified through simulation， and the platform could effectively track linear trajectories and circular trajectories. Simulation results show that compared with position single-loop trajectory tracking control strategy that decouples drive caster speed through angle inverse kinematic model of platform， the system overshoot is reduced by 97.23% and the response time is shortened by 36.84% after adding speed inner loop.

Trajectory tracking control of wheeled mobile robots under side-slip and slip

Ying HU, Zhihuan CHEN

2024, 44(7): 2294-2300. DOI: 10.11772/j.issn.1001-9081.2023070898

Asbtract ( )

HTML ( )

PDF (3498KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that traditional sliding mode control is prone to insufficient convergence accuracy and control input jitter due to model uncertainty， external disturbances and other perturbations in trajectory control of Wheeled Mobile Robot （WMR） under unknown side-slipping and slipping disturbances， an Adaptive Second-Order Sliding Mode tracking Control method based on Nonlinear Extended State Observer （ASOSMC-NESO） was proposed. Firstly， the kinematics and dynamics models of the wheeled mobile robot under side-slipping and slipping conditions were established； secondly， the kinematic controller was designed by inverse step method， which provided the virtual velocity for dynamics； then， a nonlinear extended state observer was designed for estimating the total disturbance in response to the external disturbances； and then， the dynamic controller was designed based on the idea of second-order integral sliding mode by combining integral sliding mode with non-singular fast end-sliding mode surface， and the stability analysis of the controller was given. The experimental results show that compared with the first-order sliding mode control method， the maximum value of error of the proposed control method decreases by about 89.53% and 16.28% under linear and nonlinear trajectories， respectively， and the control input is basically unaffected by disturbance. It can be seen that ASOSMC-NESO can effectively improve the control accuracy of WMR under unknown side-slipping and slipping， effectively cut down the jitter， and enhance the robustness of WMR trajectory tracking.

First-arrival picking and inversion of seismic waveforms based on U-shaped multilayer perceptron network

Minghao SUN, Han YU, Yuqing CHEN, Kai LU

2024, 44(7): 2301-2309. DOI: 10.11772/j.issn.1001-9081.2023060808

Asbtract ( )

HTML ( )

PDF (5608KB) ( )

Figures and Tables | References | Related Articles | Metrics

A method for first-arrival picking and inversion of seismic waveforms based on U-shaped MultiLayer Perceptron （U-MLP） network was proposed to solve the safety problem in production because of the low quality of inverted velocity， caused by heavy workload， poor noise immunity and low accuracy in traditional first-arrival picking methods. Firstly， a Weighted cross-entropy Lovász-Softmax （WLS） loss function was designed to solve the problem of poor performance of U-shaped Network （U-Net） caused by the traditional cross-entropy loss function in processing unbalanced data categories. Then， residual connections were introduced in the feature fusion stage to reduce the discrepancies between low-level and high-level features， restoring more detailed information. Finally， a MultiLayer Perceptron （MLP） module was introduced for high-level features to learn local image features better， and reduce the number of parameters and computational complexity. Experimental results show that compared with U-Net， U-MLP network converges faster in training， and its maximum error of first-arrival picking decreases by at least 20% while its value of Intersection over Union （IoU） increases by about 2%. It can be seen that the proposed network model not only improves the accuracy of first-arrival picking significantly， but also produces first arrivals for inverting ideal velocity distributions in both the synthetic and the real datasets， hence demonstrating its better performance and stronger adaptability.

Anisotropic travel time computation method based on dense residual connection physical information neural networks

Yiqun ZHAO, Zhiyu ZHANG, Xue DONG

2024, 44(7): 2310-2318. DOI: 10.11772/j.issn.1001-9081.2023070915

Asbtract ( )

HTML ( )

PDF (7009KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve problems that the travel time calculation by Physical Information Neural Network （PINN） is only applied to isotropic media at present， the error is large and the efficiency is low when far away from the seismic source， and finite difference method， shooting method and bending method have high computational cost on multiple seismic sources and high-density grids， a method of calculating the travel time using Dense Residual Connection PINN （DRC-PINN） in anisotropic media was proposed. Firstly， the eikonal equation after anisotropic factorization was derived as the loss function term. Secondly， the local adaptive arctangent function was selected as the activation function and Limited-memory Broyden-Fletcher-Goldfarb-Shanno-B （L-BFGS-B） was used as the optimizer. Finally， the network was trained in a segmented manner， the deep dense residual network was trained first， their parameters were frozen， and then the shallow dense residual network with physical meaning was trained， so that the network was evaluated and the travel time was obtained. The experimental results show that the maximum absolute error of the proposed method is 0.015 8 μs in the uniform velocity model， and the average absolute errors in other velocity models are reduced by two orders of magnitude， and the efficiency is doubled compared with that of the original model. The proposed method is obviously better than fast sweeping method.

Table of Content