Journal of Computer Applications

Review of interactive machine translation

Xingbin LIAO, Xiaolin QIN, Siqi ZHANG, Yangge QIAN

2023, 43(2): 329-334. DOI: 10.11772/j.issn.1001-9081.2021122067

Asbtract ( )

HTML ( )

PDF (1870KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the development and maturity of deep learning， the quality of neural machine translation has increased， yet it is still not perfect and requires human post-editing to achieve acceptable translation results. Interactive Machine Translation （IMT） is an alternative to this serial work， that is performing human interaction during the translation process， where the user verifies the candidate translations produced by the translation system and， if necessary， provides new input， and the system generates new candidate translations based on the current feedback of users， this process repeats until a satisfactory output is produced. Firstly， the basic concept and the current research progresses of IMT were introduced. Then， some common methods and state-of-the-art works were suggested in classification， while the background and innovation of each work were briefly described. Finally， the development trends and research difficulties of IMT were discussed.

Weakly-supervised text classification with label semantic enhancement

Chengyu LIN, Lei WANG, Cong XUE

2023, 43(2): 335-342. DOI: 10.11772/j.issn.1001-9081.2021122221

Asbtract ( )

HTML ( )

PDF (1987KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of category vocabulary noise and label noise in weakly-supervised text classification tasks， a weakly-supervised text classification model with label semantic enhancement was proposed. Firstly， the category vocabulary was denoised on the basis of the contextual semantic representation of the words in order to construct a highly accurate category vocabulary. Then， a word category prediction task based on MASK mechanism was constructed to fine-tune the pre-training model BERT （Bidirectional Encoder Representations from Transformers）， so as to learn the relationship between words and categories. Finally， a self-training module with label semantics introduced was used to make full use of all data information and reduce the impact of label noise in order to achieve word-level to sentence-level semantic conversion， thereby accurately predicting text sequence categories. Experimental results show that compared with the current state-of-the-art weakly-supervised text classification model LOTClass （Label-name-Only Text Classification）， the proposed method improves the classification accuracy by 5.29， 1.41 and 1.86 percentage points respectively on the public datasets THUCNews， AG News and IMDB.

Temporal convolutional knowledge tracing model with attention mechanism

Xiaomeng SHAO, Meng ZHANG

2023, 43(2): 343-348. DOI: 10.11772/j.issn.1001-9081.2022010024

Asbtract ( )

HTML ( )

PDF (2110KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of insufficient interpretability and long sequence dependency in the deep knowledge tracing model based on Recurrent Neural Network （RNN）， a model named Temporal Convolutional Knowledge Tracing with Attention mechanism （ATCKT） was proposed. Firstly， the student historical interactions embedded representations were learned in the training process. Then， the exercise problem-based attention mechanism was used to learn a specific weight matrix to identify and strengthen the influences of student historical interactions on the knowledge state at each moment. Finally， the student knowledge states were extracted by Temporal Convolutional Network （TCN）， in which dilated convolution and deep neural network were used to expand the scope of sequence learning， and alleviate the problem of long sequence dependency. Experimental results show that compared with four models such as Deep Knowledge Tracing （DKT） and Convolutional Knowledge Tracing （CKT） on four datasets （ASSISTments2009、ASSISTments2015、Statics2011 and Synthetic-5）， ATCKT model has the Area Under the Curve （AUC） and Accuracy （ACC） significantly improved， especially on ASSISTments2015 dataset， with an increase of 6.83 to 20.14 percentage points and 7.52 to 11.22 percentage points respectively， at the same time， the training time of the proposed model is decreased by 26% compared with that of DKT model. In summary， this model can accurately capture the student knowledge states and efficiently predict student future performance.

Abductive reasoning model based on attention balance list

Ming XU, Linhao LI, Qiaoling QI, Liqin WANG

2023, 43(2): 349-355. DOI: 10.11772/j.issn.1001-9081.2021122105

Asbtract ( )

HTML ( )

PDF (1484KB) ( )

Figures and Tables | References | Related Articles | Metrics

Abductive reasoning is an important task in Natural Language Inference （NLI）， which aims to infer reasonable process events （hypotheses） between the given initial observation event and final observation event. Earlier studies independently trained the inference model from each training sample； recently， mainstream studies have considered the semantic correlation between similar training samples and fitted the reasonableness of the hypotheses with the frequency of these hypotheses in the training set， so as to describe the reasonableness of the hypotheses in different environments more accurately. On this basis， while describing the reasonableness of the hypotheses， the difference and relativity constraints between reasonable hypotheses and unreasonable hypotheses were added， thereby achieving the purpose of two-way characterization of the reasonableness and unreasonableness of the hypotheses， and the overall relativity was modeled through many-to-many training. In addition， considering the difference of the word importance in the process of event expression， an attention module was constructed for different words in the samples. Finally， an abductive reasoning model based on attention balance list was formed. Experimental results show that compared with the L2R² （Learning to Rank for Reasoning） model， the proposed model has the accuracy and AUC improved by about 0.46 and 1.36 percentage points respectively on the mainstream abductive inference dataset Abductive Reasoning in narrative Text （ART）， which prove the effectiveness of the proposed model.

Understanding of math word problems integrating commonsense knowledge base and grammatical features

Qingtang LIU, Xinqian MA, Jie ZHOU, Linjing WU, Pengxiao ZHOU

2023, 43(2): 356-364. DOI: 10.11772/j.issn.1001-9081.2021122142

Asbtract ( )

HTML ( )

PDF (1525KB) ( )

Figures and Tables | References | Related Articles | Metrics

Understanding the meaning of mathematical problems is the key for automatic problem solving. However， the accuracy of understanding word problems with complex situations and many parameters is relatively low in previous studies， and the effective optimization solutions need to be further explored and studied. On this basis， a math word problem understanding method integrating commonsense knowledge base and grammatical features was proposed for the classical probability word problems with complex context. Firstly， a classical probability word problem representation model containing seven kinds of key problem-solving parameters was constructed according to text and structure characteristics of the classical probability word problems. Then， based on this model， the task of understanding of word problems was transformed into the problem of solving parameter identification， and a Conditional Random Field （CRF） parameter identification method integrating multi-dimensional grammatical features was presented to solve it. Furthermore， aiming at the problem of implicit parameter identification， a commonsense completion module was added， and an understanding method of math word problems integrating commonsense knowledge base and grammatical features was proposed. Experimental results show that the proposed method has the average F1-score of 93.56% for problem-solving parameter identification， and the accuracy of word problem understanding reached 66.54%， which are better than those of Maximum Entropy Model （MaxEnt）， Bidirectional Long Short-Term Memory-Conditional Random Field （BiLSTM-CRF） and traditional CRF methods. It proves the effectiveness of this method in understanding of classical probability word problems.

Answer selection model based on pooling and feature combination enhanced BERT

Jie HU, Xiaoxi CHEN, Yan ZHANG

2023, 43(2): 365-373. DOI: 10.11772/j.issn.1001-9081.2021122167

Asbtract ( )

HTML ( )

PDF (1248KB) ( )

Figures and Tables | References | Related Articles | Metrics

Current main stream models cannot fully express the semantics of question and answer pairs， do not fully consider the relationships between the topic information of question and answer pairs， and the activation function has the problem of soft saturation， which affect the overall performance of the model. To solve these problems， an answer selection model based on pooling and feature combination enhanced BERT （Bi-directional Encoder Representations from Transformers） was proposed. Firstly， adversarial samples and pooling operation were introduced to represent the semantics of question and answer pairs based on the pre-training model BERT. Secondly， the relationships between topic information of question and answer pairs were strengthened by the feature combination of topic information. Finally， the activation function in the hidden layer was improved， and the splicing vector was used to complete the answer selection task through the hidden layer and classifier. Model validation was performed on datasets SemEval-2016CQA and SemEval-2017CQA. The results show that compared with tBERT model， the proposed model has the accuracy increased by 3.1 percentage points and 2.2 percentage points respectively， F1 score increased by 2.0 percentage points and 3.1 percentage points respectively. It can be seen that the comprehensive effect of the proposed model on the answer selection task is effectively improved， and both of the accuracy and F1 score of the model are better than those of the model for comparison.

Cross-corpus speech emotion recognition based on decision boundary optimized domain adaptation

Yang WANG, Hongliang FU, Huawei TAO, Jing YANG, Yue XIE, Li ZHAO

2023, 43(2): 374-379. DOI: 10.11772/j.issn.1001-9081.2021122043

Asbtract ( )

HTML ( )

PDF (3084KB) ( )

Figures and Tables | References | Related Articles | Metrics

Domain adaptation algorithms are widely used for cross-corpus speech emotion recognition. However， many domain adaptation algorithms lose the discrimination of target domain samples while pursuing the minimization of domain discrepancy， resulting in their presence at the decision boundary of the model in a high-density form， which degrades the performance of the model. Based on the above problem， a Decision Boundary Optimized Domain Adaptation （DBODA） method based cross-corpus speech emotion recognition was proposed. Firstly， the features were processed by using convolutional neural networks. Then， the features were fed into the Maximum Nuclear-norm and Mean Discrepancy （MNMD） module to maximize the nuclear norm of the sentiment prediction probability matrix of the target domain while reducing the inter-domain discrepancy， thereby enhancing the discrimination of the target domain samples and optimize the decision boundary. In six sets of cross-corpus experiments set up on the basis of Berlin， eNTERFACE and CASIA speech databases， the average recognition accuracy of the proposed method is 1.68 to 11.01 percentage points ahead of those of the other algorithms， indicating that the proposed model effectively reduces the sample density around the decision boundary and improves the prediction accuracy.

End-to-end speech recognition method based on prosodic features

Cong LIU, Genshun WAN, Jianqing GAO, Zhonghua FU

2023, 43(2): 380-384. DOI: 10.11772/j.issn.1001-9081.2022010009

Asbtract ( )

HTML ( )

PDF (1114KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the traditional speech recognition system， the optimal decoding paths are determined by a language model restrained by the training data. Almost inevitably， the right pronunciation may produce wrong character recognition results in some scenarios. In order to use the prosodic information in speech to enhance the probability of correct character combination in language model， an end-to-end speech recognition method based on prosodic features was proposed. Based on the attention mechanism based encoder-decoder speech recognition framework， firstly， the coefficient distribution of attention mechanism was used to extract prosodic features such as pronunciation interval and pronunciation energy. Then， the prosodic features were combined with decoder to significantly improve the accuracy of speech recognition in the cases with the same or similar pronunciation and semantic ambiguity. Experimental results show that the proposed method achieves a relative accuracy improvement of 5.2% and 5.0% respectively compared with the baseline end-to-end speech recognition method on 1 000 h and 10 000 h speech recognition tasks and improves the intelligibility of speech recognition results.

Inverse distance weight interpolation algorithm based on particle swarm local optimization

Feng XIANG, Zhongzhi LI, Xi XIONG, Binyong LI

2023, 43(2): 385-390. DOI: 10.11772/j.issn.1001-9081.2022010056

Asbtract ( )

HTML ( )

PDF (2046KB) ( )

Figures and Tables | References | Related Articles | Metrics

The accuracy of Inverse Distance Weighting （IDW） will be affected by the selection of reference points and parameters. Aiming at the problem of ignoring local characteristics in multi-Parameter co-optimization Inverse Distance Weighting algorithm （PIDW）， an improved algorithm based on particle swarm local optimized IDW was proposed， namely Particle swarm Local optimization Inverse Distance Weight （PLIDW）. Firstly， the parameters of each sample point in the study area were optimized respectively， and the cross-validation method was used for evaluation， and the optimal set of parameters for each sample point was recorded. At the same time， in order to improve the query efficiency， a K-Dimensional Tree （KD-Tree） was used to save the spatial positions and optimal parameters. Finally， according to the spatial proximity， the nearest set of parameters was selected from KD-Tree to optimize IDW. Experimental results based on simulated data and real temperature dataset show that compared with PIDW， PLIDW has the accuracy on the real dataset improved by more than 4.18%. This shows that the low accuracy in some scenarios caused by ignoring local features in PIDW is improved by the proposed algorithm， and the adaptability is increased at the same time.

Partial periodic pattern incremental mining of time series data based on multi-scale

Yaling XUN, Linqing WANG, Jianghui CAI, Haifeng YANG

2023, 43(2): 391-397. DOI: 10.11772/j.issn.1001-9081.2021122190

Asbtract ( )

HTML ( )

PDF (2226KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of high computational complexity and poor expansibility in the mining process of partial periodic patterns from dynamic time series data， a partial periodic pattern mining algorithm for dynamic time series data combined with multi-scale theory， named MSI-PPPGrowth （Multi-Scale Incremental Partial Periodic Frequent Pattern） was proposed. In MSI-PPPGrowth， the objective multi-scale characteristics of time series data， were made full use， and the multi-scale theory was introduced in the mining process of partial periodic patterns from time series data. Firstly， both the original data after scale division and incremental time series data were used as a finer-grained benchmark scale dataset for independent mining. Then， the correlation between different scales was used to realize scale transformation， so as to indirectly obtain global frequent patterns corresponding to the dynamically updated dataset. Therefore， the repeated scanning of the original dataset and the constant adjustment of the tree structure were avoided. In which， a new frequent missing count estimation model PJK-EstimateCount was designed based on Kriging method considering the periodicity of time series to effectively estimate the frequent missing item support count in scale transformation. Experimental results show that MSI-PPPGrowth has good scalability and real-time performance. Especially for dense datasets， MSI-PPPGrowth has significant performance advantages.

Parameter calculation algorithm of structural graph clustering driven by instance clusters

Chuanyu ZONG, Chao XIAN, Xiufeng XIA

2023, 43(2): 398-406. DOI: 10.11772/j.issn.1001-9081.2022010082

Asbtract ( )

HTML ( )

PDF (2584KB) ( )

Figures and Tables | References | Related Articles | Metrics

Clustering results of the pSCAN （pruned Structural Clustering Algorithm for Network） algorithm are influenced by the density constraint parameter and the similarity threshold parameter. If the requirements cannot be satisfied by the clustering results obtained by the clustering parameters provided by the user， then the user’s own clustering requirements can be expressed through instance clusters. Aiming at the problem of instance clusters expressing clustering query requirements， an instance cluster-driven structural graph clustering parameter calculation algorithm PART and its improved algorithm ImPART were proposed. Firstly， the influences of two clustering parameters on the clustering results were analyzed， and correlation subgraph of instance cluster was extracted. Secondly， the feasible interval of the density constraint parameter was obtained by analyzing the correlation subgraph， and the nodes in the instance cluster were divided into core nodes and non-core nodes according to the current density constraint parameter and the structural similarity between nodes. Finally， according to the node division result， the optimal similarity threshold parameter corresponding to the current density constraint parameter was calculated， and the obtained parameters were verified and optimized on the relevant subgraph until the clustering parameters that satisfy the requirements of the instance cluster were obtained. Experimental results on real datasets show that a set of effective parameters can be returned for user instance clusters by using the proposed algorithm， and the proposed improved algorithm ImPART is more than 20% faster than the basic algorithm PART， and can return the optimal clustering parameters that satisfy the requirements of instance clusters quickly and effectively for the user.

Diversity represented deep subspace clustering algorithm

Zhifeng MA, Junyang YU, Longge WANG

2023, 43(2): 407-412. DOI: 10.11772/j.issn.1001-9081.2021122126

Asbtract ( )

HTML ( )

PDF (1851KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focusing on the challenge task for mining complementary information in different levels of features in the deep subspace clustering problem， based on the deep autoencoder， by exploring complementary information between the low-level and high-level features obtained by the encoder， a Diversity Represented Deep Subspace Clustering （DRDSC） algorithm was proposed. Firstly， based on Hilbert-Schmidt Independence Criterion （HSIC）， a diversity representation measurement model was established for different levels of features. Secondly， a feature diversity representation module was introduced into the deep autoencoder network structure， which explored image features beneficial to enhance the clustering effect. Furthermore， the form of loss function was updated to effectively fuse the underlying subspaces of multi-level representation. Finally， several experiments were conducted on commonly used clustering datasets. Experimental results show that on the datasets Extended Yale B， ORL， COIL20 and Umist， the clustering error rates of DRDSC reach 1.23%， 10.50%， 1.74% and 17.71%， respectively， which are reduced by 10.41， 16.75， 13.12 and 12.92 percentage points， respectively compared with those of Efficient Dense Subspace Clustering （EDSC）， and are reduced by 1.44， 3.50， 3.68 and 9.17 percentage points， respectively compared with Deep Subspace Clustering （DSC）， which indicates that the proposed DRDSC algorithm has better clustering effect.

Fast sanitization algorithm based on BCU-Tree and dictionary for high-utility mining

Chunyong YIN, Ying LI

2023, 43(2): 413-422. DOI: 10.11772/j.issn.1001-9081.2021122161

Asbtract ( )

HTML ( )

PDF (2958KB) ( )

Figures and Tables | References | Related Articles | Metrics

Privacy Preserving Utility Mining （PPUM） has problems of long sanitization time， high computational complexity， and high side effect. To solve these problems， a fast sanitization algorithm based on BCU-Tree and Dictionary （BCUTD） for high-utility mining was proposed. In the algorithm， a new tree structure called BCU-Tree was presented to store sensitive item information， and based on the bitwise operator coding model， the tree construction time and search space were reduced. The dictionary table was used to store all nodes in the tree structure， and only the dictionary table needed to be accessed when the sensitive item was modified. Finally， the sanitization process was completed. In the experiments on four different datasets， BCUTD algorithm has better performance on sanitization time and high side effect than Hiding High Utility Item First （HHUIF）， Maximum Sensitive Utility-MAximum item Utility （MSU-MAU）， and Fast Perturbation Using Tree and Table structures （FPUTT）. Experimental results show that BCUTD algorithm can effectively speed up the sanitization process， reduce the side effect and computational complexity of the algorithm.

Efficient complex event matching algorithm based on ordered event lists

Tao QIU, Jianli DING, Xiufeng XIA, Hongmei XI, Peiliang XIE, Qingyi ZHOU

2023, 43(2): 423-429. DOI: 10.11772/j.issn.1001-9081.2021122186

Asbtract ( )

HTML ( )

PDF (2336KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of high matching cost in the existing complex event matching processing methods， a complex event matching algorithm ReCEP was proposed， which uses event buffers （ordered event lists） for recursive traversal. Different from the existing method that uses automaton to match on the event stream， this method decomposes the constraints in the complex event query mode into different types， and then recursively verifies the different constraints on the ordered list. Firstly， according to the query mode， the related event instances were cached according to the event type. Secondly， the query filtering operation was performed to the event instances on the ordered list， and an algorithm based on recursive traversal was given to determine the initial event instance and obtain candidate sequence. Finally， the attribute constraints of the candidate sequence were further verified. Experimental testing and analysis results based on simulated stock transaction data show that compared with the current mainstream matching methods SASE and Siddhi， ReCEP algorithm can effectively reduce the processing time of query matching， has overall performance better than both of the two methods， and has the query matching efficiency improved by more than 8.64%. It can be seen that the proposed complex event matching method can effectively improve the efficiency of complex event processing.

Reviewer recommendation algorithm based on affinity and research direction coverage

Lei ZHONG, Yunsheng ZHOU, Dunhui YU, Haibo CUI

2023, 43(2): 430-436. DOI: 10.11772/j.issn.1001-9081.2021122127

Asbtract ( )

HTML ( )

PDF (2659KB) ( )

Figures and Tables | References | Related Articles | Metrics

To deal with the problem that the existing reviewer recommendation algorithms assign reviewers only through affinity score and ignore the research direction matching between reviewers and papers to be reviewed， a reviewer recommendation algorithm based on Affinity and Research Direction Coverage （ARDC） was proposed. Firstly， the order of the paper’s selection of reviewers was determined according to the frequencies of the research directions appearing in the papers and the reviewer’s paper groups. Secondly， the reviewer’s comprehensive review score to the paper to be reviewed was calculated based on the affinity score between the reviewers and the paper to be reviewed and the research direction coverage score of the reviewers to the paper to be reviewed， and the pre-assigned review team for the paper was obtained on the basis of round-robin scheduling. Finally， the final recommendation of the review team was realized based on the conflict of interest conflict inspection and resolution. Experimental results show that compared with assignment based reviewer recommendation algorithms such as Fair matching via Iterative Relaxation （FairIR） and Fair and Accurate reviewer assignment in Peer Review （PR4A）， the proposed algorithm has the average research direction coverage score increased by 38% on average at the expense of a small amount of affinity score， so that the recommendation result is more accurate and reasonable.

Review on privacy-preserving technologies in federated learning

Teng WANG, Zheng HUO, Yaxin HUANG, Yilin FAN

2023, 43(2): 437-449. DOI: 10.11772/j.issn.1001-9081.2021122072

Asbtract ( )

HTML ( )

PDF (2014KB) ( )

Figures and Tables | References | Related Articles | Metrics

In recent years， federated learning has become a new way to solve the problems of data island and privacy leakage in machine learning. Federated learning architecture does not require multiple parties to share data resources， in which participants only needed to train local models on local data and periodically upload parameters to the server to update the global model， and then a machine learning model can be built on large-scale global data. Federated learning architecture has the privacy-preserving nature and is a new scheme for large-scale data machine learning in the future. However， the parameter interaction mode of this architecture may lead to data privacy disclosure. At present， strengthening the privacy-preserving mechanism in federated learning architecture has become a new research hotspot. Starting from the privacy disclosure problem in federated learning， the attack models and sensitive information disclosure paths in federated learning were discussed， and several types of privacy-preserving techniques in federated learning were highlighted and reviewed， such as privacy-preserving technology based on differential privacy， privacy-preserving technology based on homomorphic encryption， and privacy-preserving technology based on Secure Multiparty Computation （SMC）. Finally， the key issues of privacy protection in federated learning were discussed， the future research directions were prospected.

Design scheme of digital review system for online conference based on privacy computing

Tengteng WANG, Zhe CUI, Dan TANG

2023, 43(2): 450-457. DOI: 10.11772/j.issn.1001-9081.2022010025

Asbtract ( )

HTML ( )

PDF (1992KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focused on the issue that the current online conference digital review system cannot ensure the privacy of reviewers’ opinions and obtain safe and reliable results at the same time， a privacy computing based design scheme of digital review system for online conference was proposed. Firstly， the review data were shared secretly through the encoding matrix of Reed-Solomon （RS） code to obtain secret shares， and the hash code of each party’s review data was submitted to the organizer as a stub to prevent denial. Secondly， the secret shares were cooperatively computed by the monotone expansion matrix， one encoding matrix， to obtain the voting results. Finally， the parity check matrix was used to check whether there was a sharing error or tampering behavior of the secret shares in the process of multi-party cooperative computing. Theoretical analysis and simulation show that the proposed scheme can realize the privacy computing function in the digital review system for small- and medium-sized online conferences.

Consensus mechanism of voting scheme in blockchain based on Chinese remainder theorem

Shumin TANG, Yu JIN

2023, 43(2): 458-466. DOI: 10.11772/j.issn.1001-9081.2021122176

Asbtract ( )

HTML ( )

PDF (2105KB) ( )

Figures and Tables | References | Related Articles | Metrics

The current consensus mechanisms have the following problems： 1） "monopolization" of bookkeeping rights， that is， with the operation of the system， the bookkeeping rights are mastered by some nodes with more resources， resulting in the loss of small nodes with poor resources， which affects the system security； 2） during the election of stackholders， it is necessary to traverse the transaction records of all participating nodes， resulting in a rapid increase of the consensus delay. To solve these problems， a new consensus mechanism， CRT-PoT（Chinese Remainder Theorem-Proof of Trust）， was proposed. Firstly， based on Chinese Remainder Theorem （CRT）， a voting model CRT-Election was proposed for selecting stakeholders. This model stipulated that the candidates obtained the voting support of voters through the number of successful blocks and the number of successful votes to compete for bookkeeping rights. Then， based on this voting model， a multi-voting mechanism was proposed to ensure that small nodes had more opportunities for campaign for bookkeeping rights， effectively solving the problem of "monopolization" of bookkeeping rights； it also ensured that when the number of candidates increased， the consensus delay increased slowly， because this scheme did not need to traverse the transaction records of all participating nodes， the consensus delay was only related to the number of participating nodes， and the consensus delay increased linearly. Finally， from the theoretical analysis and experimental result perspectives， it was verified that compared with the existing consensus mechanisms， CRT-PoT not only effectively solves the problem of "monopolization" of bookkeeping rights， but also reduces the consensus delay.

Poisoning attack toward visual classification model

Jie LIANG, Xiaoyan HAO, Yongle CHEN

2023, 43(2): 467-473. DOI: 10.11772/j.issn.1001-9081.2021122068

Asbtract ( )

HTML ( )

PDF (3264KB) ( )

Figures and Tables | References | Related Articles | Metrics

In data poisoning attacks， backdoor attackers manipulate the distribution of training data by inserting the samples with hidden triggers into the training set to make the test samples misclassified so as to change model behavior and reduce model performance. However， the drawback of the existing triggers is the sample independence， that is， no matter what trigger mode is adopted， different poisoned samples contain the same triggers. Therefore， by combining image steganography and Deep Convolutional Generative Adversarial Network （DCGAN）， an attack method based on sample was put forward to generate image texture feature maps according to the gray level co-occurrence matrix， embed target label character into the texture feature maps as a trigger by using the image steganography technology， and combine texture feature maps with trigger and clean samples into poisoned samples. Then， a large number of fake pictures with trigger were generated through DCGAN. In the training set samples， the original poisoned samples and the fake pictures generated by DCGAN were mixed together to finally achieve the effect that after the poisoner injecting a small number of poisoned samples， the attack rate was high and the effectiveness， sustainability and concealment of the trigger were ensured. Experimental results show that this method avoids the disadvantages of sample independence and has the model accuracy reached 93.78%. When the proportion of poisoned samples is 30%， data preprocessing， pruning defense and AUROR defense have the least influence on the success rate of attack， and the success rate of attack can reach about 56%.

Hybrid adaptive particle swarm optimization algorithm for workflow scheduling

Xuesen MA, Xuemei XU, Gonghui JIANG, Yan QIAO, Tianbao ZHOU

2023, 43(2): 474-483. DOI: 10.11772/j.issn.1001-9081.2022010001

Asbtract ( )

HTML ( )

PDF (2548KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the conflict between the makespan and execution cost of cloud workflows with deadlines， a Hybrid Adaptive Particle Swarm Optimization algorithm for workflow scheduling （HAPSO） was proposed. Firstly， a Directed Acyclic Graph （DAG） cloud workflow scheduling model was established based on deadlines. Secondly， through the combination of norm ideal points and adaptive weights， the DAG scheduling model was transformed into a multi-objective optimization problem that weighs DAG makespan and execution cost. Finally， based on Particle Swarm Optimization （PSO） algorithm， the adaptive inertia weight， the adaptive learning factors， the probability switching mechanism of flower pollination algorithm， Firefly Algorithm （FA） and the particle out-of-bound processing method were added to balance the global search ability and the local search ability of the particle swarm， and then to solve the objective optimization problem of DAG makespan and execution cost. The optimization results of PSO， Weight Particle Swarm Optimization （WPSO）， Ant Colony Optimization （ACO） and HAPSO were compared and analyzed in the experiment. Experimental results show that HAPSO reduces the multi-objective function value by 40.9% to 81.1% that weighs the makespan and execution cost of workflow （30~300 tasks）， and HAPSO effectively weighs the makespan and execution cost with the constraints of workflow deadlines. In addition， HAPSO also has a good effect on the single objective of reducing the makespan or execution cost， which verifies the universality of HAPSO.

Design and implementation of block transmission mechanism based on remote direct memory access

Dong SUN, Biao WANG, Yun XU

2023, 43(2): 484-489. DOI: 10.11772/j.issn.1001-9081.2021122243

Asbtract ( )

HTML ( )

PDF (1890KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the continuous development of blockchain technology， the block transmission delay has become a performance bottleneck of the scalability of the blockchain system. Remote Direct Memory Access （RDMA） technology， which supports high-bandwidth and low-delay data transmission， provides a new idea for block transmission with low latency. Therefore， a block catalogue structure for block information sharing was designed based on the characteristics of RDMA primitives， and the basic working process of block transmission was proposed and implemented on this basis. Experimental results show that compared with TCP（Transmission Control Protocol） transmission mechanism， the RDMA-based block transmission mechanism reduces the transmission delay between nodes by 44%， the transmission delay among the whole network by 24.4% on a block of 1 MB size， and the number of temporary forks appeared in blockchain by 22.6% on a blockchain of 10 000 nodes. It can be seen that the RDMA-based block transmission mechanism takes advantage of the performance of high speed networks， reduces block transmission latency and the number of temporary forks， thereby improving the scalability of the existing blockchain systems.

Improved instruction obfuscation framework based on obfuscator low level virtual machine

Yayi WANG, Chen LIU, Tianbo HUANG, Weiping WEN

2023, 43(2): 490-498. DOI: 10.11772/j.issn.1001-9081.2021122234

Asbtract ( )

HTML ( )

PDF (2140KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focusing on the issue that only one instruction substitution with 5 operators and 13 substitution schemes is supported in Obfuscator Low Level Virtual Machine （OLLVM） at the instruction obfuscation level， an improved instruction obfuscation framework InsObf was proposed. InsObf， including junk code insertion and instruction substitution， was able to enhance the obfuscation effect at the instruction level based on OLLVM. For junk code insertion， firstly， the dependency of the instruction inside the basic block was analyzed， and then two kinds of junk code， multiple jump and bogus loop， were inserted to disrupt the structure of the basic block. For instruction substitution， based on OLLVM， it was expanded to 13 operators， with 52 instruction substitution schemes. The framework prototype was implemented on Low Level Virtual Machine （LLVM）. Experimental results show that compared to OLLVM， InsObf has the cyclomatic complexity and resilience increased by almost four times， with a time cost of about 10 percentage points and a space cost of about 20 percentage points higher. Moreover， InsObf can provide higher code complexity compared to Armariris and Hikari， which are also improved on the basis of OLLVM， at the same order of magnitude of time and space costs. Therefore， InsObf can provide effective protection at the instruction level.

Repair method for process models with concurrent structures based on token replay

Erjing BAI, Xiaoyan LI, Yuyue DU

2023, 43(2): 499-506. DOI: 10.11772/j.issn.1001-9081.2021122154

Asbtract ( )

HTML ( )

PDF (3299KB) ( )

Figures and Tables | References | Related Articles | Metrics

Process mining can build process model according to event logs generated by enterprise information management system. There always exist some deviations between the process model and event logs when the actual business process changes. At this time， the process model needs to be repaired. For the process model with concurrent structures， the precision of some existing repairing methods will be reduced because of the addition of self-loops and invisible transitions. Therefore， a method for repairing process models with concurrent structures was proposed on the basis of logic Petri net and token replay. Firstly， according to the relationship between the input-output places of the sub-model and event logs， the insertion position of the sub-model was determined. Then， the deviation positions were determined by a token replay method. Finally， a method was designed to repair the process models based on logical Petri net. The correctness and effectiveness of this method were verified by carrying out simulations on ProM platform， and the proposed method was compared with Fahland’s and other methods. The results show that the precision of this method is about 85%， which is increased by 17 and 11 percentage points respectively compared with those of Fahland’s and Goldratt methods， In the terms of simplicity， the proposed method does not add any self-loop or invisible transition， while Fahland’s and Goldratt methods add some self-loops and invisible transitions. All of the fitting degrees of the three methods are above 0.9， and the fitting degree of Goldratt method is slightly lower. The above verifies that the model repaired by the proposed method has higher fitness and precision.

Anomaly detection in video via independently recurrent neural network and variational autoencoder network

Qing JIA, Laihua WANG, Weisheng WANG

2023, 43(2): 507-513. DOI: 10.11772/j.issn.1001-9081.2021122081

Asbtract ( )

HTML ( )

PDF (2994KB) ( )

Figures and Tables | References | Related Articles | Metrics

To effectively extract the temporal information between consecutive video frames， a prediction network IndRNN-VAE （Independently Recurrent Neural Network-Variational AutoEncoder） that fuses Independently Recurrent Neural Network （IndRNN） and Variational AutoEncoder （VAE） network was proposed. Firstly， the spatial information of video frames was extracted through VAE network， and the latent features of video frames were obtained by a linear transformation. Secondly， the latent features were used as the input of IndRNN to obtain the temporal information of the sequence of video frames. Finally， the obtained latent features and temporal information were fused through residual block and input to the decoding network to generate the prediction frame. By testing on UCSD Ped1， UCSD Ped2 and Avenue public datasets， experimental results show that compared with the existing anomaly detection methods， the method based on IndRNN-VAE has the performance significantly improved， and has the Area Under Curve （AUC） values reached 84.3%， 96.2%， and 86.6% respectively， the Equal Error Rate （EER） values reached 22.7%， 8.8%， and 19.0% respectively， the difference values in the mean anomaly scores reached 0.263， 0.497， and 0.293 respectively. Besides， the running speed of this method reaches 28 FPS （Frames Per Socond）.

Moving object detection based on reliability low-rank factorization and generalized diversity difference

Peng WANG, Dawei ZHANG, Zhengjun LU, Linhao LI

2023, 43(2): 514-520. DOI: 10.11772/j.issn.1001-9081.2021122112

Asbtract ( )

HTML ( )

PDF (2488KB) ( )

Figures and Tables | References | Related Articles | Metrics

Moving object detection aims to separate the background and foreground of the video， however， the commonly used low-rank factorization methods are often difficult to comprehensively deal with the problems of dynamic background and intermittent motion. Considering that the skewed noise distribution after background subtraction has potential background correction effect， a moving object detection model based on the reliability low-rank factorization and generalized diversity difference was proposed. There were three steps in the model. Firstly， the peak position and the nature of skewed distribution of the pixel distribution in the time dimension were used to select a sub-sequence without outlier pixels， and the median of this sub-sequence was calculated to form the static background. Secondly， the noise after static background subtraction was modeled by asymmetric Laplace distribution， and the modeling results based on spatial smoothing were used as reliability weights to participate in low-rank factorization to model comprehensive background （including dynamic background）. Finally， the temporal and spatial continuous constraints were adopted in proper order to extract the foreground. Among them， for the temporal continuity， the generalized diversity difference constraint was proposed， and the expansion of the foreground edge was suppressed by the difference information of adjacent video frames. Experimental results show that， compared with six models such as PCP（Principal Component Pursuit）， DECOLOR（DEtecting Contiguous Outliers in the Low-Rank Representation）， LSD（Low-rank and structured Sparse Decomposition）， TVRPCA（Total Variation regularized Robust Principal Component Analysis）， E-LSD（Extended LSD） and GSTO（Generalized Shrinkage Thresholding Operator）， the proposed model has the highest F-measure. It can be seen that this model can effectively improve the detection accuracy of foreground in complex scenes such as dynamic background and intermittent motion.

Action recognition method based on video spatio-temporal features

Ranyan NI, Yi ZHANG

2023, 43(2): 521-528. DOI: 10.11772/j.issn.1001-9081.2022010017

Asbtract ( )

HTML ( )

PDF (2494KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems that the end-to-end recognition of two-stream networks cannot be realized due to the need of calculating optical flow maps in advance to extract motion information and the three-dimensional convolutional networks have a lot of parameters， an action recognition method based on video spatio-temporal features was proposed. In this method， the spatio-temporal information in videos were able to be extracted efficiently without adding any calculation of optical flows or any three-dimensional convolution operation. Firstly， the motion information extraction module based on attention mechanism was used to capture the motion shift information between two adjacent frames， thereby simulating the function of optical flows in two-stream network. Secondly， a decoupled spatio-temporal information extraction module was proposed to replace the three-dimensional convolution in order to encode the spatio-temporal information. Finally， the two modules were embedded into the two-dimensional residual network to complete the end-to-end action recognition. Experiments were carried out on several mainstream action recognition datasets. The results show that when only using RGB （Red-Green-Blue） video frames as input， the recognition accuracies of the proposed method on UCF101， HMDB51 and Something-Something-V1 datasets are 96.5%， 73.1% and 46.6% respectively. Compared with Temporal Segment Network （TSN） method using two-stream structure， the proposed method has the recognition accuracy on UCF101 improved by 2.5 percentage points. It can be seen that the proposed method is able to extract spatio-temporal features in videos efficiently.

Fall detection algorithm based on scene prior and attention guidance

Ping WANG, Nan CHEN, Lei LU

2023, 43(2): 529-535. DOI: 10.11772/j.issn.1001-9081.2022010114

Asbtract ( )

HTML ( )

PDF (1544KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing fall detection works mainly focus on indoor scenes， and most of them only model people’s body posture features， ignoring background information of the scene and the interaction information between people and the ground. Aiming at the problem， from the perspective of practical application of elevator scene， a fall detection algorithm based on scene prior and attention guidance was proposed. Firstly， elevator historical data was used to automatically learn the scene prior information from people’s trajectories by Gaussian probability distribution modelling. Then， the scene information was taken as a spatial attention mask and fused with the global features of the neural network to focus on local information of the ground area. After that， the fused local and global features were further aggregated using adaptive weighting method to improve the robustness and discriminative ability of the generated features. Finally， the features were fed into a classifier module consisting of a global average pooling layer and a fully connected layer to perform the fall prediction and classification. Experimental results show that the detection accuracy of the proposed algorithm on the self-built elevator scene dataset Elevator Fall Detection Dataset and the public UR Fall Detection Dataset reached 95.36% and 99.01% respectively， which is increased by 3.52 percentage points and 0.61 percentage points respectively compared with that of ResNet50 with complicated network structure. It can be seen that proposed attention mechanism with Gaussian scene prior guidance can make the network focus on information of the ground area， which is more conducive to detect fall events. By using it， the detection model has high accuracy， and the algorithm meets the real-time application requirements.

Image inpainting algorithm of multi-scale generative adversarial network based on multi-feature fusion

Gang CHEN, Yongwei LIAO, Zhenguo YANG, Wenying LIU

2023, 43(2): 536-544. DOI: 10.11772/j.issn.1001-9081.2022010015

Asbtract ( )

HTML ( )

PDF (4735KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems in Multi-scale Generative Adversarial Networks Image Inpainting algorithm （MGANII）， such as unstable training in the process of image inpainting， poor structural consistency， insufficient details and textures of the inpainted image， an image inpainting algorithm of multi-scale generative adversarial network was proposed based on multi-feature fusion. Firstly， aiming at the problems of poor structural consistency and insufficient details and textures， a Multi-Feature Fusion Module （MFFM） was introduced in the traditional generator， and a perception-based feature reconstruction loss function was introduced to improve the ability of feature extraction in the dilated convolutional network， thereby supplying more details and texture features for the inpainted image. Then， a perception-based feature matching loss function was introduced into local discriminator to enhance the discrimination ability of the discriminator， thereby improving the structural consistency of the inpainted image. Finally， a risk penalty term was introduced into the adversarial loss function to meet the Lipschitz continuity condition， so that the network was able to converge rapidly and stably in the training process. On the dataset CelebA， compared with MANGII， the proposed multi-feature fusion image inpainting algorithm can converges faster. Meanwhile， the Peak Signal-to-Noise Ratio （PSNR） and Structural SIMilarity （SSIM） of the images inpainted by the proposed algorithm are improved by 0.45% to 8.67% and 0.88% to 8.06% respectively compared with those of the images inpainted by the baseline algorithms， and Frechet Inception Distance score （FID） of the images inpainted by the proposed algorithm is reduced by 36.01% to 46.97% than the images inpainted by the baseline algorithms. Experimental results show that the inpainting performance of the proposed algorithm is better than that of the baseline algorithms.

U-shaped feature pyramid network for image inpainting forensics

Wanli SHEN, Yujin ZHANG, Wan HU

2023, 43(2): 545-551. DOI: 10.11772/j.issn.1001-9081.2021122107

Asbtract ( )

HTML ( )

PDF (1450KB) ( )

Figures and Tables | References | Related Articles | Metrics

Image inpainting is a common method of image tampering. Image inpainting methods based on deep learning can generate more complex structures and even new objects， making image inpainting forensics more challenging. Therefore， an end-to-end U-shaped Feature Pyramid Network （FPN） was proposed for image inpainting forensics. Firstly， multi-scale feature extraction was performed through the from-top-to-down VGG16 module， and then the from-bottom-to-up feature pyramid architecture was used to carry out up-sampling of the fused feature maps， and a U-shaped structure was formed by the overall process. Next， the global and local attention mechanisms were combined to highlight the inpainting traces. Finally， the fusion loss function was used to improve the prediction rate of the repaired area. Experimental results show that the proposed method achieves an average F1-score and Intersection over Union （IoU） value of 0.791 9 and 0.747 2 respectively on various deep inpainting datasets. Compared with the existing Localization of Diffusion-based Inpainting （LDI）， Patch-based Convolutional Neural Network （Patch-CNN） and High-Pass Fully Convolutional Network （HP-FCN） methods， the proposed method has better generalization ability， and also has stronger robustness to JPEG compression.

Multi-stage low-illuminance image enhancement network based on attention mechanism

Guihui CHEN, Jinyu LIN, Yuehua LI, Zhongbing LI, Yuli WEI, Kai LU

2023, 43(2): 552-559. DOI: 10.11772/j.issn.1001-9081.2022010093

Asbtract ( )

HTML ( )

PDF (4056KB) ( )

Figures and Tables | References | Related Articles | Metrics

A multi-stage low-illuminance image enhancement network based on attention mechanism was proposed to solve the problem that the details of low-illuminance images are lost due to the overlapping of image contents and large brightness differences in some regions during the enhancement process of low-illuminance images. At the first stage， an improved multi-scale fusion module was used to perform preliminary image enhancement. At the second stage， the enhanced image information of the first stage was cascaded with the input of this stage， and the result was used as the input of the multi-scale fusion module in this stage. At the third stage， the enhanced image information of the second stage was cascaded with the input of the this stage， and the result was used as the input of the multi-scale fusion module in this stage. In this way， with the use of multi-stage fusion， not only the brightness of the image was improved adaptively， but also the details were retained adaptively. Experimental results on open datasets LOL and SICE show that compared to the algorithms and networks such as MSR （Multi-Scale Retinex） algorithm， gray Histogram Equalization （HE） algorithm and RetinexNet （Retina cortex Network）， the proposed network has the value of Peak Signal-to-Noise Ratio （PSNR） 11.0% to 28.9% higher， and the value of Structural SIMilarity （SSIM） increased by 6.8% to 46.5%. By using multi-stage method and attention mechanism to realize low-illuminance image enhancement， the proposed network effectively solves the problems of image content overlapping and large brightness difference， and the images obtained by this network are more detailed and subjective recognizable with clearer textures.

Design of guided adaptive mathematical morphology for multimodal images

Mengdi SUN, Zhonggui SUN, Xu KONG, Hongyan HAN

2023, 43(2): 560-566. DOI: 10.11772/j.issn.1001-9081.2021122168

Asbtract ( )

HTML ( )

PDF (5743KB) ( )

Figures and Tables | References | Related Articles | Metrics

Traditional Mathematical Morphology （TMM） is not well in structure-preserving， and the existing adaptive modified methods usually miss mathematical properties. To address the problems， a Guided Adaptive Mathematical Morphology （GAMM） for multimodal images was proposed. Firstly， the structure elements were constructed by considering the joint information of the input and the guidance images， so that the corresponding operators were more robust to the noise. Secondly， according to 3σ rule， the selected members of structure elements were able to be adapted to image contents. Finally， by using the Hadamard product of sparse matrices， the structure elements were imposed with a symmetry constraint. Both of the theoretical verification and simulation show that the corresponding operators of the proposed mathematical morphology can have important mathematical properties， such as order preservation and adjunction， at the same time. Denoising experimental results on multimodal images show that the Peak Signal-to-Noise Ratio （PSNR） of GAMM is 2 to 3 dB higher than those of TMM and Robust Adaptive Mathematical Morphology （RAMM）. Meanwhile， comparison of subjective visual effect shows that GAMM significantly outperforms TMM and RAMM in noise removal and structure preservation.

Nonhomogeneous image dehazing based on dual-branch conditional generative adversarial network

Li’an ZHU, Hong ZHANG

2023, 43(2): 567-574. DOI: 10.11772/j.issn.1001-9081.2021122091

Asbtract ( )

HTML ( )

PDF (5800KB) ( )

Figures and Tables | References | Related Articles | Metrics

The pictures taken on hazy days have color distortion and blurry details， which will affect the quality of the pictures to a certain extent. Many deep learning based methods have good results on synthetic homogeneous haze images， but they have poor results on the real nonhomogeneous dehazing dataset introduced in the latest NTIRE （New Trends in Image Restoration and Enhancement） challenge. The main reason is that the non-uniform distribution of haze is complicated， and the texture details are easily lost in the process of dehazing. Moreover， the sample number of this dataset is limited， which is easy to lead to overfitting. Therefore， a Conditional Generative Adversarial Network with Dual-Branch generators （DB-CGAN） was proposed. Among them， in one branch， with U-net used as the basic architecture， through the strategy of "Strengthen-Operate-Subtract"， enhancement modules were added to the decoder to enhance the recovery of features in the decoder， and the dense feature fusion was used to build enough connections for non-adjacent levels. In the other branch， a multi-layer residual structure was used to speed up the training of the network， and a large number of channel attention modules were concatenated to extract more high-frequency detailed features as many as possible. Finally， a simple and efficient fusion subnet was used to fuse the two branches. In the experiment， this model is significantly better than the previous Dark Channel Prior （DCP）， All-in-One Dehazing Network （AODNet）， Gated Context Aggregation Network （GCANet）， and Multi-Scale Boosted Dehazing Network （MSBDN） dehazing models in the evaluation index Peak Signal-to-Noise Ratio （PSNR） and Structural SIMilarity （SSIM）. Experimental results show that the proposed network has better performance on nonhomogeneous dehazing datasets.

Floorplan generation algorithm integrating user requirements and boundary constraints

Ruoying WANG, Fan LYU, Liuqing ZHAO, Fuyuan HU

2023, 43(2): 575-582. DOI: 10.11772/j.issn.1001-9081.2021122143

Asbtract ( )

HTML ( )

PDF (2582KB) ( )

Figures and Tables | References | Related Articles | Metrics

Floorplan design is an important step of house design. However， the existing automatic floorplan design methods lack the common constraints of considering user requirements and building boundaries. Thus， these methods suffer from unreasonable layout problems such as missing corners of generated room， severe occlusion between rooms and room getting out of the boundary. In order to solve the above problems， a building floorplan GBC-GAN （Graph Boundary Constrained-Generative Adversarial Network） was proposed based on user requirements and boundary constraints， and the proposed method consists of a constraint layout generator and a room relation discriminator. Firstly， the user-specified floorplan layout requirements （including the number and types of rooms and the adjacency relationship between houses） were transformed into a constraint relation graph structure， after that， the building boundary and constraint relation graph were encoded separately for feature fusion. Then， by adding the prediction module of bounding box， the constraint layout generator was used to convert the floorplan generation problem into a bounding box generation problem of each room object， and the geometric boundary optimization loss was used to solve the problems of severe occlusion between rooms and room getting out of the boundary. Finally， the room bounding box layout and the constraint relation graph were input into the room relation discriminator for training to generate the floorplan layout meeting the room objects and their relations. The Frechet Inception Distance （FID） and Structural Similarity Index Measure （SSIM） of the proposed method are improved by 4.39% and 2.3% compared with those of House-GAN on the large-scale real building dataset RPLAN. Experimental results show the proposed method improves the rationality and authenticity of the floorplan layout under different user requirements and boundary constraints.

Three-dimensional human reconstruction model based on high-resolution net and graph convolutional network

Yating SU, Cuixiang LIU

2023, 43(2): 583-588. DOI: 10.11772/j.issn.1001-9081.2021122075

Asbtract ( )

HTML ( )

PDF (2124KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focused on the head pose flipping and the implicit spatial cues missing between image features when reconstructing human body from monocular images， a three-dimensional human reconstruction model based on High-Resolution Net （HRNet） and Graph Convolutional Network （GCN） was proposed. Firstly， the rich human feature information was extracted from the original image by using HRNet and residual blocks as the backbone network. Then， the accurate spatial feature representation was obtained by using GCN to capture the implicit spatial cues. Finally， the parameters of Skinned Multi-Person Linear model （SMPL） were predicted by using the features， thereby obtaining more accurate reconstruction results. At the same time， to effectively solve the problem of human head pose flipping， the joint points of SMPL were redefined and the definition of the head joint points were added on the basis of the original joints. Experimental results show that this model can exactly reconstruct the three-dimensional human body. The reconstruction accuracy of this model on the 2D dataset LSP reaches 92.41%， and the joint error and reconstruction error of the model are greatly reduced on the 3D dataset MPI-INF-3DHP with the average of only 97.73 mm and 64.63 mm respectively， verifying the effectiveness of the proposed model in the field of human reconstruction.

2D/3D spine medical image real-time registration method based on pose encoder

Shaokang XU, Zhancheng ZHANG, Haonan YAO, Zhiwei ZOU, Baocheng ZHANG

2023, 43(2): 589-594. DOI: 10.11772/j.issn.1001-9081.2021122147

Asbtract ( )

HTML ( )

PDF (2007KB) ( )

Figures and Tables | References | Related Articles | Metrics

2D/3D medical image registration is a key technology in 3D real-time navigation of orthopedic surgery. However， the traditional 2D/3D registration methods based on optimization iteration require multiple iterative calculations， which cannot meet the requirements of doctors for real-time registration during surgery. To solve this problem， a pose regression network based on autoencoder was proposed. In this network， the geometric pose information was captured through hidden space decoding， thereby quickly regressing the 3D pose of preoperative spine pose corresponding to the intraoperative X-ray image， and the final registration image was generated through reprojection. By introducing new loss functions， the model was constrained by “Rough to Fine” combined registration method to ensure the accuracy of pose regression. In CTSpine1K spine dataset， 100 CT scan image sets were extracted for 10-fold cross-validation. Experimental results show that the registration result image generated by the proposed model has the Mean Absolute Error （MAE） with the X-ray image of 0.04， the mean Target Registration Error （mTRE） with the X-ray image of 1.16 mm， and the single frame consumption time of 1.7 s. Compared to the traditional optimization based method， the proposed model has registration time greatly shortened. Compared with the learning-based method， this model ensures a high registration accuracy with quick registration. Therefore， the proposed model can meet the requirements of intraoperative real-time high-precision registration.

Deep face verification under pose interference

Qi WANG, Hang LEI, Xupeng WANG

2023, 43(2): 595-600. DOI: 10.11772/j.issn.1001-9081.2021122214

Asbtract ( )

HTML ( )

PDF (2023KB) ( )

Figures and Tables | References | Related Articles | Metrics

Face verification is widely used in various scenes in life， and the acquisition of ordinary RGB images is extremely dependent on illumination conditions. In order to solve the interference of illumination and head pose， a convolutional neural network based Siamese network L2-Siamese was proposed. Firstly， the paired depth images were taken as input. Then， after using two convolutional neural networks that share weights to extract facial features respectively， L2 norm was introduced to constrain the facial features with different poses on a hypersphere with a fixed radius. Finally， the fully connected layer was used to map the difference between the features to the probability value in （0，1） to determine whether the group of images belonged to the same object. In order to verify the effectiveness of L2-Siamese， a test was conducted on the public dataset Pandora. Experimental results show that L2-Siamese has good overall performance. After the dataset was grouped according to the size of head pose interference， the test results show that the prediction accuracy of L2-Siamese is 4 percentage points higher than that of the state-of-the-art algorithm fully-convolutional Siamese network under the maximum head pose interference， illustrating that the accuracy of prediction has been significantly improved.

Controllable face editing algorithm with closed-form solution

Lingling TAO, Bo LIU, Wenbo LI, Xiping HE

2023, 43(2): 601-607. DOI: 10.11772/j.issn.1001-9081.2022010030

Asbtract ( )

HTML ( )

PDF (2481KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problems in face editing， such as unnatural editing results and great changes in generated images， a controllable face editing algorithm with closed-form solution was proposed. Firstly， n latent vectors were sampled randomly to construct a sample matrix， and the top k principal component vectors of the matrix were calculated. Then， five attributes of face image were obtained by ResNet-50， and the semantic boundary of each attribute was calculated by Support Vector Machine （SVM）. Finally， the interpretable direction vectors of these attributes were calculated， which were as closed to the principal components vectors as possible and stayed as far away from the semantic boundary of the corresponding attribute as possible at the same time， thereby reducing the coupling between facial attributes， and improving the controllability in face editing. Because the algorithm has a closed-form solution， it has high efficiency. Experimental results show that the compared with closed-form Factorization of latent Semantics in GANs （SeFa） algorithm and Discovering Interpretable Generative Adversarial Network Controls （GANSpace） algorithm， the proposed algorithm increases the Inception Score （IS） by 19% and 26% respectively， decreases the Fréchet Inception Distance （FID） by 4% and 37% respectively， and decreases the Maximum Mean Discrepancy （MMD） by 15% and 48% respectively. It can be seen that this algorithm has good controllability and decoupling.

Lightweight traffic sign recognition model based on coordinate attention

Wenju LI, Gan ZHANG, Liu CUI, Wanghui CHU

2023, 43(2): 608-614. DOI: 10.11772/j.issn.1001-9081.2022010100

Asbtract ( )

HTML ( )

PDF (2351KB) ( )

Figures and Tables | References | Related Articles | Metrics

For the problems of unbalanced detection speed and recognition accuracy of traffic sign recognition models， and that it is difficult to detect occluded targets and small targets， YOLOv5 （You Only Look Once version 5） model was improved， and a lightweight traffic sign recognition model based on Coordinate Attention （CA） was proposed. Firstly， CA mechanism was integrated into the backbone network to effectively capture the relationships between location information and channels， so as to obtain the regions of interest more accurately and avoid too much computational overhead. Then， cross layer connections were added to the feature fusion network to fuse more feature information without increasing the cost， improve the feature extraction ability of the network and the detection effect of occluded targets. Finally， the improved CIoU （Complete Intersection over Union） function was introduced to calculate the localization loss， thereby alleviating the uneven distribution of sample size in the detection process， and further improving the recognition accuracy of small targets. Applying this model on TT100K （Tsinghua-Tencent 100K） dataset， the recognition accuracy is 91.5%， the recall is 86.64%， which are improved by 20.96% and 11.62% respectively compared with those of the traditional YOLOv5n model， and the frame processing rate is 140.84 FPS （Frames Per Second）. These experimental results fully verify the accuracy and real-time performance of the proposed model for traffic sign detection and recognition in real scenes.

Non-fragile dissipative control scheme for event-triggered networked systems

Chao GE, Yaxin ZHANG, Yue LIU, Hong WANG

2023, 43(2): 615-621. DOI: 10.11772/j.issn.1001-9081.2022010007

Asbtract ( )

HTML ( )

PDF (2029KB) ( )

Figures and Tables | References | Related Articles | Metrics

For the problems of limited bandwidth resources， the existence of external disturbance and parameter uncertainty， a non-fragile dissipative control scheme for event-triggered networked systems was proposed. Firstly， based on the Networked Control System （NCS） model， a non-periodic sampling event-triggered scheme was proposed， and a delay closed-loop system model was established. Then， a novel bilateral Lyapunov functional was constructed by using the structure characteristics of sawtooth wave. Finally， the sufficient conditions to ensure the stability of the system were derived by using methods such as Jensen inequality， free weight matrix and convex combination， and the gain of the feedback controller was calculated. The results of numerical simulation show that the proposed bilateral functional is less conservative than the unilateral functional， the event-triggered mechanism can save bandwidth compared with the common sampling mechanism， and the proposed controller is feasible.

Virtual screening of drug synthesis reaction based on multimodal data fusion

Xiaofei SUN, Jingyuan ZHU, Bin CHEN, Hengzhi YOU

2023, 43(2): 622-629. DOI: 10.11772/j.issn.1001-9081.2021122228

Asbtract ( )

HTML ( )

PDF (3028KB) ( )

Figures and Tables | References | Related Articles | Metrics

Drug synthesis reactions， especially asymmetric reactions， are the key components of modern pharmaceutical chemistry. Chemists have invested a lot in manpower and resources to identify various chemical reaction patterns in order to achieve efficient synthesis and asymmetric catalysis. The latest researches of quantum mechanical computing and machine learning algorithms in this field have proved the great potential of accurate virtual screening and learning the existing drug synthesis reaction data by computers. However， the existing methods only use few single-modal data， and can only use the common machine learning methods due to the limitation of not enough data. This hinders their universal application in a wider range of scenarios. Therefore， two screening models of drug synthesis reaction integrating multimodal data were proposed for virtual screening of reaction yield and enantioselectivity. At the same time， a 3D conformation descriptor based on Boltzmann distribution was also proposed to combine the 3D spatial information of molecules with quantum mechanical properties. These two multimodal data fusion models were trained and verified in two representative organic synthesis reactions （C-N cross coupling reaction and N， S-acetal formation）. The R²（R-squared） of the former is increased by more than 1 percentage point compared with those of the baseline methods in most data splitting， and the MAE（Mean Absolute Error） of the latter is decreased by more than 0.5 percentage points compared with those of the baseline methods in most data splitting. It can be seen that the models based on multimodal data fusion will bring good performance in different tasks of organic reaction screening.

Design of very short antipollution error correcting code based on global distance optimization

Jianqiang LIU, Yepin QU, Yuhai LYU

2023, 43(2): 630-635. DOI: 10.11772/j.issn.1001-9081.2021122065

Asbtract ( )

HTML ( )

PDF (1784KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing two-dimensional codes have the problems of weak antipollution ability and slow decoding speed in complex environment. To solve these problems， a very short antipollution error correcting code based on global distance optimization was proposed. Firstly， a concave-convex polygon mathematical model was constructed to characterize the polluted environment. Then， a very short error correcting code was designed， which uses three coding points to represent one target data bit. Finally， a coding point arrangement method was designed， which optimizes the global distance within a limited constrained domain. The corresponding decoding algorithm was also given. The antipollution ability and recognition speed of very short error correcting code were simulated and analyzed， and the proposed code was compared with the classical Bose-Chaudhuri-Hocquenghem （BCH） codes. The results show that when the target data length is 18 and the number of coding points is 63， the recognition accuracy of very short error correcting code is close to that of BCH codes in the same polluted environment with the decoding speed of 130 times of that of BCH codes. The proposed code also has the obvious advantages of simple and clear structure， strong adaptability of coding points， and being easy to be standardized and popularized.

Low-carbon multimodal transportation path optimization based on multi-objective fuzzy chance-constrained programming

Min ZHANG, Xiaolong HAN

2023, 43(2): 636-644. DOI: 10.11772/j.issn.1001-9081.2021122085

Asbtract ( )

HTML ( )

PDF (2831KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the optimization problem of multimodal transportation path optimization under the uncertainty of time window and demand， trapezoidal fuzzy numbers were used to express fuzzy demand and fuzzy time window， and a multi-objective fuzzy chance-constrained model was established considering carbon emission costs， transportation costs， and customer satisfaction. The fixed crossover and mutation probabilities will directly affect the convergence of the algorithm. For this problem， the adaptability was combined with the Non-dominated Sorting Genetic Algorithm-Ⅱ （NSGA-Ⅱ）， and the effectiveness of the proposed model and algorithm was verified by comparing them with DOCPLEX and NSGA-Ⅱ. Finally， the influence of changes in carbon tax and fuzzy demand preference value on optimization results were explored. The research results show that the proposal of carbon tax can effectively promote “road-to-rail transportation and road-to-water transportation” significantly， thereby reducing carbon emissions. But too high carbon tax does not necessarily reduce carbon emissions， and also causes excessive costs to enterprises. And the increase of fuzzy demand preference value will lead to the increase of total cost， which means that transportation economy and reliability cannot be obtained at the same time. Therefore， setting reasonable carbon tax and fuzzy demand preference value is an effective way to improve the environmental benefit and transportation benefit of multimodal transportation.

Joint operation of quay crane and straddle carrier under dual-cycle strategy

Yuqing ZHOU, Xiaolong HAN

2023, 43(2): 645-653. DOI: 10.11772/j.issn.1001-9081.2021122042

Asbtract ( )

HTML ( )

PDF (3119KB) ( )

Figures and Tables | References | Related Articles | Metrics

The use of straddle carrier in container terminals can reduce the operation links and reduce the types and quantity of terminal mechanical equipment， at the same time， the setting of buffer capacity is very important. Firstly， in order to reduce the overall completion time of the terminal， improve the operation efficiency of the terminal， as well as solve the spatial-temporal coordination problem caused by joint loading and unloading operation of using straddle carrier as horizontal transportation equipment with quay crane， the dual-cycle operation strategy was introduced and joint operation sequence optimization problem of quay crane and straddle carrier was studied. Secondly， a mixed integer programming model was established to minimize the total completion time. In the model， the practical constraints of dual-cycle operation of quay crane and straddle carrier， as well as the constraints of the buffer capacity of quay crane and safety time were considered. Thirdly， aiming at the limitations of traditional Tabu Search （TS） algorithm， an greedy algorithm based reactive TS algorithm was designed by introducing greedy algorithm， multi-neighborhood search method and reactive algorithm， and numerical experiments were conducted. Experimental results verify the effectiveness of the proposed model and algorithm. Finally， through the experimental analysis of the number of buffer capacity and straddle carrier， the ratio of quay crane and straddle carrier， the optimal number of straddle carriers and buffer capacity， as well as the ratio of quay crane and straddle carrier were obtained. The results show that compared with traditional terminal equipment configuration， the dual-cycle operation strategy can reduce the number of straddle carriers and improve the utilization rate of quay crane and straddle carrier.

Mobile robot path planning based on improved SAC algorithm

Yongdi LI, Caihong LI, Yaoyu ZHANG, Guosheng ZHANG

2023, 43(2): 654-660. DOI: 10.11772/j.issn.1001-9081.2021122053

Asbtract ( )

HTML ( )

PDF (5152KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the long training time and slow convergence problems when applying SAC （Soft Actor-Critic） algorithm to the local path planning of mobile robots， a PER-SAC algorithm was proposed by introducing the Prioritized Experience Replay （PER） technique. Firstly， to improve the convergence speed and stability of the robot training process， a priority strategy was applied to extract samples from the experience pool instead of the traditional random sampling and the network prioritized the training of samples with larger errors. Then， the calculation of Temporal-Difference （TD） error was optimized， and the training deviation was reduced. Next， the transfer learning was used to train the robot from a simple environment to a complex one gradually in order to improve the training speed. In addition， an improved reward function was designed to increase the intrinsic reward of robots， and therefore， the sparsity problem of environmental reward was solved. Finally， the simulation was carried out on the ROS （Robot Operating System） platform， and the simulation results show that PER-SAC algorithm outperforms the original algorithm in terms of convergence speed and length of the planned path in different obstacle environments. Moreover， the PER-SAC algorithm can reduce the training time and is significantly better than the original algorithm on path planning performance.

Table of Content