Journal of Computer Applications

Discriminative multidimensional scaling for feature learning

Haitao TANG, Hongjun WANG, Tianrui LI

2023, 43(5): 1323-1329. DOI: 10.11772/j.issn.1001-9081.2022030419

Asbtract ( )

HTML ( )

PDF (1101KB) ( )

Figures and Tables | References | Related Articles | Metrics

Traditional multidimensional scaling method achieves low-dimensional embedding， which maintains the topological structure of data points but ignores the discriminability of the low-dimensional embedding itself. Based on this， an unsupervised discriminative feature learning method based on multidimensional scaling method named Discriminative MultiDimensional Scaling model （DMDS） was proposed to discover the cluster structure while learning the low-dimensional data representation. DMDS can make the low-dimensional embeddings of the same cluster closer to make the learned data representation be more discriminative. Firstly， a new objective function corresponding to DMDS was designed， reflecting that the learned data representation could maintain the topology and enhance discriminability simultaneously. Secondly， the objective function was reasoned and solved， and a corresponding iterative optimization algorithm was designed according to the reasoning process. Finally， comparison experiments were carried out on twelve public datasets in terms of average accuracy and average purity of clustering. Experimental results show that DMDS outperforms the original data representation and the traditional multidimensional scaling model based on the comprehensive evaluation of Friedman statistics， the low-dimensional embeddings learned by DMDS are more discriminative.

Improved capsule network based on multipath feature

Qinghai XU, Shifei DING, Tongfeng SUN, Jian ZHANG, Lili GUO

2023, 43(5): 1330-1335. DOI: 10.11772/j.issn.1001-9081.2022030367

Asbtract ( )

HTML ( )

PDF (1560KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the problems of poor classification of Capsule Network （CapsNet） on complex datasets and large number of parameters in the routing process， a Capsule Network based on Multipath feature （MCNet） was proposed， including a novel capsule feature extractor and a novel capsule pooling method. By the capsule feature extractor， the features of different layers and locations were extracted in parallel from multiple paths， and then the features were encoded into capsule features containing more semantic information. In the capsule pooling method， the most active capsules at each position of the capsule feature map were selected， and the effective capsule features were represented by a small number of capsules. Comparisons were performed on four datasets （CIFAR-10， SVHN， Fashion-MNIST， MNIST） with models such as CapsNet. Experimental results show that MCNet has the classification accuracy of 79.27% on CIFAR-10 dataset and the number of trainable parameters of 6.25×10⁶； compared with CapsNet， MCNet has the classification accuracy improved by 8.7%， and the number of parameters reduced by 46.8%. MCNet can effectively improve the classification accuracy while reducing the number of trainable parameters.

Comparison of three-way concepts under attribute clustering

Xiaoyan ZHANG, Jiayi WANG

2023, 43(5): 1336-1341. DOI: 10.11772/j.issn.1001-9081.2022030399

Asbtract ( )

HTML ( )

PDF (471KB) ( )

Figures and Tables | References | Related Articles | Metrics

Three-way concept analysis is a very important topic in the field of artificial intelligence. The biggest advantage of this theory is that it can study “attributes that are commonly possessed” and “attributes that are commonly not possessed” of the objects in the formal context at the same time. It is well known that the new formal context generated by attribute clustering has a strong connection with the original formal context， and there is a close internal connection between the original three-way concepts and the new three-way concepts obtained by attribute clustering. Therefore， the comparative study and analysis of three-way concepts under attribute clustering were carried out. Firstly， the concepts of pessimistic， optimistic and general attribute clusterings were proposed on the basis of attribute clustering， and the relationship among these three concepts was studied. Moreover， the difference between the original three-way concepts and the new ones was studied by comparing the clustering process with the formal process of three-way concepts. Furthermore， two minimum constraint indexes were put forward from the perspective of object-oriented and attribute-oriented respectively， and the influence of attribute clustering on three-way concept lattice was explored. The above results further enrich the theory of three-way concept analysis and provide feasible ideas for the field of visual data processing.

Iteratively modified robust extreme learning machine

Xinwei LYU, Shuxia LU

2023, 43(5): 1342-1348. DOI: 10.11772/j.issn.1001-9081.2022030429

Asbtract ( )

HTML ( )

PDF (823KB) ( )

Figures and Tables | References | Related Articles | Metrics

Many variations of Extreme Learning Machine （ELM） aim at improving the robustness of ELMs to outliers， while the traditional Robust Extreme Learning Machine （RELM） is very sensitive to outliers. How to deal with too many extreme outliers in the data becomes the most difficult problem for constructing RELM models. For outliers with large residuals， a bounded loss function was used to eliminate the pollution of outliers to the model； to solve the problem of excessive outliers， iterative modification technique was used to modify data to reduce the influence caused by excessive outliers. Combining these two approaches， an Iteratively Modified RELM （IMRELM） was proposed and it was solved by iteration. In each iteration， the samples were reweighted to reduce the influence of outliers and the under-fitting was avoided in the process of continuous modification. IMRELM， ELM， Weighted ELM （WELM）， Iteratively Re-Weighted ELM （IRWELM） and Iterative Reweighted Regularized ELM （IRRELM） were compared on synthetic datasets and real datasets with different outlier levels. On the synthetic dataset with 80% outliers， the Mean-Square Error （MSE） of IRRELM is 2.450 44， and the MSE of IMRELM is 0.000 79. Experimental results show that IMRELM has good prediction accuracy and robustness on data with excessive extreme outliers.

Multi-label cross-modal hashing retrieval based on discriminative matrix factorization

Yu TAN, Xiaoqin WANG, Rushi LAN, Zhenbing LIU, Xiaonan LUO

2023, 43(5): 1349-1354. DOI: 10.11772/j.issn.1001-9081.2022030424

Asbtract ( )

HTML ( )

PDF (929KB) ( )

Figures and Tables | References | Related Articles | Metrics

Existing cross-modal hashing algorithms underestimate the importance of semantic differences between different class labels and ignore the balance condition of hash vectors， which makes the learned hash codes less discriminative. In addition， some methods utilize the label information to construct similarity matrix and treat multi-label data as single label ones to perform modeling， which causes large semantic loss in multi-label cross-modal retrieval. To preserves accurate similarity relationship between heterogeneous data and the balance property of hash vectors， a novel supervised hashing algorithm， namely Discriminative Matrix Factorization Hashing （DMFH） was proposed. In this method， the Collective Matrix Factorization （CMF） of the kernelized features was used to obtain a shared latent subspace. The proportion of common labels between the data was also utilized to describe the similarity degree of the heterogeneous data. Besides， a balanced matrix was constructed by label balanced information to generate hash vectors with balance property and maximize the inter-class distances among different class labels. By comparing with seven advanced cross-modal hashing retrieval methods on two commonly used multi-label datasets， MIRFlickr and NUS-WIDE， DMFH achieves the best mean Average Precision （mAP） on both I2T （Image to Text） and T2I （Text to Image） tasks， and the mAPs of T2I are better， indicating that DMFH can utilize the multi-label semantic information in text modal more effectively. The validity of the constructed balanced matrix and similarity matrix is also analyzed， verifying that DMFH can maintain semantic information and similarity relations， and is effective in cross-modal hashing retrieval.

Teaching-learning-based optimization algorithm based on cooperative mutation and Lévy flight strategy and its application

Hao GAO, Qingke ZHANG, Xianglong BU, Junqing LI, Huaxiang ZHANG

2023, 43(5): 1355-1364. DOI: 10.11772/j.issn.1001-9081.2022030420

Asbtract ( )

HTML ( )

PDF (2787KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the shortcomings of unbalanced search， easy to fall into local optimum and weak comprehensive solution performance of Teaching-Learning-Based Optimization （TLBO） algorithm in dealing with optimization problems， an improved TLBO based on equilibrium optimization and Lévy flight strategy， namely ELMTLBO （Equilibrium-Lévy-Mutation TLBO）， was proposed. Firstly， an elite equilibrium guidance strategy was designed to improve the global optimization ability of the algorithm through the equilibrium guidance of multiple elite individuals in the population. Secondly， a strategy combining Lévy flight with adaptive weight was added after the learner phase of TLBO algorithm， and adaptive scaling was performed by the weight to the step size generated by Lévy flight， which improved the population's local optimization ability and enhanced the self-adaptability of individuals to complex environments. Finally， a mutation operator pool escape strategy was designed to improve the population diversity of the algorithm by the cooperative guidance of multiple mutation operators. To verify the effectiveness of the algorithm improvement， the comprehensive convergence performance of the ELMTLBO algorithm was compared with 7 state-of-the-art intelligent optimization algorithms such as the Dwarf Mongoose Optimization Algorithm （DMOA）， as well as the same type of algorithms such as Balanced TLBO （BTLBO） and standard TLBO on 15 international test functions. The statistical experiment results show that compared with advanced intelligent optimization algorithms and TLBO algorithm variants， ELMTLBO algorithm can effectively balance its search ability， not only solving both unimodal and multimodal problems， but also having significant optimization ability in complex multimodal problems. It can be seen that with the combined effect of different strategies， ELMTLBO algorithm has outstanding comprehensive optimization performance and stable global convergence performance. In addition， ELMTLBO algorithm was successfully applied to the Multiple Sequence Alignment （MSA） problem based on Hidden Markov Model （HMM）， and the high-quality aligned sequences obtained by this algorithm can be used in disease diagnosis， gene tracing and some other fields， which can provide good algorithmic support for the development of bioinformatics.

J-SGPGN： paraphrase generation network based on joint learning of sequence and graph

Zhirong HOU, Xiaodong FAN, Hua ZHANG, Xiaonan MA

2023, 43(5): 1365-1371. DOI: 10.11772/j.issn.1001-9081.2022040626

Asbtract ( )

HTML ( )

PDF (951KB) ( )

Figures and Tables | References | Related Articles | Metrics

Paraphrase generation is a text data argumentation method based on Natural Language Generation （NLG）. Concerning the problems of repetitive generation， semantic errors and poor diversity in paraphrase generation methods based on the Sequence-to-Sequence （Seq2Seq） framework， a Paraphrase Generation Network based on Joint learning of Sequence and Graph （J-SGPGN） was proposed. Graph encoding and sequence encoding were fused in the encoder of J-SGPGN for feature enhancement， and two decoding methods including sequence generation and graph generation were designed in the decoder of J-SGPGN for parallel decoding. Then the joint learning method was used to train the model， aiming to combine syntactic supervision with semantic supervision to simultaneously improve the accuracy and diversity of generation. Experimental results on Quora dataset show that the generation accuracy evaluation indicator METEOR （Metric for Evaluation of Translation with Explicit ORdering） of J-SGPGN is 3.44 percentage points higher than that of the baseline model with optimal accuracy — RNN+GCN， and the generation diversity evaluation indicator Self-BLEU （Self-BiLingual Evaluation Understudy） of J-SGPGN is 12.79 percentage points lower than that of the baseline model with optimal diversity — Back-Translation guided multi-round Paraphrase Generation （BTmPG） model. It is verified that J-SGPGN can generate paraphrase text with more accurate semantics and more diverse expressions.

Pedestrian head tracking model based on full-body appearance features

Guangyao ZHANG, Chunfeng SONG

2023, 43(5): 1372-1377. DOI: 10.11772/j.issn.1001-9081.2022030377

Asbtract ( )

HTML ( )

PDF (2258KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing pedestrian multi-object tracking algorithms have the problems of undetectable pedestrians and inter-frame association confusion in dense scenes. In order to improve the precision of pedestrian tracking in dense scenes， a head tracking model based on full-body appearance features was proposed， namely HT-FF （Head Tracking with Full-body Features）. Firstly， the head detector was used to replace the full-body detector to improve the detection rate of pedestrians in dense scenes. Secondly， using the information of human posture estimation as a guide， the noise-removed full-body appearance features were obtained as tracking clues， which greatly reduced the confusion in the association among multiple frames. HT-FF model achieves the best results on multiple indicators such as MOTA （Multiple Object Tracking Accuracy） and IDF1 （ID F1 Score） on benchmark dataset of pedestrian tracking in dense scenes — Head Tracking 21 （HT21）. The HT-FF model can effectively alleviate the problem of lost and confused pedestrian tracking in dense scenes， and the proposed tracking model combining multiple clues is a new paradigm of pedestrian tracking model.

Stock movement prediction with market dynamic hierarchical macro information

Yafei ZHANG, Jing WANG, Yaoshuai ZHAO, Zhihao WU, Youfang LIN

2023, 43(5): 1378-1384. DOI: 10.11772/j.issn.1001-9081.2022030400

Asbtract ( )

HTML ( )

PDF (1401KB) ( )

Figures and Tables | References | Related Articles | Metrics

The complex structure and diverse imformation of stock markets make stock movement prediction extremely challenging. However， most of the existing studies treat each stock as an individual or use graph structures to model complex higher-order relationships in stock markets， without considering the hierarchy and dynamics among stocks， industries and markets. Aiming at the above problems， a Dynamic Macro Memory Network （DMMN） was proposed， and price movement prediction was performed for multiple stocks simultaneously based on DMMN. In this method， the market macro-environmental information was modeled by the hierarchies of “stock-industry-market”， and long-term dependences of this information on time series were captured. Then， the market macro-environmental information was integrated with stock micro-characteristic information dynamically to enhance the ability of each stock to perceive the overall state of the market and capture the interdependences among stocks， industries， and markets indirectly. Experimental results on the collected CSI300 dataset show that compared with stock prediction methods based on Attentive Long Short-Term Memory （ALSTM） network， GCN-LSTM （Graph Convolutional Network with Long Short-Term Memory）， Convolutional Neural Network （CNN） and other models， the DMMN-based method achieves better results in F1-score and Sharpe ratio， which are improved by 4.87% and 31.90% respectively compared with ALSTM， the best model among all comparison methods. This indicates that DMMN has better prediction performance and better practicability.

Stock return prediction via multi-scale kernel adaptive filtering

Xingheng TANG, Qiang GUO, Tianhui XU, Caiming ZHANG

2023, 43(5): 1385-1393. DOI: 10.11772/j.issn.1001-9081.2022030401

Asbtract ( )

HTML ( )

PDF (1992KB) ( )

Figures and Tables | References | Related Articles | Metrics

In stock market， investors can predict the future stock return by capturing the potential trading patterns in historical data. The key issue for predicting stock return is how to find out the trading patterns accurately. However， it is generally difficult to capture them due to the influence of uncertain factors such as corporate performance， financial policies， and national economic growth. To solve this problem， a Multi-Scale Kernel Adaptive Filtering （MSKAF） method was proposed to capture the multi-scale trading patterns from past market data. In this method， in order to describe the multi-scale features of stocks， Stationary Wavelet Transform （SWT） was employed to obtain data components with different scales. The different trading patterns hidden in stock price fluctuations were contained in these data components. Then， the Kernel Adaptive Filtering （KAF） was used to capture the trading patterns with different scales to predict the future stock return. Experimental results show that compared with those of the prediction model based on Two-Stage KAF （TSKAF）， the Mean Absolute Error （MAE） of the results generated by the proposed method is reduced by 10%， and the Sharpe Ratio （SR） of the results generated by the proposed method is increased by 8.79%， verifying that the proposed method achieves better stock return prediction performance.

Extraction of PM_2.5 diffusion characteristics based on candlestick pattern matching

Rui XU, Shuang LIANG, Hang WAN, Yimin WEN, Shiming SHEN, Jian LI

2023, 43(5): 1394-1400. DOI: 10.11772/j.issn.1001-9081.2022030437

Asbtract ( )

HTML ( )

PDF (2423KB) ( )

Figures and Tables | References | Related Articles | Metrics

Most existing air quality prediction methods focus on simple time series data for trend prediction， and ignore the pollutant transport and diffusion laws and corresponding classified pattern features. In order to solve the above problem， a PM_2.5diffusion characteristic extraction method based on Candlestick Pattern Matching （CPM） was proposed. Firstly， the basic periodic candlestick charts from a large number of historical PM_2.5 sequences were generated by using the convolution idea of Convolutional Neural Network （CNN）. Then， the concentration patterns of different candlestick chart feature vectors were clustered and analyzed by using the distance formula. Finally， combining the unique advantages of CNN in image recognition， a hybrid model integrating graphical features and time series features sequences was formed， and the trend reversal that would be caused by candlestick charts with reversal signals was judged. Experimental results on the monitoring time series dataset of Guilin air quality online monitoring stations show that compared with the VGG （Visual Geometry Group）-based method which uses the single time series data， the accuracy of the CPM-based method is improved by 1.9 percentage points. It can be seen that the CPM-based method can effectively extract the trend features of PM_2.5 and be used for predicting the periodic change of pollutant concentration in the future.

Adaptive multi-scale feature channel grouping optimization algorithm based on NSGA‑Ⅱ

Bin WANG, Tian XIANG, Yidong LYU, Xiaofan WANG

2023, 43(5): 1401-1408. DOI: 10.11772/j.issn.1001-9081.2022040581

Asbtract ( )

HTML ( )

PDF (3248KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the balance optimization problem of Lightweight Convolutional Neural Network （LCNN） in accuracy and complexity， an adaptive multi-scale feature channel grouping optimization algorithm based on fast Non-dominated Sorting Genetic Algorithm （NSGA-Ⅱ） was proposed to optimize the feature channel grouping structure of LCNN. Firstly， the complexity minimization and accuracy maximization of the feature fusion layer structure in LCNN were regarded as two optimization objectives， and the dual-objective function modeling and theoretical analysis were carried out. Then， a LCNN structure optimization framework based on NSGA-Ⅱ was designed， and an adaptive grouping layer based on NSGA-Ⅱ was added to deep convolution layer in original LCNN structure， thus constructing an Adaptive Multi-scale Feature Fusion Network based on NSGA2 （NSGA2-AMFFNetwork）. Experimental results on image classification datasets show that compared with the manually designed network structure M_blockNet_v1， NSGA2-AMFFNetwork has the average accuracy improved by 1.220 2 percentage points， and the running time decreased by 41.07%. This above indicates that the proposed optimization algorithm can balance the complexity and accuracy of LCNN， and also provide more options for network structure with balanced performance for ordinary users who lack domain knowledge.

Global image captioning method based on graph attention network

Jiahong SUI, Yingchi MAO, Huimin YU, Zicheng WANG, Ping PING

2023, 43(5): 1409-1415. DOI: 10.11772/j.issn.1001-9081.2022040513

Asbtract ( )

HTML ( )

PDF (2508KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing image captioning methods only focus on the grid spatial location features without enough grid feature interaction and full use of image global features. To generate higher-quality image captions， a global image captioning method based on Graph ATtention network （GAT） was proposed. Firstly， a multi-layer Convolutional Neural Network （CNN） was utilized for visual encoding， extracting the grid features and entire image features of the given image and building a grid feature interaction graph. Then， by using GAT， the feature extraction problem was transformed into a node classification problem， including a global node and many local nodes， and the global and local features were able to be fully utilized after updating the optimization. Finally， through the Transformer-based decoding module， the improved visual features were adopted to realize image captioning. Experimental results on the Microsoft COCO dataset demonstrated that the proposed method effectively captured the global and local features of the image， achieving 133.1% in CIDEr （Consensus-based Image Description Evaluation） metric. It can be seen that the proposed image captioning method is effective in improving the accuracy of image captioning， thus allowing processing tasks such as classification， retrieval， and analysis of images by words.

Text image editing method based on font and character attribute guidance

Jingchao CHEN, Shugong XU, Youdong DING

2023, 43(5): 1416-1421. DOI: 10.11772/j.issn.1001-9081.2022040520

Asbtract ( )

HTML ( )

PDF (4333KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of inconsistent text style before and after editing and insufficient readability of the generated new text in text image editing tasks， a text image editing method based on the guidance of font and character attributes was proposed. Firstly， the generation direction of text foreground style was guided by the font attribute classifier combined with font classification， perception and texture losses to improve the consistency of text style before and after editing. Secondly， the accurate generation of text glyphs was guided by the character attribute classifier combined with the character classification loss to reduce text artifacts and generation errors， and improve the readability of generated new text. Finally， the end-to-end fine-tuned training strategy was used to refine the generated results for the entire staged editing model. In the comparison experiments with SRNet （Style Retention Network） and SwapText， the proposed method achieves PSNR （Peak Signal-to-Noise Ratio） and SSIM （Structural SIMilarity） of 25.48 dB and 0.842， which are 2.57 dB and 0.055 higher than those of SRNet and 2.11 dB and 0.046 higher than those of SwapText， respectively； the Mean Square Error （MSE） is 0.004 3， which is 0.003 1 and 0.024 lower than that of SRNet and SwapText， respectively. Experimental results show that the proposed method can effectively improve the generation effect of text image editing.

Multi-learning behavior collaborated knowledge tracing model

Kai ZHANG, Zhengchu QIN, Yue LIU, Xinyi QIN

2023, 43(5): 1422-1429. DOI: 10.11772/j.issn.1001-9081.2022091313

Asbtract ( )

HTML ( )

PDF (2411KB) ( )

Figures and Tables | References | Related Articles | Metrics

Knowledge tracing models mainly use three types of learning behaviors data， including learning process， learning end and learning interval， but the existing studies do not fuse the above types of learning behaviors and cannot accurately describe the interactions of multiple types of learning behaviors. To address these issues， a Multi-Learning Behavior collaborated Knowledge Tracing （MLB-KT） model was proposed. First， the multi-head attention mechanism was used to describe the homo-type constraint for each type of learning behavior， then the channel attention mechanism was used to model the multi-type collaboration in three types of learning behaviors. Comparison experiments of MLB-KT， Deep Knowledge Tracing （DKT） and Temporal Convolutional Knowledge Tracing with Attention mechanism （ATCKT） models were conducted on three datasets. Experimental results show that the MLB-KT model has a significant increase in Area Under the Curve （AUC） and performs best on ASSISTments2017 dataset， the AUC is improved by 12.26% and 2.77% compared to DKT and ATCKT respectively； the results of the representation quality comparison experiments also verify that the MLB-KT model has better performance. In summary， modeling the homo-type constraint and multi-type collaboration can better determine students' knowledge status and predict their future answers.

Dialogue state tracking model based on slot correlation information extraction

Lifeng SHI, Zhengwei NI

2023, 43(5): 1430-1437. DOI: 10.11772/j.issn.1001-9081.2022040508

Asbtract ( )

HTML ( )

PDF (1557KB) ( )

Figures and Tables | References | Related Articles | Metrics

Dialogue State Tracking （DST） is an important module in task-oriented dialogue systems， but the existing open-vocabulary-based DST models do not make full use of the slot correlation information as well as the structural information of the dataset itself. To solve the above problems， a new DST model named SCEL-DST （SCE and LOW for Dialogue State Tracking） was proposed based on slot correlation information extraction. Firstly， a Slot Correlation Extractor （SCE） was constructed， and the attention mechanism was used to learn the correlation information between slots. Then the Learning Optimal sample Weights （LOW） strategy was applied in the training process to enhance the model's utilization of the dataset information without substantial increase in training time. Finally， the model details were optimized to build the complete SCEL-DST model. Experimental results show that SCE and LOW are critical to the performance improvement of SCEL-DST model， making SCEL-DST achieve higher joint goal accuracy on both experimental datasets. The SCEL-DST model has the joint goal accuracy improved by 1.6 percentage points on the MultiWOZ 2.3 （Wizard-of-OZ 2.3） dataset compared to TripPy （Triple coPy） under the same conditions， and by 2.0 percentage points on the WOZ 2.0 （Wizard-of-OZ 2.0） dataset compared to AG-DST （Amendable Generation for Dialogue State Tracking）.

Joint entity and relation extraction based on contextual semantic enhancement

Jingsheng LEI, Kaijun LA, Shengying YANG, Yi WU

2023, 43(5): 1438-1444. DOI: 10.11772/j.issn.1001-9081.2022040625

Asbtract ( )

HTML ( )

PDF (1612KB) ( )

Figures and Tables | References | Related Articles | Metrics

Span-based joint extraction model shares the semantic representation of entity spans in entity and Relation Extraction （RE） tasks， which effectively reduces the cascade error caused by pipeline models. However， the existing models cannot adequately integrate contextual information into the representation of entities and relations. To solve this problem， a Joint Entity and Relation extraction model based on Contextual semantic Enhancement （JERCE） was proposed. Firstly， the semantic feature representations of sentence-level text and inter-entity text were obtained by contrastive learning method. Then， the representations were added into the representations of entity and relation to predict entities and relations jointly. Finally， the loss values of the two tasks were adjusted dynamically to optimize the overall performance of the joint model. In experiments on public datasets CoNLL04， ADE and ACE05， compared with Trigger-sense Memory Flow framework （TriMF）， the proposed JERCE model has the F1 scores of entity recognition improved by 1.04， 0.13 and 2.12 percentage points respectively， and the F1 scores of RE increased by 1.19， 1.14 and 0.44 percentage points respectively. Experimental results show that the JERCE model can fully obtain semantic information in context.

Threat intelligence entity relation extraction method integrating bootstrapping and semantic role labeling

Shunhang CHENG, Zhihua LI, Tao WEI

2023, 43(5): 1445-1453. DOI: 10.11772/j.issn.1001-9081.2022040551

Asbtract ( )

HTML ( )

PDF (1678KB) ( )

Figures and Tables | References | Related Articles | Metrics

To efficiently and automatically mine threat intelligence entities and their relations in open source heterogeneous big data， a Threat Intelligence Entity Relation Extraction （TIERE） method was proposed. Firstly， a data preprocessing method was studied and presented by analyzing the characteristics of the open source cyber security reports. Then， an Improved BootStrapping-based Named Entity Recognition （NER-IBS） algorithm and a Semantic Role Labeling-based Relation Extraction （RE-SRL） algorithm were developed for the problems of high text complexity and small standard dataset in cyber security field. Initial seeds were constructed by using a small number of samples and rules， the entities in the unstructured text were mined through iterative training， and the relations between entities were mined by the strategy of constructing semantic roles. Experimental results show that on the few-shot cyber security information extraction dataset， the F1 value of the NER-IBS algorithm is 84%， which is 2 percentage points higher than that of the RDF-CRF （Regular expression and Dictionary combined with Feature templates as well as Conditional Random Field） algorithm， and the F1 value of RE-SRL algorithm for uncategorized relation extraction is 94%， proving that TIERE method has efficient entity and relation extraction capability.

Aspect-oriented fine-grained opinion tuple extraction with adaptive span features

Linying CHEN, Jianhua LIU, Shuihua SUN, Zhixiong ZHENG, Honghui LIN, Jie LIN

2023, 43(5): 1454-1460. DOI: 10.11772/j.issn.1001-9081.2022040502

Asbtract ( )

HTML ( )

PDF (1182KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aspect-oriented Fine-grained Opinion Extraction （AFOE） extracts aspect terms and opinion terms from reviews in the form of opinion pairs or additionally extracts sentiment polarities of aspect terms on the basis of the above to form opinion triplets. Aiming at the problem of neglecting correlation between the opinion pairs and contexts， an aspect-oriented Adaptive Span Feature-Grid Tagging Scheme （ASF-GTS） model was proposed. Firstly， BERT （Bidirectional Encode Representation from Transformers） model was used to obtain the feature representation of the sentence. Then， the correlation between the opinion pair and local context was enhanced by the Adaptive Span Feature （ASF） method. Next， Opinion Pair Extraction （OPE） was transformed into a uniform grid tagging task by Grid Tagging Scheme （GTS）. Finally， the corresponding opinion pairs or opinion triplet were generated by the specific decoding strategy. Experiments were carried out on four AFOE benchmark datasets adaptive to the task of opinion tuple extraction. The results show that compared with GTS-BERT （Grid Tagging Scheme-BERT） model， the proposed model has the F1-score improved by 2.42% to 7.30% and 2.62% to 6.61% on opinion pair or opinion triplet tasks， respectively. The proposed model can effectively reserve the sentiment correlation between opinion pair and context， and extract opinion pairs and their sentiment polarities more accurately.

Text classification of agricultural news based on ERNIE+DPCNN+BiGRU

Senqi YANG, Xuliang DUAN, Zhan XIAO, Songsong LANG, Zhiyong LI

2023, 43(5): 1461-1466. DOI: 10.11772/j.issn.1001-9081.2022040641

Asbtract ( )

HTML ( )

PDF (1813KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of poor targeted performance， unclear classification and lack of datasets faced by agricultural news， an agricultural news classification model based on Enhanced Representation through kNowledge IntEgration （ERNIE）， Deep Pyramidal Convolutional Neural Network （DPCNN） and Bidirectional Gated Recurrent Unit （BiGRU）， called EGC， was proposed. The dataset was first encoded by using ERNIE， then the features of the news text were extracted simultaneously by using the improved DPCNN and BiGRU， and the features extracted were combined and the final results were obtained by Softmax. To make EGC model more suitable for applications in the field of agricultural news classification， the DPCNN was improved by reducing its convolution layers to preserve more features. Experimental results show that compared with ERNIE， the precision， recall and F1 score of the proposed EGC model are improved by 1.47， 1.29 and 1.42 percentage points， respectively， verifying that EGC is better than traditional classification models.

Attribute reduction for high-dimensional data based on bi-view of similarity and difference

Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG

2023, 43(5): 1467-1472. DOI: 10.11772/j.issn.1001-9081.2022081154

Asbtract ( )

HTML ( )

PDF (464KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning of the curse of dimensionality caused by too high data dimension and redundant information， a high-dimensional Attribute Reduction algorithm based on Similarity and Difference Matrix （ARSDM） was proposed. In this algorithm， on the basis of discernibility matrix， the similarity measure for samples in the same class was added to form a comprehensive evaluation of all samples. Firstly， the distances of samples under each attribute were calculated， and the similarity of same class and the difference of different classes were obtained based on these distances. Secondly， a similarity and difference matrix was established to form an evaluation of the entire dataset. Finally， attribute reduction was performed， i.e.， each column of the similarity and difference matrix was summed， the feature with the largest value was selected into the reduction in proper order， and the row vector of the corresponding sample pair was set to the zero vector. Experimental results show that compared with the classical attribute reduction algorithms DMG （Discernibility Matrix based on Graph theory）， FFRS （Fitting Fuzzy Rough Sets） and GBNRS （Granular Ball Neighborhood Rough Sets）， the average classification accuracy of ARSDM is increased by 1.07， 6.48， and 8.92 percentage points respectively under the Classification And Regression Tree （CART） classifier， and increased by 1.96， 11.96， and 12.39 percentage points under the Support Vector Machine （SVM） classifier. At the same time， ARSDM outperforms GBNRS and FFRS in running efficiency. It can be seen that ARSDM can effectively remove redundant information and improve the classification accuracy.

Bat algorithm for high utility itemset mining based on length constraint

Quan YUAN, Chengliang TANG, Yunpeng XU

2023, 43(5): 1473-1480. DOI: 10.11772/j.issn.1001-9081.2022040622

Asbtract ( )

HTML ( )

PDF (1493KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to mine the High Utility Itemsets （HUIs） that meet the special needs of users， such as the specified number of items， a Bat Algorithm for High Utility Itemset Mining based on Length Constraint （HUIM-LC-BA） was proposed. By combining the Bat Algorithm （BA） and length constraints， a new High Utility Itemset Mining (HUIM） model was constructed， in which the database was transformed into a bitmap matrix to realize efficient utility calculation and database scanning. Then the search space was reduced by using the Redefined Transaction Weighted Utility （RTWU） strategy. Finally， the lengths of the itemsets were pruned according to the items determined by roulette bet selection method and depth first search. Experiments on four datasets showed that， when the maximum length was 6， the number of patterns mined by HUIM-LC-BA was reduced by 91%， 98%， 99% and 97% respectively compared with that of HUIM-BA （High Utility Itemset Mining-Bat Algorithm） with less running time； and under different length constraints， the running time of HUIM-LC-BA is more stable compared to the FHM+ （Faster High-utility itemset Ming plus） algorithm. Experimental results indicate that HUIM-LC-BA can effectively mine HUIs with length constraints and reduce the number of mined patterns.

Semi-supervised three-way clustering ensemble based on Seeds set and pairwise constraints

Chunmao JIANG, Peng WU, Zhicong LI

2023, 43(5): 1481-1488. DOI: 10.11772/j.issn.1001-9081.2022071094

Asbtract ( )

HTML ( )

PDF (1442KB) ( )

Figures and Tables | References | Related Articles | Metrics

Using appropriate strategies， clustering ensemble can effectively improve the stability， robustness and precision of clustering results by fusing multiple base cluster members with differences. Current research on the clustering ensemble rarely uses known priori information， and it is difficult to describe belonging relationships between objects and clusters when facing complex data. Therefore， a semi-supervised three-way clustering ensemble method was proposed on the basis of Seeds set and pairwise constraints. Firstly， based on the existing label information， a new three-way label propagation algorithm was proposed to construct the base cluster members. Secondly， a semi-supervised three-way clustering ensemble framework was designed to integrate the base cluster members to construct a consistent similarity matrix， and this matrix was optimized by using pairwise constraint information. Finally， the three-way spectral clustering was employed as a consistency function to cluster the similarity matrix to obtain the final clustering results. Experimental results on several real datasets in UCI show that compared with the semi-supervised clustering ensemble algorithms including Cluster-based Similarity Partitioning Algorithm （CSPA）， HyperGraph Partitioning Algorithm （HGPA）， Meta-CLustering Algorithm （MCLA）， Label Propagation Algorithm （LPA） and Cop-Kmeans， the proposed method achieves the best results on most of the datasets in terms of Normalized Mutual Information （NMI）， Adjusted Rand Index （ARI） and F-measure.

Community mining algorithm based on multi-relationship of nodes and its application

Lin ZHOU, Yuzhi XIAO, Peng LIU, Youpeng QIN

2023, 43(5): 1489-1496. DOI: 10.11772/j.issn.1001-9081.2022081218

Asbtract ( )

HTML ( )

PDF (4478KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to measure the similarity of multi-relational nodes and mine the community structure with multi-relational nodes， a community mining algorithm based on multi-relationship of nodes， called LSL-GN， was proposed. Firstly， based on node similarity and node reachability， LHN-ISL， a similarity measurement index for multi-relational nodes， was described to reconstruct the low-density model of the target network， and the community division was completed by combining with GN （Girvan-Newman） algorithm. The LSL-GN algorithm was compared with several classical community mining algorithms on Modularity （Q value）， Normalized Mutual Information （NMI） and Adjusted Rand Index （ARI）. The results show that LSL-GN algorithm achieves the best results in terms of three indexes， indicating that the community division quality of LSL-GN is better. The “User-Application” mobile roaming network model was divided by LSL-GN algorithm into community structures based on basic applications such as Ctrip， Amap and Didi Travel. These results of community division can provide strategic reference information for designing personalized package services.

Improved K-anonymity privacy protection algorithm based on different sensitivities

Ran ZHAI, Xuebin CHEN, Guopeng ZHANG, Langtao PEI, Zheng MA

2023, 43(5): 1497-1503. DOI: 10.11772/j.issn.1001-9081.2022040552

Asbtract ( )

HTML ( )

PDF (1192KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem that the development of machine learning requires a large number of real datasets with both data security and availability， an improved K-anonymity privacy protection algorithm based on Random Forest （RF） was proposed， namely RFK-anonymity privacy protection. Firstly， the sensitivity of each attribute value was predicted by RF algorithm. Secondly， the attribute values were clustered according to different sensitivities by using the k-means clustering algorithm， and the data was hidden to different degrees by using the K-anonymity algorithm according to the sensitivity clusters of attribution. Finally， data tables with different hiding degrees were selected by different users according to their needs. Experimental results show that in Adult datasets，compared with the data processed by K-anonymity algorithm， the accuracies of the data processed by the RFK-anonymity privacy protection algorithm are increased by 0.5 and 1.6 percentage points at thresholds of 3 and 4， respectively； compared with the data processed by （p，α， k）-anonymity algorithm， the accuracies of the data processed by the proposed algorithm are improved by 0.4 and 1.9 percentage points at thresholds of 4 and 5. It can be seen that RFK-anonymity privacy protection algorithm can effectively improve the availability of data on the basis of protecting the privacy and security of data， and it is more suitable for classification and prediction in machine learning.

Compact constraint analysis of SPONGENT S-box based on mixed integer linear programming model

Yipeng SHI, Jie LIU, Jinyuan ZU, Tao ZHANG, Guoqun ZHANG

2023, 43(5): 1504-1510. DOI: 10.11772/j.issn.1001-9081.2022040496

Asbtract ( )

HTML ( )

PDF (503KB) ( )

Figures and Tables | References | Related Articles | Metrics

Applying the compact constraint calculation method of S-box based on Mixed Integer Linear Programming （MILP） model can solve the low efficiency of differential path search of SPONGENT in differential cryptanalysis. To find the best description of S box， a compactness verification algorithm was proposed to verify the inequality constraints in S-box from the perspective of the necessity of the existence of constraints. Firstly， the MILP model was introduced to analyze the inequality constraints of SPONGENT S-box， and the constraint composed of 23 inequalities was obtained. Then， an index for evaluating the existence necessity of constraint inequality was proposed， and a compactness verification algorithm for verifying the compactness of group of constraint inequalities was proposed based on this index. Finally， the compactness of the obtained SPONGENT S-box constraint was verified by using the proposed algorithm. Calculation analysis show that the 23 inequalities have a unique impossible difference mode that can be excluded， that is， each inequality has the necessity of existence. Furthermore， for the same case， the number of inequalities was reduced by 20% compared to that screened by using the greedy algorithm principle. Therefore， the obtained inequality constraint of S-box in SPONGENT is compact， and the proposed compactness verification algorithm outperforms the greedy algorithm.

Improved defense method for graph convolutional network based on singular value decomposition

Kejun JIN, Hongtao YU, Yiteng WU, Shaomei LI, Jianpeng ZHANG, Honghao ZHENG

2023, 43(5): 1511-1517. DOI: 10.11772/j.issn.1001-9081.2022040553

Asbtract ( )

HTML ( )

PDF (760KB) ( )

Figures and Tables | References | Related Articles | Metrics

Graph Neural Network （GNN） is vulnerable to adversarial attacks， leading to performance degradation， which affects downstream tasks such as node classification， link prediction and community detection. Therefore， the defense methods of GNN have important research value. Aiming at the problem that GNN has poor robustness when being adversarially attacked， taking Graph Convolutional Network （GCN） as the model， an improved Singular Value Decomposition （SVD） based poisoning attack defense method was proposed， named ISVDatt. In the poisoning attack scenario， the attacked graph was able to be purified by the proposed method. When the GCN was attacked by poisoning， the connected edges with large different features were first screened and deleted to keep the graph features smooth. Then， SVD and low-rank approximation operations were performed to keep the low rank of the attacked graph and clean it up. Finally， the purified graph was used for training GCN model to achieve effective defense against poisoning attack. Experiments against Metattack and DICE were conducted on the open source datasets such as Citeseer， Cora and Pubmed， and compared with the defense methods based on SVD， Pro_GNN and Robust Graph Convolutional Network （RGCN）， respectively. The results show that ISVDatt has relatively better defense effect， although the classification accuracy is lower than that of Pro_GNN， but it has low complexity and negligible time overhead. Experimental results verify that ISVDatt can resist poisoning attack effectively with the consideration of both the complexity and versatility of the algorithm， and has a high practical value.

Hierarchical access control and sharing system of medical data based on blockchain

Meng CAO, Sunjie YU, Hui ZENG, Hongzhou SHI

2023, 43(5): 1518-1526. DOI: 10.11772/j.issn.1001-9081.2022050733

Asbtract ( )

HTML ( )

PDF (2871KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focusing on coarse granularity of access control， low sharing flexibility and security risks such as data leakage of centralized medical data sharing platform， a blockchain-based hierarchical access control and sharing system of medical data was proposed. Firstly， medical data was classified according to sensitivity， and a Ciphertext-Policy Attribute-Based Hierarchical Encryption （CP-ABHE） algorithm was proposed to achieve access control of medical data with different sensitivity. In the algorithm， access control trees were merged and symmetric encryption methods were combinined to improve the performance of Ciphertext-Policy Attribute-Based Encryption （CP-ABE） algorithm， and the multi-authority center was used to solve the key escrow problem. Then， the medical data sharing mode based on permissioned blockchain was used to solve the centralized trust problem of centralized sharing platform. Security analysis shows that the proposed system ensures the security of data during the data sharing process， and can resist user collusion attacks and authority collusion attacks. Experimental results also show that the proposed CP-ABHE algorithm has lower computational cost than CP-ABE algorithm， the maximum average delay of the proposed system is 7.8 s， and the maximum throughput is 236 transactions per second， which meets the expected performance requirements.

Multi-neural network malicious code detection model based on depthwise separable convolution

Ruilin JIANG, Renchao QIN

2023, 43(5): 1527-1533. DOI: 10.11772/j.issn.1001-9081.2022050716

Asbtract ( )

HTML ( )

PDF (2771KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning of the problems of high cost and unstable detection results of the traditional malicious code detection methods， a multi-neural network malicious code detection model based on depthwise separable convolution was proposed. By using the Depthwise Separable Convolution （DSC）， SENet （Squeeze-and-Excitation Network） channel attention mechanism and Grey Level Co-occurrence Matrix （GLCM）， three lightweight neural networks were connected with GLCM in parallel to detect malicious code families and their variants， then the detection results of multiple strong classifiers were fused via Naive Bayes classifier to improve the detection accuracy while reducing the computational cost. Experimental results on the hybrid dataset of MalVis + benign data show that the proposed model achieved the accuracy of 97.43% in the detection of malicious code families and their variants， which was 6.19 and 2.29 percentage points higher than those of ResNet50 and VGGNet models respectively， while its parameter quantity is only 68% of that of ResNet50 model and 13% of that of VGGNet model. On malimg dataset， the detection accuracy of this model achieved 99.31%. In conclusion， the proposed model has good detection effect with reduced parameters.

Online detection of SQL injection attacks based on ECA rules and dynamic taint analysis

Jihui LIU, Chengwan HE

2023, 43(5): 1534-1542. DOI: 10.11772/j.issn.1001-9081.2022040636

Asbtract ( )

HTML ( )

PDF (2389KB) ( )

Figures and Tables | References | Related Articles | Metrics

SQL injection attack is a common type of attack against Web application vulnerabilities. Any form of SQL injection attacks will eventually change the logical structure of the original SQL statement， going against the original intention of the designer. The existing SQL injection attack detection methods have the shortcomings that the detection code is not easily reusable and cannot be injected into Web application online. Therefore， a model for online detection of SQL injection attacks based on Event Condition Action （ECA） rules and dynamic taint analysis was proposed. Firstly， taint marking rules were defined to monitor taint source functions， thereby marking data imported from outside of the system. Then， taint propagation rules were defined to track the flow of taint data inside the application in real time. Next， taint checking rules were defined to intercept the parameters of the taint sink functions and parse taint states they may carry. Finally， the ECA rule scripts were loaded at the runtime of the original Web application for the purpose of online detection of SQL injection attacks， and the Web application did not need to be recompiled， packaged and deployed. The proposed model was implemented by using Byteman. In two different Web application test experiments， the proposed model can identify most of the SQL injection attack samples， and there are no false positives for normal request samples， the detection accuracy of the proposed model reaches 99.42%， which is better than those of Support Vector Machine （SVM） based method and Term Frequency-Inverse Document Frequency （TF-IDF） based method. Compared with the method based on Aspect-Oriented Programming （AOP）， the proposed model is easy to load the detection module online after Web applications are started. Experimental results show that the proposed model can detect 6 common forms of SQL injection attacks without modifying execution engine and source code of the application， and has the advantage of online detection.

Edge computing and service offloading algorithm based on improved deep reinforcement learning

Tengfei CAO, Yanliang LIU, Xiaoying WANG

2023, 43(5): 1543-1550. DOI: 10.11772/j.issn.1001-9081.2022050724

Asbtract ( )

HTML ( )

PDF (2400KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problem of limited computing resources and storage space of edge nodes in the Edge Computing （EC） network， an Edge Computing and Service Offloading （ECSO） algorithm based on improved Deep Reinforcement Learning （DRL） was proposed to reduce node processing latency and improve service performance. Specifically， the problem of edge node service offloading was formulated as a resource-constrained Markov Decision Process （MDP）. Due to the difficulty of predicting the request state transfer probability of the edge node accurately， DRL algorithm was used to solve the problem. Considering that the state action space of edge node for caching services is too large， by defining new action behaviors to replace the original actions， the optimal action set was obtained according to the proposed action selection algorithm， so that the process of calculating the action behavior reward was improved， thereby reducing the size of the action space greatly， and improving the training efficiency and reward of the algorithm. Simulation results show that compared with the original Deep Q-Network （DQN） algorithm， Proximal Policy Optimization （PPO） algorithm and traditional Most Popular （MP） algorithm， the total reward value of the proposed ECSO algorithm is increased by 7.0%， 12.7% and 65.6%， respectively， and the latency of edge node service offloading is reduced by 13.0%， 18.8% and 66.4%， respectively， which verifies the effectiveness of the proposed ECSO algorithm and shows that the ECSO can effectively improve the offloading performance of edge computing services.

Multi-UAV collaborative task assignment method based on improved self-organizing map

Yanan SUN, Jiehong WU, Junling SHI, Lijun GAO

2023, 43(5): 1551-1556. DOI: 10.11772/j.issn.1001-9081.2022040592

Asbtract ( )

HTML ( )

PDF (2598KB) ( )

Figures and Tables | References | Related Articles | Metrics

To deal with the deficiencies in load balancing and execution efficiency of existing algorithms for cooperative multi-task assignment of multi-Unmanned Aerial Vehicle （UAV）， an Improved Self-Organizing Map （ISOM） algorithm was proposed. In the algorithm， the load balancing degree of UAVs was designed according to the flight time and task execution time in order to improve the efficiency of the task completion. And a novel non-linearly changing learning rate and neighborhood function were designed to ensure the stability and fast convergence of ISOM algorithm. Then， the validity of ISOM algorithm was verified in different task environments. Experimental results show that compared with Particle Swarm Optimization combined with Genetic Algorithm （GA-PSO）， Gurobi and ORTools algorithms， the proposed algorithm has the task completion time reduced by 15.5%， 12.7% and 7.3% respectively. When the effectiveness of track length reduction was verified on KroA100， KroA150， and KroA200 examples of TSPLIB dataset， comparison results with Invasive Weed Optimization （IWO） algorithm， Improved Partheno Genetic Algorithm （IPGA） and Ant Colony-Partheno Genetic Algorithm （AC-PGA） show that ISOM algorithm has the minimum track length when the number of UAVs is 2， 3， 4， 5， 8. It can be seen that ISOM algorithm has a significant effect on solving the problem of multi-UAV cooperative multi-task assignment.

Object detection algorithm based on attention mechanism and context information

Hui LIU, Linyu ZHANG, Fugang WANG, Rujin HE

2023, 43(5): 1557-1564. DOI: 10.11772/j.issn.1001-9081.2022040554

Asbtract ( )

HTML ( )

PDF (3014KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of small object miss detection in object detection process， an improved YOLOv5 （You Only Look Once） object detection algorithm based on attention mechanism and multi-scale context information was proposed. Firstly， Multiscale Dilated Separable Convolutional Module （MDSCM） was added to the feature extraction structure to extract multi-scale feature information， increasing the receptive field while avoiding the loss of small object information. Secondly， the attention mechanism was added to the backbone network， and the location awareness information was embedded in the channel information， so as to further enhance the feature expression ability of the algorithm. Finally， Soft-NMS （Soft-Non-Maximum Suppression） was used instead of the NMS （Non-Maximum Suppression） used by YOLOv5 to reduce the missed detection rate of the algorithm. Experimental results show that the improved algorithm achieves detection precisions of 82.80%， 71.74% and 77.11% respectively on PASCAL VOC dataset， DOTA aerial image dataset and DIOR optical remote sensing dataset， which are 3.70， 1.49 and 2.48 percentage points higer than those of YOLOv5， and it has better detection effect on small objects. Therefore， the improved YOLOv5 can be better applied to small object detection scenarios in practice.

Social-interaction GAN for pedestrian trajectory prediction based on state-refinement long short-term memory and attention mechanism

Jiagao WU, Shiwen ZHANG, Yudong JIANG, Linfeng LIU

2023, 43(5): 1565-1570. DOI: 10.11772/j.issn.1001-9081.2022040602

Asbtract ( )

HTML ( )

PDF (1387KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the problem of most current research work only considering the factors affecting pedestrian interaction， based on State-Refinement Long Short-Term Memory （SR-LSTM） and attention mechanism， a Social-Interaction Generative Adversarial Network （SIGAN） for pedestrian trajectory prediction was proposed， namely SRA-SIGAN， where GAN was utilized to learn movement patterns of target pedestrians. Firstly， SR-LSTM was used as a location encoder to extract the information of motion intention. Secondly， the influence of pedestrians in the same scene was reasonably assigned by setting the velocity attention mechanism， thereby handling the pedestrian interaction better. Finally， the predicted future trajectory was generated by the decoder. Experimental results on several public datasets show that the performance of SRA-SIGAN model is good on the whole. Specifically on the Zara1 dataset， compared with SR-LSTM model，the Average Displacement Error （ADE）and Final Displacement Error （FDE）of SRA-SIGAN were reduced by 20.0% and 10.5%，respectively；compared with the SIGAN model，the ADE and FDE of SRA-SIGAN were decreased by 31.7% and 24.4%，respectively.

Unsupervised face forgery video detection based on reconstruction error

Zhe XU, Zhihong WANG, Cunyu SHAN, Yaru SUN, Ying YANG

2023, 43(5): 1571-1577. DOI: 10.11772/j.issn.1001-9081.2022040568

Asbtract ( )

HTML ( )

PDF (1205KB) ( )

Figures and Tables | References | Related Articles | Metrics

The current supervised face forgery video detection methods need a large amount of labeled data. In order to solve the practical problems of fast iteration and many kinds of video forgery methods， the unsupervised idea in temporal anomaly detection was introduced into face forgery video detection， the face forgery video detection task was transformed into unsupervised video anomaly detection task， and an unsupervised face forgery video detection method based on reconstruction error was proposed. Firstly， the facial landmark sequence of continuous frames in the video to be detected was extracted. Secondly， the facial landmark sequence in the video to be detected was reconstructed based on multi-granularity information such as deviation features， local features and temporal features. Thirdly， the reconstruction error between the original sequence and the reconstructed sequence was calculated. Finally， the score was calculated according to the peak frequency of the reconstruction error to detect the forgery video automatically. Experimental results show that compared with detection methods such as LRNet （Landmark Recurrent Network） and Xception-c23， the proposed method has the AUC （Area Under Curve） of the detection performance increased by up to 27.6%， and the AUC of the transplantation performance increased by 30.4%.

Multi-task age estimation method based on multi-peak label distribution learning

Jianhui HE, Chunlong HU, Xin SHU

2023, 43(5): 1578-1583. DOI: 10.11772/j.issn.1001-9081.2022040606

Asbtract ( )

HTML ( )

PDF (1036KB) ( )

Figures and Tables | References | Related Articles | Metrics

Considering the difficulty of extracting label ordinal information and inter-class correlation in facial age estimation， a Multi-Peak Distribution （MPD） age coding was proposed， and a multi-task age estimation method MPDNet （MPD Network） was constructed based on the proposed age coding. Firstly， in order to extract correlation information among age labels and construct aging trend stages， the age labels were transformed into age distributions by using MPD. Then， a lightweight network was used for multi-stage feature extraction， and Label Distribution Learning （LDL） and regression learning were performed on the extracted features respectively. Finally， the outputs of the two learning tasks were shared and optimized with each other by back-propagation during the training process， thereby avoiding the error propagation caused by the direct regression of distribution results in traditional label distribution learning. Experimental results on MORPH Ⅱ dataset show that， the Mean Absolute Error （MAE） of MPDNet reaches 2.67， which is similar to that of the methods such as DEX （Deep EXpectation） and RankingCNN （Ranking Convolutional Neural Network） built by VGGNets （Visual Geometry Group Networks）， while the parameters of MPDNet are only 1/788.6 of those of VGGNets. Meanwhile， MPDNet outperforms lightweight methods such as C3AE and SSR-Net （Soft Stagewise Regression Network）. MPDNet can better utilize the rich correlation information among age labels to extract more discriminative age features and improve the prediction accuracy of age estimation tasks.

Transformer based U-shaped medical image segmentation network： a survey

Liyao FU, Mengxiao YIN, Feng YANG

2023, 43(5): 1584-1595. DOI: 10.11772/j.issn.1001-9081.2022040530

Asbtract ( )

HTML ( )

PDF (1887KB) ( )

Figures and Tables | References | Related Articles | Metrics

U-shaped Network （U-Net） based on Fully Convolutional Network （FCN） is widely used as the backbone of medical image segmentation models， but Convolutional Neural Network （CNN） is not good at capturing long-range dependency， which limits the further performance improvement of segmentation models. To solve the above problem， researchers have applied Transformer to medical image segmentation models to make up for the deficiency of CNN， and U-shaped segmentation networks combining Transformer have become the hot research topics. After a detailed introduction of U-Net and Transformer， the related medical image segmentation models were categorized by the position in which the Transformer module was located， including only in the encoder or decoder， both in the encoder and decoder， as a skip-connection， and others， the basic contents， design concepts and possible improvement aspects about these models were discussed， the advantages and disadvantages of having Transformer in different positions were also analyzed. According to the analysis results， it can be seen that the biggest factor to decide the position of Transformer is the characteristics of the target segmentation task， and the segmentation models of Transformer combined with U-Net can make better use of the advantages of CNN and Transformer to improve segmentation performance of models， which has great development prospect and research value.

Guidewire artifact removal method of structure-enhanced IVOCT based on Transformer

Jinwen GUO, Xinghua MA, Gongning LUO, Wei WANG, Yang CAO, Kuanquan WANG

2023, 43(5): 1596-1605. DOI: 10.11772/j.issn.1001-9081.2022040536

Asbtract ( )

HTML ( )

PDF (4010KB) ( )

Figures and Tables | References | Related Articles | Metrics

Improving the image quality of IntraVascular Optical Coherence Tomography （IVOCT） through guidewire artifact removal can assist physicians in diagnosing cardiovascular diseases more accurately， which reduces the probabilities of misdiagnosis and missed diagnosis. Aiming at the difficulties of complex structure information and a large proportion of artifact areas in IVOCT images， a Structure-Enhanced Transformer Network （SETN） using Generative Adversarial Network （GAN） architecture was proposed for guidewire artifact removal of IVOCT images. Firstly， based on the ORiginal Image （ORI） backbone generation network for extracting texture features， the generator of GAN was combined with RTV （Relative Total Variation） image enhanced generation network in parallel to obtain image structure information. Next， during the artifact area reconstruction of ORI/RTV image， Transformer encoders focusing on the temporal/spatial domain information respectively were introduced to capture the contextual information and the correlation between texture/structure features of IVOCT image sequence. Finally， the structural feature fusion module was used to integrate the structural features of different levels into the decoding stage of the ORI backbone generation network， so that the generator was cooperated with the discriminator for completing the image reconstruction of the guidewire artifact area. Experimental results show that the guidewire artifact removal results of SETN are excellent in both texture and structure reconstruction. Besides， the improvement of IVOCT image quality after guidewire artifact removal is positive for both vulnerable plaque segmentation and lumen contour extraction tasks of IVOCT image.

Gibbs artifact removal algorithm for magnetic resonance imaging based on self-attention connection UNet

Yang LIU, Zhiyang LU, Jun WANG, Jun SHI

2023, 43(5): 1606-1611. DOI: 10.11772/j.issn.1001-9081.2022040618

Asbtract ( )

HTML ( )

PDF (1363KB) ( )

Figures and Tables | References | Related Articles | Metrics

To remove Gibbs artifacts in Magnetic Resonance Imaging （MRI）， a Self-attention connection UNet based on Self-Distillation training （SD-SacUNet） algorithm was proposed. In order to reduce the semantic gap between the encoding and decoding features at both ends of the skip connection in the UNet framework and help to capture the location information of artifacts， the output features of each down-sampling layer at the UNet encoding end was input to the corresponding self-attention connection module for the calculation of the self-attention mechanism， then they were fused with the decoding features to participate in the reconstruction of the features. Self-distillation training was performed on the network decoding end， by establishing the loss function between the deep and shallow features， the feature information of the deep reconstruction network was used to guide the training of the shallow network， and at the same time， the entire network was optimized to improve the level of image reconstruction quality. The performance of SD-SacUNet algorithm was evaluated on the public MRI dataset CC359， with the Peak Signal-to-Noise Ratio （PSNR） of 30.261 dB and the Structure Similarity Index Measure （SSIM） of 0.917 9. Compared with GRACNN （Gibbs-Ringing Artifact reduction using Convolutional Neural Network）， the proposed algorithm had the PSNR increased by 0.77 dB and SSIM increased by 0.018 3； compared with SwinIR （Image Restoration using Swin Transformer）， the proposed algorithm had the PSNR increased by 0.14 dB and SSIM increased by 0.003 3. Experimental results show that SD-SacUNet algorithm improves the image reconstruction performance of MRI with Gibbs artifacts removal and has potential application values.

Super-resolution reconstruction of lung CT images based on feature pyramid network and dense network

Lihua SHEN, Bo LI

2023, 43(5): 1612-1619. DOI: 10.11772/j.issn.1001-9081.2022040620

Asbtract ( )

HTML ( )

PDF (4504KB) ( )

Figures and Tables | References | Related Articles | Metrics

To pay more attention to pulmonary nodules and satisfy the objective existence of reconstructed features in lung Computed Tomography （CT） image Super-Resolution （SR） reconstruction， a lung image SR reconstruction method based on Feature Pyramid Network （FPN） and dense network was proposed. Firstly， at the feature extraction layer， FPN was used to extract features. Secondly， the local structure based on residual network was designed at the feature mapping layer， and then the special dense network was used to connect the local structure. Thirdly， at the feature reconstruction layer， Convolution Neural Network （CNN） was used to gradually reduce the convolution layers with different depths to the image size. Finally， the residual network was used to integrate the initial Low-Resolution （LR） features and the reconstructed High-Resolution （HR） features to form the final SR image. In comparison experiments， the deep learning network with two feature fusion in FPN and five local structure connections in feature mapping has better effect. Compared with classic networks such as Super-Resolution Convolutional Neural Network （SRCNN）， the proposed network has higher Peak Signal-to-Noise Ratio （PSNR） and better visual quality of the reconstructed SR images.

Order dispatching by multi-agent reinforcement learning based on shared attention

Xiaohui HUANG, Kaiming YANG, Jiahao LING

2023, 43(5): 1620-1624. DOI: 10.11772/j.issn.1001-9081.2022040630

Asbtract ( )

HTML ( )

PDF (1392KB) ( )

Figures and Tables | References | Related Articles | Metrics

Ride-hailing has become a popular choice for people to travel due to its convenience and speed， how to efficiently dispatch the appropriate orders to deliver passengers to the destination is a research hotspot today. Many researches focus on training a single agent， which then uniformly distributies orders， without the vehicle itself being involved in the decision making. To solve the above problem， a multi-agent reinforcement learning algorithm based on shared attention， named SARL （Shared Attention Reinforcement Learning）， was proposed. In the algorithm， the order dispatching problem was modeled as a Markov decision process， and multi-agent reinforcement learning was used to make each agent become a decision-maker through centralized training and decentralized execution. Meanwhile， the shared attention mechanism was added to make the agents share information and cooperate with each other. Comparison experiments with Random matching （Random）， Greedy algorithm （Greedy）， Individual Deep-Q-Network （IDQN） and Q-learning MIXing network （QMIX） were conducted under different map scales， different number of passengers and different number of vehicles. Experimental results show that the SARL algorithm achieves optimal time efficiency in three different scale maps （100×100， 10×10 and 500×500） for fixed and variable vehicle and passenger combinations， which verifies the generalization performance and stable performance of the SARL algorithm. The SARL algorithm can optimize the matching of vehicles and passengers， reduce the waiting time of passengers and improve the satisfaction of passengers.

Cross-view geo-localization method based on multi-task joint learning

Xianlan WANG, Jinkun ZHOU, Nan MU, Chen WANG

2023, 43(5): 1625-1635. DOI: 10.11772/j.issn.1001-9081.2022040541

Asbtract ( )

HTML ( )

PDF (3631KB) ( )

Figures and Tables | References | Related Articles | Metrics

Multi-task Joint Learning Model （MJLM） was proposed to solve the performance improvement bottleneck problem caused by the separation of viewpoint-invariant feature and view transformation method in the existing cross-view geo-localization methods. MJLM was made up of a proactive image generative model and a posterior image retrieval model. In the proactive generative model， firstly， Inverse Perspective Mapping （IPM） for coordinate transformation was used to explicitly bridge the spatial domain difference so that the spatial geometric features of the projected image and the real satellite image were approximately the same. Then， the proposed Cross-View Generative Adversarial Network （CVGAN） was used to match and restore the image contents and textures at a fine-grained level implicitly and synthesize smoother and more real satellite images. The posterior retrieval model was composed of Multi-view and Multi-supervision Network （MMNet）， which could perform image retrieval tasks with multi-scale features and multi-supervised learning. Experimental results on Unmanned Aerial Vehicle （UAV） dataset University-1652 show that MJLM achieves the Average Precision （AP） of 89.22% and Recall （R@1） of 87.54%， respectively. Compared with LPN （Local Pattern Network） and MSBA （MultiScale Block Attention）， MJLM has the R@1 improved by 15.29% and 1.07% respectively. It can be seen that MJLM processes the cross-view image synthesis and retrieval tasks together to realize the fusion of view transformation and viewpoint-invariant feature methods in an aggregation， improves the precision and robustness of cross-view geo-localization significantly and verifies the feasibility of the UAV localization.

Indoor positioning method of multi-fingerprint database based on channel state information and K-means-SVR

Yi WANG, Shenglei PEI, Yu WANG

2023, 43(5): 1636-1640. DOI: 10.11772/j.issn.1001-9081.2022081162

Asbtract ( )

HTML ( )

PDF (1618KB) ( )

Figures and Tables | References | Related Articles | Metrics

The traditional Wi-Fi indoor positioning methods need to match all fingerprint data in the fingerprint database before positioning， resulting in low positioning efficiency and poor experience in the crowd gathering area. Therefore， a multi-fingerprint database indoor positioning method based on Channel State Information （CSI）， K-means clustering algorithm and Support Vector Regression （SVR） algorithm was proposed. Firstly， according to the cluster distribution characteristics of CSI， K-means algorithm was used to cluster the CSI data in all positioning points to obtain the CSI data of multiple clusters. Then， multiple fingerprint databases were established based on multiple clusters， and the CSI data was stored in multiple fingerprint databases. After that， SVR models were trained in each fingerprint database for Wi-Fi positioning. Compared with the traditional Support Vector Machine （SVM） positioning method， the proposed method needs less training samples in the off-line stage， which improves the positioning efficiency； in the online stage， this method not only reduces the matching complexity， but also improves the positioning accuracy. Due to the use of multi-fingerprint database， the Wi-Fi positioning system can adjust the resource allocation strategy in real time according to the traffic， so as to improve the server operation efficiency and positioning service experience.

Synchronous control of neural network based on event-triggered mechanism

Chao GE, Chenlei CHANG, Zheng YAO, Hao SU

2023, 43(5): 1641-1646. DOI: 10.11772/j.issn.1001-9081.2022040588

Asbtract ( )

HTML ( )

PDF (1542KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the problem of random perturbation of controller in synchronous control of neural network with mixed delays， a non-fragile controller based on event-triggered mechanism was proposed. Firstly， a random variable obeying Bernoulli distribution was used to describe the randomness of the existence of controller gain disturbance. Secondly， the event-triggered mechanism was introduced in the synchronous control process of the neural network. Next， a novel bilateral Lyapunov function was constructed to fully consider the system status information， while the functional derivatives were scaled by an improved integral inequality to obtain sufficient conditions for the exponential stability of the synchronization error system. Finally， a non-fragile controller was designed based on the decoupling technique. The effectiveness of the proposed controller was verified by simulation examples. Experimental results show that compared with the existing exponential attenuation coefficient under the same sampling period in the four-tank system， the exponential attenuation coefficient obtained by the proposed controller is improved by 0.16.

Ultra-short-term photovoltaic power prediction by deep reinforcement learning based on attention mechanism

Zhengkai DING, Qiming FU, Jianping CHEN, You LU, Hongjie WU, Nengwei FANG, Bin XING

2023, 43(5): 1647-1654. DOI: 10.11772/j.issn.1001-9081.2022040542

Asbtract ( )

HTML ( )

PDF (3448KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem that traditional PhotoVoltaic （PV） power prediction models are affected by random power fluctuation and tend to ignore important information， resulting in low prediction accuracy， ADDPG and ARDPG models were proposed by combining the attention mechanism with Deep Deterministic Policy Gradient （DDPG） and Recurrent Deterministic Policy Gradient （RDPG）， respectively， and a PV power prediction framework was proposed on this basis. Firstly， the original PV power data and meteorological data were normalized， and the PV power prediction problem was modeled as a Markov Decision Process （MDP）， where the historical power data and current meteorological data were used as the states of MDP. Then the attention mechanism was added to the Actor networks of DDPG and RDPG， giving different weights to different components of the state to highlight important and critical information， and learning critical information in the data through the interaction of Deep Reinforcement Learning （DRL） agents and historical data. Finally， the MDP problem was solved to obtain the optimal strategy and make accurate prediction. Experimental results on DKASC and Alice Springs PV system data show that ADDPG and ARDPG achieve the best results in Root Mean Square Error （RMSE）， Mean Absolute Error （MAE） and R². It can be seen that the proposed models can effectively improve the prediction accuracy of PV power， and can also be extended to other prediction fields such as grid prediction and wind power generation prediction.

Table of Content