Search Result

Select

Chinese image captioning method based on multi-level visual and dynamic text-image interaction

Junyan ZHANG, Yiming ZHAO, Bing LIN, Yunping WU

Journal of Computer Applications 2025, 45 (5): 1520-1527. DOI: 10.11772/j.issn.1001-9081.2024050616

Abstract （106）

HTML （1）

PDF （3653KB）（26）

Save

Image captioning technology can help computers understand image content better， and achieve cross-modal interaction. To address the issues of incomplete extraction of multi-granularity features from images and insufficient understanding of image-text correlation in Chinese image captioning tasks， a method for extracting multi-level visual and semantic features of images and dynamically integrating them in decoding process was proposed. Firstly， multi-level visual features were extracted on the encoder， and multi-granularity features were obtained through an auxiliary guidance module of the image local feature extractor. Then， a text-image interaction module was designed to dynamically focus on semantic associations between visual and textual information. Concurrently， a dynamic feature fusion decoder was designed to perform closed-loop dynamic fusion and decoding of features with adaptive cross-modal weights， ensuring enhanced information integrity while maintaining semantic relevance. Finally， coherent Chinese descriptive sentences were generated. The method's effectiveness was evaluated using BLEU-n， Rouge， Meteor， and CIDEr metrics， with comparisons against eight existing approaches. Experimental results demonstrate consistent improvements across all semantic relevance evaluation metrics. Specifically， compared with the baseline model NIC （Neural Image Caption）， the proposed method improves the BLEU-1， BLEU-2， BLEU-3， BLEU-4， Rouge_L， Meteor， and CIDEr by 5.62%， 7.25%， 8.78%， 10.85%， 14.06%， 5.14%， and 15.16%， respectively， confirming its superior accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Spatio-temporal context network for 3D human pose estimation based on graph attention

Zhengdong ZENG, Ming ZHAO

Journal of Computer Applications 2025, 45 (10): 3161-3169. DOI: 10.11772/j.issn.1001-9081.2024101489

Abstract （48）

HTML （0）

PDF （3822KB）（117）

Save

According to recent research on human pose estimation， making full use of potential 2D pose space information to acquire representative characteristics can produce more accurate 3D pose results. Therefore， a spatio-temporal context network based on graph attention mechanism was proposed， which includes Temporal Context Network with Shifted windows （STCN）， Extremity-Guided global graph ATtention mechanism network （EGAT）， and Pose Grammar-based local graph attention Convolution Network （PGCN）. Firstly， STCN was used to transform the 2D joint position in long sequence into potential features of human pose in single sequence， which aggregated and utilized the long-range and short-range human pose information effectively， and reduce the computational cost significantly. Secondly， EGAT was presented for computing global spatial context effectively， so that human extremities were treated as “traffic hubs”， and bridges were established for information exchange between them and other nodes. Thirdly， graph attention mechanism was utilized for adaptive weight assignment to perform global context computation on human joints. Finally， PGCN was designed to utilize Graph Convolution Network （GCN） for computing and modeling local spatial context， thereby emphasizing the motion consistency of symmetrical nodes of human and the motion correlation structure of human bones. Evaluations of the proposed model were conducted on the two complex benchmark datasets： Human3.6M and HumanEva-Ⅰ. Experimental results demonstrate that the proposed model has superior performance. Specifically， when the input frame length is 81， the proposed model achieves a Mean Per Joint Position Error （MPJPE） of 43.5 mm on the Human3.6M dataset， which represents a 10.5% reduction compared to that of the state-of-the-art algorithm MCFNet （Multi-scale Cross Fusion Network）， showcasing higher accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Adversarial attack defense model with residual dense block self-attention mechanism and generative adversarial network

Yuming ZHAO, Shenkai GU

Journal of Computer Applications 2022, 42 (3): 921-929. DOI: 10.11772/j.issn.1001-9081.2021030431

Abstract （555）

HTML （11）

PDF （804KB）（140）

Save

Neural network has outstanding performance on image classification tasks. However， it is vulnerable to adversarial examples generated by adding small perturbations， which makes it output incorrect classification results. The current defense methods have the problems of insufficient image feature extraction ability and less attention to the features of key areas of the image. To address these issues， a Defense model that fuses Residual Dense Block （RDB） Self-Attention mechanism and Generative Adversarial Network （GAN）， namely RD-SA-DefGAN， was proposed. GAN was combined with Projected Gradient Descent （PGD） attacking algorithm. The adversarial samples generated by PGD attacking algorithm were input to the training sample set， and the training process of model was stabilized by conditional constraints. The model also introduced RDB and self-attention mechanism， fully extracted features from the image， and enhanced the contribution of features from the key areas of the image. Experimental results on CIFAR10， STL10， and ImageNet20 datasets show that RD-SA-DefGAN can effectively defend from adversarial attacks， and outperforms Adv.Training， Adv-BNN， and Rob-GAN methods on defending PGD adversarial attacks. Compared to the most similar algorithm Rob-GAN， RD-SA-DefGAN improved the defense success rate by 5.0 percentage points to 9.1 percentage points on affected images in CIFAR10 dataset， with the disturbance threshold ranged from 0.015 to 0.070.

Table and Figures | Reference | Related Articles | Metrics

Select

Image denoising algorithms based on Laplacian operator and image inpainting

TIAN Su-yun WANG Xiao-ming ZHAO Xue-qing

Journal of Computer Applications 2012, 32 (10): 2793-2797. DOI: 10.3724/SP.J.1087.2012.02793

Abstract （1300）

PDF （850KB）（756）

Save

Through the analysis of Partial Differential Equation (PDE), the image denoising algorithms based on Laplacian operator and image inpainting were designed for the processing of the polluted image by noise: Rudin-Osher-Fatemi (ROF) harmonical Laplacian algorithm and ROF harmonical inpainting algorithm, which were simply called RHL and RHI respectively. By analyzing the local features of the image, the ability of the ROF model in protecting image edges and the harmonical model in overcoming the "ladder effect", and the advantages of the Laplacian operator in enhancing edges, the first image denoising algorithm, RHL was designed. Meanwhile, the second algorithm RHI was designed by syncretizing the image inpainting model. The experimental results show that the two designed algorithms, RHL and RHI, have better performance visually and quantitatively than other algorithms, which combine the advantages of the ROF model and harmonical model in image denoising effectively. Compared with other PDE based algorithms, the two designed algorithms can remove noise, protect smooth region and edge information much better.

Reference | Related Articles | Metrics