Multimedia computing and computer simulation

Select

Extremely dim target search algorithm based on detection and tracking mutual iteration

XIAO Qi, YIN Zengshan, GAO Shuang

Journal of Computer Applications 2021, 41 (10): 3017-3024. DOI: 10.11772/j.issn.1001-9081.2020122000

Abstract （290）

PDF （1788KB）（306）

Save

It is difficult to distinguish the intensity between dim moving targets and background noise in the case of extremely Low Signal-to-Noise Ratio (LSNR). In order to solve the problem, a new extremely dim target search algorithm based on detection and tracking mutual iteration was proposed with a new strategy for combining and iterating the process of temporal domain detection and spatial domain tracking. Firstly, the difference between the signal segment in the detection window and the extracted background estimated feature was calculated during the detection process. Then, the dynamic programming algorithm was adopted to remain the trajectories with the largest trajectory energy accumulation in the tracking process. Finally, the threshold parameters of the detector of the remained trajectory were adaptively adjusted in the next detection process, so that the pixels in this trajectory were able to be retained to the next detection and tracking stage with a more tolerant strategy. Experimental results show that, the dim moving targets with SNR as low as 0 dB can be detected by the proposed algorithm, false alarm rate of 1% - 2% and detection rate of about 70%. It can be seen that the detection ability for dim targets with extremely LSNR can be improved effectively by the proposed algorithm.

Reference | Related Articles | Metrics

Select

High-precision classification method for breast cancer fusing spatial features and channel features

XU Xuebin, ZHANG Jiada, LIU Wei, LU Longbin, ZHAO Yuqing

Journal of Computer Applications 2021, 41 (10): 3025-3032. DOI: 10.11772/j.issn.1001-9081.2020111891

Abstract （317）

PDF （1343KB）（263）

Save

The histopathological image is the gold standard for identifying breast cancer, so that the automatic and accurate classification of breast cancer histopathological images is of great clinical application. In order to improve the classification accuracy of breast cancer histopathology images and thus meet the needs of clinical applications, a high-precision breast classification method that incorporates spatial and channel features was proposed. In the method, the histopathological images were processed by using color normalization and the dataset was expanded by using data enhancement, and the spatial feature information and channel feature information of the histopathological images were fused based on the Convolutional Neural Network (CNN) models DenseNet and Squeeze-and-Excitation Network (SENet). Three different BCSCNet (Breast Classification fusing Spatial and Channel features Network) models, BCSCNetⅠ, BCSCNetⅡ and BCSCNetⅢ, were designed according to the insertion position and the number of Squeeze-and-Excitation (SE) modules. The experiments were carried out on the breast cancer histopathology image dataset (BreaKHis), and through experimental comparison, it was firstly verified that color normalization and data enhancement of the images were able to improve the classification accuracy of breast canner, and then among the three designed breast canner classification models, the one with the highest precision was found to be BCSCNetⅢ. Experimental results showed that BCSCNetⅢ had the accuracy of binary classification ranged from 99.05% to 99.89%, which was improved by 0.42 percentage points compared with Breast cancer Histopathology image Classification Network (BHCNet); and the accuracy of multi-class classification ranged from 93.06% to 95.72%, which was improved by 2.41 percentage points compared with BHCNet. It proves that BCSCNet can accurately classify breast cancer histopathological images and provide reliable theoretical support for computer-aided breast cancer diagnosis.

Reference | Related Articles | Metrics

Select

Reconstruction method for uncertain spatial information based on improved variational auto-encoder

TU Hongyan, ZHANG Ting, XIA Pengfei, DU Yi

Journal of Computer Applications 2021, 41 (10): 2959-2963. DOI: 10.11772/j.issn.1001-9081.2020081338

Abstract （241）

PDF （1274KB）（194）

Save

Uncertain spatial information is widely used in many scientific fields. However, the current methods for uncertain spatial information reconstruction need to scan the Training Image (TI) for many times, and then obtain the simulation results through complex probability calculation, which leads to the low efficiency and complex simulation process. To address this issue, a method of Fisher information and Variational Auto-Encoder (VAE) jointly applying to the reconstruction of uncertain spatial information was proposed. Firstly, the structural features of the spatial information were learned through the encoder neural network, and the mean and variance of the spatial information were obtained by training. Then, the random sampling was carried out to reconstruct the intermediate results according to the mean and variance of the sampling results and the spatial information, and the encoder neural network was optimized by combining the optimization function of the network with the Fisher information. Finally, the intermediate results were input into the decoder neural network to decode and reconstruct the spatial information, and the optimization function of the decoder was combined with the Fisher information to optimize the reconstruction results. By comparing the reconstruction results of different methods and the training data on multiple-point connectivity curve, variogram, pore distribution and porosity, it is shown that the reconstruction quality of the proposed method is better than those of other methods. In specific, the average porosity of the reconstruction results of the proposed method is 0.171 5, which is closer to the 0.170 5 porosity of the training data compared to those of other methods. Compared with the traditional method, this method has the average CPU utilization reduced from 90% to 25%, and the average memory consumption reduced by 50%, which indicates that the reconstruction efficiency of this method is higher. Through the comparison of reconstruction quality and reconstruction efficiency, the effectiveness of this method is illustrated.

Reference | Related Articles | Metrics

Select

Robust 3D object detection method based on localization uncertainty

PEI Yiyao, GUO Huiming, ZHANG Danpu, CHEN Wenbo

Journal of Computer Applications 2021, 41 (10): 2979-2984. DOI: 10.11772/j.issn.1001-9081.2020122055

Abstract （321）

PDF （1259KB）（221）

Save

To solve the problem of inaccurate localization of model which is caused by inaccurate manual labeling in 3D point cloud training data, a novel robust 3D object detection method based on localization uncertainty was proposed. Firstly, with the 3D voxel grid-based Sparsely Embedded CONvolutional Detection (SECOND) network as basic network, the prediction of localization uncertainty was added based on Region Proposal Network (RPN). Then, during the training process, the localization uncertainty was modeled by using Gaussian and Laplace distribution models, and the localization loss function was redefined. Finally, during the prediction process, the threshold filtering and Non-Maximum Suppression (NMS) methods were performed to filter candidate objects based on the object confidence which was consisted of the localization uncertainty and classification confidence. Experimental results on the KITTI 3D object detection dataset show that compared with SECOND network, the proposed algorithm has the detection accuracy improved by 0.5 percentage points on car category at moderate level. The detection accuracy of the proposed algorithm is 3.1 percentage points higher than that of SECOND network with adding disturbance simulation noise to the training data in the best case. The proposed algorithm improves the accuracy of 3D object detection, which reduces false detection and improves the accuracy of 3D bounding boxes, and is more robust to noisy data.

Reference | Related Articles | Metrics

Select

Unmanned aerial vehicle image positioning algorithm based on scene graph division

ZHANG Chi, LI Zhuhong, LIU Zhou, SHEN Weiming

Journal of Computer Applications 2021, 41 (10): 3004-3009. DOI: 10.11772/j.issn.1001-9081.2020111795

Abstract （252）

PDF （1581KB）（258）

Save

Due to the problems of slow speed and error drift in the positioning of large-scale long-sequence Unmanned Aerial Vehicle (UAV) images, a positioning algorithm of UAV images based on scene graph division was proposed according to the characteristics of UAV images. Firstly, the Global Positioning System (GPS) ancillary information was used to narrow the spatial search scope for feature matching, so as to accelerate the extraction of corresponding points. After that, visual consistency and spatial consistency were combined to construct the scene graphs, and Normalized Cut (Ncut) was used to divide them. Then, incremental reconstruction was performed to each group of scene graphs. Finally, all scene graphs were fused to establish a 3S scene model by Bundle Adjustment (BA). In addition, the GPS spatial constraint information was added to the cost function in the BA stage. In the experiments on four UAV image datasets, compared with COLMAP and other Structure From Motion (SFM) algorithms, the proposed algorithm has the positioning speed increased by 50%, the reprojection error decreased by 41%, and the positioning error was controlled within 0.5 m. Through the experimental comparison of algorithms with or without GPS assistance, it can be seen that BA with relative and absolute GPS constraints solves the problem of error drift, avoids the ambiguous results and greatly reduces positioning error.

Reference | Related Articles | Metrics

Select

Video abnormal behavior detection based on dual prediction model of appearance and motion features

LI Ziqiang, WANG Zhengyong, CHEN Honggang, LI Linyi, HE Xiaohai

Journal of Computer Applications 2021, 41 (10): 2997-3003. DOI: 10.11772/j.issn.1001-9081.2020121906

Abstract （362）

PDF （1399KB）（406）

Save

In order to make full use of appearance and motion information in video abnormal behavior detection, a Siamese network model that can capture appearance and motion information at the same time was proposed. The two branches of the network were composed of the same autoencoder structure. Several consecutive frames of RGB images were used as the input of the appearance sub-network to predict the next frame, while RGB frame difference image was used as the input of the motion sub-network to predict the future frame difference. In addition, considering one of the reasons that affected the detection effect of the prediction-based method, that is the diversity of normal samples, and the powerful "generation" ability of the autoencoder network, that is it has a good prediction effect on some abnormal samples. Therefore, a memory enhancement module that learns and stores the "prototype" features of normal samples was added between the encoder and the decoder, so that the abnormal samples were able to obtain greater prediction error. Extensive experiments were conducted on three public anomaly detection datasets Avenue, UCSD-ped2 and ShanghaiTech. Experimental results show that, compared with other video abnormal behavior detection methods based on reconstruction or prediction, the proposed method achieves better performance. Specifically, the average Area Under Curve (AUC) of the proposed method on Avenue, UCSD-ped2 and ShanghaiTech datasets reach 88.2%, 97.5% and 73.0% respectively.

Reference | Related Articles | Metrics

Select

Lightweight real-time semantic segmentation algorithm based on separable pyramid

GAO Shiwei, ZHANG Changzhu, WANG Zhuping

Journal of Computer Applications 2021, 41 (10): 2937-2944. DOI: 10.11772/j.issn.1001-9081.2020121939

Abstract （320）

PDF （2525KB）（219）

Save

The existing semantic segmentation algorithms have too many parameters and huge memory usage, so that it is difficult to meet the requirements real-world applications such as automatic driving. In order to solve the problem, a novel, effective and lightweight real-time semantic segmentation algorithm based on Separable Pyramid Module (SPM) was proposed. Firstly, factorized convolution and dilated convolution were adopted in the form of a feature pyramid to construct the bottleneck structure, providing a simple but effective way to extract local and contextual information. Then, the Context Channel Attention (CCA) module based on computer vision attention was proposed to modify the channel weights of shallow feature maps by utilizing deep semantic features, thereby optimizing the segmentation results. Experimental results show that without pre-training or any additional processing, the proposed algorithm achieves mean Intersection-over-Union (mIoU) of 71.86% on Cityscapes test set at the speed of 91 Frames Per Second (FPS). Compared to Efficient Residual Factorized ConvNet (ERFNet), the proposed algorithm has the mIoU 3.86 percentage points higher, and the processing speed of 2.2 times. Compared with the latest Light-weighted Network with Efficient Reduced Non-local operation for real-time semantic segmentation (LRNNet), the proposed algorithm has the mIoU slightly lower by 0.34 percentage points, but the processing speed increased by 20 FPS. The experimental results show that the proposed algorithm has great value for completing tasks such as efficient and accurate street scene image segmentation required in automatic driving.

Reference | Related Articles | Metrics

Select

Deepfake image detection method based on autoencoder

ZHANG Ya, JIN Xin, JIANG Qian, LEE Shin-jye, DONG Yunyun, YAO Shaowen

Journal of Computer Applications 2021, 41 (10): 2985-2990. DOI: 10.11772/j.issn.1001-9081.2020122046

Abstract （477）

PDF （769KB）（352）

Save

The image forgery method based on deep learning can generate images which are difficult to distinguish with the human eye. Once the technology is abused to produce fake images and videos, it will have a serious negative impact on a country's politics, economy, and culture, as well as the social life and personal privacy. To solve the problem, a Deepfake detection method based on autoencoder was proposed. Firstly, the Gaussian filtering was used to preprocess the image, and the high-frequency information was extracted as the input of the model. Secondly, the autoencoder was used to extract features from the image. In order to obtain better classification effect, an attention mechanism module was added to the encoder. Finally, it was proved by the ablation experiments that the proposed preprocessing method and the addition of attention mechanism module were helpful for the Deepfake image detection. Experimental results show that, compared with ResNet50, Xception and InceptionV3, the proposed method can effectively detect images forged by multiple generation methods when the dataset has a small sample size and contains multiple scenes, and its average accuracy is up to 97.10%, which is significantly better than those of the comparison methods, and its generalization performance is also significantly better than those of the comparison methods.

Reference | Related Articles | Metrics

Select

High-precision sparse reconstruction of CT images based on multiply residual UNet

ZHANG Yanjiao, QIAO Zhiwei

Journal of Computer Applications 2021, 41 (10): 2964-2969. DOI: 10.11772/j.issn.1001-9081.2020121985

Abstract （381）

PDF （1075KB）（460）

Save

Aiming at the problem of producing streak artifacts during sparse analytic reconstruction of Computed Tomography (CT), in order to better suppress strip artifacts, a Multiply residual UNet (Mr-UNet) network architecture was proposed based on the classical UNet network architecture. Firstly, the sparse images with streak artifacts were sparsely reconstructed by the traditional Filtered Back Projection (FBP) analytic reconstruction algorithm. Then, the reconstructed images were used as the input of the network structure, and the corresponding high-precision images were trained as the labels of the network, so that the network had a good performance of suppressing streak artifacts. Finally, the original four-layer down-sampling of the classical residual UNet was deepened to five layers, and the residual learning mechanism was introduced into the proposed model, so that each convolution unit was constructed to residual structure to improve the training performance of the network. In the experiments, 2 000 pairs of images containing images with streak artifacts and the corresponding high-precision images with the size of 256×256 were used as the dataset, among which, 1 900 pairs were used as the training set, 50 pairs were used as the verification set, and the rest were used as the test set to train the network, and verify and evaluate the network performance. The experimental results show that, compared with the traditional Total Variation (TV) minimization algorithm and the classical deep learning method of UNet, the proposed model can reduce the Root Mean Square Error (RMSE) by about 0.002 5 on average and improve the Structural SIMilarity (SSIM) by about 0.003 on average, and can retain the texture and detail information of the image better.

Reference | Related Articles | Metrics

Select

Semantic segmentation method of power line on mobile terminals based on encoder-decoder structure

HUANG Juting, GAO Hongli, DAI Zhikun

Journal of Computer Applications 2021, 41 (10): 2952-2958. DOI: 10.11772/j.issn.1001-9081.2020122037

Abstract （241）

PDF （1631KB）（226）

Save

The traditional vision algorithms have low accuracy and are greatly affected by environmental factors during the detection of long and slender power lines in complex scenes, and the existing power line detection algorithms based on deep learning are not efficient. In order to solve the problems, an end-to-end fully convolutional neural network model was proposed which was suitable for power line detection on mobile terminals. Firstly, a symmetrical encoder-decoder structure was adopted. In the encoder part, the max-pooling layer was used for down-sampling, so as to extract multi-scale features. In the decoder part, the max-pooling indices based non-linear up-sampling was used to fuse multi-scale features layer by layer to restore the image details. Then, a weighted loss function was adopted to train the model, thereby solving the imbalance problem between power line pixels and background pixels. Finally, a power line dataset with complex background and pixel-level labels was constructed to train and evaluate the model, and a public power line dataset was relabeled as a different source test set. Compared with a model named Dilated ConvNet for power line semantic segmentation on mobile devices, the proposed model has the prediction speed for 512×512 resolution images on the mobile device GPU NVIDIA JetsonTX2 twice that of Dilated ConvNet, which is 8.2 frame/s; the proposed model achieves a mean Intersection over Union (mIoU) of 0.857 3, F1 score of 0.844 7, Average Precision (AP) of 0.927 9 on the same source test set, which are increased by 0.011, 0.014 and 0.008 respectively; and the proposed model achieves mIoU of 0.724 4, F1 score of 0.634 1, AP of 0.664 4 on the public test set, which are increased by 0.004, 0.007 and 0.032 respectively. Experimental results show that the proposed model has better performance of real-time power line segmentation on mobile terminals.

Reference | Related Articles | Metrics

Select

Pedestrian re-identification method based on multi-scale feature fusion

HAN Jiandong, LI Xiaoyu

Journal of Computer Applications 2021, 41 (10): 2991-2996. DOI: 10.11772/j.issn.1001-9081.2020121908

Abstract （341）

PDF （1794KB）（338）

Save

Pedestrian re-identification tasks lack the consideration of the pedestrian feature scale variation during feature extraction, so that they are easily affected by environment and have low accuracy of pedestrian re-identification. In order to solve the problem, a pedestrian re-identification method based on multi-scale feature fusion was proposed. Firstly, in the shallow layer of the network, multi-scale pedestrian features were extracted through mixed pooling operation, which was helpful to improve the feature extraction capability of the network. Then, strip pooling operation was added to the residual block to extract the remote context information in horizontal and vertical directions respectively, which avoided the interference of irrelevant regions. Finally, after the residual network, the dilated convolutions with different scales were used to further preserve the multi-scale features, so as to help the model to analyze the scene structure flexibly and effectively. Experimental results show that, on Market-1501 dataset, the proposed method has the Rank1 of 95.9%, and the mean Average Precision (mAP) of 88.5%; on DukeMTMC-reID dataset, the proposed method has the Rank1 of 90.1%, and the mAP of 80.3%. It can be seen that the proposed method can retain the pedestrian feature information better, thereby improving the accuracy of pedestrian re-identification tasks.

Reference | Related Articles | Metrics

Select

Rapid calculation method of orthopedic plate fit based on improved iterative closest point algorithm

ZHU Xincheng, HE Kunjin, NI Na, HAO Bo

Journal of Computer Applications 2021, 41 (10): 3033-3039. DOI: 10.11772/j.issn.1001-9081.2020122012

Abstract （224）

PDF （2201KB）（171）

Save

In order to quickly calculate the optimal fitting position of the orthopedic plate on the surface of broken bone to reduce the repeated adjustment times of the orthopedic plate during the surgical operation, a rapid calculation method of orthopedic plate fit based on improved Iterative Closest Point (ICP) algorithm was proposed. Firstly, under the guidance of the doctor, the fitting area was selected on the surface of the broken bone, and the point cloud of the inner surface for the orthopedic plate was extracted by using the angle between the normal vectors of the surface points for the orthopedic plate. Then, the two groups of point cloud models were smoothed, and the grid sampling method was adopted to simplify the point cloud models, after these operations, the characteristic relationship between the point clouds was used for the initial registration. Finally, the boundary and internal feature key points of the inner surface point cloud model of the orthopedic plate were extracted, K-Dimensional Tree (KD-Tree) was used to search the adjacent points, so that the feature key points of the orthopedic plate and the selected area of the broken bone surface were accurately registered by ICP. Taking tibia as the example to carry out experiments, and the results show that the proposed method can improve the registration efficiency while maintaining relatively high registration degree compared with other registration algorithms proposed in recent years. The proposed algorithm can realize the rapid registration between different damage types of tibia and orthopedic plate, and it is universal to other damaged bones.

Reference | Related Articles | Metrics

Select

Shipping monitoring image recognition model based on attention mechanism network

ZHANG Kaiyue, ZHANG Hong

Journal of Computer Applications 2021, 41 (10): 3010-3016. DOI: 10.11772/j.issn.1001-9081.2020121899

Abstract （230）

PDF （1343KB）（284）

Save

In the existing shipping monitoring image recognition model named Convolutional 3D (C3D), the intermediate representation learning ability is limited, the extraction of effective features is easily disturbed by noise, and the relationship between global features and local features is ignored in feature extraction. In order to solve these problems, a new shipping monitoring image recognition model based on attention mechanism network was proposed. The model was based on the Convolutional Neural Network (CNN) framework. Firstly, the shallow features of the image were extracted by the feature extractor. Then, the attention information was generated and the local discriminant features were extracted based on the different response strengths of the CNN to the active features of different regions. Finally, the multi-branch CNN structure was used to fuse the local discriminant features and the global texture features of the image, thus the interaction between the local discriminant features and the global texture features of the image was utilized to improve the learning ability of CNN to the intermediate representations. Experimental results show that, the recognition accuracy of the proposed model is 91.8% on the shipping image dataset, which is improved by 7.2 percentage points and 0.6 percentage points compared with the current C3D model and Discriminant Filter within a Convolutional Neural Network (DFL-CNN) model respectively. It can be seen that the proposed model can accurately judge the state of the ship, and can be effectively applied to the shipping monitoring project.

Reference | Related Articles | Metrics

Select

Salient object detection in weak light images based on ant colony optimization algorithm

WANG Hongyu, ZHANG Yu, YANG Heng, MU Nan

Journal of Computer Applications 2021, 41 (10): 2970-2978. DOI: 10.11772/j.issn.1001-9081.2020111814

Abstract （305）

PDF （1306KB）（317）

Save

With substantial attention being received from industry and academia over last decade, salient object detection has become an important fundamental research in computer vision. The solution of salient object detection will be helpful to make breakthroughs in various visual tasks. Although various works have achieved remarkable success for saliency detection tasks in visible light scenes, there still remain a challenging issue on how to extract salient objects with clear boundary and accurate internal structure in weak light images with low signal-to-noise ratios and limited effective information. For that fuzzy boundary and incomplete internal structure cause low accuracy of salient object detection in weak light scenes, an Ant Colony Optimization (ACO) algorithm based saliency detection framework was proposed. Firstly, the input image was transformed into an undirected graph with different nodes by multi-scale superpixel segmentation. Secondly, the optimal feature selection strategy was adopted to capture the useful information contained in the salient object and eliminate the redundant noise information from weak light image with low contrast. Then, the spatial contrast strategy was introduced to explore the global saliency cues with relatively high contrast in the weak light image. To acquire more accurate saliency estimation at low signal-to-noise ratio, the ACO algorithm was used to optimize the saliency map. Through the experiments on three public datasets (MSRA, CSSD and PASCAL-S) and the Nighttime Image (NI) dataset, it can be seen that the Area Under the Curve (AUC) value of the proposed model reached 87.47%, 84.27% and 81.58% on three public datasets respectively, and the AUC value of the model was increased by 2.17 percentage points compared to that of the Low Rank Matrix Recovery (LR) model (which ranked the second) on the NI dataset. The results demonstrate that the proposed model has the detection effect with more accurate structure and clearer boundary compared to 11 mainstream saliency detection models and effectively suppresses the interference of weak light scenes on the detection performance of salient objects.

Reference | Related Articles | Metrics

Select

Semantic SLAM algorithm based on deep learning in dynamic environment

ZHENG Sicheng, KONG Linghua, YOU Tongfei, YI Dingrong

Journal of Computer Applications 2021, 41 (10): 2945-2951. DOI: 10.11772/j.issn.1001-9081.2020111885

Abstract （442）

PDF （1572KB）（1076）

Save

Concerning the problem that the existence of moving objects in the application scenes will reduce the positioning accuracy and robustness of the visual Synchronous Localization And Mapping (SLAM) system, a semantic information based visual SLAM algorithm in dynamic environment was proposed. Firstly, the traditional visual SLAM front end was combined with the YOLOv4 object detection algorithm, during the extraction of ORB (Oriented FAST and Rotated BRIEF) features of the input image, the image was semantically segmented. Then, the object type was judged to obtain the area of the dynamic object in the image, and the feature points distributed on the dynamic object were eliminated. Finally, the camera pose was solved by using inter-frame matching between the processed feature points and the adjacent frames. The test results on TUM dataset show that, the accuracy of the pose estimation of this algorithm is 96.78% higher than that of ORB-SLAM2 (Orient FAST and Rotated BRIEF SLAM2) in a high dynamic environment, and the average consumption time per frame of tracking thread of the algorithm is 0.065 5 s, which is the shortest time consumption compared to those of the other SLAM algorithms used in dynamic environment. The above experimental results illustrate that the proposed algorithm can realize real-time precise positioning and mapping in dynamic environment.

Reference | Related Articles | Metrics

Select

Facial landmark detection based on ResNeXt with asymmetric convolution and squeeze excitation

WANG Hebing, ZHANG Chunmei

Journal of Computer Applications 2021, 41 (9): 2741-2747. DOI: 10.11772/j.issn.1001-9081.2020111847

Abstract （316）

PDF （2305KB）（259）

Save

Cascaded Deep Convolutional Neural Network (DCNN) algorithm is the first model that uses Convolutional Neural Network (CNN) in facial landmark detection and the use of CNN improves the accuracy significantly. This strategy needs to perform regression processing to the data between the adjacent stages repeatedly, resulting in complex algorithm procedure. Therefore, a facial landmark detection algorithm based on Asymmetric Convolution-Squeeze Excitation-Next Residual Network (AC-SE-ResNeXt) was proposed with only single-stage regression to simplify the procedure and solve the non-real-time problem of data preprocessing between adjacent stages. In order to keep the accuracy, the Asymmetric Convolution (AC) module and the Squeeze-and-Excitation (SE) module were added to Next Residual Network (ResNeXt) block to construct the AC-SE-ResNeXt network model. At the same time, in order to fit faces in complex environments such as different illuminations, postures and expressions better, the AC-SE-ResNeXt network model was deepened to 101 layers. The trained model was tested on datasets BioID and LFPW respectively. The overall mean error rate of the model for the five-point facial landmark detection on BioID dataset was 1.99%, and the overall mean error rate of the model for the five-point facial landmark detection on LFPW dataset was 2.3%. Experimental results show that with the simplified algorithm procedure and end to end processing, the improved algorithm can keep the accuracy as cascaded DCNN algorithm, while has the robustness significantly increased.

Reference | Related Articles | Metrics

Select

General object detection framework based on improved Faster R-CNN

MA Jialiang, CHEN Bin, SUN Xiaofei

Journal of Computer Applications 2021, 41 (9): 2712-2719. DOI: 10.11772/j.issn.1001-9081.2020111852

Abstract （502）

PDF （2181KB）（446）

Save

Aiming at the problem that current detectors based on deep learning cannot effectively detect objects with irregular shapes or large differences between length and width, based on the traditional Faster Region-based Convolutional Neural Network (Faster R-CNN) algorithm, an improved two-stage object detection framework named Accurate R-CNN was proposed. First of all, a novel Intersection over Union (IoU) metric-Effective Intersection over Union (EIoU) was proposed to reduce the proportion of redundant bounding boxes in the training data by using the centrality weight. Then, a context related Feature Reassignment Module (FRM) was proposed to re-encode the features by the remote dependency and local context information of objects, so as to make up for the loss of shape information in the pooling process. Experimental results show that on the Microsoft Common Objects in COntext (MS COCO) dataset, for the bounding box detection task, when using Residual Networks (ResNets) with two different depths of 50 and 101 as the backbone networks, Accurate R-CNN has the Average Precision (AP) improvements of 1.7 percentage points and 1.1 percentage points respectively compared to the baseline model Faster R-CNN, which are significantly than those of the detectors based on mask with the same backbone networks. After adding mask branch, for the instance segmentation task, when ResNets with two different depths are used as the backbone networks, the mask Average Precisions of Accurate R-CNN are increased by 1.2 percentage points and 1.1 percentage points respectively compared with Mask Region-based Convolutional Neural Network (Mask R-CNN). The research results illustrate that compared to the baseline model, Accurate R-CNN achieves better performance on different datasets and different tasks.

Reference | Related Articles | Metrics

Select

3D point cloud face recognition based on deep learning

GAO Gong, YANG Hongyu, LIU Hong

Journal of Computer Applications 2021, 41 (9): 2736-2740. DOI: 10.11772/j.issn.1001-9081.2020111826

Abstract （501）

PDF （1375KB）（517）

Save

In order to enhance the robustness of the 3D point cloud face recognition system for multiple expressions and multiple poses, a deep learning-based point cloud feature extraction network was proposed, namely ResPoint. The modules such as grouping, sampling and local feature extraction (ResConv) were used in the ResPoint network, and skip connection was used in ResConv module, so that the proposed network had good recognition results for sparse point cloud. Firstly, the nose tip point was located by the geometric feature points of the face, and the face area was cut with this point as the center. The obtained area had noisy points and holes, so Gaussian filtering and 3D cubic interpolation were performed to it. Secondly, the ResPoint network was used to extract features of the preprocessed point cloud data. Finally, the features were combined in the fully connected layer to realize the classification of 3D faces. In the experiments on CASIA 3D face database, the recognition accuracy of the ResPoint network is increased by 5.06% compared with that of the Relation-Shape Convolutional Neural Network (RS-CNN). Experimental results show that the ResPoint network increases the depth of the network while using different convolution kernels to extract features, so that the ResPoint network has better feature extraction capability.

Reference | Related Articles | Metrics

Select

Indoor scene recognition method combined with object detection

XU Jianglang, LI Linyan, WAN Xinjun, HU Fuyuan

Journal of Computer Applications 2021, 41 (9): 2720-2725. DOI: 10.11772/j.issn.1001-9081.2020111815

Abstract （408）

PDF （1357KB）（332）

Save

In the method of combining Object detection Network (ObjectNet) and scene recognition network, the object features extracted by the ObjectNet and the scene features extracted by the scene network are inconsistent in dimensionality and property, and there is redundant information in the object features that affects the scene judgment, resulting in low recognition accuracy of scenes. To solve this problem, an improved indoor scene recognition method combined with object detection was proposed. First, the Class Conversion Matrix (CCM) was introduced into the ObjectNet to convert the object features output by ObjectNet, so that the dimension of the object features was consistent with that of the scene features, as a result, the information loss caused by inconsistency of the feature dimensions was reduced. Then, the Context Gating (CG) mechanism was used to suppress the redundant information in the features, reducing the weight of irrelevant information, and increasing the contribution of object features in scene recognition. The recognition accuracy of the proposed method on MIT Indoor67 dataset reaches 90.28%, which is 0.77 percentage points higher than that of Spatial-layout-maintained Object Semantics Features (SOSF) method; and the recognition accuracy of the proposed method on SUN397 dataset is 81.15%, which is 1.49 percentage points higher than that of Hierarchy of Alternating Specialists (HoAS) method. Experimental results show that the proposed method improves the accuracy of indoor scene recognition.

Reference | Related Articles | Metrics

Select

Object tracking algorithm of fully-convolutional Siamese networks with rotation and scale estimation

JI Zhangjian, REN Xingwang

Journal of Computer Applications 2021, 41 (9): 2705-2711. DOI: 10.11772/j.issn.1001-9081.2020111805

Abstract （309）

PDF （2425KB）（231）

Save

In the object tracking task, Fully-Convolutional Siamese networks (SiamFC) tracking method has problems of tracking errors or inaccurate tracking results caused by the rotation and scale variation of objects. Therefore, a SiamFC tracking algorithm with rotation and scale estimation was proposed, which consists of location module and rotation-scale estimation module. Firstly, in the location module, the tracking position was obtained by using SiamFC algorithm, and this position was adjusted by combining the rotation and scale information. Then, in the rotation-scale estimation module, as the image rotation and scale variation were converted into translational motions in log-polar coordinate system, the object search area was transformed from Cartesian coordinate system to log-polar coordinate system, so that the scale and rotation angle of the object were estimated by using correlation filtering technology. Finally, an object tracking model which can simultaneously estimate object position, rotation angle and scale variation was obtained. In the comparison experiments, the proposed algorithm had the success rate and accuracy of 57.7% and 81.4% averagely on Visual Tracker Benchmark 2015 (OTB2015) dataset, and had the success rate and accuracy of 51.8% and 53.3% averagely on Planar Object Tracking in the wild (POT) dataset with object rotation and scale variation. Compared with the success rate and accuracy of SiamFC algorithm, those of the proposed algorithm were increased by 13.5 percentage points and 13.4 percentage points averagely. Experimental results verify that the proposed algorithm can effectively solve the tracking challenges caused by object rotation and scale variation.

Reference | Related Articles | Metrics

Select

Remote sensing scene classification based on bidirectional gated scale feature fusion

SONG Zhongshan, LIANG Jiarui, ZHENG Lu, LIU Zhenyu, TIE Jun

Journal of Computer Applications 2021, 41 (9): 2726-2735. DOI: 10.11772/j.issn.1001-9081.2020111778

Abstract （317）

PDF （3143KB）（262）

Save

There are large differences in shape, texture and color of images in remote sensing image datasets, and the classification accuracy of remote sensing scenes is low due to the scale differences cased by different shooting heights and angles. Therefore, a Feature Aggregation Compensation Convolution Neural Network (FAC-CNN) was proposed, which used active rotation aggregation to fuse features of different scales and improved the complementarity between bottom features and top features through bidirectional gated method. In the network, the image pyramid was used to generate images of different scales and input them into the branch network to extract multi-scale features, and the active rotation aggregation method was proposed to fuse features of different scales, so that the fused features have directional information, which improved the generalization ability of the model to different scale inputs and different rotation inputs, and improved the classification accuracy of the model. On NorthWestern Polytechnical University REmote Sensing Image Scene Classification (NWPU-RESISC) dataset, the accuracy of FAC-CNN was increased by 2.05 percentage points and 2.69 percentage points respectively compared to those of Attention Recurrent Convolutional Network based on VGGNet (ARCNet-VGGNet) and Gated Bidirectional Network (GBNet); and on Aerial Image Dataset (AID), the accuracy of FAC-CNN was increased by 3.24 percentage points and 0.86 percentage points respectively compared to those of the two comparison networks. Experimental results show that FAC-CNN can effectively solve the problems in remote sensing image datasets and improve the accuracy of remote sensing scene classification.

Reference | Related Articles | Metrics

Select

Detection of left and right railway tracks based on deep convolutional neural network and clustering

ZENG Xiangyin, ZHENG Bochuan, LIU Dan

Journal of Computer Applications 2021, 41 (8): 2324-2329. DOI: 10.11772/j.issn.1001-9081.2021030385

Abstract （328）

PDF （1502KB）（477）

Save

In order to improve the accuracy and speed of railway track detection, a new method of detecting left and right railway tracks based on deep Convolutional Neural Network (CNN) and clustering was proposed. Firstly, the labeled images in the dataset were processed, each origin labeled image was divided into many grids uniformly, and the railway track information in each grid region was represented by one pixel, so as to construct the reduced images of railway track labeled images. Secondly, based on the reduced labeled images, a new deep CNN for railway track detection was proposed. Finally, a clustering method was proposed to distinguish left and right railway tracks. The proposed left and right railway track detection method can reach accuracy of 96% and speed of 155 frame/s on images with size of 1000 pixel×1000 pixel. Experimental results demonstrate that the proposed method not only has high detection accuracy, but also has fast detection speed.

Reference | Related Articles | Metrics

Select

Indefinite reconstruction method of spatial data based on multi-resolution generative adversarial network

GUAN Qijie, ZHANG Ting, LI Deya, ZHOU Shaojing, DU Yi

Journal of Computer Applications 2021, 41 (8): 2306-2311. DOI: 10.11772/j.issn.1001-9081.2020101541

Abstract （326）

PDF （1224KB）（293）

Save

In the field of indefinite spatial data reconstruction, Multiple-Point Statistics (MPS) has been widely used, but its applicability is affected due to the high computational cost. A spatial data reconstruction method based on a multi-resolution Generative Adversarial Network (GAN) model was proposed by using a pyramid structured fully convolutional GAN model to learn the data training images with different resolutions. In the method, the detailed features were captured from high-resolution training images and large-scale features were captured from low-resolution training images. Therefore, the image reconstructed by this method contained the global and local structural information of the training image while maintaining a certain degree of randomness. By comparing the proposed algorithm with the representative algorithms in MPS and the GAN method applied in spatial data reconstruction, it can be seen that the total time of 10 reconstructions of the proposed algorithm is reduced by about 1 h, the difference between the average porosity of the algorithm and the training image porosity is reduced to 0.000 2, and the variogram curve and the Multi-Point Connectivity (MPC) curve of the algorithm are closer to those of the training image, showing that the proposed algorithm has better reconstruction quality.

Reference | Related Articles | Metrics

Select

Classification of functional magnetic resonance imaging data based on semi-supervised feature selection by spectral clustering

ZHU Cheng, ZHAO Xiaoqi, ZHAO Liping, JIAO Yuhong, ZHU Yafei, CHENG Jianying, ZHOU Wei, TAN Ying

Journal of Computer Applications 2021, 41 (8): 2288-2293. DOI: 10.11772/j.issn.1001-9081.2020101553

Abstract （340）

PDF （1318KB）（363）

Save

Aiming at the high-dimensional and small sample problems of functional Magnetic Resonance Imaging (fMRI) data, a Semi-Supervised Feature Selection by Spectral Clustering (SS-FSSC) model was proposed. Firstly, the prior brain region template was used to extract the time series signal. Then, the Pearson correlation coefficient and the Order Statistics Correlation Coefficient (OSCC) were selected to describe the functional connection features between the brain regions, and spectral clustering was performed to the features. Finally, the feature importance criterion based on Constraint score was adopted to select feature subsets, and the subsets were input into the Support Vector Machine (SVM) classifier for classification. By 100 times of five-fold cross-validation on the COBRE (Center for Biomedical Research Excellence) schizophrenia public dataset in the experiments, it is found that when the number of retained features is 152, the highest average accuracy of the proposed model to schizophrenia is about 77%, and the highest accuracy of the proposed model to schizophrenia is 95.83%. Experimental result analysis shows that by only retaining 16 functional connection features for classifier training, the model can stably achieve an average accuracy of more than 70%. In addition, in the results obtained by the proposed model, Intracalcarine Cortex has the highest occurrence frequency among the 10 brain regions corresponding to the functional connections, which is consistent to the existing research state about schizophrenia.

Reference | Related Articles | Metrics

Select

Review of remote sensing image change detection

REN Qiuru, YANG Wenzhong, WANG Chuanjian, WEI Wenyu, QIAN Yunyun

Journal of Computer Applications 2021, 41 (8): 2294-2305. DOI: 10.11772/j.issn.1001-9081.2020101632

Abstract （996）

PDF （1683KB）（1083）

Save

As a key technology of land use/land cover detection, change detection aims to detect the changed part and its type in the remote sensing data of the same region in different periods. In view of the problems in traditional change detection methods, such as heavy manual labor and poor detection results, a large number of change detection methods based on remote sensing images have been proposed. In order to further understand the change detection technology based on remote sensing images and further study on the change detection methods, a comprehensive review of change detection was carried out by sorting, analyzing and comparing a large number of researches on change detection. Firstly, the development process of change detection was described. Then, the research progress of change detection was summarized in detail from three aspects:data selection and preprocessing, change detection technology, post-processing and precision evaluation, where the change detection technology was mainly summarized from analysis unit and comparison method respectively. Finally, the summary of the problems in each stage of change detection was performed and the future development directions were proposed.

Reference | Related Articles | Metrics

Select

Nonlinear constraint based quasi-homography warps for image stitching

WANG Huai, WANG Zhanqing

Journal of Computer Applications 2021, 41 (8): 2318-2323. DOI: 10.11772/j.issn.1001-9081.2020101637

Abstract （327）

PDF （2008KB）（263）

Save

In order to solve the problem of longitudinal projection distortion in non-overlapping regions of images caused by the quasi-homography warp algorithm for image stitching, an image stitching algorithm based on nonlinear constraint was proposed. Firstly, the nonlinear constraint was used to smoothly transit the image regions around the dividing line. Then, the linear equation of quasi-homography warp was replaced by a parabolic equation. Finally, the mesh-based method was used to improve the speed of image texture mapping and the method based on optimal stitching line was used to fuse the images. For images of 1 200 pixel×1 600 pixel, the time consumption range of texture mapping by the proposed algorithm is 4 s to 7 s, and the proposed algorithm has the average deviation degree of diagonal structure is 11 to 31. Compared with the quasi-homography warp algorithm for image stitching, the proposed algorithm has the time consumption of texture mapping reduced by 55% to 67%, and the average deviation degree of diagonal structure reduced by 36% to 62%. It can be seen that the proposed algorithm not only corrects the oblique diagonal structure, but also improves the efficiency of image stitching. Experimental results show that the proposed algorithm has better results in improving the visual effect of stitched images.

Reference | Related Articles | Metrics

Select

Review of deep learning-based medical image segmentation

CAO Yuhong, XU Hai, LIU Sun'ao, WANG Zixiao, LI Hongliang

Journal of Computer Applications 2021, 41 (8): 2273-2287. DOI: 10.11772/j.issn.1001-9081.2020101638

Abstract （1723）

PDF （2539KB）（1404）

Save

As a fundamental and key task in computer-aided diagnosis, medical image segmentation aims to accurately recognize the target regions such as organs, tissues and lesions at pixel level. Different from natural images, medical images show high complexity in texture and have the boundaries difficult to judge caused by ambiguity, which is the fault of much noise due to the limitations of the imaging technology and equipment. Furthermore, annotating medical images highly depends on expertise and experience of the experts, thereby leading to limited available annotations in the training and potential annotation errors. For medical images suffer from ambiguous boundary, limited annotated data and large errors in the annotations, which makes it is a great challenge for the auxiliary diagnosis systems based on traditional image segmentation algorithms to meet the demands of clinical applications. Recently, with the wide application of Convolutional Neural Network (CNN) in computer vision and natural language processing, deep learning-based medical segmentation algorithms have achieved tremendous success. Firstly the latest research progresses of deep learning-based medical image segmentation were summarized, including the basic architecture, loss function, and optimization method of the medical image segmentation algorithms. Then, for the limitation of medical image annotated data, the mainstream semi-supervised researches on medical image segmentation were summed up and analyzed. Besides, the studies related to measuring uncertainty of the annotation errors were introduced. Finally, the characteristics summary and analysis as well as the potential future trends of medical image segmentation were listed.

Reference | Related Articles | Metrics

Select

Noise image segmentation by adaptive wavelet transform based on artificial bee swarm and fuzzy C-means

SHI Xuesong, LI Xianhua, SUN Qing, SONG Tao

Journal of Computer Applications 2021, 41 (8): 2312-2317. DOI: 10.11772/j.issn.1001-9081.2020101684

Abstract （286）

PDF （3644KB）（262）

Save

Aiming at the problem that traditional Fuzzy C-Means (FCM) clustering algorithm is easily affected by noise in processing noise images, a noise image segmentation method of wavelet domain feature enhancement based on FCM was proposed. Firstly, the noise image was decomposed by two-dimensional wavelet. Secondly, the approximate coefficient was enhanced at the edge, and Artificial Bee Colony (ABC) optimization algorithm was used to perform threshold processing to the detail coefficients, and then the wavelet reconstruction was carried out for the processed coefficients. Finally, the reconstructed image was segmented by FCM algorithm. Five typical grayscale images were selected, and were added with Gaussian noise and salt-and-pepper noise respectively. Various methods were used to segment them, and the Peak Signal-to-Noise Ratio (PSNR) and Misclassification Error (ME) of the segmented images were taken as performance indicators. Experimental results show that the PSNR of the images segmented by the proposed method is at most 281% and 54% higher than the PSNR of the images segmented by the traditional FCM clustering algorithm segmentation method and Particle Swarm Optimization (PSO) segmentation method respectively, and the segmented images of the proposed method has the ME at most 55% and 41% lower than those of the comparison methods respectively. It can be seen that the proposed segmentation method preserves the edge texture information well, and the anti-noise and segmentation performance of this method are improved.

Reference | Related Articles | Metrics

Select

Medical image fusion based on edge-preserving decomposition and improved sparse representation

PEI Chunyang, FAN Kuangang, MA Zheng

Journal of Computer Applications 2021, 41 (7): 2092-2099. DOI: 10.11772/j.issn.1001-9081.2020081303

Abstract （265）

PDF （4280KB）（457）

Save

Aiming at the problems of artifacts and loss of details in multimodal medical fusion, a two-scale multimodal medical image fusion method framework using multiscale edge-preserving decomposition and sparse representation was proposed. Firstly, the source image was decomposed at multiple scales by utilizing an edge-preserving filter to obtain the smoothing and detail layers of the source image. Then, an improved sparse representation fusion algorithm was employed to fuse the smoothing layers, and on this basis, an image block selection based strategy was proposed to construct the dataset of the over-complete dictionary and the dictionary learning algorithm was used for training the joint dictionary, as well as a novel multi-norm based activity level measurement method was introduced to select the sparse coefficients; the detail layers were merged by an adaptive weighted local regional energy fusion rule. Finally, the fused smoothing layer and detail layers were reconstructed with multi-scale to obtain the fused image. Comparison experiments were conducted on the medical images from three different imaging modalities. The results demonstrate that the proposed method preserves more salient edge features with the improvement of contrast and has advantages in both visual effect and objective evaluation compared to other multi-scale transform and sparse representation methods.

Reference | Related Articles | Metrics

Select

Medical image fusion with intuitionistic fuzzy set and intensity enhancement

ZHANG Linfa, ZHANG Yufeng, WANG Kun, LI Zhiyao

Journal of Computer Applications 2021, 41 (7): 2082-2091. DOI: 10.11772/j.issn.1001-9081.2020101539

Abstract （339）

PDF （2743KB）（580）

Save

Image fusion technology plays an important role in computer-aided diagnosis. Detail extraction and energy preservation are two key issues in image fusion, and the traditional fusion methods address them simultaneously by designing the fusion method. However, it tends to cause information loss or insufficient energy preservation. In view of this, a fusion method was proposed to solve the problems of detail extraction and energy preservation separately. The first part of the method aimed at detail extraction. Firstly, the Non-Subsampled Shearlet Transform (NSST) was used to divide the source image into low-frequency and high-frequency subbands. Then, an improved energy-based fusion rule was used to fuse the low-frequency subbands, and an strategy based on the intuitionistic fuzzy set theory was proposed for the fusion of the high-frequency subbands. Finally, the inverse NSST was employed to reconstruct the image. In the second part, an intensity enhancement method was proposed for energy preservation. The proposed method was verified on 43 groups of images and compared with other eight fusion methods such as Principal Component Analysis (PCA) and Local Laplacian Filtering (LLF). The fusion results on two different categories of medical image fusion (Magnetic Resonance Imaging (MRI) and Positron Emission computed Tomography (PET), MRI and Single-Photon Emission Computed Tomography (SPECT)) show that the proposed method can obtain more competitive performance on both visual quality and objective evaluation indicators including Mutual Information (MI), Spatial Frequency (SF), Q value, Average Gradient (AG), Entropy of Information (EI), and Standard Deviation (SD), and can improve the quality of medical image fusion.

Reference | Related Articles | Metrics

Project Articles