Search Result

Select

Commonsense question answering model based on cross-modal contrastive learning

Yuanlong WANG, Tinghua LIU, Hu ZHANG

Journal of Computer Applications 2025, 45 (3): 732-738. DOI: 10.11772/j.issn.1001-9081.2024081139

Abstract （71）

HTML （4）

PDF （772KB）（40）

Save

Commonsense Question Answering （CQA） aims to use commonsense knowledge to answer questions described in natural language automatically to obtain accurate answer， and it belongs to intelligent question answering field. Typically， this task demands background commonsense knowledge to enhance the model in problem-solving capability. While most related methods rely on extracting and utilizing commonsense from textual data， however， commonsense is often implicit and not always represented in the text directly， which affects the application range and effectiveness of these methods. Therefore， a cross-modal contrastive learning-based CQA model was proposed to fully utilize cross-modal information for enriching the expression of commonsense knowledge. Firstly， a cross-modal commonsense representation module was designed to integrate the commonsense bases and a cross-modal large model， thereby obtaining a cross-modal commonsense representation. Secondly， in order to enhance the ability of the model to distinguish among different options， contrastive learning was carried out on the cross-modal representations of problems and options. Finally， the softmax layer was used to generate relevance scores for the problem option pairs， and the option with the highest score was taken as the final predicted answer. Experimental results on public datasets CommonSenseQA （CSQA） and OpenBookQA （OBQA） show that compared to DEKCOR （DEscriptive Knowledge for COmmonsense question answeRing）， the proposed model is improved by 1.46 and 0.71 percentage points respectively in accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Knowledge-guided visual relationship detection model

Yuanlong WANG, Wenbo HU, Hu ZHANG

Journal of Computer Applications 2024, 44 (3): 683-689. DOI: 10.11772/j.issn.1001-9081.2023040413

Abstract （261）

HTML （25）

PDF （1592KB）（321）

Save

The task of Visual Relationship Detection （VRD） is to further detect the relationship between target objects on the basis of target recognition， which belongs to the key technology of visual understanding and reasoning. Due to the interaction and combination between objects， it is easy to cause the combinatorial explosion problem of relationship between objects， resulting in many entity pairs with weak correlation， which in turn makes the subsequent relationship detection recall rate low. To solve the above problems， a knowledge-guided visual relationship detection model was proposed. Firstly， visual knowledge was constructed， data analysis and statistics were carried out on entity labels and relationship labels in common visual relationship detection datasets， and the interaction co-occurrence frequency between entities and relationships was obtained as visual knowledge. Then， the constructed visual knowledge was used to optimize the combination process of entity pairs， the score of entity pairs with weak correlation decreased， while the score of entity pairs with strong correlation increased， and then the entity pairs were ranked according to their scores and the entity pairs with lower scores were deleted； the relationship score was also optimized in a knowledge-guided way for the relationship between entities， so as to improve the recall rate of the model. The effect of the proposed model was verified in the public datasets VG （Visual Genome） and VRD， respectively. In predicate classification tasks， compared with the existing model PE-Net （Prototype-based Embedding Network）， the recall rates Recall@50 and Recall@100 improved by 1.84 and 1.14 percentage points respectively in the VG dataset. Compared to Coacher， the Recall@20， Recall@50 and Recall@100 increased by 0.22， 0.32 and 0.31 percentage points respectively in the VRD dataset.

Table and Figures | Reference | Related Articles | Metrics

Select

Multivariate time series anomaly detection based on multi-domain feature extraction

Pei ZHAO, Yan QIAO, Rongyao HU, Xinyu YUAN, Minyue LI, Benchu ZHANG

Journal of Computer Applications 2024, 44 (11): 3419-3426. DOI: 10.11772/j.issn.1001-9081.2023111636

Abstract （247）

HTML （4）

PDF （754KB）（1022）

PDF（mobile）（1807KB）（23）

Save

Due to the high dimensionality and the complex variable distribution of Multivariate Time Series （MTS） data， the existing anomaly detection models generally suffer from high error rates and training difficulties when dealing with MTS datasets. Moreover， most models only consider the spatial-temporal features of time series samples， which are not sufficient to learn the features of time series. To solve the above problems， a multivariate Time Series anomaly detection model based on Multi-domain Feature Extraction （MFE-TS） was proposed. Firstly， starting from the original data domain， the Long Short-Term Memory （LSTM） network and the Convolutional Neural Network （CNN） were used to extract the temporal correlation and spatial correlation features of the MTS respectively. Secondly， Fourier transform was used to convert the original time series into frequency domain space， and Transformer was used to learn the amplitude and phase features of the data in frequency domain space. Multi-domain feature learning was able to model time series features more comprehensively， thereby improving anomaly detection performance of the model to MTS. In addition， the masking strategy was introduced to further enhance the feature learning ability of the model and make the model have a certain degree of noise resistance. Experimental results show that MFE-TS has superior performance on multiple real MTS datasets， while it still maintain good detection accuracy on datasets with noise.

Table and Figures | Reference | Related Articles | Metrics

Select

Lightweight algorithm of 3D mesh model for preserving detailed geometric features

Yun ZHANG, Shuying WANG, Qing ZHENG, Haizhu ZHANG

Journal of Computer Applications 2023, 43 (4): 1226-1232. DOI: 10.11772/j.issn.1001-9081.2022030434

Abstract （461）

HTML （12）

PDF （3119KB）（274）

Save

An important strategy for lightweighting a 3D model is to use the mesh simplification algorithm to reduce the number of triangular meshes on the model surface. The widely used edge collapse algorithm is more efficient and has better simplification effect than other mesh simplification algorithms， but some detailed geometric features may be damaged or lost during the simplification process of this algorithm. Therefore， the approximate curvature of curve and the average area of the first-order neighborhood triangle of the edge to be collapsed were added as penalty factors to optimize the edge collapse cost of the original algorithm. First， according to the definition of curve curvature in geometry， the calculation formula of the approximate curvature of curve was proposed. Then， in the calculation process of vertex normal vector， two stages - area weighting and interior angle weighting were used to modify the initial normal vector， thereby considering more abundant geometric information of the model. The performance of the optimized algorithm was verified by experiments. Compared with the classical Quadratic Error Metric （QEM） algorithm and the mesh simplification algorithm considering the angle error， the optimized algorithm has the maximum error reduced by 73.96% and 49.77% at least and respectively. Compared with the QEM algorithm， the optimized algorithm has the Hausdorff distance reduced by 17.69% at least. It can be seen that in the process of model lightweighting， the optimized algorithm can reduce the deformation of the model and better maintain its own detailed geometric features.

Table and Figures | Reference | Related Articles | Metrics

Select

Machine reading comprehension model based on event representation

Yuanlong WANG, Xiaomin LIU, Hu ZHANG

Journal of Computer Applications 2022, 42 (7): 1979-1984. DOI: 10.11772/j.issn.1001-9081.2021050719

Abstract （479）

HTML （76）

PDF （916KB）（330）

Save

In order to truly understand a piece of text， it is very important to grasp the main clues of the original text in the process of reading comprehension. Aiming at the questions of main clues in machine reading comprehension， a machine reading comprehension method based on event representation was proposed. Firstly， the textual event graph including the representation of events， the extraction of event elements and the extraction of event relations was extracted from the reading material by clue phrases. Secondly， after considering the time elements， emotional elements of events and the importance of each word in the document， the TextRank algorithm was used to select the events related to the clues. Finally， the answers of the questions were constructed based on the selected clue events. Experimental results show that on the test set composed of the collected 339 questions of clues， the proposed method is better than the sentence ranking method based on TextRank algorithm on BiLingual Evaluation Understudy （BLEU） and Consensus-based Image Description Evaluation （CIDEr） evaluation indexes. In specific， BLEU-4 index is increased by 4.1 percentage points and CIDEr index is increased by 9 percentage points.

Table and Figures | Reference | Related Articles | Metrics

Select

Personalized privacy protection method for data with multiple numerical sensitive attributes

Meishu ZHANG, Yabin XU

Journal of Computer Applications 2020, 40 (2): 491-496. DOI: 10.11772/j.issn.1001-9081.2019091639

Abstract （493）

HTML （0）

PDF （588KB）（470）

Save

The existing privacy protection methods for data with multiple numerical sensitive attributes not only have the problem of large loss of information about quasi-identifier attributes， but also have the problem that they cannot satisfy the user’s personalized need for ranking the importance of numerically sensitive attributes. To solve the above problems， a personalized privacy protection method based on clustering and weighted Multi-Sensitive Bucketization （MSB） was proposed. Firstly， according to the similarity of quasi-identifiers， the dataset was divided into several subsets with similar values of quasi-identifier attributes. Then， considering the different sensitivities of users to sensitive attributes， the sensitivity and the bucket capacity of multi-dimensional buckets were used to calculate the weighted selectivity and to construct the weighted multi-dimensional buckets. Finally， the data were grouped and anonymized according to all above. Eight attributes in UCI’s standard Adult dataset were selected for experiments， and the proposed method was compared with MNSACM and WMNSAPM. Experimental results show that the proposed method is better generally and is significantly superior to the comparison methods in reducing information loss and running time， which improves the data quality and operating efficiency.

Table and Figures | Reference | Related Articles | Metrics

Select

Motor imagery electroencephalogram signal recognition method based on convolutional neural network in time-frequency domain

HU Zhangfang, ZHANG Li, HUANG Lijia, LUO Yuan

Journal of Computer Applications 2019, 39 (8): 2480-2483. DOI: 10.11772/j.issn.1001-9081.2018122553

Abstract （930）

PDF （643KB）（443）

Save

To solve the problem of low recognition rate of motor imagery ElectroEncephaloGram (EEG) signals, considering that EEG signals contain abundant time-frequency information, a recognition method based on Convolutional Neural Network (CNN) in time-frequency domain was proposed. Firstly, Short-Time Fourier Transform (STFT) was applied to preprocess the relevant frequency bands of EEG signals to construct a two-dimensional time-frequency domain map composed of multiple time-frequency maps of electrodes, which was regarded as the input of the CNN. Secondly, focusing on the time-frequency characteristic of two-dimensional time-frequency domain map, a novel CNN structure was designed by one-dimensional convolution method. Finally, the features extracted by CNN were classified by Support Vector Machine (SVM). Experimental results based on BCI dataset show that the average recognition rate of the proposed method is 86.5%, which is higher than that of traditional motor imagery EEG signal recognition method, and the proposed method has been applied to the intelligent wheelchair, which proves its effectiveness.

Reference | Related Articles | Metrics

Select

Adaptive Monte-Carlo localization algorithm integrated with two-dimensional code information

HU Zhangfang, ZENG Linquan, LUO Yuan, LUO Xin, ZHAO Liming

Journal of Computer Applications 2019, 39 (4): 989-993. DOI: 10.11772/j.issn.1001-9081.2018091910

Abstract （802）

PDF （790KB）（438）

Save

Monte Carlo Localization (MCL) algorithm has many problems such as large computation and poor positioning accuracy. Because of the diversity of information carried by two-dimensional code and usability and convenience of two-dimensional code recognition, an adaptive MCL algorithm integrated with two-dimensional code information was proposed. Firstly, the cumulative error of odometer model was corrected by absolute position information provided by two-dimensional code and then sampling was performed. Sencondly, the measurement model provided by laser sensor was used to determine the importance weights of the particles. Finally, as fixed sample set used in the resampling part caused large computation, Kullback-Leibler Distance (KLD) was utilized in resampling to reduce the computation by adaptively adjusting the number of particles required for the next iteration according to the distribution of particles in state space. Experimental result on the mobile robot show that the proposed algorithm improves the localization accuracy by 15.09% and reduces the localization time by 15.28% compared to traditional Monte-Carlo algorithm.

Reference | Related Articles | Metrics

Select

Visual simultaneous location and mapping based on improved closed-loop detection algorithm

HU Zhangfang, BAO Hezhang, CHEN Xu, FAN Tingkai, ZHAO Liming

Journal of Computer Applications 2018, 38 (3): 873-878. DOI: 10.11772/j.issn.1001-9081.2017082004

Abstract （570）

PDF （1040KB）（355）

Save

Aiming at the problem that maps may be not consistent caused by accumulation of errors in visual Simultaneous Location and Mapping (SLAM), a Visual SLAM (V-SLAM) system based on improved closed-loop detection algorithm was proposed. To reduce the cumulative error caused by long operation of mobile robots, an improved closed-loop detection algorithm was introduced. By improving the similarity score function, the perceived ambiguity was reduced and finally the closed-loop recognition rate was improved. At the same time, to reduce the computational complexity, the environment image and depth information were directly obtained by Kinect, and feature extraction and matching was carried out by using small and robust ORB (Oriented FAST and Rotated BRIEF) features. RANdom SAmple Consensus (RANSAC) algorithm was used to delete mismatching pairs to obtain more accurate matching pairs, and then the camera poses were calculated by PnP. More stable and accurate initial estimation poses are critical to back-end processing, which were attained by g2o to carry on unstructured iterative optimization for camera poses. Finally, in the back-end Bundle Adjustment (BA) was used as the core of the map optimization method to optimize poses and road signs. The experimental results show that the system can meet the real-time requirements, and can obtain more accurate pose estimation.

Reference | Related Articles | Metrics

Select

Scale-adaptive face tracking algorithm based on graph cuts theory

HU Zhangfang, QIN Yanghong

Journal of Computer Applications 2017, 37 (4): 1189-1192. DOI: 10.11772/j.issn.1001-9081.2017.04.1189

Abstract （513）

PDF （665KB）（554）

Save

Aiming at the problem of the excessive size-changing while the tracking window is enlarged by traditional Continuously Adaptive MeanShift (Camshift) algorithm in face tracking, an adaptive window face tracking method for Camshift based on graph cuts theory was proposed. Firstly, a graph cut area was created according to the Camshift iteration result of every frame by using graph cuts theory, and the skin lump was found by using Gaussian mixture model as weights of graph cuts. As a result, the tracking window could be updated by the skin lump. Then the real size of the target was obtained by computing the size of skin lump, and whether the target needed to be re-tracked was determined by comparing the size of the skin lump in the tracking window with that in the previous frame. Finally, the skin lump in last frame was used as the tracking target of the next frame. The experimental results demonstrate that the proposed method based on graph cuts can avoid interference of other skin color targets in the background, which effectively reflects the real face size-changing of the human body in rapid movement, and prevents the Camshift algorithm from losing the tracking target and falling into the local optimal solution with good usability and robustness.

Reference | Related Articles | Metrics

Select

Destriping method based on transform domain

LIU Haizhao YANG Wenzhu ZHANG Chen

Journal of Computer Applications 2013, 33 (09): 2603-2605. DOI: 10.11772/j.issn.1001-9081.2013.09.2603

Abstract （634）

PDF （503KB）（551）

Save

To remove the stripe noise from the line scan images, a transform domain destriping method which combined Fourier transform and wavelet decomposition was proposed. Firstly, the image was decomposed using multi-resolution wavelet decomposition to separate the subband which contained the stripe noise from other subbands. Then the subband that contained stripe noise was transformed into Fourier coefficients. The Fourier coefficients were processed by a band-stop filter to remove the stripe noise. The live collected cotton foreign fiber images with stripe noise were used in the simulation experiment. The experimental results indicate that the proposed approach which combined Fourier transform with wavelet decomposition can effectively remove the stripe noise from the image while preserving the characteristics of the original image. It gets better destriping effect than just using Fourier transform or wavelet decomposition separately.

Related Articles | Metrics

Select

Semi-fragile watermark algorithm based on second generation Bandelet and slant transform

WANG Shu ZHANG Min-qing SHEN Jun-wei XIAO Hai-yan

Journal of Computer Applications 2012, 32 (08): 2265-2287. DOI: 10.3724/SP.J.1087.2012.02265

Abstract （965）

PDF （661KB）（418）

Save

Concerning general image operations and attack method, the paper proposed a new semi-fragile watermarking algorithm. In this algorithm, the recovery bits generated from the compressed original image were embedded into the Least Significant Bits (LSB) of the watermarked image, then the switch coefficients coded by Turbo code after second generation Bandelet transform were embedded in midfrequency blocked regions after slant transform. Authentication could be realized and tampers could be located through comparison between error codes generated by Turbo code and coefficients of Bandelet transform. The experimental results show that the algorithm has enough robustness to general image operations, and can detect and locate the place being maliciously tampered accurately, and has good ability to recover the lost image content.

Reference | Related Articles | Metrics

Select

A Solution for Workflow Patterns involving Multiple Instances based on Network Partition

HU Fei-hu ZHANG Dan-dan YANG Hui-yuan MA Ling

Journal of Computer Applications 2011, 31 (05): 1420-1422. DOI: 10.3724/SP.J.1087.2011.01420

Abstract （879）

PDF （442KB）（883）

Save

To realize the building and controlling of workflow patterns involving multiple instances, a solution was proposed from the perspective of network partition. The implementing method was discussed based on RTWD net proposed by HU Fei-hu, et al. in Patent China 201010114083.9. First, the sub-workflows involving multiple instances should be divided into a subnet. Then the related parameters of multiple instances were defined, and multiple instances were controlled based on it. The paper discussed the controlling of sequential, synchronous and asynchronous parallel workflow patterns involving multiple instances based on the method. Because the divided subnet keeps consistent with the definition of workflow model, multiple instances can be scheduled by original workflow engine, which simplifies the realization of multiple instance patterns.

Reference | Related Articles | Metrics

Select

Audio source separation based on Hilbert-Huang transform

Chao-zhu ZHANG Jian-pei ZHANG Xiao-dong SUN

Journal of Computer Applications

Abstract （1617）

PDF （568KB）（853）

Save

The energy frequency distribution of non-stationary signal could not be got correctly with short-time Fourier transform. A new method was proposed to separate the audio sources from a single mixture based on Hilbert-Huang transform. Hilbert transform combined with Intrinsic Mode Functions (IMFs) constituted Hilbert Spectrum (HS) of mixture, which was a time-frequency representation of a non-stationary signal. The HS of mixture was used to derive the independent source subspaces. The time domain source signals were reconstructed by applying the inverse transformation. The simulated results show that the proposed method is efficient and improves the separation performance. It was observed that HS-based TF representation performed better than using STFT.

Related Articles | Metrics

Select

Research of CT image edge detection based on improved ant colony algorithm

Jingh-Hu ZHANG Min GUO

Journal of Computer Applications

Abstract （2279）

PDF （1030KB）（1684）

Save

Ant Colony Algorithm(ACA) was applied in CT image edge detection and a new method of CT image edge detection based on ant colony algorithm was proposed. In order to improve the efficiency of algorithm, detection accuracy and adaptability to various CT images, the basic ant colony algorithm was modified by applying different transfer principles and pheromone updating rules in accordance with different contents of CT image. The computer experiments demonstrate the effectiveness of the proposed algorithm, which satisfies the demand of 3D reconstruction of CT image.

Related Articles | Metrics

Select

Improvement on adaptive mixture Gaussian background model

Quan-min LI Yun-chu ZHANG

Journal of Computer Applications

Abstract （1724）

PDF （874KB）（1731）

Save

To improve the quality of motion segmentation, the background reconstruction and foreground mergence time control mechanism were incorporated into the adaptive mixture Gaussian background model. The background reconstruction algorithm constructed a static background image from a video sequence which contained moving objects in the scene, and then the static background image was used to initialize the background model. The foreground mergence time control mechanism was introduced to make the foreground mergence time adjustable and independent of the model's learning rate. The experimental results prove the effectiveness of the algorithm.

Related Articles | Metrics