Project Articles

    The 39th CCF National Database Conference (NDBC 2022)

    Default Latest Most Read
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Parallel algorithm of betweenness centrality for dynamic networks
    Zhenyu LIU, Chaokun WANG, Gaoyang GUO
    Journal of Computer Applications    2023, 43 (7): 1987-1993.   DOI: 10.11772/j.issn.1001-9081.2022071121
    Abstract402)   HTML57)    PDF (1663KB)(406)       Save

    Betweenness centrality is a common metric for evaluating the importance of nodes in a graph. However, the update efficiency of betweenness centrality in large-scale dynamic graphs is not high enough to meet the application requirements. With the development of multi-core technology, algorithm parallelization has become one of the effective ways to solve this problem. Therefore, a Parallel Algorithm of Betweenness centrality for dynamic networks (PAB) was proposed. Firstly, the time cost of redundant point pairs was reduced through operations such as community filtering, equidistant pruning and classification screening. Then, the determinacy of the algorithm was analyzed and processed to realize parallelization. Comparison experiments were conducted on real datasets and synthetic datasets, and the results show that the update efficiency of PAB is 4 times that of the latest batch-iCENTRAL algorithm on average when adding edges. It can be seen that the proposed algorithm can improve the update efficiency of betweenness centrality in dynamic networks effectively.

    Table and Figures | Reference | Related Articles | Metrics
    Dictionary partition vector space model for ciphertext ranked search in cloud environment
    Jiaxing LU, Hua DAI, Yuanlong LIU, Qian ZHOU, Geng YANG
    Journal of Computer Applications    2023, 43 (7): 1994-2000.   DOI: 10.11772/j.issn.1001-9081.2022071111
    Abstract182)   HTML10)    PDF (1846KB)(133)       Save

    Aiming at the problems that the dimensions of vectors generated by Traditional Vector Space Model (TVSM) are high, and the vector dot product operation to calculate the correlation between the documents and the queried keywords is time-consuming, a Dictionary Partition Vector Space Model (DPVSM) for ciphertext ranked search in cloud environment was proposed. Firstly, the specific definition of DPVSM was given, and it was proved that the relevance score between the queried keywords and the documents in DPVSM was exactly the same as that in TVSM. Then, by adopting the equal-length dictionary partition method, an encrypted vector generation algorithm and a relevance score calculation algorithm between documents and queried keywords were proposed. Experimental results show that the space occupation of document vectors of DPVSM is much lower than that of TVSM, and the more the number of documents, the greater the occupation reduction. In addition, the space occupation of query vectors and the time consumption of relevance score calculation are also much lower than those of TVSM. Obviously, DPVSM is superior to TVSM in both the space efficiency of generated vectors and the efficiency cost of relevance score calculation.

    Table and Figures | Reference | Related Articles | Metrics
    Explainable recommendation mechanism by fusion collaborative knowledge graph and counterfactual inference
    Zifang XIA, Yaxin YU, Ziteng WANG, Jiaqi QIAO
    Journal of Computer Applications    2023, 43 (7): 2001-2009.   DOI: 10.11772/j.issn.1001-9081.2022071113
    Abstract223)   HTML11)    PDF (1898KB)(335)       Save

    In order to construct a transparent and trustworthy recommendation mechanism, relevant research works mainly provide reasonable explanations for personalized recommendation through explainable recommendation mechanisms. However, there are three major limitations of the existing explainable recommendation mechanism: 1) using correlations only can provide rational explanations rather than causal explanations, and using paths to provide explanations will bring privacy leakage; 2) the problem of sparse user feedback is ignored, so it is difficult to guarantee the fidelity of explanations; 3) the granularities of explanations are relatively coarse, and users’ personalized preferences are not considered. To solve the above problems, an explainable recommendation mechanism ERCKCI based on Collaborative Knowledge Graph (CKG) and counterfactual inference was proposed. Firstly, based on the user’s own behavior sequence, the counterfactual inference was used to achieve high-sparsity causal decorrelation by using the casual relations, and the counterfactual explanations were derived iteratively. Secondly, in order to improve the fidelity of explanations, not only the CKG and the neighborhood propagation mechanism of the Graph Neural Network (GNN) were used to learn users’ and items’ representations based on single time slice; but also the user long-short term preference were captured to enhance user preference representation through self-attention mechanism on multiple time slices. Finally, via a higher-order connected subgraph of the counterfactual set, the multi-granularity personalized preferences of user was captured to enhance counterfactual explanations. To verify the effectiveness of ERCKCI mechanism, comparison experiments were performed on the public datasets MovieLens(100k), Book-crossing and MovieLens(1M). The obtained results show that compared with the Explainable recommendation based on Counterfactual Inference (ECI) algorithm under the Relational Collaborative Filtering (RCF) recommendation model on the first two datasets, the proposed mechanism has the explanation fidelity improved by 4.89 and 3.38 percentage points respectively, the size of CF set reduced by 63.26% and 66.24% respectively, and the sparsity index improved by 1.10 and 1.66 percentage points respectively; so the explainability is improved effectively by the proposed mechanism.

    Table and Figures | Reference | Related Articles | Metrics
    OmegaDB: concurrent computing framework of relational operators for heterogeneous architecture
    Jinhui LAI, Zichen XU, Yicheng TU, Guolong TAN
    Journal of Computer Applications    2023, 43 (7): 2017-2025.   DOI: 10.11772/j.issn.1001-9081.2022071131
    Abstract219)   HTML10)    PDF (2915KB)(249)       Save

    There is a possibility of overlapping access data paths and shared computation among different queries in database systems, and batch processing of queries in workloads is called Multiple-Query-at-a-Time model. Several developed multi-query processing frameworks have been proven effective, but all of them lack a general framework for building complete query processing and optimization methods. On the basis of a query time operator merging optimization framework constructed based on equivalent transformation, a relational operator concurrent computing framework for heterogeneous architectures, called OmegaDB, was proposed. In this framework, by studying the GPU-oriented relational operator flow-batch computing model, and constructing the relational data query pipeline, a flow-batch computing method aggregating multiple-query was implemented on the CPU-GPU heterogeneous architecture. On experiments and prototype implementation, the advantages of OmegaDB were verified through theoretical analysis and experimental results by comparing with Relational Database Management System (RDBMS), and the potential of OmegaDB in utilizing new hardware was shown. According to the theoretical study of query optimization frameworks of Multiple-Query-at-a-Time models based on the traditional relational algebraic rules, several optimization methods were proposed and future research directions were prospected. Using TPC-H business intelligence computing as a benchmarking program, the results show that OmegaDB achieves up to 24 times end-to-end speedup while consuming lower disk I/O and CPU time than the modern advanced commercial database system SQL SERVER.

    Table and Figures | Reference | Related Articles | Metrics
    Dynamic aggregate nearest neighbor query algorithm in weighted road network space
    Fangshu CHEN, Wei ZHANG, Xiaoming HU, Yufei ZHANG, Xiankai MENG, Linxiang SHI
    Journal of Computer Applications    2023, 43 (7): 2026-2033.   DOI: 10.11772/j.issn.1001-9081.2022091371
    Abstract190)   HTML5)    PDF (2757KB)(177)       Save

    As a classical problem in spatial databases, Aggregate Nearest Neighbor (ANN) query is of great importance in the optimization of network link structures, the location selection of logistics distribution points and the car-sharing services, and can effectively contribute to the development of fields such as logistics, mobile Internet industry and operations research. The existing research has some shortcomings: lack of efficient index structure for large-scale dynamic road network data, low query efficiency of the algorithms when the data point locations move in real time and network weights update dynamically. To address these problems, an ANN query algorithm in dynamic scenarios was proposed. Firstly, with adopting G-tree as the road network index, a pruning algorithm combining spatial index structures such as quadtrees and k-d trees with the Incremental Euclidean Restriction (IER) algorithm was proposed to solve ANN queries in statistic space. Then, aiming at the issue of frequent updates of data point locations in dynamic scenarios, the time window and safe zone update strategy were added to reduce the iteration times of the algorithm, experimental results showed that the efficiency could be improved by 8% to 85%. Finally, for ANN query problems with road network weight changed, based on historical query results, two correction based continuous query algorithms were proposed to obtain the current query results according to the increment of weight changes. In certain scenarios, these algorithms can reduce errors by approximately 50%. The theoretical research and experimental results show that the proposed algorithms can solve the ANN query problems in dynamic scenarios efficiently and more accurately.

    Table and Figures | Reference | Related Articles | Metrics
    Approximate query processing approach based on deep autoregressive model
    Libin CEN, Jingdong LI, Chunbo LIN, Xiaoling WANG
    Journal of Computer Applications    2023, 43 (7): 2034-2039.   DOI: 10.11772/j.issn.1001-9081.2022071128
    Abstract211)   HTML12)    PDF (1513KB)(243)       Save

    Recently, Approximate Query Processing (AQP) of aggregate functions is a research hotspot in the database field. Existing approximate query techniques have problems such as high query response time cost, high storage overhead, and no support for multi-predicate queries. Thus, a deep autoregressive model-based AQP approach DeepAQP (Deep Approximate Query Processing) was proposed. DeepAQP leveraged deep autoregressive model to learn the joint probability distribution of multi-column data in the table in order to estimate the selectivity and the target column’s probability distribution of the given query, enhancing the ability to handle the approximate query requests of aggregation functions with multiple predicates in a single table. Experiments were conducted on TPC-H and TPC-DS datasets. The results show that compared with VerdictDB, which is a sample-based method, DeepAQP has the query response time reduced by 2 to 3 orders of magnitude, and the storage space reduced by 3 orders of magnitude; compared with DBEst++, which is a machine learning-based method, DeepAQP has the query response time reduced by 1 order of magnitude and the model training time reduced significantly. Besides, DeepAQP can handle with multi-predicate query requests, for which DBEst++ does not support. It can be seen that DeepAQP achieves good accuracy and speed at the same time and reduces the training and storage overhead of algorithm significantly.

    Table and Figures | Reference | Related Articles | Metrics
    GTC: geo-distributed triangle counting algorithm in graph streams
    Chunze CAO, Delong MA, Ye YUAN
    Journal of Computer Applications    2023, 43 (7): 2040-2048.   DOI: 10.11772/j.issn.1001-9081.2022071130
    Abstract149)   HTML3)    PDF (1947KB)(162)       Save

    The existing distributed triangle counting algorithms assume that all computing nodes are in the same location, while in reality, the nodes may be located in multiple data centers across continents. Geo-distributed data centers which are connected with wide area networks have characteristics of heterogeneous network bandwidth, high communication cost and uneven distribution, and the existing distributed algorithms cannot be applied to geo-distributed environment. At the same time, the existing research which ignores the temporal locality of the formation of triangles mostly adopts strategies such as random sampling and elimination of edges. Therefore, the triangle counting problem of real graph streams in geo-distributed environment was studied, and a Geo-distributed Triangle Counting (GTC) algorithm was proposed. Firstly, aiming at the problem of too high data transmission caused by the existing edge distribution strategy, a geo-distributed edge distribution strategy was proposed to build a benefit formula combining the time benefit and data benefit of communication and use point-to-point communication to replace broadcast edges. Then, for the triangle repeated counting problem caused by point-to-point communication in geo-distributed environment, a final edge calculation rule was proposed to ensure no counting repetition. Finally, based on the time weighted sampling algorithm, a time-weighted triangle counting algorithm was proposed to use the time locality of the triangle to sample. The GTC was compared with Conditional Counting and Sampling (CoCoS) and Tri-Fly on five real graph streams. The results show that GTC has the communication data size decreased by 17% compared to CoCoS and decreased by 44% compared to Tri-Fly, GTC has the error rate decreased by 53% compared to Tri-Fly and slightly less than CoCoS, and GTC has the running time decreased by 34% compared to Tri-Fly and slightly more than CoCoS. It can be seen that the GTC can reduce the size of communication data effectively while ensuring high accuracy and short algorithm running time.

    Table and Figures | Reference | Related Articles | Metrics
    Closed high utility quantitative itemset mining algorithm on incremental data
    Zhihui SHAN, Meng HAN, Qiang HAN
    Journal of Computer Applications    2023, 43 (7): 2049-2056.   DOI: 10.11772/j.issn.1001-9081.2022091333
    Abstract161)   HTML3)    PDF (2376KB)(130)       Save

    High Utility Itemset (HUI) mining can provide information about the combination of highly profitable items in a dataset, which is useful for developing effective marketing strategies in real-world applications. However, HUIs only provide the itemsets and their total utility, not the purchased numbers of individual items, and the numbers of items in a real scenarios provide more precise accurate information. Therefore, High Utility Quantitative Itemset (HUQI) mining algorithms have been proposed by researchers. Focusing on the issue that the current HUQI mining algorithms can only process static data and have the problem of redundant resultsets, an incrementally updated quantitative utility list structure was proposed for storing and updating the utility information of items in the dataset, and based on this structure, an algorithm for mining Closed High Utility Quantitative Itemset (CHUQI) was proposed. The time and memory consumption of the proposed algorithm was compared with that of Faster High Utility Quantitative Itemset Miner (FHUQI-Miner) algorithm in terms of the number of result sets, minimum utility threshold, number of batches, and scalability. Experimental results show that the proposed algorithm can process incremental data effectively and mine more interesting itemsets.

    Table and Figures | Reference | Related Articles | Metrics
    PrivSPM: frequent sequential pattern mining algorithm under local differential privacy
    Shuo HUANG, Yanhui LI, Jianqiu CAO
    Journal of Computer Applications    2023, 43 (7): 2057-2064.   DOI: 10.11772/j.issn.1001-9081.2022091365
    Abstract177)   HTML4)    PDF (1710KB)(277)       Save

    Sequential data may contain a lot of sensitive information, so that directly mining frequent patterns of sequential data would carry significant risk to privacy of individuals. By resisting attackers with any background knowledge, Local Differential Privacy (LDP) can provide more comprehensive protection for sensitive information. Due to the inherent sequentiality and high-dimensionality, it is challenging to mine frequent sequential patterns with the application of LDP. To tackle this problem, a top-k frequent sequential pattern mining algorithm satisfying ε-LDP, called PrivSPM, was proposed. In this algorithm, filling and sampling technologies, adaptive frequency estimation algorithm and frequent item prediction technology were integrated to construct candidate item. Based on the new domain, an exponential mechanism based strategy was employed to perturb the user data, and the final frequent sequential patterns were identified by combining the frequency estimation algorithm. Theoretical analysis proves that the proposed algorithm satisfies ε-LDP. Experimental results on three real datasets demonstrate that PrivSPM algorithm performs better than the comparison algorithm on True Positive Rate (TPR) and Normalized Cumulative Rank (NCR), and can improve the accuracy of mined results effectively.

    Table and Figures | Reference | Related Articles | Metrics
    Differential privacy generative adversarial network algorithm with dynamic gradient threshold clipping
    Shaoquan CHEN, Jianping CAI, Lan SUN
    Journal of Computer Applications    2023, 43 (7): 2065-2072.   DOI: 10.11772/j.issn.1001-9081.2022071114
    Abstract186)   HTML4)    PDF (1824KB)(236)       Save

    Most of the existing methods combining Generative Adversarial Network (GAN) and differential privacy use gradient perturbation to achieve privacy protection, that is in the process of optimization, the gradient clipping technology was used to constrain the sensitivity of the optimizer to single data, and random noise is added to the clipped gradient to achieve the purpose of model protection. However, most methods take the clipping threshold as a fixed parameter during training. Whether the threshold is too large or too small, the performance of the model will be affected. To solve this problem, DGC_DPGAN (Dynamic Gradient Clipping Differential Privacy Generative Adversarial Network) with dynamic gradient threshold clipping was proposed to consider privacy protection and model performance at the same time. In this algorithm, combined with the pre-training technology, in the process of optimization, the mean gradient F-norm value of each batch of privacy data was obtained as the dynamic gradient clipping threshold at first, and then the gradient was perturbed. Considering different clipping orders, CLIP_DGC_DPGAN (Clip Dynamic Gradient Clipping Differential Privacy Generative Adversarial Network), which clipping first and adding noise after, and DGC_DPGAN, which adding noise first and clipping after, were proposed, and Rényi Accountant was used to calculate the privacy loss. Experimental results show that under the same privacy budget, the two proposed dynamic gradient clipping algorithms are better than the fixed gradient threshold clipping method. On Mnist dataset, the two proposed algorithm has the Inception Score (IS), Structural SIMilarity (SSIM), and Convolutional Neural Network (CNN) classification accuracy improved by 0.32 to 3.92, 0.03 to 0.27, and 7% to 44% respectively; on Fashion-Mnist dataset, the two proposed algorithm has the IS, SSIM, and CNN classification accuracy improved by 0.40 to 4.32, 0.01 to 0.44 and 20% to 51% respectively. At the same time, the usability of the images generated by GAN model is better.

    Table and Figures | Reference | Related Articles | Metrics
    Self-regularization optimization methods for Non-IID data in federated learning
    Mengjie LAN, Jianping CAI, Lan SUN
    Journal of Computer Applications    2023, 43 (7): 2073-2081.   DOI: 10.11772/j.issn.1001-9081.2022071122
    Abstract292)   HTML13)    PDF (4171KB)(223)       Save

    Federated Learning (FL) is a new distributed machine learning paradigm that breaks down data barriers and protects data privacy at the same time, thereby enabling clients to collaboratively train a machine learning model without sharing local data. However, how to deal with Non-Independent Identical Distribution (Non-IID) data from different clients remains a huge challenge faced by FL. Some existing proposed solutions to this problem do not utilize the implicit relationship between local and global models to solve the problem simply and efficiently. To address the Non-IID issue of different clients in FL, novel FL optimization algorithms including Federated Self-Regularization (FedSR) and Dynamic Federated Self-Regularization (Dyn-FedSR) were proposed. In FedSR, self-regularization penalty terms were introduced in each training round to modify the local loss function dynamically, and by building a relationship between the local and the global models, the local model was closer to the global model that aggregates rich knowledge, thereby alleviating the client drift problem caused by Non-IID data. In Dyn-FedSR, the self-regularization term coefficient was determined dynamically by calculating the similarity between the local and global models. Extensive experimental analyses on different tasks demonstrate that the two algorithms, FedSR and Dyn-FedSR, significantly outperform the state-of-the-art FL algorithms such as Federated Averaging (FedAvg) algorithm, Federated Proximal (FedProx) optimization algorithm and Stochastic Controlled Averaging algorithm (SCAFFOLD) in various scenarios, and can achieve efficient communication and high accuracy, as well as the robustness to imbalanced data and uncertain local updates.

    Table and Figures | Reference | Related Articles | Metrics
    Sparse reward exploration mechanism fusing curiosity and policy distillation
    Ziteng WANG, Yaxin YU, Zifang XIA, Jiaqi QIAO
    Journal of Computer Applications    2023, 43 (7): 2082-2090.   DOI: 10.11772/j.issn.1001-9081.2022071116
    Abstract161)   HTML6)    PDF (1696KB)(237)       Save

    Deep reinforcement learning algorithms are difficult to learn optimal policy through interaction with environment in reward sparsity environments, so that the intrinsic reward needs to be built to guide the update of algorithms. However, there are still some problems in this way: 1) statistical inaccuracy of state classification will misjudge reward value, thereby causing the agent to learn wrong behavior; 2) due to the strong ability of the prediction network to identify state information, the state freshness generated by the intrinsic reward decreases, which affects the learning effect of the optimal policy; 3) due to the random state transition, the information of the teacher strategies is not effectively utilized, which reduces the agent’s ability to explore the environment. To solve the above problems, a reward construction mechanism combining prediction error of stochastic generative network with hash discretization statistics, namely RGNP-HCE (Randomly Generated Network Prediction and Hash Count Exploration), was proposed, and the knowledge of multi-teacher policy was transferred to student policy through distillation. In RGNP-HCE mechanism, the fusion reward was constructed through the idea of curiosity classification. In specific, the global curiosity reward was constructed by stochastic generative network’s prediction error between multiple episodes, and the local curiosity reward was constructed by hash discretization statistics in one episode, which guaranteed the rationality of intrinsic rewards and the correctness of policy gradient updates. In addition, multi-teacher policy distillation provides students with multiple reference directions for exploration, which improved environmental exploration ability of the student policy effectively. Finally, in the test environments of Montezuma’s Revenge and Breakout, experiment of comparing the proposed mechanism with four current mainstream deep reinforcement learning algorithms was carried out, and policy distillation was performed. The results show that compared with the average performance of current high-performance deep reinforcement learning algorithms, the average performance of RGNP-HCE mechanism in both test environments is improved, and the distilled student policy is further improved in average performance, indicating that RGNP-HCE mechanism and policy distillation are effective in improving the exploration ability of agent.

    Table and Figures | Reference | Related Articles | Metrics
    Online incentive mechanism based on quality perception in spatio-temporal crowdsourcing
    Yanan PAN, Qingxian PAN, Zhaoyi YU, Jiajing CHU, Song YU
    Journal of Computer Applications    2023, 43 (7): 2091-2099.   DOI: 10.11772/j.issn.1001-9081.2022071095
    Abstract176)   HTML3)    PDF (2623KB)(153)       Save

    In the real-time and complex network environment, how to motivate workers to participate in tasks and obtain high-quality perception data is the focus of spatio-temporal crowdsourcing research. Based on this, a spatio-temporal crowdsourcing’s online incentive mechanism based on quality perception was proposed. Firstly, in order to adapt to the real-time characteristics of spatio-temporal crowdsourcing, a Phased Online selection of workers Algorithm (POA) was proposed. In this algorithm, the entire crowdsourcing activity cycle was divided into multiple stages under budget constraints, and workers were selected online in each stage. Secondly, in order to improve the accuracy and efficiency of quality prediction, an Improved Expected Maximum (IEM) algorithm was proposed. In this algorithm, the task results submitted by workers with high reliability were given priority in the process of algorithm iteration. Finally, the effectiveness of the proposed incentive mechanism in improving platform utility was verified by comparison experiments on real datasets. Experimental results show that in terms of efficiency, compared with the Improved Two-stage Auction (ITA) algorithm, the Multi-attribute and ITA (M-ITA) algorithm, Lyapunov-based Vickrey-Clarke-Groves (L-VCG) and other auction algorithms, the efficiency of POA has increased by 11.11% on average, and the amount of additional rewards for workers has increased by 12.12% on average, which can encourage workers to move to remote and unpopular areas; In terms of quality estimation, the IEM algorithm has an average improvement of 5.06% in accuracy and 14.2% in efficiency compared to other quality estimation algorithms.

    Table and Figures | Reference | Related Articles | Metrics
    Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation
    Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN
    Journal of Computer Applications    2023, 43 (7): 2100-2106.   DOI: 10.11772/j.issn.1001-9081.2022091364
    Abstract180)   HTML5)    PDF (1507KB)(255)       Save

    Accurate prediction of taxi demands between urban regions can provide decision support information for taxi guidance and scheduling as well as passenger travel recommendation, so as to optimize the relation between taxi supply and demand. However, most of the existing models only focus on modeling and predicting the taxi demands within a region, do not consider enough the spatial-temporal correlation between regions, and pay less attention to the more fine-grained demand prediction between regions. To solve the above problems, a prediction model for taxi demands between urban regions — Origin-Destination fusion with Spatial-Temporal Network (ODSTN) model was proposed. In this model, complex spatial-temporal correlations between regions was captured from spatial dimensions of the regions and region pairs respectively and three temporal dimensions of recent, daily and weekly periods by using graph convolution and attention mechanism, and a new path perception fusion mechanism was designed to combine the multi-angle features and finally realize the taxi demand prediction between urban regions. Experiments were carried out on two real taxi order datasets in Chengdu and Manhattan. The results show that the Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) of ODSTN model are 0.897 1, 3.527 4, 50.655 6% and 0.589 6, 1.163 8, 61.079 4%, respectively, indicating that ODSTN model has high accuracy in taxi demand prediction tasks.

    Table and Figures | Reference | Related Articles | Metrics
    Prompt learning based unsupervised relation extraction model
    Menglin HUANG, Lei DUAN, Yuanhao ZHANG, Peiyan WANG, Renhao LI
    Journal of Computer Applications    2023, 43 (7): 2010-2016.   DOI: 10.11772/j.issn.1001-9081.2022071133
    Abstract558)   HTML18)    PDF (1353KB)(242)       Save

    Unsupervised relation extraction aims to extract the semantic relations between entities from unlabeled natural language text. Currently, unsupervised relation extraction models based on Variational Auto-Encoder (VAE) architecture provide supervised signals to train model through reconstruction loss, which offers a new idea to complete unsupervised relation extraction tasks. Focusing on the issue that this kind of models cannot understand contextual information effectively and relies on dataset inductive biases, a Prompt-based learning based Unsupervised Relation Extraction (PURE) model was proposed, including a relation extraction module and a link prediction module. In the relation extraction module, a context-aware Prompt template function was designed to fuse the contextual information, and the unsupervised relation extraction task was converted into a mask prediction task, so as to make full use of the knowledge obtained during pre-training phase to extract relations. In the link prediction module, supervised signals were provided for the relation extraction module by predicting the missing entities in the triples to assist model training. Extensive experiments on two public real-world relation extraction datasets were carried out. The results show that PURE model can use contextual information effectively and does not rely on dataset inductive biases, and has the evaluation index B-cubed F1 improved by 3.3 percentage points on NYT dataset compared with the state-of-the-art VAE architecture-based model UREVA (Variational Autoencoder-based Unsupervised Relation Extraction model).

    Table and Figures | Reference | Related Articles | Metrics
2024 Vol.44 No.4

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF