1 |
CHARIKAR M. Similarity estimation techniques from rounding algorithms[C]// Proceedings of the 34th ACM Symposium on Theory of Computing. New York: ACM, 2002:380-388. 10.1145/509907.509965
|
2 |
王诚,王宇成. 基于Simhash的大规模文档去重改进算法研究[J]. 计算机技术与发展, 2019, 29(2):115-119. 10.3969/j.issn.1673-629X.2019.02.024
|
|
WANG C, WANG Y C. Research on improved large-scale documents deduplication algorithm based on Simhash[J]. Computer Technology and Development, 2019, 29(2):115-119. 10.3969/j.issn.1673-629X.2019.02.024
|
3 |
BRODER A Z. On the Resemblance and containment of documents[C]// Proceedings of the 1997 International Conference on Compression and Complexity of Sequences. Piscataway: IEEE, 1997: 21-29.
|
4 |
INDYK P, MOTWANI R. Approximate nearest neighbors: towards removing the curse of dimensionality[C]// Proceedings of the 30th ACM Symposium on Theory of Computing. New York: ACM, 1998:604-613. 10.1145/276698.276876
|
5 |
APPLEBY A. MurmurHash[EB/OL]. (2011-03-01) [2022-08-22]..
|
6 |
HUANG P S, HE X, GAO J, et al. Learning deep structured semantic models for Web search using clickthrough data[C]// Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. New York: ACM, 2013:2333-2338. 10.1145/2505515.2505665
|
7 |
SHEN Y, HE X, GAO J, et al. A latent semantic model with convolutional-pooling structure for information retrieval[C]// Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. New York: ACM, 2014:101-110. 10.1145/2661829.2661935
|
8 |
MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07) [2022-08-22].. 10.3126/jiee.v3i1.34327
|
9 |
VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. (2018-02-04) [2022-08-22]..
|
10 |
ZHANG T, LIU B, NIU D, et al. Multiresolution graph attention networks for relevance matching[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York: ACM, 2018:933-942. 10.1145/3269206.3271806
|
11 |
LIU B, NIU D, WEI H, et al. Matching article pairs with graphical decomposition and convolutions[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 6284-6294. 10.18653/v1/p19-1632
|
12 |
彭双和,图尔贡·麦提萨比尔,周巧凤. 基于Simhash的中文文本去重技术研究[J]. 计算机技术与发展, 2017, 27(11):137-140, 145. 10.3969/j.issn.1673-629X.2017.11.030
|
|
PENG S H, MAITISABIER T, ZHOU Q F. Research on deduplication technique of Chinese text with Simhash[J]. Computer Technology and Development, 2017, 27(11):137-140, 145. 10.3969/j.issn.1673-629X.2017.11.030
|
13 |
张亚男,陈卫卫,付印金,等. 基于Simhash改进的文本去重算法[J]. 计算机技术与发展, 2022, 32(8):26-32. 10.3969/j.issn.1673-629X.2022.08.005
|
|
ZHANG Y N, CHEN W W, FU Y J, et al. Improved text deduplication algorithm based on Simhash[J]. Computer Technology and Development, 2022, 32(8): 26-32. 10.3969/j.issn.1673-629X.2022.08.005
|
14 |
SUN Y, QIU H, ZHENG Y, et al. SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model[J]. IEEE Access, 2020, 8:10896-10906. 10.1109/access.2020.2965087
|
15 |
YE J, GUI T, LUO Y, et al. One2Set: generating diverse keyphrases as a set[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2021:4598-4608. 10.18653/v1/2021.acl-long.354
|
16 |
BARUNI J S, SATHIASEELAN J G R. Keyphrase extraction from document using RAKE and TextRank algorithms[J]. International Journal of Computer Science and Mobile Computing, 2020, 9(9):83-93. 10.47760/ijcsmc.2020.v09i09.009
|
17 |
CHO T, LEE J H. Latent keyphrase extraction using LDA model[J]. Journal of Korean Institute of Intelligent Systems, 2015, 25(2):180-185. 10.5391/jkiis.2015.25.2.180
|
18 |
朱泽德,李淼,张健,等. 一种基于LDA模型的关键词抽取方法[J]. 中南大学学报(自然科学版), 2015, 46(6):2142-2148.
|
|
ZHU Z D, LI M, ZHANG J, et al. A LDA-based approach to keyphrase extraction[J]. Journal of Central South University (Science and Technology), 2015, 46(6):2142-2148.
|
19 |
DING L, ZHANG Z, LIU H, et al. Automatic keyphrase extraction from scientific Chinese medical abstracts based on character-level sequence labeling[J]. Journal of Data and Information Science, 2021, 6(3):35-57. 10.2478/jdis-2021-0013
|
20 |
HAMILTON W L, YING R, LESKOVEC J. Representation learning on graphs: methods and applications[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2017, 40(3):52-74.
|
21 |
KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22) [2022-08-22].. 10.48550/arXiv.1609.02907
|
22 |
PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: ACL, 2018:2227-2237. 10.18653/v1/n18-1202
|
23 |
CHE W, LIU Y, WANG Y, et al. Towards better UD parsing: deep contextualized word embeddings, ensemble, and treebank concatenation[C]// Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Stroudsburg, PA: ACL, 2018:55-64.
|
24 |
ARORA S, LIANG Y, MA T. A simple but tough-to-beat baseline for sentence embeddings[EB/OL] (2022-07-22) [2022-08-22]..
|
25 |
陈乐乐,黄松,孙金磊,等. 基于BM25算法的问题报告质量检测方法[J]. 清华大学学报(自然科学版), 2020, 60(10):829-836.
|
|
CHEN L L, HUANG S, SUN J L, et al. Bug report quality detection based on the BM25 algorithm[J]. Journal of Tsinghua University (Science and Technology), 2020, 60(10): 829-836.
|
26 |
BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
|
27 |
ZHENG C, SUN Y, WAN S, et al. RLTM: an efficient neural IR framework for long documents[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2020:5457-5463. 10.24963/ijcai.2019/758
|
28 |
DEVLIN J, CHANG W M, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: ACL, 2019:4171-4186. 10.18653/v1/n18-2
|