Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2431-2438.DOI: 10.11772/j.issn.1001-9081.2022071108
Special Issue: 数据科学与技术
• Data science and technology • Previous Articles Next Articles
Ting YANG, Ruoyu MO, Xiujuan ZHANG, Zhousen ZHU
Received:
2022-07-29
Revised:
2022-09-19
Accepted:
2022-09-19
Online:
2023-01-15
Published:
2023-08-10
Contact:
Zhousen ZHU
About author:
YANG Ting, born in 1997, M. S. candidate. Her research interests include intelligent information processing.Supported by:
杨婷, 莫若玉, 张秀娟, 朱洲森
通讯作者:
朱洲森
作者简介:
杨婷(1997—),女,四川广元人,硕士研究生,主要研究方向:智能信息处理基金资助:
CLC Number:
Ting YANG, Ruoyu MO, Xiujuan ZHANG, Zhousen ZHU. Enhancement and expansion of full-text search in relational databases based on lightweight caching strategy[J]. Journal of Computer Applications, 2023, 43(8): 2431-2438.
杨婷, 莫若玉, 张秀娟, 朱洲森. 轻量级缓存策略的关系型数据库全文搜索加强与扩展[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2431-2438.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022071108
id | stem | answer |
---|---|---|
1 | Redis支持的数据类型 | String,Hash,List,Set和Zset |
2 | 消息队列在存取消息时要满足的3个需求 | 消息保序、处理重复的消息和保证消息可靠性 |
3 | Redis中Hash的实现 | Hash类型的底层数据结构是由压缩列表或哈希表实现的 |
Tab. 1 Data from table question
id | stem | answer |
---|---|---|
1 | Redis支持的数据类型 | String,Hash,List,Set和Zset |
2 | 消息队列在存取消息时要满足的3个需求 | 消息保序、处理重复的消息和保证消息可靠性 |
3 | Redis中Hash的实现 | Hash类型的底层数据结构是由压缩列表或哈希表实现的 |
词元 | 倒排列表 |
---|---|
Redis | 〈question: stem;{1,3}〉 |
支持 | 〈question: stem;{1}〉 |
数据 | 〈question: stem;{1}〉,〈question: answer;{1}〉 |
类型 | 〈question: stem;{1}〉,〈question: answer;{3}〉 |
Hash | 〈question: stem;{1}〉,〈question: answer;{1,3}〉 |
Tab. 2 Partial index data structure
词元 | 倒排列表 |
---|---|
Redis | 〈question: stem;{1,3}〉 |
支持 | 〈question: stem;{1}〉 |
数据 | 〈question: stem;{1}〉,〈question: answer;{1}〉 |
类型 | 〈question: stem;{1}〉,〈question: answer;{3}〉 |
Hash | 〈question: stem;{1}〉,〈question: answer;{1,3}〉 |
方法 | 自定义 词典 | 数字 识别 | 人名 识别 | 机构名 识别 | 新词 发现 |
---|---|---|---|---|---|
BaseAnalysis | × | √ | × | × | × |
ToAnalysis | √ | √ | √ | × | × |
NlpAnalysis | √ | √ | √ | √ | √ |
IndexAnalysis | √ | √ | √ | × | × |
Tab. 3 Ansj function statistics
方法 | 自定义 词典 | 数字 识别 | 人名 识别 | 机构名 识别 | 新词 发现 |
---|---|---|---|---|---|
BaseAnalysis | × | √ | × | × | × |
ToAnalysis | √ | √ | √ | × | × |
NlpAnalysis | √ | √ | √ | √ | √ |
IndexAnalysis | √ | √ | √ | × | × |
属性 | 说明 |
---|---|
sql | 原查询语句 |
sCol | 查询字段 |
Condition | 条件表达式(字段、条件运算符和条件值) |
fTable | 表名 |
Tab. 4 Properties of ParseQuery (part)
属性 | 说明 |
---|---|
sql | 原查询语句 |
sCol | 查询字段 |
Condition | 条件表达式(字段、条件运算符和条件值) |
fTable | 表名 |
属性 | 数据类型 | 说明 |
---|---|---|
sqlType | String | 操作类型 |
tbName | String | 表单 |
cols | String[] | 列名集合 |
colValues | String[] | 列值集合 |
condition | Condition | 更新条件 |
Tab. 5 Properties of ParseUpdate (part)
属性 | 数据类型 | 说明 |
---|---|---|
sqlType | String | 操作类型 |
tbName | String | 表单 |
cols | String[] | 列名集合 |
colValues | String[] | 列值集合 |
condition | Condition | 更新条件 |
数据 条数/104 | 数据 大小/MB | FullText索引 | Elasticsearch | 缓存模式 | |||
---|---|---|---|---|---|---|---|
大小/MB | 耗时/s | 大小/MB | 耗时/s | 大小/MB | 耗时/s | ||
1 | 100.67 | 2.15 | 1.6 | 28.8 | 53 | 1.10 | 1.4 |
50 | 625.00 | 12.54 | 3.6 | 105.1 | 290 | 4.90 | 24.0 |
100 | 1 996.80 | 22.56 | 7.8 | 140.8 | 480 | 9.46 | 39.0 |
500 | 4 853.76 | 101.68 | 29.0 | 327.6 | 1 852 | 48.40 | 155.0 |
Tab. 6 Comparison of space and time costs of full-index construction
数据 条数/104 | 数据 大小/MB | FullText索引 | Elasticsearch | 缓存模式 | |||
---|---|---|---|---|---|---|---|
大小/MB | 耗时/s | 大小/MB | 耗时/s | 大小/MB | 耗时/s | ||
1 | 100.67 | 2.15 | 1.6 | 28.8 | 53 | 1.10 | 1.4 |
50 | 625.00 | 12.54 | 3.6 | 105.1 | 290 | 4.90 | 24.0 |
100 | 1 996.80 | 22.56 | 7.8 | 140.8 | 480 | 9.46 | 39.0 |
500 | 4 853.76 | 101.68 | 29.0 | 327.6 | 1 852 | 48.40 | 155.0 |
数据条数/104 | Like匹配 | FullText索引 | Elasticsearch | 缓存模式 |
---|---|---|---|---|
10 | 382 | 41 | 18 | 24 |
50 | 2 146 | 1 094 | 101 | 360 |
100 | 4 140 | 2 055 | 718 | 1 215 |
Tab. 7 Comparison of data retrieval response time
数据条数/104 | Like匹配 | FullText索引 | Elasticsearch | 缓存模式 |
---|---|---|---|---|
10 | 382 | 41 | 18 | 24 |
50 | 2 146 | 1 094 | 101 | 360 |
100 | 4 140 | 2 055 | 718 | 1 215 |
查询文本 | Like匹配 | FullText | Elasticsearch | 缓存模式 |
---|---|---|---|---|
通信电缆 | 0 | 0 | 7 001 | 7 001 |
船舶电气钳工 | 1 000 | 1 000 | 13 000 | 13 000 |
飞机测量 | 0 | 0 | 11 000 | 11 000 |
电信交换 | 1 000 | 1 000 | 2 000 | 2 000 |
Tab. 8 Comparison of hits of multi-keyword retrieval
查询文本 | Like匹配 | FullText | Elasticsearch | 缓存模式 |
---|---|---|---|---|
通信电缆 | 0 | 0 | 7 001 | 7 001 |
船舶电气钳工 | 1 000 | 1 000 | 13 000 | 13 000 |
飞机测量 | 0 | 0 | 11 000 | 11 000 |
电信交换 | 1 000 | 1 000 | 2 000 | 2 000 |
1 | FERNÁNDEZ CANDEL C J, RUIZ D S, GARCÍA-MOLINA J J. A unified metamodel for NoSQL and relational databases[J]. Information Systems, 2022, 104: No.101898. 10.1016/j.is.2021.101898 |
2 | YILMAZEL O, YUREKLI B, YILMAZEL B, et al. Relational databases versus information retrieval systems: a case study[C/OL]// Proceedings of the 2009 IADIS International Conference Applied Computing [2022-05-22].. |
3 | 鲁佳文,严丽. 对象关系数据库到RDF(S)的映射方法[J]. 计算机科学, 2021, 48(10):145-151. 10.11896/jsjkx.200800006 |
LU J W, YAN L. Mapping method from object-relational database to RDF(S)[J]. Computer Science, 2021, 48(10): 145-151. 10.11896/jsjkx.200800006 | |
4 | ČEREŠŇÁK R, KVET M. Comparison of query performance in relational a non-relation databases[J]. Transportation Research Procedia, 2019, 40: 170-177. 10.1016/j.trpro.2019.07.027 |
5 | JIN D W, CHEN G, HAO W N, et al. Whole database retrieval method of general relational database based on Lucene[C]// Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications. Piscataway: IEEE, 2020: 1277-1279. 10.1109/icaica50127.2020.9182496 |
6 | GYŐRÖDI C A, DUMŞE-BURESCU D V, ZMARANDA D R, et al. Performance analysis of NoSQL and relational databases with CouchDB and MySQL for application’s data storage[J]. Applied Sciences, 2020, 10(23): No.8524. 10.3390/app10238524 |
7 | 吴小青. 嵌入式数据库(MySQL)中数据模糊检索方法解析[J]. 软件, 2020, 41(9):115-118. 10.3969/j.issn.1003-6970.2020.09.032 |
WU X Q. Analysis of data fuzzy retrieval methods in embedded database (MySQL)[J]. Software, 2020, 41(9): 115-118. 10.3969/j.issn.1003-6970.2020.09.032 | |
8 | KELEC A, DUJLOVIC I, OBRADOVIC N. One approach for full-text search of files in MongoDB based systems[C]// Proceedings of the 18th International Symposium INFOTEH-JAHORINA. Piscataway: IEEE, 2019: 1-6. 10.1109/infoteh.2019.8717777 |
9 | CHAITANYA B S S K, REDDY D A K, CHANDRA B P S E, et al. Full-text search using database index[C]// Proceedings of the 5th International Conference on Computing, Communication, Control and Automation. Piscataway: IEEE, 2019: 1-5. 10.1109/iccubea47591.2019.9128683 |
10 | 吴东明,王丽娟. 基于MySQL的中文全文搜索研究[J]. 电脑知识与技术, 2017, 13(32):7-8. |
WU D M, WANG L J. Research on Chinese full text search based on MySQL[J]. Computer Knowledge and Technology, 2017, 13(32): 7-8. | |
11 | SHAH N, WILLICK D, MAGO V. A framework for social media data analytics using Elasticsearch and Kibana[J]. Wireless Networks, 2022, 28(3): 1179-1187. 10.1007/s11276-018-01896-2 |
12 | MARINŠEK B. Analiza integracije iskalnega mehanizma Elasticsearch na primeru sistema eArhiv Pošte Slovenije[D]. Maribor : University of Maribor, 2018. |
13 | TAWARE U, SHAIKH N. Heterogeneous database system for faster data querying using Elasticsearch[C]// Proceedings of the 4th International Conference on Computing Communication Control and Automation. Piscataway: IEEE, 2018: 1-4. 10.1109/iccubea.2018.8697437 |
14 | ZMARANDA D R, MOISI C I, GYŐRÖDI C A, et al. An analysis of the performance and configuration features of MySQL document store and Elasticsearch as an alternative backend in a data replication solution[J]. Applied Sciences, 2021, 11(24): No.11590. 10.3390/app112411590 |
15 | 刘泽恩. 基于Elasticsearch的垂直搜索推荐系统研究[D]. 武汉:湖北大学, 2021:56-58. 10.47056/0365-9615-2022-174-9-361-368 |
LIU Z E. Research of vertical search recommendation system based on Elasticsearch[D]. Wuhan: Hubei University, 2021:56-58. 10.47056/0365-9615-2022-174-9-361-368 | |
16 | 王伟,魏乐,刘文清,等. 基于ElasticSearch的分布式全文搜索系统[J]. 电子科技, 2018, 31(8):56-59, 65. |
WANG W, WEI L, LIU W Q, et al. Design and implementation on distributed full-text search system based on ElasticSearch[J]. Electronic Science and Technology, 2018, 31(8): 56-59, 65. | |
17 | 梁爽,赵宝军,张海霞. 基于Elasticsearch的海量数据入库及快速检索方法研究[J]. 测绘与空间地理信息, 2020, 43(12):74-76. 10.3969/j.issn.1672-5867.2020.12.019 |
LIANG S, ZHAO B J, ZHANG H X. Research on storage and quick retrieval of massive data based on Elasticsearch[J]. Geomatics & Spatial Information Technology, 2020, 43(12): 74-76. 10.3969/j.issn.1672-5867.2020.12.019 | |
18 | BHANDARKAR S, BN N. A full-text-based search algorithm vs Elasticsearch[J]. Studies in Indian Place Names, UGC Care Journal, 2020, 40(74): 2168-2171. |
19 | 毛海宇,舒继武,李飞,等. 内存计算研究进展[J]. 中国科学:信息科学, 2021, 51(2):173-205. 10.1360/ssi-2020-0037 |
MAO H Y, SHU J W, LI F, et al. Development of progressing-in-memory[J]. SCIENTIA SINICA Informationis, 2021, 51(2):173-205. 10.1360/ssi-2020-0037 | |
20 | 葛微,罗圣美,周文辉,等. HiBase:一种基于分层式索引的高效HBase查询技术与系统[J]. 计算机学报, 2016, 39(1):140-153. 10.11897/SP.J.1016.2016.00140 |
GE W, LUO S M, ZHOU W H, et al. HiBase: a hierarchical indexing mechanism and system for efficient HBase query[J]. Chinese Journal of Computers, 2016, 39(1):140-153. 10.11897/SP.J.1016.2016.00140 | |
21 | ZULFA M I, FADLI A, WARDHANA A W. Application caching strategy based on in-memory using Redis server to accelerate relational data access[J]. Jurnal Teknologi dan Sistem Komputer, 2020, 8(2): 157-163. 10.14710/jtsiskom.8.2.2020.157-163 |
22 | SU Q K, GAO X, ZHANG X, et al. A novel cache strategy leveraging Redis with filters to speed up queries[C]// Proceedings of the SPIE 12162, International Conference on High Performance Computing and Communication. Bellingham, WA: SPIE, 2022: No.121620O. 10.1117/12.2628119 |
23 | EDDELBUETTEL D. A brief introduction to Redis[EB/OL]. (2022-03-13) [2022-04-10].. |
24 | MATALLAH H, BELALEM G, BOUAMRANE K. Evaluation of NoSQL databases: MongoDB, Cassandra, HBase, Redis, Couchbase, OrientDB[J]. International Journal of Software Science and Computational Intelligence, 2020, 12(4): 71-91. 10.4018/ijssci.2020100105 |
25 | CHEN X, WANG F J, XU J M, et al. A distributed cache system based on Redis for high-speed railway catenary monitoring system[C]// Proceedings of the 2020 Chinese Automation Congress. Piscataway: IEEE, 2020: 2048-2053. 10.1109/cac51589.2020.9326531 |
26 | 黄立冬. 分布式搜索引擎中关键词倒排索引方法仿真[J]. 计算机仿真, 2019, 36(8):380-383, 398. 10.3969/j.issn.1006-9348.2019.08.079 |
HUANG L D. Simulation of keyword inverted index method in distributed search engine[J]. Computer Simulation, 2019, 36(8): 380-383, 398. 10.3969/j.issn.1006-9348.2019.08.079 | |
27 | 王莱,周腾达,王正飞,等. 基于布隆过滤器和B+树构建倒排索引的电子病历密文搜索[J]. 计算机应用与软件, 2021, 38(4):276-280, 333. |
WANG L, ZHOU T D, WANG Z F, et al. Search on encrypted electronic medical records using inverted index based on bloom filter and B+tree[J]. Computer Applications and Software, 2021, 38(4): 276-280, 333. | |
28 | 李子骅. Redis入门指南[M]. 北京:人民邮电出版社, 2013:19-57. |
LI Z H. Redis Getting Started Guide[M]. Beijing: Posts & Telecom Press, 2013:19-57. | |
29 | 霍晨鹏. 科技专家遴选系统关键技术研究与实现[D]. 广州:华南理工大学, 2020:9-10. |
HUO C P. The research and implementation of key techniques in science-technology expert selection system[D]. Guangzhou: South China University of Technology, 2020:9-10. | |
30 | 邱涛,王斌,舒昭维,等. 面向关系数据库的智能索引调优方法[J]. 软件学报, 2020, 31(3):634-647. |
QIU T, WANG B, SHU Z W, et al. Intelligent index tuning approach for relational databases[J]. Journal of Software, 2020, 31(3): 634-647. | |
31 | 李燚,顾乃杰,黄增士,等. Redis集群可靠性的研究与优化[J]. 计算机工程, 2018, 44(5):40-46. |
LI Y, GU N J, HUANG Z S, et al. Research and optimization of Redis cluster reliability[J]. Computer Engineering, 2018, 44(5): 40-46. | |
32 | TIAN Q, LI J M, ZHAO S, et al. A cache consistency protocol with improved architecture[J]. International Journal of Performability Engineering, 2018, 14(1):178-185. |
33 | ÖZSU M T, VALDURIEZ P. Principles of Distributed Database Systems[M]. 2nd ed. Upper Saddle River, NJ: Prentice Hall, 1999:247-248. |
34 | MEGIDDO N, MODHA D S. Outperforming LRU with an adaptive replacement cache algorithm[J]. Computer, 2004, 37(4): 58-65. 10.1109/mc.2004.1297303 |
35 | MELNIK S, GUBAREV A, LONG J J, et al. Dremel: interactive analysis of web-scale datasets[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2):330-339. 10.14778/1920841.1920886 |
36 | 瞿龙俊. 基于HBase的交通流数据实时存储与查询优化方案的设计与实现[D]. 镇江:江苏大学, 2017:32-33. |
QU L J. Design and implementation of HBase-based traffic stream data real-time storage and query optimization[D]. Zhenjiang: Jiangsu University, 2017:32-33. |
[1] | Wentao JIANG, Wanxuan LI, Shengchong ZHANG. Correlation filtering based target tracking with nonlinear temporal consistency [J]. Journal of Computer Applications, 2024, 44(8): 2558-2570. |
[2] | Zhuo ZHANG, Huazhu CHEN. Deep subspace clustering based on multiscale self-representation learning with consistency and diversity [J]. Journal of Computer Applications, 2024, 44(2): 353-359. |
[3] | Tingting GAO, Zhongyuan YAO, Miao JIA, Xueming SI. Overview of on-chain and off-chain consistency protection technologies [J]. Journal of Computer Applications, 2024, 44(12): 3658-3668. |
[4] | Shuaihua ZHANG, Shufen ZHANG, Mingchuan ZHOU, Chao XU, Xuebin CHEN. Malicious traffic detection model based on semi-supervised federated learning [J]. Journal of Computer Applications, 2024, 44(11): 3487-3494. |
[5] | Qiujie LIU, Yuan WAN, Jie WU. Deep bi-modal source domain symmetrical transfer learning for cross-modal retrieval [J]. Journal of Computer Applications, 2024, 44(1): 24-31. |
[6] | Jianwen GAN, Yan CHEN, Peng ZHOU, Liang DU. Clustering ensemble algorithm with high-order consistency learning [J]. Journal of Computer Applications, 2023, 43(9): 2665-2672. |
[7] | Ruiqi WANG, Shujuan JI, Ning CAO, Yajie GUO. Semi-supervised fake job advertisement detection model based on consistency training [J]. Journal of Computer Applications, 2023, 43(9): 2932-2939. |
[8] | Yiheng LI, Chenxi DU, Yanyan YANG, Xiangyu LI. Feature selection algorithm for imbalanced data based on pseudo-label consistency [J]. Journal of Computer Applications, 2022, 42(2): 475-484. |
[9] | WU Yue, LUO Jiangtao, LIU Rui, HU Zhongyin. Video similarity detection method based on perceptual hashing and dicing [J]. Journal of Computer Applications, 2021, 41(7): 2070-2075. |
[10] | TANG Chen, HUANG Guorui, JIN Peiquan. NVM-LH: non-volatile memory-friendly linear hash index [J]. Journal of Computer Applications, 2021, 41(3): 623-629. |
[11] | JIANG Kun, LIU Zheng, ZHU Lei, LI Xiaoxing. Fixed word-aligned partition compression algorithm of inverted list based on directed acyclic graph [J]. Journal of Computer Applications, 2021, 41(3): 727-732. |
[12] | Zhenjun PAN, Cheng LIANG, Huaxiang ZHANG. Robust multi-view subspace clustering based on consistency graph learning [J]. Journal of Computer Applications, 2021, 41(12): 3438-3446. |
[13] | Xiaoling SUN, Guang YANG, Yanping SHEN, Qiuge YANG, Tao CHEN. Searchable encryption scheme based on splittable inverted index [J]. Journal of Computer Applications, 2021, 41(11): 3288-3294. |
[14] | CHENG Yaling, BAI Zhi, TAN Aiping. Multi-focus image fusion method based on guided filtering and difference image [J]. Journal of Computer Applications, 2021, 41(1): 220-224. |
[15] | ZHU Mengdi, SHU Yong’an. Verification of control-data plane consistency in software defined network [J]. Journal of Computer Applications, 2020, 40(6): 1751-1754. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||