Search Result

Select

Information retrieval method based on multi-granularity semantic fusion

Zhengyu ZHAO, Jing LUO, Xinhui TU

Journal of Computer Applications 2024, 44 (6): 1775-1780. DOI: 10.11772/j.issn.1001-9081.2023050646

Abstract （203）

HTML （11）

PDF （1551KB）（186）

Save

Information Retrieval （IR） is a process that organizes and processes information using specific techniques and methods to meet users’ information needs. In recent years， dense retrieval methods based on pre-trained models have achieved significant success. However， these methods only utilize vector representations of text and words to calculate the relevance between query and document， ignoring the semantic information at the phrase level. To address this issue， an IR method called MSIR （Multi-Scale Information Retrieval） was proposed. IR performance was enhanced by integrating semantic information of different granularities from the query and the document. First， semantic units of three different granularities — word， phrase， and text — were constructed in the query and the document. Then， the pre-trained model was used to encode these three semantic units separately to obtain their semantic representations. Finally， these semantic representations were used to calculate the relevance between the query and the document. Comparison experiments were conducted on three classic datasets of different sizes， including Corvid-19， TREC2019 and Robust04. Compared with ColBERT （ranking model based on Contextualized late interaction over BERT （Bidirectional Encoder Representation from Transformers））， MSIR shows an approximately 8% improvement in the P@10， P@20， NDCG@10 and NDCG@20 indicators on Robust04 dataset， as well as some improvements on Corvid-19 and TREC2019 datasets. Experimental results demonstrate that MSIR can effectively integrate multi-granularity semantic information， thereby enhancing retrieval accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Pseudo relevance feedback method for dense retrieval

Wenhao HU, Jing LUO, Xinhui TU

Journal of Computer Applications 2023, 43 (4): 1036-1042. DOI: 10.11772/j.issn.1001-9081.2022030480

Abstract （356）

HTML （16）

PDF （1463KB）（150）

Save

Pseudo Relevance Feedback （PRF） mechanism is an automated Query Expansion （QE） technology that uses the original query and the information contained in the top N documents in the initial retrieval to build more accurate queries. It can further improve the performance of retrieval systems. However， the existing PRF methods for dense retrieval have two problems： lack of semantic information due to text truncation， and high time complexity in retrieval stages. Aiming at these problems， an PRF method based on paragraph-level granularity and can be used in dense retrieval for long texts， namely Dense-PRD， was proposed. Firstly， the embeddings of relevant paragraphs from top N documents of the initial retrieval were obtained by semantic distance calculation. Secondly， the QE term embeddings were obtained by average polling of the relevant paragraph embeddings. Thirdly， new query embeddings were constructed by combining the original query embeddings and QE term embeddings according to their weights. Finally， the final retrieval results were obtained according to new query embeddings. In experiments of comparing Dense-PRF with baseline models on two classic long text test datasets of Robust04 and WT2G， compared to model RepBERT+BM25， Dense-PRF has the accuracy and Normalized Discounted Cumulative Gain （NDCG） index of the top 20 documents improved by 1.66， 1.32 percentage points and 2.30， 1.91 percentage points. Experimental results demonstrate that Dense-PRF can effectively alleviate the mismatches between queries and document vocabularies and improve the retrieval accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Image scrambling algorithm based on cloud model

FAN Tiesheng ZHANG Zhongqing SUN Jing LUO Xuechun LU Guiqiang ZHANG Pu

Journal of Computer Applications 2013, 33 (09): 2497-2500. DOI: 10.11772/j.issn.1001-9081.2013.09.2497

Abstract （610）

PDF （704KB）（471）

Save

Concerning the deficiency of the digital image scrambling algorithm in double scrambling, an image scrambling algorithm based on cloud model was proposed. The algorithm used the function value generated by the three-dimensional cloud model to change the positions and values of the image pixels, to achieve a double scrambling. The experimental verification as well as quantitative and qualitative analysis show that the scrambling image renders white noise and realizes the image scrambling. There is no security issue on cyclical recovery. The algorithm can quickly achieve the desired effect, resistant to shear, plus noise, filtering and scaling attacks. This proves that the algorithm is effective and reasonable, and also can be better applied to the image scrambling.