Consensus mechanism is the core of blockchain technology, and consensus algorithms are the specific technical means to achieve this mechanism. Consensus mechanism ensures consistency and correctness of blockchain database, and is crucial to system performance of the blockchain such as security, scalability and throughput. Therefore, firstly, from perspective of underlying storage of blockchain technology, consensus algorithms were divided into two categories: chain and graph, and working principles, optimization strategies and typical representative algorithms of different types of different categories of consensus algorithms were classified and reviewed. Then, in view of complex application background of blockchain, the mainstream improved algorithms of chain structure and graph structure consensus algorithms were sorted out respectively and comprehensively, and main line of consensus algorithm development was given, especially in terms of security, the algorithms were compared deeply, and advantages, disadvantages and possible security risks of them were pointed out. Finally, from multiple dimensions such as security, scalability, fairness and incentive strategy, challenges faced by the current blockchain consensus algorithms were discussed in depth, and their development trends were prospected, so as to provide theoretical reference for researchers.
Aiming at the problem of model compatibility caused by modality absence in real complex scenes, an emotion recognition method was proposed, supporting input from any available modality. Firstly, during the pre-training and fine-tuning stages, a modality-random-dropout training strategy was adopted to ensure model compatibility during reasoning. Secondly, a spatio-temporal masking strategy and a feature fusion strategy based on cross-modal attention mechanism were proposed respectively, so as to reduce risk of the model over-fitting and enhance cross-modal feature fusion effects. Finally, to solve the noise label problem brought by inconsistent emotion labels across various modalities, an adaptive denoising strategy based on multi-prototype clustering was proposed. In the strategy, class centers were set for different modalities, and noisy labels were removed by comparing the consistency between clustering categories of each modality’ features and their labels. Experimental results show that on a self-built dataset, compared with the baseline Audio-Visual Hidden unit Bidirectional Encoder Representation from Transformers (AV-HuBERT), the proposed method improves the Weighted Average Recall rate (WAR) index by 6.98 percentage points of modality alignment reasoning, 4.09 percentage points while video modality is absent, and 33.05 percentage points while audio modality is absent; compared with AV-HuBERT on public video dataset DFEW, the proposed method achieves the highest WAR, reaching 68.94%.
Aiming at the problem of complex traffic intersection images, the difficulty in detecting small targets, and the tendency for occlusion between targets, as well as the color distortion, noise, and blurring caused by changes in weather and lighting, a multi-target detection algorithm ITD-YOLOv9(Intersection Target Detection-YOLOv9) for traffic intersection images based on YOLOv9 (You Only Look Once version 9) was proposed. Firstly, the CoT-CAFRNet (Chain-of-Thought prompted Content-Aware Feature Reassembly Network) image enhancement network was designed to improve image quality and optimize input features. Secondly, the iterative Channel Adaptive Feature Fusion (iCAFF) module was added to enhance feature extraction for small targets as well as overlapped and occluded targets. Thirdly, the feature fusion pyramid structure BiHS-FPN (Bi-directional High-level Screening Feature Pyramid Network) was proposed to enhance multi-scale feature fusion capability. Finally, the IF-MPDIoU (Inner-Focaler-Minimum Point Distance based Intersection over Union) loss function was designed to focus on key samples and enhance generalization ability by adjusting variable factors. Experimental results show that on the self-made dataset and SODA10M dataset, ITD-YOLOv9 algorithm achieves 83.8% and 56.3% detection accuracies and 64.8 frame/s and 57.4 frame/s detection speeds, respectively; compared with YOLOv9 algorithm, the detection accuracies are improved by 3.9 and 2.7 percentage points respectively. It can be seen that the proposed algorithm realizes multi-target detection at traffic intersections effectively.
To address the limitations of the existing Large Language Models (LLMs) in processing cross-domain knowledge, updating real-time academic information, and ensuring output quality, ScholatGPT, a scholar LLM based on Academic Social Networks (ASNs), was proposed. In ScholatGPT, the abilities of precise semantic retrieval and dynamic knowledge update were enhanced by integrating Knowledge-Graph Augmented Generation (KGAG) and Retrieval-Augmented Generation (RAG), and optimization and fine-tuning were used to improve the generation quality of academic text. Firstly, a scholar knowledge graph was constructed based on relational data from SCHOLAT, with LLMs employed to enrich the graph semantically. Then, a KGAG-based retrieval model was introduced, combined with RAG to realize multi-path hybrid retrieval, thereby enhancing the model’s precision in search. Finally, fine-tuning techniques were applied to optimize the model’s generation quality in academic fields. Experimental results demonstrate that ScholatGPT achieves the precision of 83.2% in academic question answering tasks, outperforming GPT-4o and AMiner AI by 69.4 and 11.5 percentage points, and performs well in all the tasks such as scholar profiling, representative work identification, and research field classification. Furthermore, ScholatGPT obtains stable and competitive results in answer relevance, coherence, and readability, achieving a good balance between specialization and readability. Additionally, ScholatGPT-based intelligent applications such as scholar think tank and academic information recommendation system improve academic resource acquisition efficiency effectively.
Commonsense Question Answering (CQA) aims to use commonsense knowledge to answer questions described in natural language automatically to obtain accurate answer, and it belongs to intelligent question answering field. Typically, this task demands background commonsense knowledge to enhance the model in problem-solving capability. While most related methods rely on extracting and utilizing commonsense from textual data, however, commonsense is often implicit and not always represented in the text directly, which affects the application range and effectiveness of these methods. Therefore, a cross-modal contrastive learning-based CQA model was proposed to fully utilize cross-modal information for enriching the expression of commonsense knowledge. Firstly, a cross-modal commonsense representation module was designed to integrate the commonsense bases and a cross-modal large model, thereby obtaining a cross-modal commonsense representation. Secondly, in order to enhance the ability of the model to distinguish among different options, contrastive learning was carried out on the cross-modal representations of problems and options. Finally, the softmax layer was used to generate relevance scores for the problem option pairs, and the option with the highest score was taken as the final predicted answer. Experimental results on public datasets CommonSenseQA (CSQA) and OpenBookQA (OBQA) show that compared to DEKCOR (DEscriptive Knowledge for COmmonsense question answeRing), the proposed model is improved by 1.46 and 0.71 percentage points respectively in accuracy.
Since the camouflaged object is highly similar to the background, it is easily confused by background features, making it difficult to distinguish boundary information and extract object features. Current mainstream Camouflaged Object Detection (COD) algorithms mainly study the camouflage object itself and its boundaries, ignoring relationship between the image background and the object, and the detection results are not ideal in complex scenes. To this end, in order to explore potential connection between background and object, an camouflaged object detection algorithm by mining boundaries and background was proposed, called I2DNet (Indirect to Direct Network). The algorithm consists of five parts: in the encoder, the initial raw data was processed; in the Boundary-guided feature Extracting and Mining Framework (BEMF), more refined boundary features were extracted through feature processing and feature mining; in the Latent-feature Exploring Framework based on Background guidance (LEFB), more salient features were explored through multi-scale convolution while based on attention, the Hybrid Attention Module (HAM) was designed to enhance selection of background features; in the Information Supplement Module (ISM), the detailed information lost during feature processing was made up; in the Multi-task Co-segmentation Decoder (MCD), the features extracted from different tasks and modules were fused efficiently and the final prediction results were output. Experimental results show that the proposed algorithm is better than the other 15 state-of-the-art models on three widely used datasets; especially on CAMO dataset, the proposed algorithm has the mean absolute error index dropped to 0.042.
In the field of Human Pose Estimation (HPE), heatmap-based methods suffer from the problems of big quantization error, high computational complexity, and the need to post-process the heatmap. To address the above issues, with SimCC method of coordinate regression as a baseline, a lightweight HPE model based on Merge State Space Model (MSSM) was proposed, namely Lite-SimCC. Firstly, ShuffleNet V2 was adopted as the backbone network to replace the original HRNet (High-Resolution Net), which simplified to a structure of single-branch form and realized lightweight model. Secondly, to reduce the loss of precision, a large kernel convolution was introduced to extract global feature information. Thirdly, an MSSM was further designed to handle both local and full long sequence features, so as to enhance representational ability of the key points. Finally, a soft-label based loss function was proposed to replace the traditional one-hot loss calculation method. Experimental results show that compared with the baseline method SimCC, Lite-SimCC has the parameters decreased by 87.1%, and the Average Precision (AP) improved by 1.4% on COCO2017 test set, and it is proved on MPII dataset that Lite-SimCC reduces parameters of the model effectively while guaranteeing detection precision.
Reptile Search Algorithm (RSA) has strong global exploration ability, but its exploitation ability is relatively weak and it cannot converge well in the late stage of the iteration. To address the above issues, combined with the Teaching-Learning-Based Optimization (TLBO) algorithm, the Beetle Antennae Search (BAS) algorithm based on quadratic interpolation and the lens opposite-based learning strategy, Reptile Search Algorithm based on Multi-Hunting Coordination Strategy (MHCS-RSA) was proposed. In MHCS-RSA, the position update formula of the hunting cooperation in the encircling phase (global exploration) and hunting phase (local exploitation) of RSA was retained. And in the hunting coordination of the hunting phase, the learning phase of TLBO algorithm and the BAS based on quadratic interpolation were integrated to perform position update in order to improve the exploitation ability and convergence ability of the algorithm. In addition, the lens opposite-based learning strategy was introduced to enhance the algorithm ability of jumping out of the local optimum. Experimental results on CEC 2020 test functions show that MHCS-RSA has good optimization, convergence abilities and robustness. By solving the tension/compression spring design problem and the speed reducer design problem, the validity of MHCS-RSA is further verified in solving practical problems.
In response to the issue of significantly reduced detection performance of face detection models in low-light conditions, a low-light face detection method based on image enhancement was developed. Firstly, image enhancement techniques were applied to preprocess low-light images, enhancing the effective facial features. Secondly, an attention mechanism was introduced after the model’s backbone network to increase the network’s focus on facial regions and reduce the negative impact of non-uniform lighting and noise simultaneously. Furthermore, an attention-based bounding box loss function — Wise Intersection over Union (WIoU) was incorporated to improve the network’s accuracy in detecting low-quality faces. Finally, a more efficient feature fusion module was used to replace the original model structure. Experimental results on the low-light face dataset DARK FACE compared to the original YOLOv7 model indicate that the improved method achieves an increase of 2.4 percentage points in average detection precision AP@0.5 and an increase of 1.4 percentage points in mean value of average precision AP@0.5:0.95, all without introducing additional parameters or computational complexity. Additionally, the results on two other low-light face datasets confirm the effectiveness and robustness of the proposed method, approving the applicability of the method for low-light face detection in diverse scenarios.
Deep learning has further promoted the research of gait recognition algorithms, but there are still some problems, such as ignoring the detailed information extracted by shallow networks, and difficulty in fusing unlimited time-space information of gait videos. In order to effectively utilize shallow features and fuse time-space features, a cross-view gait recognition algorithm based on multi-layer refined feature fusion was proposed. The proposed algorithm was consisted of two parts: Edge Motion Capture Module (EMCM) was used to extract edge motion features containing temporal information, Multi-layer refined Feature Extraction Module (MFEM) was used to extract multi-layer fine features containing global and local information at different granularities. Firstly, EMCM and MFEM were used to extract multi-layer fine features and edge motion features. Then, the features extracted from the two modules were fused to obtain discriminative gait features. Finally, comparative experiments were conducted in multiple scenarios on the public datasets CASIA-B and OU-MVLP. The average recognition accuracy on CASIA-B can reach 89.9%, which is improved by 1.1 percentage points compared with GaitPart. The average recognition accuracy is improved by 3.0 percentage points over GaitSet in the 90-degree view of the OU-MVLP dataset. The proposed algorithm can effectively improve the accuracy of gait recognition in many situations.
In response to several issues related to the interference of multiple aspect words in the syntactic dependency tree, redundant information caused by invalid words and punctuation marks, as well as weak correlations between aspect words and corresponding sentiment words, an aspect-level sentiment analysis model combining Strong Association Dependencies and Concise Syntax (SADCS) was proposed. Firstly, a sentiment Part-Of-Speech (POS) list was constructed to enhance the association between aspect words and corresponding sentiments. Then, a joint list incorporating POS list and dependency relationships was constructed to eliminate redundant information of invalid words and punctuation marks in the optimized dependency tree. Next, optimized dependency tree was combined with a Graph ATtention network (GAT) to model and extract contextual features. Finally, contextual feature information and the feature information of dependency relationship types were learned and fused to enhance the feature representation, enabling the classifier to efficiently predict the sentiment polarity of each aspect word. The proposed model was experimentally analyzed on four public datasets. Compared with the DMF-GAT-BERT (Dynamic Multichannel Fusion mechanism based on the GAT and BERT (Bidirectional Encoder Representations from Transformers)) model, the accuracy of the proposed model increased by 1.48, 1.81, 0.09 and 0.44 percentage points, respectively. Experimental results demonstrate that the proposed model effectively enhances the association between aspect words and sentiment words, resulting in more accurate prediction of aspect word sentiment polarity.
Concerning the problems that the size of the trapdoor base is too large and the public key of ring members needs digital certificate authentication in the lattice-based ring signature schemes, an NTRU (Number Theory Research Unit) lattice-based Identity-Based Ring Signature scheme (NTRU-IBRS) was proposed. Firstly, the trapdoor generation algorithm on NTRU lattice was used to generate the system master public-private key pairs. Secondly, the master private key was taken as the trapdoor information and the one-way function was reversely operated to obtain the private key of every ring member. Finally, based on the Small Integer Solution (SIS) problem, the ring signature was generated by using the rejection sampling technology. Security analysis shows that NTRU-IBRS is anonymous and existentially unforgeable under adaptive chosen message and chosen identity attacks. Performance analysis and experimental simulation show that compared with the ring signature scheme on ideal lattice and the identity-based linkable ring signature scheme on NTRU lattice: in storage overhead, NTRU-IBRS has the system private key length decreased by 0 to 99.6% and the signature private key length decreased by 50.0% to 98.4%; and in time overhead, NTRU-IBRS has the total time overhead reduced by 15.3% to 21.8%. Simulation results of applying NTRU-IBRS to the dynamic Internet of Vehicles (IoV) scenario show that NTRU-IBRS can ensure privacy security and improve communication efficiency during vehicle interaction at the same time.
Most of entity relationships in the real world cannot be represented by simple binary relations, and hypergraph can represent the n-ary relations among entities well. Therefore, definitions of hypergraph clique and maximal clique were proposed, and the exact algorithm and approximation algorithm for searching hypergraph maximal clique were given. First, the reason why the existing maximal clique searching algorithms on ordinary graphs cannot be applied to hypergraphs directly was analyzed. Then, based on the characteristics of hypergraph and the definition of maximal clique, a novel data structure for preserving the adjacency relations among hyperpoints was proposed, and an accurate maximal clique searching algorithm on hypergraph was proposed. As the running of the exact algorithm is slow, the pruning idea of pivots was combined with, the number of recursive layers was reduced, and an approximation maximal clique searching algorithm on hypergraph was proposed. Experimental results on multiple real hypergraph datasets show that under the premise finding most maximal cliques, the proposed approximation algorithm improves the search speed. When the number of test hypergraph cliques on 3-uniform hypergraph is 22, the acceleration ratio reaches over 1 000.
Focused on the issue that the current Aspect-Based Sentiment Analysis (ABSA) models rely too much on the syntactic dependency tree with relatively sparse relationships to learn feature representations, which leads to the insufficient ability of the model to learn local information, an ABSA model fused with multi-window local information called MWGAT (combining Multi-Window local information and Graph ATtention network) was proposed. Firstly, the local contextual features were learned through the multi-window local feature learning mechanism, and the potential local information contained in the text was mined. Secondly, Graph ATtention network (GAT), which can better understand the syntactic dependency tree, was used to learn the syntactic structure information represented by the syntactic dependency tree, and syntax-aware contextual features were generated. Finally, these two types of features representing different semantic information were fused to form the feature representation containing both the syntactic information of syntactic dependency tree and the local information, so that the sentiment polarities of aspect words were discriminated by the classifier efficiently. Three public datasets, Restaurant, Laptop, and Twitter were used for experiment. The results show that compared with the T-GCN (Type-aware Graph Convolutional Network) model combined with the syntactic dependency tree, the proposed model has the Macro-F1 score improved by 2.48%, 2.37% and 0.32% respectively. It can be seen that the proposed model can mine potential local information effectively and predict the sentiment polarities of aspect words more accurately.
Aspect-oriented Fine-grained Opinion Extraction (AFOE) extracts aspect terms and opinion terms from reviews in the form of opinion pairs or additionally extracts sentiment polarities of aspect terms on the basis of the above to form opinion triplets. Aiming at the problem of neglecting correlation between the opinion pairs and contexts, an aspect-oriented Adaptive Span Feature-Grid Tagging Scheme (ASF-GTS) model was proposed. Firstly, BERT (Bidirectional Encode Representation from Transformers) model was used to obtain the feature representation of the sentence. Then, the correlation between the opinion pair and local context was enhanced by the Adaptive Span Feature (ASF) method. Next, Opinion Pair Extraction (OPE) was transformed into a uniform grid tagging task by Grid Tagging Scheme (GTS). Finally, the corresponding opinion pairs or opinion triplet were generated by the specific decoding strategy. Experiments were carried out on four AFOE benchmark datasets adaptive to the task of opinion tuple extraction. The results show that compared with GTS-BERT (Grid Tagging Scheme-BERT) model, the proposed model has the F1-score improved by 2.42% to 7.30% and 2.62% to 6.61% on opinion pair or opinion triplet tasks, respectively. The proposed model can effectively reserve the sentiment correlation between opinion pair and context, and extract opinion pairs and their sentiment polarities more accurately.
To efficiently and automatically mine threat intelligence entities and their relations in open source heterogeneous big data, a Threat Intelligence Entity Relation Extraction (TIERE) method was proposed. Firstly, a data preprocessing method was studied and presented by analyzing the characteristics of the open source cyber security reports. Then, an Improved BootStrapping-based Named Entity Recognition (NER-IBS) algorithm and a Semantic Role Labeling-based Relation Extraction (RE-SRL) algorithm were developed for the problems of high text complexity and small standard dataset in cyber security field. Initial seeds were constructed by using a small number of samples and rules, the entities in the unstructured text were mined through iterative training, and the relations between entities were mined by the strategy of constructing semantic roles. Experimental results show that on the few-shot cyber security information extraction dataset, the F1 value of the NER-IBS algorithm is 84%, which is 2 percentage points higher than that of the RDF-CRF (Regular expression and Dictionary combined with Feature templates as well as Conditional Random Field) algorithm, and the F1 value of RE-SRL algorithm for uncategorized relation extraction is 94%, proving that TIERE method has efficient entity and relation extraction capability.
A multi-stage low-illuminance image enhancement network based on attention mechanism was proposed to solve the problem that the details of low-illuminance images are lost due to the overlapping of image contents and large brightness differences in some regions during the enhancement process of low-illuminance images. At the first stage, an improved multi-scale fusion module was used to perform preliminary image enhancement. At the second stage, the enhanced image information of the first stage was cascaded with the input of this stage, and the result was used as the input of the multi-scale fusion module in this stage. At the third stage, the enhanced image information of the second stage was cascaded with the input of the this stage, and the result was used as the input of the multi-scale fusion module in this stage. In this way, with the use of multi-stage fusion, not only the brightness of the image was improved adaptively, but also the details were retained adaptively. Experimental results on open datasets LOL and SICE show that compared to the algorithms and networks such as MSR (Multi-Scale Retinex) algorithm, gray Histogram Equalization (HE) algorithm and RetinexNet (Retina cortex Network), the proposed network has the value of Peak Signal-to-Noise Ratio (PSNR) 11.0% to 28.9% higher, and the value of Structural SIMilarity (SSIM) increased by 6.8% to 46.5%. By using multi-stage method and attention mechanism to realize low-illuminance image enhancement, the proposed network effectively solves the problems of image content overlapping and large brightness difference, and the images obtained by this network are more detailed and subjective recognizable with clearer textures.
The aim of Network Representation Learning (NRL) is to learn the potential and low-dimensional representation of network vertices, and the obtained representation is applied for downstream network analysis tasks. The existing NRL algorithms using autoencoder extract information about node attributes insufficiently and are easy to generate information bias, which affects the learning effect. Aiming at these problems, a Network Representation learning model based on Autoencoder with optimized Graph Structure (NR-AGS) was proposed to improve the accuracy by optimizing the graph structure. Firstly, the structure and attribute information were fused to generate the joint transition matrix, thereby forming the high-dimensional representation. Secondly, the low-dimensional embedded representation was learnt by an autoencoder. Finally, the deep embedded clustering algorithm was introduced during learning to form a self-supervision mechanism in the processes of autoencoder training and the category distribution division of nodes. At the same time, the improved Maximum Mean Discrepancy (MMD) algorithm was used to reduce the gap between distribution of the learnt low-dimensional embedded representation and distribution of the original data. Besides, in the proposed model, the reconstruction loss of the autoencoder, the deep embedded clustering loss and the improved MMD loss were used to optimize the network jointly. NR-AGS was applied to the learning of three real datasets, and the obtained low-dimensional representation was used for downstream tasks such as node classification and node clustering. Experimental results show that compared with the deep graph representation model DNGR (Deep Neural networks for Graph Representations), NR-AGS improves the Micro-F1 score by 7.2, 13.5 and 8.2 percentage points at least and respectively on Cora, Citeseer and Wiki datasets. It can be seen that NR-AGS can improve the learning effect of NRL effectively.
Focusing on the target tracking problem in the docking stage of autonomous aerial refueling, a joint detection and tracking algorithm of target in aerial refueling scenes was proposed. In the algorithm, CenterTrack network with integrated detection and tracking was adopted to track the drogue. In view of the large computational cost and long training time, this network was improved from two aspects: model design and network optimization. Firstly, dilated convolution group was introduced into the tracker to make the network weight lighter without changing the size of the receptive field. At the same time, the convolutional layer of the output part was replaced with depthwise separable convolutional layer to reduce the network parameters and computational cost. Then, the network was further optimized to make it converge to a stable state faster by combining Stochastic Gradient Descent (SGD) method with Adaptive moment estimation (Adam) algorithm. Finally, videos of real-world aerial refueling scenes and simulations on the ground were made into dataset with the corresponding format for experimental verification. The training and testing were carried out on the self-built drogue dataset and MOT17 (Multiple Object Tracking 17) public dataset respectively, and the effectiveness of the proposed algorithm was verified. Compared to the original CenterTrack network, the improved network Tiny-CenterTrack reduces training time by about 48.6% and improves the real-time performance by 8.8%. Experimental results show that the improved network can effectively save the computing resources and improve the real-time performance to a certain extent without the loss of network performance.
At present, most deep learning models are difficult to deal with the classification of bird sound under complex background noise. Because bird sound has the continuity characteristic in time domain and high-low characteristic in frequency domain, a fusion model of homologous spectrogram features was proposed for bird sound classification under complex background noise. Firstly, Convolutional Neural Network (CNN) was used to extract Mel-spectrogram features of bird sound. Then, the time domain and frequency domain dimensions of the same Mel-spectrogram feature were compressed to 1 by specific convolution and down-sampling operations, so that frequency domain feature with only high-low characteristics and the time domain feature with only continuous characteristics were obtained. Based on the above operation to extract frequency domain and time domain features, the features of Mel-spectrogram were extracted both in time domain and frequency domain, the time-frequency domain features with continuity and high-low characteristics were obtained. Then the self-attention mechanism was applied to the obtained time domain, frequency domain and time-frequency domain features, strengthening their own characteristics. Finally, the results of these three homologous spectrogram features after decision fusion were used for bird sound classification. The proposed model was used for audio classification of 8 bird species on Xeno-canto website, achieved the better result in the comparison experiment with the Mean Average Precision (MAP) of 0.939. The experimental results show that the proposed model can deal with the problem of the poor classification effect of bird sound under complex background noise.
In online kernel regression learning, the inverse matrix of the kernel matrix needs to be calculated when a new sample arrives, and the computational complexity is at least the square of the number of rounds. The idea of applying sketching method to hypothesis updating was introduced, and a more efficient online kernel regression algorithm via sketching method was proposed. Firstly, The loss function was set as the square loss, a new gradient descent algorithm, called FTL-Online Kernel Regression (F-OKR) was proposed, using the Nystr?m approximation method to approximate the Kernel, and applying the idea of Follow-The-Leader (FTL). Then, sketching method was used to accelerate F-OKR so that the computational complexity of F-OKR was reduced to the level of linearity with the number of rounds and sketch scale, and square with the data dimension. Finally, an efficient online kernel regression algorithm called Sketched Online Kernel Regression (SOKR) was designed. Compared to F-OKR, SOKR had no change in accuracy and reduced the runtime by about 16.7% on some datasets. The sub-linear regret bounds of these two algorithms were proved, and experimental results on standard regression datasets also verify that the algorithms have better performance than NOGD (Nystr?m Online Gradient Descent) algorithm, the average loss of all the datasets was reduced by about 64%.
In feature learning process, the existing hashing methods cannot distinguish the importance of the feature information of each region, and cannot utilize the label information to explore the correlation between modalities. Therefore, an Adaptive Hybrid Attention Hashing for deep cross-modal retrieval (AHAH) model was proposed. Firstly, channel attention and spatial attention were combined by the weights obtained by autonomous learning to strengthen the attention to the relevant target area and weaken the attention to the irrelevant target area. Secondly, the similarity between modalities was expressed more finely through the statistical analysis of modality labels and quantification of similarity degrees to numbers between 0 and 1 by using the proposed similarity measurement method. Compared with the most advanced method Multi-Label Semantics Preserving Hashing (MLSPH) on four commonly used datasets MIRFLICKR-25K, NUS-WIDE, MSCOCO, and IAPR TC-12, when the hash code length is 16 bit, the proposed method has the retrieval mean Average Precision (mAP) increased by 2.25%, 1.75%, 6.8%, and 2.15%, respectively. In addition, ablation experiments and efficiency analysis also prove the effectiveness of the proposed method.
The existing Nonnegative Matrix Factorization (NMF) algorithms are often designed based on Euclidean distance, which makes the algorithms sensitive to noise. In order to enhance the robustness of these algorithms, a Manifold Regularized Nonnegative Matrix Factorization based on Clean Data (MRNMF/CD) algorithm was proposed. In MRNMF/CD algorithm, the low-rank constraints, manifold regularization and NMF technologies were seamlessly integrated, which makes the algorithm perform relatively excellent. Firstly, by adding the low-rank constraints, MRNMF/CD can recover clean data from noisy data and obtain the global structure of the data. Secondly, in order to use the local geometric structure information of the data, manifold regularization was incorporated into the objective function by MRNMF/CD. In addition, an iterative algorithm for solving MRNMF/CD was proposed, and the convergence of this solution algorithm was analyzed theoretically. Experimental results on ORL, Yale and COIL20 datasets show that MRNMF/CD algorithm has better accuracy than the existing algorithms including k-means, Principal Component Analysis (PCA), NMF and Graph Regularized Nonnegative Matrix Factorization (GNMF).
Functional query is an important operation in big data application, and the problem of query answering has always been the core problem in database theory. In order to analyze the complexity of the functional query answering problem on big data, firstly, the functional query language was reduced to a known decidable language by using mapping reduction method, which proves the computability of the functional query answering problem. Secondly, first-order language was used to describe the functional query, and the plexity of the first-order language was analyzed. On this basis, the NC-factor reduction method was used to reduce the functional query class to the known Π Τ Q -complete class. It is proved that functional query answering problem can be solved in NC time after PTIME (Polynomial TIME) preprocessing. It can be conducted that the functional query answering problem can be handled on big data.
Constant expansion and that energy consumption factors are ignored with its design process, bring the problem of high energy consumption and low efficiency of the cloud storage system. And this problem has become a main bottleneck in the development of cloud computing and big data. Most of previous studies had been mostly used to adjust the entire storage node to the low-power mode to save energy. According to the repetition of data and access rules, new storage model based on data classification was proposed. The storage area was divided into HotZone, ColdZone and ReduplicationZone so as to divisionally store the data according to the repetition and activity factor characteristics of each data file. Based on the new storage model, an energy-efficient storage algorithm was designed and a new storage model was constructed. The experimental results show that, the new storage model improves the energy utilization rate of the distributed storage system nearly 25%, especially when the system load is lower than the given threshold.
Building an interpretable and large-scale protein-compound interactions model is an very important subject. A new chemical interpretable model to cover the protein-compound interactions was proposed. The core idea of the model is based on the hypothesis that a protein-compound interaction can be decomposed as protein fragments and compound fragments interactions, so composing the fragments interactions brings about a protein-compound interaction. Firstly, amino acid oligomer clusters and compound substructures were applied to describe protein and compound respectively. And then the protein fragments and the compound fragments were viewed as the two parts of a bipartite graph, fragments interactions as the edges. Based on the hypothesis, the protein-compound interaction is determined by the summation of protein fragments and compound fragments interactions. The experiment demonstrates that the model prediction accuracy achieves 97% and has the very good explanatory.
In order to overcome the problem of staircase effect and edge blurring caused by using the traditional anisotropic diffusion model, the perfect reconstruction and better direction selectivity characteristics of the complex wavelet transform were applied to design an adaptive diffusion model combined with the gradient and complex wavelet transform modulus in the complex wavelet domain, and an adaptive image diffusion filtering algorithm was proposed based on exponent variable in this work. Finally, the filtering performance of the proposed algorithm was tested through computer simulation. The experimental results show that the noise can be filtered effectively in low Signal-to-Noise Ratio (SNR) conditions, and the edges and textures can be preserved well by the proposed method.
In order to improve the robustness and accuracy of relative orientation, an approach combining direct resolving and iterative refinement for relative orientation was proposed. Firstly, the essential matrix was estimated from some corresponding points. Afterwards the initial relative position and posture of two cameras were obtained by decomposing the essential matrix. The process for determining the only position and posture parameters were introduced in detail. Finally, by constructing the horizontal epipolar coordinate system, the constraint equation group was built up from the corresponding points based on the coplanar constraint, and the initial position and posture parameters were refined iteratively. The algorithm was resistant to the outliers by applying the RANdom Sample Consensus (RANSAC) strategy and dynamically removing outliers during iterative refinement. The simulation experiments illustrate the resolving efficiency and accuracy of the proposed algorithm outperforms that of the traditional algorithm under the circumstance of importing varies of random errors. And the experiment with real data demonstrates the algorithm can be effectively applied to relative position and posture estimation in 3D reconstruction.
Aiming at object classification problem in heavily crowded and complex visual surveillance scenes, a real-time object classification approach was proposed based on discriminable features and continuous tracking. Firstly rapid features matching including color, shape and position was utilized to build the initial target correspondence in the whole scene, in which motion direction and velocity of the moving target were used to predict the preferable searching area in the next frame to accelerate the target matching process. And then the appearance model was utilized to rematch the occluded object without establishing the correspondence. In order to enhance the classification precision, the final object classification results were determined by the maximum probability of continuous object feature extraction and classification according to the tracking results. Experimental results show that the proposed method gets better classification precision compared with the method which do not utilized the continuous tracking,and its correct rate averagely reaches 97%. The new scheme effectively improves the performance of object classification in the complex scenes.
Femtocell is a small low powered base station which can provide an increase in system capacity and better indoor coverage for two-tier Long Term Evolution (LTE) network. However, interference problem between the femtocell and the Microcell eNodeB (MeNB) should be solved in advance. Concerning the interference between them, an effective Inter-Cell Interference Coordination (ICIC) scheme using Soft Frequency Reuse (SFR) was proposed in LTE femtocell system. Under the macrocell pre-allocating frequency band by the SFR, the femtocell user equipments chose sub-bands which were not used in the macrocell sub-area to avoid co-channel interference. At the same time, when the femtocell was located in the center of a macrocell, it was not going to select the sub-bands which were occupied by the boundary region of the same sector. Simulation results show that the proposal scheme improves the throughput performance of overall network by 14% compared to the situation without ICIC, and the average throughput of cell edge users increases by 34% at least.