Emotion recognition is a technology that allows computers to recognize and understand human emotions. It plays an important role in many fields and is an important development direction in the field of artificial intelligence. Therefore, the research status of bimodal emotion recognition based on speech and text was summarized. Firstly, the representation space of emotion was classified and elaborated. Secondly, the emotion databases were classified according to their emotion representation space, and the common multi-modal emotion databases were summed up. Thirdly, the methods of bimodal emotion recognition based on speech and text were introduced, including feature extraction, modal fusion, and decision classification. Specifically, the modal fusion methods were highlighted and divided into four categories, namely feature level fusion, decision level fusion, model level fusion and multi-level fusion. In addition, results of a series of bimodal emotion recognition methods based on speech and text were compared and analyzed. Finally, the application scenarios, challenges, and future development directions of emotion recognition were introduced. The above aims to analyze and review the work of multi-modal emotion recognition, especially bimodal emotion recognition based on speech and text, providing valuable information for emotion recognition.
With the rapid development of Large Language Models (LLMs), dialogue assistants based on LLM have emerged as a new learning method for students. These assistants generate answers through interactive Q&A, helping students solve problems and improve learning efficiency. However, the existing conversational assistants ignore students’ personalized needs, failing to provide personalized answers for “tailored instruction”. To address this, a personalized conversational assistant framework based on student capability perception was proposed, which is consisted of two main modules: a capability perception module that analyzes students’ exercise records to explore the knowledge proficiency of the students, and a personalized answer generation module that creates personalized answers based on the capabilities of the students. Three implementation paradigms — instruction-based, data-driven, and agent-based ones were designed to explore the framework’s practical effects. In the instruction-based assistant, the inference capabilities of LLMs were used to explore knowledge proficiency of the students from students’ exercise records to help generate personalized answers; in the small model-driven assistant, a Deep Knowledge Tracing (DKT) model was employed to generate students’ knowledge proficiency; in the agent-based assistant, tools such as student capability perception, personalized detection, and answer correction were integrated using LLM agent method for assistance of answer generation. Comparison experiments using Chat General Language Model (ChatGLM) and GPT4o_mini demonstrate that LLMs applying all three paradigms can provide personalized answers for students, the accuracy of the agent-based paradigm is higher, indicating the superior student capability perception and personalized answer generation of this paradigm.
Network traffic anomaly detection is a network security defense method that involves analyzing and determining network traffic to identify potential attacks. A new approach was proposed to address the issue of low detection accuracy and high false positive rate caused by imbalanced high-dimensional network traffic data and different attack categories. One Dimensional Convolutional Neural Network(1D-CNN) and Bidirectional Gated Recurrent Unit (BiGRU) were combined to construct a model for traffic anomaly detection. For class-imbalanced data, balanced processing was performed by using an improved Synthetic Minority Oversampling TEchnique (SMOTE), namely Borderline-SMOTE, and an undersampling clustering technique based on Gaussian Mixture Model (GMM). Subsequently, a one-dimensional CNN was utilized to extract local features in the data, and BiGRU was used to better extract the time series features in the data. Finally, the proposed model was evaluated on the UNSW-NB15 dataset, achieving an accuracy of 98.12% and a false positive rate of 1.28%. The experimental results demonstrate that the proposed model outperforms other classic machine learning and deep learning models, it improves the recognition rate for minority attacks and achieves higher detection accuracy.
In order to solve the problems of slow accurate search speed due to mixed data storage and difficult security governance caused by unclassified and graded data management, a data classified and graded access control model based on master-slave multi-chain was built to achieve classified and graded protection of data and dynamic secure access. Firstly, a hybrid on-chain and off-chain trusted storage model was constructed to balance the storage bottleneck faced by blockchain. Secondly, a master-slave multi-chain architecture was proposed and smart contracts were designed to automatically store data with different privacy levels in the slave chain. Finally, based on Role-Based Access Control, a Multi-Chain and Level Policy-Role Based Access Control (MCLP-RBAC) mechanism was constructed and its specific access control process design was provided. Under the graded access control policy, the throughput of the proposed model is stabilized at around 360 TPS (Transactions Per Second). Compared with the BC-BLPM scheme, it has a certain superiority in throughput, with the ratio of sending rate to throughput reaching 1∶1. Compared with no access strategy, the memory consumption is reduced by about 35.29%; compared with the traditional single chain structure, the memory average consumption is reduced by 52.03%. And compared with the scheme with all the data on the chain, the average storage space is reduced by 36.32%. The experimental results show the proposed model can effectively reduce the storage burden, achieve graded secure access, and suitable for the management of multi-class data with high scalability.
The target size of the Unmanned Aerial Vehicle (UAV) is small, and the characteristics among multiple UAVs are not obvious. At the same time, the interference of birds and flying insects brings a huge challenge to the accurate detection and stable tracking of the UAV targets. Aiming at the problem of poor detection performance and unstable tracking of small target UAVs by using traditional target detection algorithms, a real-time tracking algorithm for multiple UAVs based on improved PaddlePaddle-YOLO (PP-YOLO) and Simple Online and Realtime Tracking with a Deep association metric (Deep-SORT) was proposed. Firstly, the squeeze-excitation module was integrated into PP-YOLO detection algorithm to achieve feature extraction and detection of UAV targets. Secondly, the Mish activation function was introduced into ResNet50-vd structure to solve the problem of vanishing gradient in the back propagation process and further improve the detection precision. Thirdly, Deep-SORT algorithm was used to track UAV targets in real time, and the backbone network that extracts appearance features was replaced with ResNet50, thereby improving the original network’s weak perceptual ability of small appearances. Finally, the loss function Margin Loss was introduced, which not only improved the class separability, but also strengthened the tightness within the class and the difference between classes. Experimental results show that the detection mean Average Precision (mAP) of the proposed algorithm is increased by 2.27 percentage points compared to that of the original PP-YOLO algorithm, and the tracking accuracy of the proposed algorithm is increased by 4.5 percentage points compared to that of the original Deep-SORT algorithm. The proposed algorithm has a tracking accuracy of 91.6%, can track multiple UAV targets within 600 m in real time, and effectively solves the problem of "frame loss" in the tracking process.
Nitrogen oxide (NOx) is one of the main pollutants in the regenerated flue gas of Fluid Catalytic Cracking (FCC) unit. Accurate prediction of NOx emission can effectively avoid the occurrence of pollution events in refinery enterprises. Because of the non-stationarity, nonlinearity and long-memory characteristics of pollutant emission data, a new hybrid model incorporating Ensemble Empirical Mode Decomposition (EEMD) and Long Short-Term Memory network (LSTM) was proposed to improve the prediction accuracy of pollutant emission concentration. The NOx emission concentration data was first decomposed into several Intrinsic Mode Functions (IMFs) and a residual by using the EEMD model. According to the correlation analysis between the IMF sub-sequences and the original data, the IMF sub-sequences with low correlation were eliminated, which could effectively reduce the noise in the original data. The IMFs could be divided into high and low frequency sequences, which were respectively trained in the LSTM networks with different depths. The final NOx concentration prediction results were reconstructed by the predicted results of each sub-sequences. Compared with the performance of LSTM in the NOx emission prediction of FCC unit, the Mean Square Error (MSE), Mean Absolute Error (MAE) were reduced by 46.7%, 45.9%,and determination coefficient (R2) of EEMD-LSTM was improved by 43% respectively, which means the proposed model achieves higher prediction accuracy.
According to the capacity of multi-level caches, the population individuality and ant data in CPU main memory were assigned to L3 cache, L2 cache and L1 cache to reduce data transfer overhead among multiple caches during parallel computing. The asynchronous and incomplete transmission was performed between CPU and GPU, and multiple flows were asynchronously executed by multiple GPU kernel functions. The thread number of GPU block was set to the size of 16 times and GPU public memory was divided into bank with the size of 32 times. GPU constant memory was used to store read-only parameters such as cross probability and mutate probability which were read frequently. The read-only big data structure such as string set and overlap matrix were bound to GPU texture memory, and a computation, cache and communication-efficient parallel algorithm for CPU and GPU to coordinate solving shortest common superstring problem was designed and implemented. The experimental results for solving shortest common superstring problem with several sizes show the proposed CPU and GPU parallel algorithm is faster over 70 times than the sequential algorithm.