To address the lack of publicly available data for modeling effective dialogue models in psychological counseling human-machine dialogues, a psychological counseling dialogue dataset was constructed for dialogue generation and mental disorder detection. Firstly, a multi-round dialogue dataset containing 3 268 doctor-patient conversations was collected from an online medical consultation platform, enriched with comprehensive metadata including hospital affiliations, medical departments, disease categories, and patient self-descriptions. Secondly, a knowledge-enhanced dialogue model named Empathy Bidirectional and Auto-Regressive Transformers (EmBART) was proposed to enhance the empathic capabilities of the dialogue model. Finally, an experimental evaluation of the dataset usability was conducted through psychological response generation and mental disorder detection tasks. In psychological response generation, EmBART trained on this dataset performed excellently on all metrics in both automatic and human evaluations, with the perplexity reduced by 2.31 compared to baseline model CDial-GPT(Chinese Dialogue Generative Pre-trained Transformer). In mental disorder detection, CPT (Chinese Pre-trained unbalanced Transformer) and RoBERTa (Robustly optimized Bidirectional Encoder Representations from Transformers approach) trained on this dataset demonstrated outstanding mental disorder prediction capabilities. Experimental results confirm the strong utility of this dataset in generating empathic dialogues and detecting mental disorders, providing a data base for future research on psychological counseling human-machine dialogues.
Multi-modal 3D object detection is an important task in computer vision, and how to better fuse information among different modalities is always a research focus of this task. Previous methods lack information filtering when fusing the information of different modalities, and excessive irrelevant and interference information may lead to a decline in model performance. To address the above issues, an LiDAR-camera 3D object detection model based on multi-modal information mutual guidance and supplementation was proposed, which selected information from another modality for fusion adaptively when fusing features. Adaptive information fusion includes data-level and feature-level mutual guidance and supplementation. In data-level fusion, depth maps generated by point clouds and segmentation masks generated by images were used as input to construct instance-level depth maps and instance-level 3D virtual points, respectively, for supplementing images and point clouds. In feature-level fusion, voxel features generated by point clouds and feature maps generated by images were used as input, and key regions were selected from another modality for the features to be fused and feature fusion was conducted through attention mechanism. Experimental results show that the proposed model achieves good results on nuScenes test set. Compared to traditional unguided fusion models such as BEVFusion and TransFusion, the proposed model has the two mainstream evaluation indexes — mean Average Precision (mAP) and nuScenes Detection Score (NDS) improved by 0.9-28.9 percentage points and 0.6-26.1 percentage points, respectively. The above verifies that the proposed model can improve the accuracy of multi-modal 3D object detection effectively.
Node feature representation was learned by Graph Convolutional Network (GCN) by deep graph matching models in the stage of node feature extraction. However, GCN was limited by the learning ability for node feature representation, affecting the distinguishability of node features, which causes poor measurement of node similarity, and leads to the loss of model matching accuracy. To solve the problem, a deep graph matching model based on self-attention network was proposed. In the stage of node feature extraction, a new self-attention network was used to learn node features. The principle of the network is improving the feature description of nodes by utilizing spatial encoder to learn the spatial structures of nodes, and using self-attention mechanism to learn the relations among all the nodes. In addition, in order to reduce the loss of accuracy caused by relaxed graph matching problem, the graph matching problem was modelled to an integer linear programming problem. At the same time, structural matching constraints were added to graph matching problem on the basis of node matching, and an efficient combinatorial optimization solver was introduced to calculate the local optimal solution of graph matching problem. Experimental results show that on PASCAL VOC dataset, compared with Permutation loss and Cross-graph Affinity based Graph Matching (PCA-GM), the proposed model has the average matching precision on 20 classes of images increased by 14.8 percentage points, on Willow Object dataset, the proposed model has the average matching precision on 5 classes of images improved by 7.3 percentage points, and achieves the best results on object matching tasks such as bicycles and plants.
Accurate traffic flow prediction is very important in helping traffic management departments to take effective traffic control and guidance measures and travelers to plan routes reasonably. Aiming at the problem that the traditional deep learning models do not fully consider the spatial-temporal characteristics of traffic data, a CNN-LSTM prediction model based on attention mechanism, namely STCAL (Spatial-Temporal Convolutional Attention-LSTM network), was established under the theoretical frameworks of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) unit and with the combination of the spatial-temporal characteristics of urban traffic flow. Firstly, the fine-grained grid division method was used to construct the spatial-temporal matrix of traffic flow. Secondly, CNN model was used as a spatial component to extract the spatial characteristics of urban traffic flow in different periods. Finally, the LSTM model based on attention mechanism was used as a dynamic time component to capture the temporal characteristics and trend variability of traffic flow, and the prediction of traffic flow was realized. Experimental results show that compared with Gated Recurrent Unit (GRU) and Spatio-Temporal Residual Network (ST-ResNet), STCAL model has the Root Mean Square Error (RMSE) index reduced by 17.15% and 7.37% respectively, the Mean Absolute Error (MAE) index reduced by 22.75% and 9.14% respectively, and the coefficient of determination (R2) index increased by 11.27% and 2.37% respectively. At the same time, it is found that the proposed model has the prediction effect on weekdays with high regularity higher than that on weekends, and has the best prediction effect of morning peak on weekdays, showing that it can provide a basis for short-term urban regional traffic flow change monitoring.