Head pose estimation has been extensively studied in various fields. However, in the medical field, the research on utilizing head pose estimation for monitoring patient recovery issues in the Post-Anesthesia Care Unit (PACU) is limited. Existing approaches, such as Learning Fine-Grained Structure Aggregation (FSA-Net) for head pose estimation from a single image, suffer from poor convergence and overfitting problems. To address these issues, three publicly available datasets, 300W-LP, AFLW2000 and BIWI, were used to monitor the head movements of patients during anesthesia resuscitation, and a method for classifying the amplitude of patient head movements based on the estimation of head posture was proposed. Firstly, the activation function Rectifier Linear Unit (ReLU) of one of the streams of FSA-Net was replaced with a Leakage-Rectifier Linear Unit (LeakyReLU) to optimize the convergence of the model, and Adam Weight decay optimizer (AdamW) was employed instead of Adaptive Moment Estimation (Adam) to mitigate overfitting. Secondly, the magnitude of head movements during patient anesthesia resuscitation was classified into three categories: small, medium, and large movements. Finally, the collected data was visualized using Hypertext Preprocessor (PHP), EnterpriseCharts (EChart), and PostgreSQL to provide real-time monitoring graphs of patient head movements. The experimental results show that the mean absolute error of the improved FSA-Net is reduced by 0.334° and 0.243° compared to the mean absolute error of the original FSA-Net on the AFLW2000 dataset and the BIWI dataset, respectively. Thus, the improved model demonstrates practical effectiveness in anaesthesia resuscitation monitoring and serves as a valuable tool for healthcare professionals to make decisions regarding patient anaesthesia resuscitation.
In the task of handwriting identification, the large area of image is background, handwriting information is sparse, key information is difficult to capture, and personal handwriting signature style has slight changes and handwriting imitated is highly similar, as well as there is few public Chinese handwriting datasets. By improving attention mechanism and Siamese network model, a handwriting identification method based on Multi-scale and Mixed Domain Attention mechanism (MMDANet) was proposed. Firstly, a maximum pooling layer was connected in parallel to the effective channel attention module, and to extend the number of channels of two-dimensional strip pooling module to three dimensions. The improved effective channel attention module and strip pooling module were fused to generate a Mixed Domain Module (MDM), thereby solving the problems that large area of handwriting image is background, handwriting information is sparse and detailed features are difficult to extract. Secondly, the Path Aggregation Network (PANet) feature pyramid was used to extract features at multiple scales to capture the subtle differences between true and false handwriting, and the comparison loss of Siamese network and Additive Margin Softmax (AM-Softmax) loss were weightedly fused for training to increase the discrimination between categories and solve the problem of personal handwriting style variation and high similarity between true and false handwriting. Finally, a Chinese Handwriting Dataset (CHD) with a total sample size of 8 000 was self-made. The accuracy of the proposed method on the Chinese dataset CHD reached 84.25%; and compared with the suboptimal method Two-stage Siamese Network (Two-stage SiamNet), the proposed method increased the accuracy by 4.53%, 1.02% and 1.67% respectively on three foreign language datasets Cedar, Bengla and Hindi. The experimental results show that the MMDANet can more accurately capture the subtle differences between true and false handwriting, and complete complex handwriting identification tasks.
Considering the low positioning accuracy and strong scene dependence of optimization strategy in the Distance Vector Hop (DV-Hop) localization model, an improved DV-Hop model, Function correction Distance Vector Hop (FuncDV-Hop) based on function analysis and determining coefficients by simulation was presented. First, the average hop distance, distance estimation, and least square error in the DV-Hop model were analyzed. The following concepts were introduced: undetermined coefficient optimization, step function segmentation experiment, weight function approach using equivalent points, and modified maximum likelihood estimation. Then, in order to design control trials, the number of nodes, the proportion of beacon nodes, the communication radius, the number of beacon nodes, and the number of unknown nodes were all designed for multi-scenario comparison experiments by using the control variable technique. Finally, the experiment was split into two phases:determining coefficients by simulation and integrated optimization testing. Compared with the original DV-Hop model, the positioning accuracy of the final improved strategy is improved by 23.70%-75.76%, and the average optimization rate is 57.23%. The experimental results show that, the optimization rate of FuncDV-Hop model is up to 50.73%, compared with the DV-Hop model based on genetic algorithm and neurodynamic improvement, the positioning accuracy of FuncDV-Hop model is increased by 0.55%-18.77%. The proposed model does not introduce other parameters, does not increase the protocol overhead of Wireless Sensor Networks (WSN), and effectively improves the positioning accuracy.
To effectively extract the temporal information between consecutive video frames, a prediction network IndRNN-VAE (Independently Recurrent Neural Network-Variational AutoEncoder) that fuses Independently Recurrent Neural Network (IndRNN) and Variational AutoEncoder (VAE) network was proposed. Firstly, the spatial information of video frames was extracted through VAE network, and the latent features of video frames were obtained by a linear transformation. Secondly, the latent features were used as the input of IndRNN to obtain the temporal information of the sequence of video frames. Finally, the obtained latent features and temporal information were fused through residual block and input to the decoding network to generate the prediction frame. By testing on UCSD Ped1, UCSD Ped2 and Avenue public datasets, experimental results show that compared with the existing anomaly detection methods, the method based on IndRNN-VAE has the performance significantly improved, and has the Area Under Curve (AUC) values reached 84.3%, 96.2%, and 86.6% respectively, the Equal Error Rate (EER) values reached 22.7%, 8.8%, and 19.0% respectively, the difference values in the mean anomaly scores reached 0.263, 0.497, and 0.293 respectively. Besides, the running speed of this method reaches 28 FPS (Frames Per Socond).
The domestic DCU (Deep Computer Unit) adopts the parallel execution model of Single Instruction Multiple Thread (SIMT). When the programs are executed, inconsistent control flow is generated in the kernel function, which causes the threads in the warp be executed serially. And that is warp divergence. Aiming at the problem that the performance of the kernel function is severely restricted by warp divergence, a compilation optimization method to reduce the warp divergence time — Partial-Control-Flow-Merging (PCFM) was proposed. Firstly, divergence analysis was performed to find the fusible divergent regions that are isomorphic and contained a large number of same instructions and similar instructions. Then, the fusion profit of the fusible divergent regions was evaluated by counting the percentage of instruction cycles saved after merging. Finally, the alignment sequence was searched, the profitable fusible divergent regions were merged. Some test cases from Graphics Processing Unit (GPU) benchmark suite Rodinia and the classic sorting algorithm were selected to test PCFM on DCU. Experimental results show that PCFM can achieve an average speedup ratio of 1.146 for the test cases. And the speedup of PCFM is increased by 5.72% compared to that of the branch fusion + tail merging method. It can be seen that the proposed method has a better effect on reducing warp divergence.
Due to the introduction of MonoDepth2, unsupervised monocular ranging has made great progress in the field of visible light. However, visible light is not applicable in some scenes, such as at night and in some low-visibility environments. Infrared thermal imaging can obtain clear target images at night and under low-visibility conditions, so it is necessary to estimate the depth of infrared image. However, due to the different characteristics of visible and infrared images, it is unreasonable to migrate existing monocular depth estimation algorithms directly to infrared images. An infrared monocular ranging algorithm based on multiscale feature fusion after improving the MonoDepth2 algorithm can solve this problem. A new loss function, edge loss function, was designed for the low texture characteristic of infrared image to reduce pixel mismatch during image reprojection. The previous unsupervised monocular ranging simply upsamples the four-scale depth maps to the original image resolution to calculate projection errors, ignoring the correlation between scales and the contribution differences between different scales. A weighted Bi-directional Feature Pyramid Network (BiFPN) was applied to feature fusion of multiscale depth maps so that the blurring of depth map edge was solved. In addition, Residual Network (ResNet) structure was replaced by Cross Stage Partial Network (CSPNet) to reduce network complexity and increase operation speed. The experimental results show that edge loss is more suitable for infrared image ranging, resulting in better depth map quality. After adding BiFPN structure, the edge of depth image is clearer. After replacing ResNet with CSPNet, the inference speed is improved by about 20 percentage points. The proposed algorithm can accurately estimate the depth of the infrared image, solving the problem of depth estimation in night low-light scenes and some low-visibility scenes, and the application of this algorithm can also reduce the cost of assisted driving to a certain extent.
In Named Entity Recognition (NER) of elementary mathematics, aiming at the problems that the word embedding of the traditional NER method cannot represent the polysemy of a word and some local features are ignored in the feature extraction process of the method, a Bidirectional Encoder Representation from Transformers (BERT) based NER method for elementary mathematical text named BERT-BiLSTM-IDCNN-CRF (BERT-Bidirectional Long Short-Term Memory-Iterated Dilated Convolutional Neural Network-Conditional Random Field) was proposed. Firstly, BERT was used for pre-training. Then, the word vectors obtained by training were input into BiLSTM and IDCNN to extract features, after that, the output features of the two neural networks were merged. Finally, the output was obtained through the correction of CRF. Experimental results show that the F1 score of BERT-BiLSTM-IDCNN-CRF is 93.91% on the dataset of test questions of elementary mathematics, which is 4.29 percentage points higher than that of BiLSTM-CRF benchmark model, and 1.23 percentage points higher than that of BERT-BiLSTM-CRF model. And the F1 scores of the proposed method to line, angle, plane, sequence and other entities are all higher than 91%, which verifies the effectiveness of the proposed method on elementary mathematical entity recognition. In addition, after adding attention mechanism to the proposed model, the recall of the model decreases by 0.67 percentage points, but the accuracy of the model increases by 0.75 percentage points, which means the introduction of attention mechanism has little effect on the recognition effect of the proposed method.
To address the difficulty of data sparsity and lower recommendation precision in the traditional Collaborative Filtering (CF) recommendation algorithm, a new CF recommendation method of integrating social tags and users background information was proposed in this paper. Firstly, the similarities of different social tags and different users background information were calculated respectively. Secondly, the similarities of different users ratings were calculated. Finally, these three similarities were integrated to generate the integrated similarity between users and undertook the recommendations about items for target users. The experimental results show that, compared with the traditional CF recommendation algorithm, the Mean Absolute Error (MAE) of the proposed algorithm respectively reduces by 16% and 22.6% in the normal dataset and cold-start dataset. The new method can not only improve the accuracy of recommendation algorithm, but also solve the problems of data sparsity and cold-start.
Concerning that the traditional image retrieval methods are confronted with massive image data processing problems, a new solution for large-scale image retrieval, named MR-BoVW, was proposed, which was based on the traditional Bag of Visual Words (BVW) approach and MapReduce model to take advantage of the massive storage capacity and powerful parallel computing ability of Hadoop. To handle image data well, firstly an improved method for Hadoop image processing was introduced, and then, the MapReduce layout was divided into three stages: feature vector generation, feature clustering, image representation and inverted index construction. The experimental results demonstrate that the MR-BoVW solution shows good performance on speedup, scaleup, and sizeup. In fact, the efficiency results are all greater than 0.62, and the curve of scaleup and sizeup is gentle. Thus it is suitable for large-scale image retrieval.
The available Visual Background extractor (ViBe) only uses the spatial information of pixels to build background model ignoring the time information,as a result to make the accuracy of detection decrease. In addition, the detection radius and random sampling factor of updating background model are fixed parameters, the effect of detection is not ideal on the circumstances of dynamic background interference and camera shake. In order to solve these problems, an adaptive moving target detection method based on spatial-temporal background model was proposed. Firstly, the time information was added to ViBe to set up spatial-temporal background model. And then the complexity of the background was reflected by the standard deviation of the samples in the background model. So the standard deviation was able to change the detection radius and random sampling factor of updating background model to adapt to the change of background. The experimental results indicate that the proposed method can not only effectively detect the foreground with static background and uniformity of light, but also have certain inhibitory effects in the cases of the light changing greatly, camera shaking, and the dynamic background interference, and so on. It is capable of improving the precision of detection.
Drone guide line recognition is an automatic path finding method based on ground guide lines. To address the issues of slow recognition speed and low recognition accuracy of ground guide lines, a multi-task lightweight model with data fusion called Mobile-FuUnet based on U-shaped Network (U-Net) was proposed. Firstly, on the structure of U-Net, MobileNet-V3 was introduced for feature extraction, and Depthwise Separable Convolution (DSC) was introduced to reduce the number of model parameters, so as to establish a multi-task lightweight model. Finally, based on attention mechanism for data fusion, the polynomial feature matrix of the pre-image was introduced to solve the computational problem caused by the large area missing at the edge of the guide line, in order to improve the operational accuracy of the model. Multiple comparisons were carried out on Tusimple dataset and the drone guide line dataset. Experimental results show that on the drone guide line dataset, Mobile-FuUnet model can achieve guide line recognition task with the frame rate of 109 frame/s, the Mean Intersection over Union (MIoU) of 98.71%, the F1 score of 99.64 %, and the curve model R2 score of 95.03%. Compared with models such as U-Net, ENet, and DeepLab v3, the proposed model improves both running speed and computational accuracy.