In view of the lack of traditional image segmentation algorithms to guide Convolutional Neural Network (CNN) for segmentation in the current field of medical image segmentation, a medical image segmentation Network with Content-Guided Multi-Angle Feature Fusion (CGMAFF-Net) was proposed. Firstly, grayscale images and Otsu threshold segmentation images were used to generate lesion region guidance maps through a Transformer-based micro U-shaped feature extraction module, and Adaptive Combination Weighting (ACW) was used to weight them to the original medical images for initial guidance. Then, Residual Network (ResNet) was employed to extract downsampled features from the weighted medical images, and a Multi-Angle Feature Fusion (MAFF) module was used to fuse feature maps at 1/16 and 1/8 scales. Finally, Reverse Attention (RA) was applied to upsample and restore the feature map size gradually, so as to predict key lesion regions. Experimental results on CVC-ClinicDB, Kvasir-SEG, and ISIC 2018 datasets demonstrate that compared to the existing best-performing segmentation multiscale spatial reverse attention network MSRAformer, CGMAFF-Net increases the mean Intersection over Union (mIoU) by 0.97, 0.78, and 0.11 percentage points, respectively; compared to the classic network U-Net, CGMAFF-Net improves the mIoU by 2.66, 8.94, and 1.69 percentage points, respectively, fully verifying the effectiveness and advancement of CGMAFF-Net.
Addressing the insufficiency of structural guidance in the framework of Zero-Shot Action Recognition (ZSAR) algorithms, an Action Recognition Algorithm based on Attention mechanism and Energy function (ARAAE) was proposed guided by the Energy-Based Model (EBM) for framework design. Firstly, to obtain the input for EBM, a combination of optical flow and Convolutional 3D (C3D) architecture was designed to extract visual features, achieving spatial non-redundancy. Secondly, Vision Transformer (ViT) was utilized for visual feature extraction to reduce temporal redundancy, and ViT cooperated with combination of optical flow and C3D architecture was used to reduce spatial redundancy, resulting in a non-redundant visual space. Finally, to measure the correlation between visual space and semantic space, an energy score evaluation mechanism was realized with the design of a joint loss function for optimization experiments. Experimental results on HMDB51 and UCF101 datasets using six classical ZSAR algorithms and algorithms in recent literature show that on the HMDB51 dataset with average grouping, the average recognition accuracy of ARAAE is (22.1±1.8)%, which is better than those of CAGE (Coupling Adversarial Graph Embedding), Bi-dir GAN (Bi-directional Generative Adversarial Network) and ETSAN (Energy-based Temporal Summarized Attentive Network). On UCF101 dataset with average grouping, the average recognition accuracy of ARAAE is (22.4±1.6)%, which is better than those of all comparison algorithm slightly. On UCF101 with 81/20 dataset segmentation method, the average recognition accuracy of ARAAE is (40.2±2.6)%, which is higher than those of the comparison algorithms. It can be seen that ARAAE improves the recognition performance in ZSAR effectively.
To solve the problems of image detail loss and unclear texture caused by interference factors such as noise, imaging technology and imaging principles in the medical Magnetic Resonance Imaging (MRI) process, a multi-receptive field generative adversarial network for medical MRI image super-resolution reconstruction was proposed. First, the multi-receptive field feature extraction block was used to obtain the global feature information of the image under different receptive fields. In order to avoid the loss of detailed texture due to too small or too large receptive fields, each set of features was divided into two groups, and one of which was used to feedback global feature information under different scales of receptive fields, and the other group was used to enrich the local detailed texture information of the next set of features; then, the multi-receptive field feature extraction block was used to construct feature fusion group, and spatial attention module was added to each feature fusion group to adequately obtain the spatial feature information of the image, reducing the loss of shallow and local features in the network, and achieving a more realistic degree in the details of the image. Secondly, the gradient map of the low-resolution image was converted into the gradient map of the high-resolution image to assist the reconstruction of the super-resolution image. Finally, the restored gradient map was integrated into the super-resolution branch to provide structural prior information for super-resolution reconstruction, which was helpful to generate high quality super-resolution images. The experimental results show that compared with the Structure-Preserving Super-Resolution with gradient guidance (SPSR) algorithm, the proposed algorithm improves the Peak Signal-to-Noise Ratio (PSNR) by 4.8%, 2.7% and 3.5% at ×2, ×3 and ×4 scales, respectively, and the reconstructed medical MRI images have richer texture details and more realistic visual effects.
To meet the demand of high spatio-temporal resolution remote sensing images for power facilities safety monitoring and emergency management, a deep convolutional network-based remote sensing image fusion enhancement model for power facilities was proposed. Firstly, a deep convolutional network was designed, including encoder, Residual Attention (RA) mechanism block, substitution attention mechanism block and decoder. Secondly, the two-layer convolution and the residual block of fusion channel attention mechanism were improved to increase the network's attention to details and key features of images, and enhance the feature extraction capability of the network. Thirdly, the multi-channel substitution attention block was improved to make the network paying more attention to the details of images. As the result, the performance of high-resolution image fusion reconstruction was improved. Finally, the loss function composition of the model was improved, and the composite loss function consisting of content loss and visual loss was adopted to improve training effect of the model. Experimental results indicate that the proposed model has the performance of image fusion reconstruction better than other fusion models significantly, and the detail textures of predicted image closer to those of the real image. Compared with Multi-stage Feature Compensation NET (MFCNET) model, the proposed model has the Correlation Coefficient (CC) improved by 1.6%. and the SSIM (Structure Similarity Index Measure) improved by 18.4%. It can be seen that the proposed model provides a basis for remote sensing image processing, especially for high-resolution reconstruction of small target remote sensing images.