Real-world networks are often composed of multiple types of entities and interaction relationships, with topological structure and attributes evolving with time continuously. The heterogeneity and dynamics inherent in such networks can be fully described by Dynamic Heterogeneous Graph (DHG). To solve the problems of coarse spatio-temporal information fusion and heavy reliance of the supervised learning paradigm on manual labels in the existing DHG representation learning models, a Masked AutoEncoder (MAE) enhanced DHG representation learning model was proposed. Firstly, heterogeneous spatial information was fused through a multi-level attention structure, and temporal information was fused across snapshots. Then, representation information of nodes was enriched by leveraging the reconstruction loss of the masked autoencoder. Experimental results show that improvements of at least 1.26 to 3.99 percentage points in Area Under the receiver operating Characteristic curve (AUC) are achieved by the proposed model on link prediction tasks compared to baseline models on multiple real-world datasets. It can be seen that the proposed model provides an effective self-supervised framework for DHG representation learning, facilitating more precise capture of heterogeneous information and dynamic evolution laws in real networks.
With the advancement of deep learning technologies, artificial intelligence has been driven to evolve from single-modality intelligence toward multimodal intelligence. Vision?Language Models (VLMs), which serve as the pivotal means of bridging vision and language, have been established as a core research area. Aiming at the technological evolution of VLMs, architecture development of VLM was reviewed systematically, and the core technologies and latest research progress in this field were summarized. Firstly, the progression of VLM from early explorations to the current flourishing state was traced, key technological nodes and development trends were analyzed, and a technology roadmap with “architecture development” as the core theme was delineated. Secondly, the current foundational techniques of VLM were analyzed deeply, including core architectures built around vision encoders, language encoders, and cross‐modal fusion mechanisms, as well as key pretraining optimization objectives such as Masked Language Modeling (MLM), Masked Image Modeling (MIM), and Contrastive Learning (CL). Concurrently, the mainstream datasets, which VLM pretraining relies on, such as COCO and LAION-5B, were listed systematically. Finally, representative VLMs were compared and analyzed to discover the relationships among model performance, data scale, architectural innovations, and training strategies, and the advantages and limitations of the related core technologies were commented, thereby providing a comprehensive VLM technology map for researchers of related fields, and offering reference and inspiration for future research.
When using traditional models for the early diagnosis of retinopathy in high-risk patients with Diabetic Nephropathy (DN), the diagnostic accuracy is often compromised due to limited and category imbalanced retinal images of diabetic patients. To address this issue, an auxiliary diagnostic method for retinopathy based on dual-branch structure with knowledge distillation was proposed to improve the recognition capability for minority categories. Firstly, a teacher network pre-trained on large medical datasets was employed to guide the student network's learning process, transferring acquired knowledge to improve the student network's generalization ability and mitigate data scarcity. Secondly, a dual-branch structure was proposed in the student network. Branch 1 utilized a rebalancing strategy with Focal Loss function to emphasize challenging samples by adjusting loss function weights, while Branch 2 employed a Category Attention Module (CAM) to learn discriminative features for each category, preventing model bias towards majority categories. These two branches respectively promoted classifier learning and feature learning to alleviate category imbalance. Evaluated on clinically collected retinal image data, experimental results demonstrate that the proposed method achieves 1.05 and 1.53 percentage points improvements in accuracy and specificity respectively compared with Lesion-aware Attention Model (LAM) in screening tasks involving 66 cases (89 eyes) of high-risk patients with DN. The proposed method improves the recognition accuracy of DN and realizes the auxiliary diagnosis of retinal diseases.
A multi-scale dilated convolution based Unmanned Aerial Vehicle (UAV) image target detection algorithm Swin-Det was proposed to address the issues of complex target scenes, diverse scales of targets, dense small targets and severe occlusion of targets in UAV aerial images. Firstly, Swin Transformer was used as the backbone feature extraction network, and a Spatial Information Blending Module (SIBM) was introduced into the backbone network to solve the problem of fuzziness in target information due to occlusion between objects. Secondly, a Fusion of Dilation Feature Pyramid Network (FDFPN) was proposed to fuse feature information through multi-branch dilated convolution, thereby effectively improving the receptive field of the network and the reuse of feature information, so that the model was able to learn detailed features of different dimensions. Finally, the issues of mismatches in the prediction area and sample imbalance were addressed by using linear interpolation method and multi-task loss function, thereby improving the detection precision of the model. Experimental results on VisDrone dataset show that the Swin-Det algorithm reaches a mean Average Precision (mAP) of 27.2%, which is 4.1 percentage points higher than that of the original Swin Transformer, and converges faster under the same training batch. It can be seen tha the Swin-Det algorithm can achieve high-precision detection of UAV image targets in complex scenes.
Community detection helps to comprehend the complex structure of social networks, but most of the existing community detection methods do not consider the imbalanced sizes of communities to detect, and the discovered community structures are relatively single with low accuracy. Therefore, a non-overlapping community detection method based on Local Expansion of Initial Community Structure (LEICS) was proposed. LEICS was divided into three stages: in the first stage, the initial community structures with different scales were detected by utilizing the hierarchical structure information and local structure information of the network; in the second stage, the initial community was expanded by calculating the connection intensity between the node and the nodes in the community and the modularity contribution of the node, and then using the Label Propagation Algorithm (LPA) to deal with the rest of the nodes; in the third stage, for unstable communities with size smaller than the average community size, the nodes were redistributed to further optimize the results of community detection. Experimental results on twelve datasets of real-world networks and Lancichinetti-Fortunato-Radicchi (LFR) simulated networks show that compared to the suboptimal Local Balanced Label Diffusion (LBLD) algorithm, LEICS improves the Normalized Mutual Information (NMI) by at least 5 percentage points on Polbooks and YouTube networks, and the accuracy and robustness of LEICS in both small-size and large-size networks are fully validated, proving that LEICS can adapt to the imbalance of community size.
To solve the problem of limited computing resources and storage space of edge nodes in the Edge Computing (EC) network, an Edge Computing and Service Offloading (ECSO) algorithm based on improved Deep Reinforcement Learning (DRL) was proposed to reduce node processing latency and improve service performance. Specifically, the problem of edge node service offloading was formulated as a resource-constrained Markov Decision Process (MDP). Due to the difficulty of predicting the request state transfer probability of the edge node accurately, DRL algorithm was used to solve the problem. Considering that the state action space of edge node for caching services is too large, by defining new action behaviors to replace the original actions, the optimal action set was obtained according to the proposed action selection algorithm, so that the process of calculating the action behavior reward was improved, thereby reducing the size of the action space greatly, and improving the training efficiency and reward of the algorithm. Simulation results show that compared with the original Deep Q-Network (DQN) algorithm, Proximal Policy Optimization (PPO) algorithm and traditional Most Popular (MP) algorithm, the total reward value of the proposed ECSO algorithm is increased by 7.0%, 12.7% and 65.6%, respectively, and the latency of edge node service offloading is reduced by 13.0%, 18.8% and 66.4%, respectively, which verifies the effectiveness of the proposed ECSO algorithm and shows that the ECSO can effectively improve the offloading performance of edge computing services.
RAMCloud stores data using log segment structure. When large amount of small files store in RAMCloud, each small file occupies a whole segment, so it may leads to much fragments inside the segments and low memory utilization. In order to solve the small file problem, a strategy based on file classification was proposed to optimize the storage of small files. Firstly, small files were classified into three categories including structural related, logical related and independent files. Before uploading, merging algorithm and grouping algorithm were used to deal with these files respectively. The experiment demonstrates that compared with non-optimized RAMCloud, the proposed strategy can improve memory utilization.