Search Result

Select

Masked autoencoder enhanced dynamic heterogeneous graph representation learning model

Haoran YUAN, Huan LIU, Pengfei JIAO, Zhidong ZHAO, Xianfei ZHANG, Zunliang LIU

Journal of Computer Applications 2026, 46 (6): 1728-1737. DOI: 10.11772/j.issn.1001-9081.2025060754

Abstract （59）

HTML （1）

PDF （1188KB）（13）

Save

Real-world networks are often composed of multiple types of entities and interaction relationships， with topological structure and attributes evolving with time continuously. The heterogeneity and dynamics inherent in such networks can be fully described by Dynamic Heterogeneous Graph （DHG）. To solve the problems of coarse spatio-temporal information fusion and heavy reliance of the supervised learning paradigm on manual labels in the existing DHG representation learning models， a Masked AutoEncoder （MAE） enhanced DHG representation learning model was proposed. Firstly， heterogeneous spatial information was fused through a multi-level attention structure， and temporal information was fused across snapshots. Then， representation information of nodes was enriched by leveraging the reconstruction loss of the masked autoencoder. Experimental results show that improvements of at least 1.26 to 3.99 percentage points in Area Under the receiver operating Characteristic curve （AUC） are achieved by the proposed model on link prediction tasks compared to baseline models on multiple real-world datasets. It can be seen that the proposed model provides an effective self-supervised framework for DHG representation learning， facilitating more precise capture of heterogeneous information and dynamic evolution laws in real networks.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of vision-language model architecture development

Ziquan LIU, Xuyang SHI, Ke LI, Liang LIU, Zhewei ZHU

Journal of Computer Applications 2026, 46 (6): 1703-1711. DOI: 10.11772/j.issn.1001-9081.2025060695

Abstract （220）

HTML （4）

PDF （1005KB）（53）

Save

With the advancement of deep learning technologies， artificial intelligence has been driven to evolve from single-modality intelligence toward multimodal intelligence. Vision?Language Models （VLMs）， which serve as the pivotal means of bridging vision and language， have been established as a core research area. Aiming at the technological evolution of VLMs， architecture development of VLM was reviewed systematically， and the core technologies and latest research progress in this field were summarized. Firstly， the progression of VLM from early explorations to the current flourishing state was traced， key technological nodes and development trends were analyzed， and a technology roadmap with “architecture development” as the core theme was delineated. Secondly， the current foundational techniques of VLM were analyzed deeply， including core architectures built around vision encoders， language encoders， and cross‐modal fusion mechanisms， as well as key pretraining optimization objectives such as Masked Language Modeling （MLM）， Masked Image Modeling （MIM）， and Contrastive Learning （CL）. Concurrently， the mainstream datasets， which VLM pretraining relies on， such as COCO and LAION-5B， were listed systematically. Finally， representative VLMs were compared and analyzed to discover the relationships among model performance， data scale， architectural innovations， and training strategies， and the advantages and limitations of the related core technologies were commented， thereby providing a comprehensive VLM technology map for researchers of related fields， and offering reference and inspiration for future research.

Table and Figures | Reference | Related Articles | Metrics

Select

Auxiliary diagnostic method for retinopathy based on dual-branch structure with knowledge distillation

Sijie NIU, Yuliang LIU

Journal of Computer Applications 2025, 45 (5): 1410-1414. DOI: 10.11772/j.issn.1001-9081.2024060856

Abstract （397）

HTML （8）

PDF （1274KB）（78）

Save

When using traditional models for the early diagnosis of retinopathy in high-risk patients with Diabetic Nephropathy （DN）， the diagnostic accuracy is often compromised due to limited and category imbalanced retinal images of diabetic patients. To address this issue， an auxiliary diagnostic method for retinopathy based on dual-branch structure with knowledge distillation was proposed to improve the recognition capability for minority categories. Firstly， a teacher network pre-trained on large medical datasets was employed to guide the student network's learning process， transferring acquired knowledge to improve the student network's generalization ability and mitigate data scarcity. Secondly， a dual-branch structure was proposed in the student network. Branch 1 utilized a rebalancing strategy with Focal Loss function to emphasize challenging samples by adjusting loss function weights， while Branch 2 employed a Category Attention Module （CAM） to learn discriminative features for each category， preventing model bias towards majority categories. These two branches respectively promoted classifier learning and feature learning to alleviate category imbalance. Evaluated on clinically collected retinal image data， experimental results demonstrate that the proposed method achieves 1.05 and 1.53 percentage points improvements in accuracy and specificity respectively compared with Lesion-aware Attention Model （LAM） in screening tasks involving 66 cases （89 eyes） of high-risk patients with DN. The proposed method improves the recognition accuracy of DN and realizes the auxiliary diagnosis of retinal diseases.

Table and Figures | Reference | Related Articles | Metrics

Select

Small target detection method in UAV images based on fusion of dilated convolution and Transformer

Lin WANG, Jingliang LIU, Wuwei WANG

Journal of Computer Applications 2024, 44 (11): 3595-3602. DOI: 10.11772/j.issn.1001-9081.2023111575

Abstract （660）

HTML （6）

PDF （1433KB）（666）

Save

A multi-scale dilated convolution based Unmanned Aerial Vehicle （UAV） image target detection algorithm Swin-Det was proposed to address the issues of complex target scenes， diverse scales of targets， dense small targets and severe occlusion of targets in UAV aerial images. Firstly， Swin Transformer was used as the backbone feature extraction network， and a Spatial Information Blending Module （SIBM） was introduced into the backbone network to solve the problem of fuzziness in target information due to occlusion between objects. Secondly， a Fusion of Dilation Feature Pyramid Network （FDFPN） was proposed to fuse feature information through multi-branch dilated convolution， thereby effectively improving the receptive field of the network and the reuse of feature information， so that the model was able to learn detailed features of different dimensions. Finally， the issues of mismatches in the prediction area and sample imbalance were addressed by using linear interpolation method and multi-task loss function， thereby improving the detection precision of the model. Experimental results on VisDrone dataset show that the Swin-Det algorithm reaches a mean Average Precision （mAP） of 27.2%， which is 4.1 percentage points higher than that of the original Swin Transformer， and converges faster under the same training batch. It can be seen tha the Swin-Det algorithm can achieve high-precision detection of UAV image targets in complex scenes.

Table and Figures | Reference | Related Articles | Metrics

Select

Non-overlapping community detection with imbalanced community sizes

Shiliang LIU, Yi WANG, Yinglong MA

Journal of Computer Applications 2024, 44 (11): 3396-3402. DOI: 10.11772/j.issn.1001-9081.2023101536

Abstract （413）

HTML （2）

PDF （659KB）（102）

Save

Community detection helps to comprehend the complex structure of social networks， but most of the existing community detection methods do not consider the imbalanced sizes of communities to detect， and the discovered community structures are relatively single with low accuracy. Therefore， a non-overlapping community detection method based on Local Expansion of Initial Community Structure （LEICS） was proposed. LEICS was divided into three stages： in the first stage， the initial community structures with different scales were detected by utilizing the hierarchical structure information and local structure information of the network； in the second stage， the initial community was expanded by calculating the connection intensity between the node and the nodes in the community and the modularity contribution of the node， and then using the Label Propagation Algorithm （LPA） to deal with the rest of the nodes； in the third stage， for unstable communities with size smaller than the average community size， the nodes were redistributed to further optimize the results of community detection. Experimental results on twelve datasets of real-world networks and Lancichinetti-Fortunato-Radicchi （LFR） simulated networks show that compared to the suboptimal Local Balanced Label Diffusion （LBLD） algorithm， LEICS improves the Normalized Mutual Information （NMI） by at least 5 percentage points on Polbooks and YouTube networks， and the accuracy and robustness of LEICS in both small-size and large-size networks are fully validated， proving that LEICS can adapt to the imbalance of community size.

Table and Figures | Reference | Related Articles | Metrics

Select

Edge computing and service offloading algorithm based on improved deep reinforcement learning

Tengfei CAO, Yanliang LIU, Xiaoying WANG

Journal of Computer Applications 2023, 43 (5): 1543-1550. DOI: 10.11772/j.issn.1001-9081.2022050724

Abstract （953）

HTML （18）

PDF （2400KB）（283）

Save

To solve the problem of limited computing resources and storage space of edge nodes in the Edge Computing （EC） network， an Edge Computing and Service Offloading （ECSO） algorithm based on improved Deep Reinforcement Learning （DRL） was proposed to reduce node processing latency and improve service performance. Specifically， the problem of edge node service offloading was formulated as a resource-constrained Markov Decision Process （MDP）. Due to the difficulty of predicting the request state transfer probability of the edge node accurately， DRL algorithm was used to solve the problem. Considering that the state action space of edge node for caching services is too large， by defining new action behaviors to replace the original actions， the optimal action set was obtained according to the proposed action selection algorithm， so that the process of calculating the action behavior reward was improved， thereby reducing the size of the action space greatly， and improving the training efficiency and reward of the algorithm. Simulation results show that compared with the original Deep Q-Network （DQN） algorithm， Proximal Policy Optimization （PPO） algorithm and traditional Most Popular （MP） algorithm， the total reward value of the proposed ECSO algorithm is increased by 7.0%， 12.7% and 65.6%， respectively， and the latency of edge node service offloading is reduced by 13.0%， 18.8% and 66.4%， respectively， which verifies the effectiveness of the proposed ECSO algorithm and shows that the ECSO can effectively improve the offloading performance of edge computing services.

Table and Figures | Reference | Related Articles | Metrics

Select

Optimal storing strategy based on small files in RAMCloud

YING Changtian YU Jiong LU Liang LIU Jiankuang

Journal of Computer Applications 2014, 34 (11): 3104-3108. DOI: 10.11772/j.issn.1001-9081.2014.11.3104

Abstract （465）

PDF （782KB）（750）

Save

RAMCloud stores data using log segment structure. When large amount of small files store in RAMCloud, each small file occupies a whole segment, so it may leads to much fragments inside the segments and low memory utilization. In order to solve the small file problem, a strategy based on file classification was proposed to optimize the storage of small files. Firstly, small files were classified into three categories including structural related, logical related and independent files. Before uploading, merging algorithm and grouping algorithm were used to deal with these files respectively. The experiment demonstrates that compared with non-optimized RAMCloud, the proposed strategy can improve memory utilization.

Reference | Related Articles | Metrics

Select

Multi-objective evolutionary algorithm for grid job scheduling based on adaptive neighborhood

YANG Ming XUE Sheng-jun CHEN Liang LIU Yong-sheng

Journal of Computer Applications 2012, 32 (03): 599-602. DOI: 10.3724/SP.J.1087.2012.00599

Abstract （1299）

PDF （608KB）（907）

Save

A new adaptive neighborhood Multi-Objective Grid Task Scheduling Algorithm (ANMO-GTSA) was proposed in this paper for the multi-objective job scheduling collaborative optimization problem in grid computing. In the ANMO-GTSA, an adaptive neighborhood method was applied to find the non-inferior set of solutions and maintain the diversity of the multi-objective job scheduling population. The experimental results indicate that the algorithm proposed in this paper can not only balance the multi-objective job scheduling, but also improve the resource utilization and efficiency of task execution. Moreover, the proposed algorithm can achieve better performance on time-dimension and cost-dimension than the traditional Min-min and Max-min algorithms.

Reference | Related Articles | Metrics

Select

Snapshot K neighbor query processing on moving objects in road networks

LU Bing-liang LIU Na

Journal of Computer Applications 2011, 31 (11): 3078-3083. DOI: 10.3724/SP.J.1087.2011.03078

Abstract （1252）

PDF （957KB）（513）

Save

The functionality of a framework that supported location-based services on moving objects in road networks was extended and Snapshot K Nearest Neighbor (SKNN) queries based on Mobile Network Distance Range (MNDR) queries was proposed using an on-disk R-tree to store the network connectivity and an in-memory grid structure to maintain the moving object position updates. The minimum and maximum number of grid cells of a given arbitrary edge in the space that were possibly affected were analyzed. The maximum bound that could be used in snapshot range query processing to prune the search space was shown. SKNN estimated the subspace containing the query results and used the subspace as range to efficiently compute the KNN POI from the query points to reduce I/O cost and time of query. Analysis shows that the maximum bound can be used in snapshot range query processing to prune the search space. The contrast experiments show that SKNN has better system throughput than S-GRID while scaling to hundreds of thousands of moving objects.