Search Result

Select

New dish recognition network based on lightweight YOLOv5

Chenghanyu ZHANG, Yuzhe LIN, Chengke TAN, Junfan WANG, Yeting GU, Zhekang DONG, Mingyu GAO

Journal of Computer Applications 2024, 44 (2): 638-644. DOI: 10.11772/j.issn.1001-9081.2023030271

Abstract （475）

HTML （22）

PDF （2914KB）（903）

Save

In order to better meet the accuracy and timeliness requirements of Chinese food dish recognition， a new type of dish recognition network was designed. The original YOLOv5 model was pruned by combining Supermask method and structured channel pruning method， and lightweighted finally by Int8 quantization technology. This ensured that the proposed model could balance accuracy and speed in dish recognition， achieving a good trade-off while improving the model portability. Experimental results show that the proposed model achieves a mean Average Precision （mAP） of 99.00% and an average recognition speed of 59.54 ms /frame at an Intersection over Union （IoU） of 0.5， which is 20 ms/frame faster than that of the original YOLOv5 model while maintaining the same level of accuracy. In addition， the new dish recognition network was ported to the Renesas RZ/G2L board by Qt. Based on this， an intelligent service system was constructed to realize the whole process of ordering， generating orders， and automatic meal distribution. A theoretical and practical foundation was provided for the future construction and application of truly intelligent service systems in restaurants.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of multimodal pre-training models

Huiru WANG, Xiuhong LI, Zhe LI, Chunming MA, Zeyu REN, Dan YANG

Journal of Computer Applications 2023, 43 (4): 991-1004. DOI: 10.11772/j.issn.1001-9081.2022020296

Abstract （1733）

HTML （148）

PDF （5539KB）（1401）

PDF（mobile）（3280KB）（111）

Save

By using complex pre-training targets and a large number of model parameters， Pre-Training Model （PTM） can effectively obtain rich knowledge from unlabeled data. However， the development of the multimodal PTMs is still in its infancy. According to the difference between modals， most of the current multimodal PTMs were divided into the image-text PTMs and video-text PTMs. According to the different data fusion methods， the multimodal PTMs were divided into two types： single-stream models and two-stream models. Firstly， common pre-training tasks and downstream tasks used in validation experiments were summarized. Secondly， the common models in the area of multimodal pre-training were sorted out， and the downstream tasks of each model and the performance and experimental data of the models were listed in tables for comparison. Thirdly， the application scenarios of M6 （Multi-Modality to Multi-Modality Multitask Mega-transformer） model， Cross-modal Prompt Tuning （CPT） model， VideoBERT （Video Bidirectional Encoder Representations from Transformers） model， and AliceMind （Alibaba’s collection of encoder-decoders from Mind） model in specific downstream tasks were introduced. Finally， the challenges and future research directions faced by related multimodal PTM work were summed up.

Table and Figures | Reference | Related Articles | Metrics

Select

Parameter asynchronous updating algorithm based on multi-column convolutional neural network

Xinyu CHEN, Mingzhe LIU, Jun REN, Ying TANG

Journal of Computer Applications 2022, 42 (2): 395-403. DOI: 10.11772/j.issn.1001-9081.2021020367

Abstract （500）

HTML （15）

PDF （4787KB）（221）

Save

To address the problem that the existing algorithm uses synchronous manual optimization of deep learning networks， and ignores the negative information of network learning， which leads to a large number of redundant parameters or even overfitting， thereby affecting the counting accuracy， a parameter asynchronous updating algorithm based on Multi-column Convolutional Neural Network （MCNN） was proposed. Firstly， a single frame image was input to the network， and after three columns of convolutions to extracting features with different scales respectively， the correlation of every two columns of feature maps was learned through the mutual information between columns. Then， the parameters of each column were updated asynchronously according to the optimized mutual information and the updated loss function until the algorithm converges. Finally， the dynamic Kalman filtering was used to deeply fuse the output density maps output by the columns， and all pixels in the fused density map were summed up to obtain the total number of people in the image. Experimental results show that on the UCSD （University of California San Diego） dataset， the Mean Absolute Error （MAE） of the proposed algorithm is 1.1% less than that of ic-CNN+McML （iterative crowd counting Convolution Neural Network Multi-column Multi-task Learning） with the best MAE performance on the dataset， and the Mean Square Error （MSE） of the proposed algorithm is 4.3% less than that of Contextual Pyramid Convolution Neural Network （CP-CNN） with the best MSE performance on the dataset； on the ShanghaiTech Part_A dataset， the MAE of the proposed algorithm is reduced by 1.7% compared to that of ic-CNN+McML with the best MAE performance on the dataset， and the MSE of the proposed algorithm is reduced by 3.2% compared to that of ACSCP （Adversarial Cross-Scale Consistency Pursuit）with the best MSE performance on the dataset； on the ShanghaiTech Part_B dataset， the proposed algorithm has the MAE and MSE reduced by 18.3% and 35.2% respectively compared to ic-CNN+McML with the best MAE and MSE performances on the dataset； on the UCF_CC_50 （University of Central Florida Crowd Counting） dataset， the proposed algorithm has the MAE and MSE reduced by 1.9% and 9.8% respectively compared to ic-CNN+McML with the best MAE and MSE performances on the dataset. The above shows that this algorithm can effectively improve the accuracy and robustness of crowd counting， and allows the input image to have any size or resolution， and can adapt to the large-scale transformation of the detected target.

Table and Figures | Reference | Related Articles | Metrics

Select

Personalized recommendation service system based on cloud-client-convergence

Jialiang HAN, Yudong HAN, Xuanzhe LIU, Yaoshuai ZHAO, Di FENG

Journal of Computer Applications 2022, 42 (11): 3506-3512. DOI: 10.11772/j.issn.1001-9081.2021111992

Abstract （438）

HTML （5）

PDF （1160KB）（100）

Save

Mainstream personalized recommendation systems usually use models deployed in the cloud to perform recommendation， so the private data such as user interaction behaviors need to be uploaded to the cloud， which may cause potential risks of user privacy leakage. In order to protect user privacy， user-sensitive data can be processed on the client， however， there are communication bottleneck and computation resource bottleneck in clients. Aiming at the above challenges， a personalized recommendation service system based on cloud-client-convergence was proposed. In this system， the cloud-based recommendation model was divided into a user representation model and a sorting model. After being pre-trained on the cloud， the user representation model was deployed to the client， while the sorting model was deployed to the cloud. A small-scale Recurrent Neural Network （RNN） was used to model the user behavior characteristics by extracting temporal information from user interaction logs， and the Lasso （Least absolute shrinkage and selection operator） algorithm was used to compress user representations， thereby preventing a drop in recommendation accuracy while reducing the communication overhead between the cloud and the client as well as the computation overhead of the client. Experiments were conducted on RecSys Challenge 2015 dataset， and the results show that the recommendation accuracy of the proposed system is comparable to that of the GRU4REC model， while the volume of the compressed user representations is only 34.8% of that before compression， with a lower computational overhead.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of event extraction

Chunming MA, Xiuhong LI, Zhe LI, Huiru WANG, Dan YANG

Journal of Computer Applications 2022, 42 (10): 2975-2989. DOI: 10.11772/j.issn.1001-9081.2021081542

Abstract （1057）

HTML （149）

PDF （3054KB）（605）

Save

The event that the user is interested in is extracted from the unstructured information， and then displayed to the user in a structured way， that is event extraction. Event extraction has a wide range of applications in information collection， information retrieval， document synthesis， and information questioning and answering. From the overall perspective， event extraction algorithms can be divided into four categories： pattern matching algorithms， trigger lexical methods， ontology-based algorithms， and cutting-edge joint model methods. In the research process， different evaluation methods and datasets can be used according to the related needs， and different event representation methods are also related to event extraction research. Distinguished by task type， meta-event extraction and subject event extraction are the two basic tasks of event extraction. Among them， meta-event extraction has three methods based on pattern matching， machine learning and neural network respectively， while there are two ways to extract subjective events： based on the event framework and based on ontology respectively. Event extraction research has achieved excellent results in single languages such as Chinese and English， but cross-language event extraction still faces many problems. Finally， the related works of event extraction were summarized and the future research directions were prospected in order to provide guidelines for subsequent research.

Table and Figures | Reference | Related Articles | Metrics

Select

Linking algorithm of discontinuity crack block based on autonomous edge growing

ZHU Ping-zhe LI Wei

Journal of Computer Applications 2011, 31 (12): 3382-3384.

Abstract （760）

PDF （495KB）（736）

Save

In order to deal with the problem of false information and edge breakpoint about binary image of asphalt pavement crack image segmentation, a new method of discontinuity crack block linking was developed based on autonomous edge growing. This method removed false information according to the characteristics of circular noise and linearly cracks, and filled interstice by using method of region filling, thereby accomplished linking of discontinuity crack block based on autonomous edge growing. The experimental results show that the algorithm has excellent performance in edge linking of discontinuity crack block in different cases, and the noises can be removed at the same time, which is in favor of subsequent image processing such as image measurement and evaluating.