Under the background of emphasizing data right confirmation and privacy protection, federated learning, as a new machine learning paradigm, can solve the problem of data island and privacy protection without exposing the data of all participants. Since the modeling methods based on federated learning have become mainstream and achieved good effects at present, it is significant to summarize and analyze the concepts, technologies, applications and challenges of federated learning. Firstly, the development process of machine learning and the inevitability of the appearance of federated learning were elaborated, and the definition and classification of federated learning were given. Secondly, three federated learning methods (including horizontal federated learning, vertical federated learning and federated transfer learning) which were recognized by the industry currently were introduced and analyzed. Thirdly, concerning the privacy protection issue of federated learning, the existing common privacy protection technologies were generalized and summarized. In addition, the recent mainstream open-source frameworks were introduced and compared, and the application scenarios of federated learning were given at the same time. Finally, the challenges and future research directions of federated learning were prospected.
With the continuous development of network applications, network resources are growing exponentially and information overload is becoming increasingly serious, so how to efficiently obtain the resources that meet the user needs has become one of the problems that bothering people. Recommendation system can effectively filter mass information and recommend the resources that meet the users needs. The research status of the recommendation system was introduced in detail, including three traditional recommendation methods of content-based recommendation, collaborative filtering recommendation and hybrid recommendation, and the research progress of four common deep learning recommendation models based on Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN) and Graph Neural Network (GNN) were analyzed in focus. The commonly used datasets in recommendation field were summarized, and the differences between the traditional recommendation algorithms and the deep learning-based recommendation algorithms were analyzed and compared. Finally, the representative recommendation models in practical applications were summarized, and the challenges and the future research directions of recommendation system were discussed.
To solve the irreconcilable contradiction between data sharing demands and requirements of privacy protection, federated learning was proposed. As a distributed machine learning, federated learning has a large number of model parameters needed to be exchanged between the participants and the central server, resulting in higher communication overhead. At the same time, federated learning is increasingly deployed on mobile devices with limited communication bandwidth and limited power, and the limited network bandwidth and the sharply raising client amount will make the communication bottleneck worse. For the communication bottleneck problem of federated learning, the basic workflow of federated learning was analyzed at first, and then from the perspective of methodology, three mainstream types of methods based on frequency reduction of model updating, model compression and client selection respectively as well as special methods such as model partition were introduced, and a deep comparative analysis of specific optimization schemes was carried out. Finally, the development trends of federated learning communication overhead technology research were summarized and prospected.
Multi-modal medical images can provide clinicians with rich information of target areas (such as tumors, organs or tissues). However, effective fusion and segmentation of multi-modal images is still a challenging problem due to the independence and complementarity of multi-modal images. Traditional image fusion methods have difficulty in addressing this problem, leading to widespread research on deep learning-based multi-modal medical image segmentation algorithms. The multi-modal medical image segmentation task based on deep learning was reviewed in terms of principles, techniques, problems, and prospects. Firstly, the general theory of deep learning and multi-modal medical image segmentation was introduced, including the basic principles and development processes of deep learning and Convolutional Neural Network (CNN), as well as the importance of the multi-modal medical image segmentation task. Secondly, the key concepts of multi-modal medical image segmentation was described, including data dimension, preprocessing, data enhancement, loss function, and post-processing, etc. Thirdly, different multi-modal segmentation networks based on different fusion strategies were summarized and analyzed. Finally, several common problems in medical image segmentation were discussed, the summary and prospects for future research were given.
By using complex pre-training targets and a large number of model parameters, Pre-Training Model (PTM) can effectively obtain rich knowledge from unlabeled data. However, the development of the multimodal PTMs is still in its infancy. According to the difference between modals, most of the current multimodal PTMs were divided into the image-text PTMs and video-text PTMs. According to the different data fusion methods, the multimodal PTMs were divided into two types: single-stream models and two-stream models. Firstly, common pre-training tasks and downstream tasks used in validation experiments were summarized. Secondly, the common models in the area of multimodal pre-training were sorted out, and the downstream tasks of each model and the performance and experimental data of the models were listed in tables for comparison. Thirdly, the application scenarios of M6 (Multi-Modality to Multi-Modality Multitask Mega-transformer) model, Cross-modal Prompt Tuning (CPT) model, VideoBERT (Video Bidirectional Encoder Representations from Transformers) model, and AliceMind (Alibaba’s collection of encoder-decoders from Mind) model in specific downstream tasks were introduced. Finally, the challenges and future research directions faced by related multimodal PTM work were summed up.
In recent years, federated learning has become a new way to solve the problems of data island and privacy leakage in machine learning. Federated learning architecture does not require multiple parties to share data resources, in which participants only needed to train local models on local data and periodically upload parameters to the server to update the global model, and then a machine learning model can be built on large-scale global data. Federated learning architecture has the privacy-preserving nature and is a new scheme for large-scale data machine learning in the future. However, the parameter interaction mode of this architecture may lead to data privacy disclosure. At present, strengthening the privacy-preserving mechanism in federated learning architecture has become a new research hotspot. Starting from the privacy disclosure problem in federated learning, the attack models and sensitive information disclosure paths in federated learning were discussed, and several types of privacy-preserving techniques in federated learning were highlighted and reviewed, such as privacy-preserving technology based on differential privacy, privacy-preserving technology based on homomorphic encryption, and privacy-preserving technology based on Secure Multiparty Computation (SMC). Finally, the key issues of privacy protection in federated learning were discussed, the future research directions were prospected.
U-shaped Network (U-Net) based on Fully Convolutional Network (FCN) is widely used as the backbone of medical image segmentation models, but Convolutional Neural Network (CNN) is not good at capturing long-range dependency, which limits the further performance improvement of segmentation models. To solve the above problem, researchers have applied Transformer to medical image segmentation models to make up for the deficiency of CNN, and U-shaped segmentation networks combining Transformer have become the hot research topics. After a detailed introduction of U-Net and Transformer, the related medical image segmentation models were categorized by the position in which the Transformer module was located, including only in the encoder or decoder, both in the encoder and decoder, as a skip-connection, and others, the basic contents, design concepts and possible improvement aspects about these models were discussed, the advantages and disadvantages of having Transformer in different positions were also analyzed. According to the analysis results, it can be seen that the biggest factor to decide the position of Transformer is the characteristics of the target segmentation task, and the segmentation models of Transformer combined with U-Net can make better use of the advantages of CNN and Transformer to improve segmentation performance of models, which has great development prospect and research value.
With the widespread application of deep learning, human beings are increasingly relying on a large number of complex systems that adopt deep learning techniques. However, the black?box property of deep learning models offers challenges to the use of these models in mission?critical applications and raises ethical and legal concerns. Therefore, making deep learning models interpretable is the first problem to be solved to make them trustworthy. As a result, researches in the field of interpretable artificial intelligence have emerged. These researches mainly focus on explaining model decisions or behaviors explicitly to human observers. A review of interpretability for deep learning was performed to build a good foundation for further in?depth research and establishment of more efficient and interpretable deep learning models. Firstly, the interpretability of deep learning was outlined, the requirements and definitions of interpretability research were clarified. Then, several typical models and algorithms of interpretability research were introduced from the three aspects of explaining the logic rules, decision attribution and internal structure representation of deep learning models. In addition, three common methods for constructing intrinsically interpretable models were pointed out. Finally, the four evaluation indicators of fidelity, accuracy, robustness and comprehensibility were introduced briefly, and the possible future development directions of deep learning interpretability were discussed.
With the advancement of technologies such as sensor networks and global positioning systems, the volume of meteorological data with both temporal and spatial characteristics has exploded, and the research on deep learning models for Spatiotemporal Sequence Forecasting (STSF) has developed rapidly. However, the traditional machine learning methods applied to weather forecasting for a long time have unsatisfactory effects in extracting the temporal correlations and spatial dependences of data, while the deep learning methods can extract features automatically through artificial neural networks to improve the accuracy of weather forecasting effectively, and have a very good effect in encoding long-term spatial information modeling. At the same time, the deep learning models driven by observational data and Numerical Weather Prediction (NWP) models based on physical theories are combined to build hybrid models with higher prediction accuracy and longer prediction time. Based on these, the application analysis and research progress of deep learning in the field of weather forecasting were reviewed. Firstly, the deep learning problems in the field of weather forecasting and the classical deep learning problems were compared and studied from three aspects: data format, problem model and evaluation metrics. Then, the development history and application status of deep learning in the field of weather forecasting were looked back, and the latest progress in combining deep learning technologies with NWP was summarized and analyzed. Finally, the future development directions and research focuses were prospected to provide a certain reference for future deep learning research in the field of weather forecasting.
Virtual digital currency provides a breeding ground for terrorist financing, money laundering, drug trafficking and other criminal activities. As a representative emerging digital currency, Monero has a universally acknowledged high anonymity. Aiming at the problem of using Monroe anonymity to commit crimes, Monero anonymity technology and tracking technology were explored as well as the research progresses were reviewed in recent years, so as to provide technical supports for effectively tackling the crimes based on blockchain technology. In specific, the evolution of Monero anonymity technology was summarized, and the tracking strategies of Monero anonymity technology in academic circles were sorted out. Firstly, in the anonymity technologies, ring signature, guaranteed unlinkability (one-off public key), guaranteed untraceability, and the important version upgrading for improving anonymity were introduced. Then, in tracking technologies, the attacks such as zero mixin attack, output merging attack, guess-newest attack, closed set attack, transaction flooding attack, tracing attacks from remote nodes and Monero ring attack were introduced. Finally, based on the analysis of anonymity technologies and tracking strategies, four conclusions were obtained: the development of anonymity technology and the development of tracking technology of Monero promote each other; the application of Ring Confidential Transactions (RingCT) is a two-edged sword, which makes the passive attack methods based on currency value ineffective, and also makes the active attack methods easier to succeed; output merging attack and zero mixin attack complement each other; Monero’s system security chain still needs to be sorted out.
With the continuous development of information technology, the scale of time series data has grown exponentially, which provides opportunities and challenges for the development of time series anomaly detection algorithm, making the algorithm in this field gradually become a new research hotspot in the field of data analysis. However, the research in this area is still in the initial stage and the research work is not systematic. Therefore, by sorting out and analyzing the domestic and foreign literature, this paper divides the research content of multidimensional time series anomaly detection into three aspects: dimension reduction, time series pattern representation and anomaly pattern detection in logical order, and summarizes the mainstream algorithms to comprehensively show the current research status and characteristics of anomaly detection. On this basis, the research difficulties and trends of multi-dimensional time series anomaly detection algorithms were summarized in order to provide useful reference for related theory and application research.
The purpose of disentangled representation learning is to model the key factors that affect the form of data, so that the change of a key factor only causes the change of data on a certain feature, while the other features are not affected. It is conducive to face the challenge of machine learning in model interpretability, object generation and operation, zero-shot learning and other issues. Therefore, disentangled representation learning always be a research hotspot in the field of machine learning. Starting from the history and motives of disentangled representation learning, the research status and applications of disentangled representation learning were summarized, the invariance, reusability and other characteristics of disentangled representation learning were analyzed, and the research on the factors of variation via generative entangling, the research on the factors of variation with manifold interaction, and the research on the factors of variation using adversarial training were introduced, as well as the latest research trends such as a Variational Auto-Encoder (VAE) named β-VAE were introduced. At the same time, the typical applications of disentangled representation learning were shown, and the future research directions were prospected.
In recent years, deep learning has been widely used in many fields. However, due to the highly nonlinear operation of deep neural network models, the interpretability of these models is poor, these models are often referred to as “black box” models, and cannot be applied to some key fields with high performance requirements. Therefore, it is very necessary to study the interpretability of deep learning. Firstly, deep learning was introduced briefly. Then, around the interpretability of deep learning, the existing research work was analyzed from eight aspects, including hidden layer visualization, Class Activation Mapping (CAM), sensitivity analysis, frequency principle, robust disturbance test, information theory, interpretable module and optimization method. At the same time, the applications of deep learning in the fields of network security, recommender system, medical and social networks were demonstrated. Finally, the existing problems and future development directions of deep learning interpretability research were discussed.
In the field of deep learning, a large number of correctly labeled samples are essential for model training. However, in practical applications, labeling data requires high labeling cost. At the same time, the quality of labeled samples is affected by subjective factors or tool and technology of manual labeling, which inevitably introduces label noise in the annotation process. Therefore, existing training data available for practical applications is subject to a certain amount of label noise. How to effectively train training data with label noise has become a research hotspot. Aiming at label noise learning algorithms based on deep learning, firstly, the source, classification and impact of label noise learning strategies were elaborated; secondly, four label noise learning strategies based on data, loss function, model and training method were analyzed according to different elements of machine learning; then, a basic framework for learning label noise in various application scenarios was provided; finally, some optimization ideas were given, and challenges and future development directions of label noise learning algorithms were proposed.
Single object tracking is an important research direction in the field of computer vision, and has a wide range of applications in video surveillance, autonomous driving and other fields. For single object tracking algorithms, although a large number of summaries have been conducted, most of them are based on correlation filter or deep learning. In recent years, Siamese network-based tracking algorithms have received extensive attention from researchers for their balance between accuracy and speed, but there are relatively few summaries of this type of algorithms and it lacks systematic analysis of the algorithms at the architectural level. In order to deeply understand the single object tracking algorithms based on Siamese network, a large number of related literatures were organized and analyzed. Firstly, the structures and applications of the Siamese network were expounded, and each tracking algorithm was introduced according to the composition classification of the Siamese tracking algorithm architectures. Then, the commonly used datasets and evaluation metrics in the field of single object tracking were listed, the overall and each attribute performance of 25 mainstream tracking algorithms was compared and analyzed on OTB 2015 (Object Tracking Benchmark) dataset, and the performance and the reasoning speed of 23 Siamese network-based tracking algorithms on LaSOT (Large-scale Single Object Tracking) and GOT-10K (Generic Object Tracking) test sets were listed. Finally, the research on Siamese network-based tracking algorithms was summarized, and the possible future research directions of this type of algorithms were prospected.
The event that the user is interested in is extracted from the unstructured information, and then displayed to the user in a structured way, that is event extraction. Event extraction has a wide range of applications in information collection, information retrieval, document synthesis, and information questioning and answering. From the overall perspective, event extraction algorithms can be divided into four categories: pattern matching algorithms, trigger lexical methods, ontology-based algorithms, and cutting-edge joint model methods. In the research process, different evaluation methods and datasets can be used according to the related needs, and different event representation methods are also related to event extraction research. Distinguished by task type, meta-event extraction and subject event extraction are the two basic tasks of event extraction. Among them, meta-event extraction has three methods based on pattern matching, machine learning and neural network respectively, while there are two ways to extract subjective events: based on the event framework and based on ontology respectively. Event extraction research has achieved excellent results in single languages such as Chinese and English, but cross-language event extraction still faces many problems. Finally, the related works of event extraction were summarized and the future research directions were prospected in order to provide guidelines for subsequent research.
Imbalanced data classification is an important research content in machine learning, but most of the existing imbalanced data classification algorithms foucus on binary classification, and there are relatively few studies on imbalanced multi?class classification. However, datasets in practical applications usually have multiple classes and imbalanced data distribution, and the diversity of classes further increases the difficulty of imbalanced data classification, so the multi?class classification problem has become a research topic to be solved urgently. The imbalanced multi?class classification algorithms proposed in recent years were reviewed. According to whether the decomposition strategy was adopted, imbalanced multi?class classification algorithms were divided into decomposition methods and ad?hoc methods. Furthermore, according to the different adopted decomposition strategies, the decomposition methods were divided into two frameworks: One Vs. One (OVO) and One Vs. All (OVA). And according to different used technologies, the ad?hoc methods were divided into data?level methods, algorithm?level methods, cost?sensitive methods, ensemble methods and deep network?based methods. The advantages and disadvantages of these methods and their representative algorithms were systematically described, the evaluation indicators of imbalanced multi?class classification methods were summarized, the performance of the representative methods were deeply analyzed through experiments, and the future development directions of imbalanced multi?class classification were discussed.
Object detection in autonomous driving scenes is one of the important research directions in computer vision. The researches focus on ensuring real-time and accurate object detection of objects by autonomous vehicles. Recently, a rapid development in deep learning technology had been witnessed, and its wide application in the field of autonomous driving had prompted substantial progress in this field. An analysis was conducted on the research status of object detection by YOLO (You Only Look Once) algorithms in the field of autonomous driving from the following four aspects. Firstly, the ideas and improvement methods of the single-stage YOLO series of detection algorithms were summarized, and the advantages and disadvantages of the YOLO series of algorithms were analyzed. Secondly, the YOLO algorithm-based object detection applications in autonomous driving scenes were introduced, the research status and applications for the detection and recognition of traffic vehicles, pedestrians, and traffic signals were expounded and summarized respectively. Additionally, the commonly used evaluation indicators in object detection, as well as the object detection datasets and automatic driving scene datasets, were summarized. Lastly, the problems and future development directions of object detection were discussed.