In recent years, federated learning has become a new way to solve the problems of data island and privacy leakage in machine learning. Federated learning architecture does not require multiple parties to share data resources, in which participants only needed to train local models on local data and periodically upload parameters to the server to update the global model, and then a machine learning model can be built on large-scale global data. Federated learning architecture has the privacy-preserving nature and is a new scheme for large-scale data machine learning in the future. However, the parameter interaction mode of this architecture may lead to data privacy disclosure. At present, strengthening the privacy-preserving mechanism in federated learning architecture has become a new research hotspot. Starting from the privacy disclosure problem in federated learning, the attack models and sensitive information disclosure paths in federated learning were discussed, and several types of privacy-preserving techniques in federated learning were highlighted and reviewed, such as privacy-preserving technology based on differential privacy, privacy-preserving technology based on homomorphic encryption, and privacy-preserving technology based on Secure Multiparty Computation (SMC). Finally, the key issues of privacy protection in federated learning were discussed, the future research directions were prospected.
Virtual digital currency provides a breeding ground for terrorist financing, money laundering, drug trafficking and other criminal activities. As a representative emerging digital currency, Monero has a universally acknowledged high anonymity. Aiming at the problem of using Monroe anonymity to commit crimes, Monero anonymity technology and tracking technology were explored as well as the research progresses were reviewed in recent years, so as to provide technical supports for effectively tackling the crimes based on blockchain technology. In specific, the evolution of Monero anonymity technology was summarized, and the tracking strategies of Monero anonymity technology in academic circles were sorted out. Firstly, in the anonymity technologies, ring signature, guaranteed unlinkability (one-off public key), guaranteed untraceability, and the important version upgrading for improving anonymity were introduced. Then, in tracking technologies, the attacks such as zero mixin attack, output merging attack, guess-newest attack, closed set attack, transaction flooding attack, tracing attacks from remote nodes and Monero ring attack were introduced. Finally, based on the analysis of anonymity technologies and tracking strategies, four conclusions were obtained: the development of anonymity technology and the development of tracking technology of Monero promote each other; the application of Ring Confidential Transactions (RingCT) is a two-edged sword, which makes the passive attack methods based on currency value ineffective, and also makes the active attack methods easier to succeed; output merging attack and zero mixin attack complement each other; Monero’s system security chain still needs to be sorted out.
Federated Learning (FL) can effectively protect users' personal data from attackers. Differential Privacy (DP) is applied to enhance the privacy of FL, which can solve the problem of privacy disclose caused by parameters in the model training. However, existing FL methods based on DP on concentrate on the unified privacy protection budget and ignore the personalized privacy requirements of users. To solve this problem, a two-stage Federated Learning with Personalized Differential Privacy (PDP-FL) algorithm was proposed. In the first stage, the user's privacy was graded according to the user's privacy preference, and the noise meeting the user's privacy preference was added to achieve the purpose of personalized privacy protection. At the same time, the privacy level corresponding to the privacy preference was uploaded to the central aggregation server. In the second stage, in order to fully protect the global data, the simultaneous local and central protection strategy was adopted. And according to the privacy level uploaded by the user, the noise conforming to the global DP threshold was added to quantify the global privacy protection level. Experimental results show that on MNIST and CIFAR-10 datasets, the classification accuracy of PDP-FL algorithm reaches 93.8% to 94.5% and 43.4% to 45.2% respectively, which is better than those of Federated learning with Local Differential Privacy (LDP-Fed) algorithm and Federated Learning with Global Differential Privacy (GDP-FL) algorithm, PDP-FL algorithm meets the needs of personalized privacy protection.
Considering different blockchains being isolated and the data interaction and sharing difficulties in the current rapid development process of blockchain technology, a cross-chain mechanism based on Spark blockchain was proposed. Firstly, common cross-chain technologies and current mainstream cross-chain projects were analyzed, the implementation principles of different technologies and projects were studied, and their differences, advantages and disadvantages were summarized. Then, using the blockchain architecture maned main-sub blockchain mode, the key core components such as smart contract component, transaction verification component, transaction timeout component were designed, and the four stages of cross-chain process were elaborated in detail, including transaction initiation, transaction routing, transaction verification and transaction confirmation. Finally, the feasible experiments were designed for performance test and security test, and the security was analyzed. Experimental results show that Spark blockchain has significant advantages compared to other blockchains in terms of transaction delay, throughput and spike testing. Besides, when the proportion of malicious nodes is low, the success rate of cross-chain transactions is 100%, and different sub chains can conduct cross-chain transactions safely and stably. This mechanism solves the problem of data interaction and sharing between blockchains, and provides technical reference for the design of Spark blockchain application scenarios in the next step.
Smart contract technology, as a milestone of blockchain 2.0, has received widespread attention from both academic and industry circles. It runs on an underlying infrastructure without trusted computing environment and has characteristics that distinguish it from traditional programs, and there are many vulnerabilities with huge influence in its own security, so that the research on security auditing for it has become a popular and urgent key scientific problem in the field of blockchain security. Aiming at the detection and automatic repair of smart contract vulnerabilities, firstly, main types and classifications of smart contract vulnerabilities were introduced. Secondly, three most important methods of smart contract vulnerability detection in the past five years were reviewed, and representative and innovative research techniques of each method were introduced. Thirdly, smart contract upgrade schemes and cutting-edge automatic repair technologies were introduced in detail. Finally, challenges and future work of smart contract vulnerability detection and automatic repair technologies for online, real-time, multi-platform, automatic, and intelligent requirements were analyzed and prospected as a framework of technical solutions.
Federated Learning (FL) emerges as a novel privacy-preserving Machine Learning (ML) paradigm. However, the distributed training structure of FL is more vulnerable to poisoning attack, where adversaries contaminate the global model through uploading poisoning models, resulting in the convergence deceleration and the prediction accuracy degradation of the global model. To solve the above problem, a poisoning attack detection scheme based on Generative Adversarial Network (GAN) was proposed. Firstly, the benign local models were fed into the GAN to output testing samples. Then, the testing samples were used to detect the local models uploaded by the clients. Finally, the poisoning models were eliminated according to the testing metrics. Meanwhile, two test metrics named F1 score loss and accuracy loss were defined to detect the poisoning models and extend the detection scope from one single type of poisoning attacks to all types of poisoning attacks. Besides, a threshold determination method was designed to deal with misjudgment, so that the robust of misjudgment was confirmed. Experimental results on MNIST and Fashion-MNIST datasets show that the proposed scheme can generate high-quality testing samples, and then detect and eliminate poisoning models. Compared with the global models trained with the detection scheme based on directly gathering test data from clients and the detection scheme based on generating test data and using test accuracy as the test metric, the global model trained with the proposed scheme has significant accuracy improvement from 2.7 to 12.2 percentage points.
With increasingly severe network security threats and increasingly complex security defense means, zero trust network is a new evaluation and review of traditional boundary security architecture. Zero trust emphasizes never always trusting anything and verifying things continuously. Zero trust network emphasizes that the identity is not identified by location, all access controls strictly execute minimum permissions, and all access processes are tracked in real time and evaluated dynamically. Firstly, the basic definition of zero trust network was given, the main problems of traditional perimeter security were pointed out, and the zero trust network model was described. Secondly, the key technologies of zero trust network, such as Software Defined Perimeter (SDP), identity and access management, micro segmentation and Automated Configuration Management System (ACMS), were analyzed. Finally, zero trust network was summarized and its future development was prospected.
In order to deal with the problem of low accuracy of anomaly detection caused by data imbalance and highly complex temporal correlation of time series, a re-encoding based unsupervised time series anomaly detection model based on Generative Adversarial Network (GAN), named RTGAN (Re-encoding Time series based on GAN), was proposed. Firstly, multiple generators with cycle consistency were used to ensure the diversity of generated samples and thereby learning different anomaly patterns. Secondly, the stacked Long Short-Term Memory-dropout Recurrent Neural Network (LSTM-dropout RNN) was used to capture temporal correlation. Thirdly, the differences between the generated samples and the real samples were compared in the latent space by improved re-encoding. As the re-encoding errors, these differences were served as a part of anomaly score to improve the accuracy of anomaly detection. Finally, the new anomaly score was used to detect anomalies on univariate and multivariate time series datasets. The proposed model was compared with seven baseline anomaly detection models on univariate and multivariate time series. Experimental results show that the proposed model obtains the highest average F1-score (0.815) on all datasets. And the overall performance of the proposed model is 36.29% and 8.52% respectively higher than those of the original AutoEncoder (AE) model Dense-AE (Dense-AutoEncoder) and latest benchmark model USAD (UnSupervised Anomaly Detection on multivariate time series). The robustness of the model was detected by different Signal-to-Noise Ratio (SNR). The results show that the proposed model consistently outperforms LSTM-VAE (Variational Autoencoder based on LSTM), USAD and OmniAnomaly, especially in the case of 30% SNR, the F1-score of RTGAN is 13.53% and 10.97% respectively higher than those of USAD and OmniAnomaly. It can be seen that RTGAN can effectively improve the accuracy and robustness of anomaly detection.
Focusing on coarse granularity of access control, low sharing flexibility and security risks such as data leakage of centralized medical data sharing platform, a blockchain-based hierarchical access control and sharing system of medical data was proposed. Firstly, medical data was classified according to sensitivity, and a Ciphertext-Policy Attribute-Based Hierarchical Encryption (CP-ABHE) algorithm was proposed to achieve access control of medical data with different sensitivity. In the algorithm, access control trees were merged and symmetric encryption methods were combinined to improve the performance of Ciphertext-Policy Attribute-Based Encryption (CP-ABE) algorithm, and the multi-authority center was used to solve the key escrow problem. Then, the medical data sharing mode based on permissioned blockchain was used to solve the centralized trust problem of centralized sharing platform. Security analysis shows that the proposed system ensures the security of data during the data sharing process, and can resist user collusion attacks and authority collusion attacks. Experimental results also show that the proposed CP-ABHE algorithm has lower computational cost than CP-ABE algorithm, the maximum average delay of the proposed system is 7.8 s, and the maximum throughput is 236 transactions per second, which meets the expected performance requirements.
In the research of image classification tasks in deep learning, the phenomenon of adversarial attacks brings severe challenges to the secure application of deep learning models, which arouses widespread attention of researchers. Firstly, around the adversarial attack technologies for generating the adversarial perturbations, the important white-box adversarial attack algorithms in the image classification tasks were introduced in detail, and the advantages and disadvantages of different attack algorithms were analyzed. Then, from three realistic application scenarios: mobile application, face recognition and autonomous driving, the application status of the white-box adversarial attack technologies was illustrated. Additionally, some typical white-box adversarial attack algorithms were selected to perform experiments on different target models, and the experimental results were analyzed. Finally, the white-box adversarial attack technologies were summarized, and their valuable research directions were prospected.
Aiming at the problem of model information leakage caused by interpretability in Deep Neural Network (DNN), the feasibility of using the Gradient-weighted Class Activation Mapping (Grad-CAM) interpretation method to generate adversarial samples in a white-box environment was proved, moreover, an untargeted black-box attack algorithm named dynamic genetic algorithm was proposed. In the algorithm, first, the fitness function was improved according to the changing relationship between the interpretation area and the positions of the disturbed pixels. Then, through multiple rounds of genetic algorithm, the disturbance value was continuously reduced while increasing the number of the disturbed pixels, and the set of result coordinates of each round would be maintained and used in the next round of iteration until the perturbed pixel set caused the predicted label to be flipped without exceeding the perturbation boundary. In the experiment part, the average attack success rate under the AlexNet, VGG-19, ResNet-50 and SqueezeNet models of the proposed algorithm was 92.88%, which was increased by 16.53 percentage points compared with that of One pixel algorithm, although with the running time increased by 8% compared with that of One pixel algorithm. In addition, in a shorter running time, the proposed algorithm had the success rate higher than the Adaptive Fast Gradient Sign Method (Ada-FGSM) algorithm by 3.18 percentage points, higher than the Projection & Probability-driven Black-box Attack (PPBA) algorithm by 8.63 percentage points, and not much different from Boundary-attack algorithm. The results show that the dynamic genetic algorithm based on the interpretation method can effectively execute the adversarial attack.
Aiming at the problems of low computational efficiency and insufficient accuracy in the privacy-preserving neural network based on homomorphic encryption, an efficient Homomorphic Neural Network (HNN) under three-party collaborative supporting privacy-preserving training was proposed. Firstly, in order to reduce the computational cost of ciphertext-ciphertext multiplication in homomorphic encryption, the idea of secret sharing was combined to design a secure fast multiplication protocol to convert the ciphertext-ciphertext multiplication into plaintext-ciphertext multiplication with low complexity. Then, in order to avoid multiple iterations of ciphertext polynomials generated during the construction of HNN and improve the nonlinear calculation accuracy, a secure nonlinear calculation method was studied, which executed the corresponding nonlinear operator for the confused plaintext message with random mask. Finally, the security, correctness and efficiency of the proposed protocols were analyzed theoretically, and the effectiveness and superiority of HNN were verified by experiments. Experimental results show that compared with the dual server scheme PPML (Privacy Protection Machine Learning), HNN has the training efficiency improved by 18.9 times and the model accuracy improved by 1.4 percentage points.
After the introduction of federated learning technology in intrusion detection scenarios, there is a problem that the traffic data between nodes is non-independent and identically distributed (non-iid), which makes it difficult for models to aggregate and obtain a high recognition rate. To solve this problem, an efficient federated learning algorithm named H?E?Fed was constructed, and a network intrusion detection model based on this algorithm was proposed. Firstly, a global model for traffic data was designed by the coordinator and was sent to the intrusion detection nodes for model training. Then, by the coordinator, the local models were collected and the skewness of the covariance matrix of the local models between nodes was evaluated, so as to measure the correlation of models between nodes, thereby reassigning model aggregation parameters and generating a new global model. Finally, multiple rounds of interactions between the coordinator and the nodes were carried out until the global model converged. Experimental results show that compared with the models based on FedAvg (Federated Averaging) algorithm and FedProx algorithm, under data non-iid phenomenon between nodes, the proposed model has the communication consumption relatively low. And on KDDCup99 dataset and CICIDS2017 dataset, compared with baseline models, the proposed model has the accuracy improved by 10.39%, 8.14% and 4.40%, 5.98% respectively.
The current reversible data hiding algorithms in encrypted domain have the problems that the ciphertext images carrying secret have poor fault tolerance and disaster resistance after embedding secret data, once attacked or damaged, the original image cannot be reconstructed and the secret data cannot be extracted. In order to solve the problems, a new reversible data hiding algorithm in encrypted domain based on secret image sharing was proposed, and its application scenarios in cloud environment were analyzed. Firstly, the encrypted image was divided into n different ciphertext images carrying secret with the same size. Secondly, in the process of segmentation, the random quantities in Lagrange interpolation polynomial were taken as redundant information, and the mapping relationship between secret data and each polynomial coefficient was established. Finally, the reversible embedding of the secret data was realized by modifying the built-in parameters of the encryption process. When k ciphertext images carrying secret were collected, the original image was able to be fully recovered and the secret data was able to be extracted. Experimental results show that, the proposed algorithm has the advantages of low computational complexity, large embedding capacity and complete reversibility. In the (3,4) threshold scheme, the maximum embedding rate of the proposed algorithm is 4 bit per pixel (bpp), and in the (4,4) threshold scheme, the maximum embedding rate of the proposed algorithm is 6 bpp. The proposed algorithm gives full play to the disaster recovery characteristic of secret sharing scheme. Without reducing the security of secret sharing, the proposed algorithm enhances the fault tolerance and disaster resistance of ciphertext images carrying secret, improves the embedding capacity of algorithm and the disaster recovery ability in the application scenario of cloud environment, and ensures the security of carrier image and secret data.
In data poisoning attacks, backdoor attackers manipulate the distribution of training data by inserting the samples with hidden triggers into the training set to make the test samples misclassified so as to change model behavior and reduce model performance. However, the drawback of the existing triggers is the sample independence, that is, no matter what trigger mode is adopted, different poisoned samples contain the same triggers. Therefore, by combining image steganography and Deep Convolutional Generative Adversarial Network (DCGAN), an attack method based on sample was put forward to generate image texture feature maps according to the gray level co-occurrence matrix, embed target label character into the texture feature maps as a trigger by using the image steganography technology, and combine texture feature maps with trigger and clean samples into poisoned samples. Then, a large number of fake pictures with trigger were generated through DCGAN. In the training set samples, the original poisoned samples and the fake pictures generated by DCGAN were mixed together to finally achieve the effect that after the poisoner injecting a small number of poisoned samples, the attack rate was high and the effectiveness, sustainability and concealment of the trigger were ensured. Experimental results show that this method avoids the disadvantages of sample independence and has the model accuracy reached 93.78%. When the proportion of poisoned samples is 30%, data preprocessing, pruning defense and AUROR defense have the least influence on the success rate of attack, and the success rate of attack can reach about 56%.
Clustering analysis can uncover hidden interconnections between data and segment the data according to multiple indicators, which can facilitate personalized and refined operations. However, data fragmentation and isolation caused by data islands seriously affects the effectiveness of cluster analysis applications. To solve data island problem and protect data privacy, an Equivalent Local differential privacy Federated K-means (ELFedKmeans) algorithm was proposed. A grid-based initial cluster center selection method and a privacy budget allocation scheme were designed for the horizontal federation learning model. To generate same random noise with lower communication cost, all organizations jointly negotiated random seeds, protecting local data privacy. The ELFedKmeans algorithm was demonstrated satisfying differential privacy protection through theoretical analysis, and it was also compared with Local Differential Privacy distributed K-means (LDPKmeans) algorithm and Hybrid Privacy K-means (HPKmeans) algorithm on different datasets. Experimental results show that all three algorithms increase F-measure and decrease SSE (Sum of Squares due to Error) gradually as privacy budget increases. As a whole, the F-measure values of ELFedKmeans algorithm was 1.794 5% to 57.066 3% and 21.245 2% to 132.048 8% higher than those of LDPKmeans and HPKmeans algorithms respectively; the Log(SSE) values of ELFedKmeans algorithm were 1.204 2% to 12.894 6% and 5.617 5% to 27.575 2% less than those of LDPKmeans and HPKmeans algorithms respectively. With the same privacy budget, ELFedKmeans algorithm outperforms the comparison algorithms in terms of clustering quality and utility metric.