Graph data generation approach for graph neural network model extraction attacks

doi:10.11772/j.issn.1001-9081.2023081110

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (8): 2483-2492.DOI: 10.11772/j.issn.1001-9081.2023081110

• Cyber security • Previous Articles Next Articles

Graph data generation approach for graph neural network model extraction attacks

Ying YANG, Xiaoyan HAO(), Dan YU, Yao MA, Yongle CHEN

College of Computer Science and Technology （College of Data Science），Taiyuan University of Technology，Jinzhong Shanxi 030600，China

Received:2023-08-20 Revised:2023-11-01 Accepted:2023-11-03 Online:2023-12-18 Published:2024-08-10
Contact: Xiaoyan HAO
About author:bio graphy：YANG Ying， born in 1999， M. S. candidate. Her research interests include artificial intelligence security.
bio graphy：YU Dan， born in 1983， Ph. D.， lecturer. Her research interests include internet of things security.
bio graphy：MA Yao， born in 1982， Ph. D.， lecturer. His research interests include internet of things security.
bio graphy：CHEN Yongle， born in 1983， Ph. D.， professor. His research interests include internet of things security.
Supported by:
Basic Research Program of Shanxi Province(20210302123131);Natural Science Foundation of Shanxi Province(202203021221234);Unscheduled Technical Services Horizontal Project(RH2100005181)

面向图神经网络模型提取攻击的图数据生成方法

杨莹, 郝晓燕(), 于丹, 马垚, 陈永乐

太原理工大学计算机科学与技术学院（大数据学院），山西晋中 030600

通讯作者: 郝晓燕
作者简介:杨莹（1999—），女，山西太原人，硕士研究生，CCF会员，主要研究方向：人工智能安全
郝晓燕（1970—），女，山西太原人，副教授，博士，主要研究方向：自然语言处理、信息安全 1006390817@qq.com
于丹（1983—），女，山西太原人，讲师，博士，CCF会员，主要研究方向：物联网安全
马垚（1982—），男，山西太原人，讲师，博士，CCF会员，主要研究方向：物联网安全
陈永乐（1983—），男，山东潍坊人，教授，博士，CCF会员，主要研究方向：物联网安全。
基金资助:
山西省基础研究计划项目(20210302123131);山西省自然科学基金面上项目(202203021221234);计划外技术服务横向项目(RH2100005181)

Abstract

Abstract:

Data-free model extraction attacks are a class of machine learning security problems based on the fact that the attacker has no knowledge of the training data information required to carry out the attack. Aiming at the research gap of data-free model extraction attacks in the field of Graphical Neural Network （GNN）， a GNN model extraction attack method was proposed. The graph node feature information and edge information were optimized with the graph neural network interpretability method GNNExplainer and the graph data enhancement method GAUG-M， respectively， so as to generate the required graph data and achieve the final GNN model extraction. Firstly， the GNNExplainer method was used to obtain the important graph node feature information from the interpretable analysis of the response results of the target model. Secondly， the overall optimization of the graph node feature information was achieved by up weighting the important graph node features and downweighting the non-important graph node features. Then， the graph autoencoder was used as the edge information prediction module， which obtained the connection probability information between nodes according to the optimized graph node features. Finally， the edge information was optimized by adding or deleting the corresponding edges according to the probability. Three GNN model architectures trained on five graph datasets were experimented as the target models for extraction attacks， and the obtained alternative models achieve 73% to 87% accuracy in node classification task and 76% to 89% fidelity with the target model performance， which verifies the effectiveness of the proposed method.

Key words: data-free model extraction attack, graph data generation, Graphical Neural Network (GNN), GNN interpretability, graph data enhancement

摘要：

无数据模型提取攻击是基于攻击者在进行攻击时所需的训练数据信息未知的情况下提出的一类机器学习安全问题。针对无数据模型提取攻击在图神经网络（GNN）领域的研究缺乏，提出分别用GNN可解释性方法GNNExplainer和图数据增强方法GAUG-M优化图节点特征信息和边信息生成所需图数据，最终提取GNN模型的方法。首先，利用GNNExplainer方法对目标模型的响应结果进行可解释性分析得到重要的图节点特征信息；其次，通过对重要的图节点特征加权，对非重要图节点特征降权，实现图节点特征信息的整体优化；然后，使用图形自动编码器作为边信息预测模块，根据优化后的图节点特征得到节点与节点之间的连接概率；最后，根据概率增加或者删减相应边优化边信息。实验采用5个图数据集训练的3种GNN模型架构作为目标模型提取攻击，得到的替代模型达到了73%~87%的节点分类任务准确性和76%~89%的与目标模型性能的一致性，验证了所提方法的有效性。

关键词: 无数据模型提取攻击, 图数据生成, 图神经网络, 图神经网络可解释性, 图数据增强

CLC Number:

TP309

Ying YANG, Xiaoyan HAO, Dan YU, Yao MA, Yongle CHEN. Graph data generation approach for graph neural network model extraction attacks[J]. Journal of Computer Applications, 2024, 44(8): 2483-2492.

杨莹, 郝晓燕, 于丹, 马垚, 陈永乐. 面向图神经网络模型提取攻击的图数据生成方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2483-2492.

Figures/Tables 16

Tab. 1 Related symbols and explanations

变量符号	含义
$G = (V, E)$	图的基本表示
$v i 或 v i'$	图中的某个节点表示
$e i = (v a, v b)$	节点 $v a$ 和 $v b$ 构成边 $e i$
$e i' = (v a', v b')$	节点 $v a'$ 和 $v b'$ 构成边 $e i'$
$V = {v 1, v 2, …, v n}$	节点集合
$E = {e 1, e 2, …, e m}$	边集合
$X = (x 1, x 2, …, x k)$	节点特征向量
$C = {c 1, c 2, …, c j}$	节点标签集合
$G'$	优化后的图数据
$V' = {v 1', v 2', …, v n'}$	优化后的节点数据
$E' = {e 1', e 2', …, e m'}$	优化后的边数据
$X' = (x 1', x 2', …, x k')$	优化后的节点特征向量
M_g	目标模型
M_s	替代模型
R	目标模型的响应结果
$R'$	替代模型的响应结果

Tab. 1 Related symbols and explanations

变量符号	含义
$G = (V, E)$	图的基本表示
$v i 或 v i'$	图中的某个节点表示
$e i = (v a, v b)$	节点 $v a$ 和 $v b$ 构成边 $e i$
$e i' = (v a', v b')$	节点 $v a'$ 和 $v b'$ 构成边 $e i'$
$V = {v 1, v 2, …, v n}$	节点集合
$E = {e 1, e 2, …, e m}$	边集合
$X = (x 1, x 2, …, x k)$	节点特征向量
$C = {c 1, c 2, …, c j}$	节点标签集合
$G'$	优化后的图数据
$V' = {v 1', v 2', …, v n'}$	优化后的节点数据
$E' = {e 1', e 2', …, e m'}$	优化后的边数据
$X' = (x 1', x 2', …, x k')$	优化后的节点特征向量
M_g	目标模型
M_s	替代模型
R	目标模型的响应结果
$R'$	替代模型的响应结果

Fig. 1 Graph data structure

Fig. 2 Relationship between node features， labels and edges

Fig. 3 Overall architecture for graph node feature optimization based on GNNExplainer

Tab. 2 Node and edge information for five graph datasets

数据集	节点数	边数	特征向量维度	标签数
DBLP	17 716	105 734	1 639	4
PubMed	19 717	88 648	500	3
Citeseer	4 230	5 358	602	6
ACM	3 025	26 256	1 870	3
Coauthor	34 493	495 924	8 415	5

Fig. 4 Visualization results of t-SNE downscaling of graph node features after optimization at epoch of 0， 50， 100， 150，and 200

Fig. 5 Edge optimization results for three graph data samples at epoch of0， 50， 100， and 150

Fig. 6 Quality of graph data generated between epoch 0 and 200

Fig. 7 Influence of number of initial graph nodes on quality of generated graph data

Fig. 8 Influence of initial contiguous edge probability on quality of generated graph data

Tab. 3 Attack performance with GraphSAGE as target model architecture

数据集	M_s
	GraphSAGE		GAT		GIN
	Acc	Fid	Acc	Fid	Acc	Fid
DBLP	0.799±0.003	0.832±0.005	0.735±0.003	0.768±0.011	0.735±0.008	0.779±0.013
PubMed	0.830±0.012	0.867±0.007	0.812±0.007	0.846±0.004	0.772±0.002	0.824±0.004
Citeseer	0.812±0.002	0.853±0.005	0.809±0.004	0.847±0.003	0.758±0.015	0.797±0.010
ACM	0.837±0.005	0.870±0.008	0.836±0.002	0.850±0.004	0.823±0.013	0.854±0.007
Coauthor	0.866±0.001	0.889±0.005	0.856±0.005	0.882±0.003	0.846±0.004	0.877±0.010

Tab. 4 Attack performance with GAT as target model architecture

数据集	M_s
	GraphSAGE		GAT		GIN
	Acc	Fid	Acc	Fid	Acc	Fid
DBLP	0.758±0.002	0.811±0.006	0.812±0.001	0.846±0.004	0.749±0.004	0.788±0.013
PubMed	0.781±0.005	0.832±0.004	0.832±0.002	0.874±0.001	0.768±0.002	0.820±0.002
Citeseer	0.763±0.003	0.809±0.004	0.828±0.004	0.862±0.003	0.756±0.020	0.805±0.011
ACM	0.823±0.004	0.857±0.002	0.845±0.002	0.879±0.006	0.813±0.001	0.847±0.003
Coauthor	0.841±0.005	0.879±0.003	0.870±0.005	0.891±0.012	0.836±0.005	0.872±0.013

Tab. 5 Attack performance with GIN as target model architecture

数据集	M_s
	GraphSAGE		GAT		GIN
	Acc	Fid	Acc	Fid	Acc	Fid
DBLP	0.742±0.002	0.773±0.003	0.733±0.002	0.769±0.010	0.806±0.005	0.822±0.003
PubMed	0.774±0.006	0.791±0.004	0.758±0.003	0.801±0.003	0.819±0.004	0.835±0.002
Citeseer	0.760±0.017	0.782±0.005	0.741±0.002	0.796±0.001	0.811±0.002	0.847±0.003
ACM	0.790±0.003	0.828±0.003	0.808±0.004	0.832±0.003	0.841±0.005	0.863±0.002
Coauthor	0.832±0.005	0.856±0.010	0.838±0.005	0.861±0.004	0.859±0.002	0.877±0.004

Fig. 9 Influence of number of initial graph nodes on extraction attacks on GNN models

Fig. 10 Influence of initial connected edge probability on extraction attacks on GNN models

Fig. 11 Comparative experimental results with GraphSAGE as target model architecture

References 26

1	任奎，孟泉润，闫守琨，等. 人工智能模型数据泄露的攻击与防御研究综述［J］.网络与信息安全学报， 2021， 7（1）： 1-10.
	REN K， MENG Q R， YAN S K， et al. Survey of artificial intelligence data security and privacy protection［J］. Chinese Journal of Network and Information Security， 2021， 7（1）： 1-10.
2	李欣姣，吴国伟，姚琳，等. 机器学习安全攻击与防御机制研究进展和未来挑战［J］. 软件学报， 2021， 32（2）： 406-423.
	LI X J， WU G W， YAO L， et al. Progress and future challenges of security attacks and defense mechanisms in machine learning［J］. Journal of Software， 2021， 32（2）： 406-423.
3	GONG X， WANG Q， CHEN Y， et al. Model extraction attacks and defenses on cloud-based machine learning models［J］. IEEE Communications Magazine， 2020， 58（12）： 83-89.
4	OREKONDY T， SCHIELE B， FRITZ M. Knockoff nets： stealing functionality of black-box models［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4949-4958.
5	PAPERNOT N， McDANIEL P， GOODFELLOW I， et al. Practical black-box attacks against machine learning［C］// Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. New York： ACM， 2017： 506-519.
6	陈传涛，潘丽敏，罗森林，等. 基于FGSM样本扩充的模型窃取攻击方法研究［J］. 信息安全研究， 2021， 7（11）： 1023-1030.
	CHEN C T， PAN L M， LUO S L， et al. Research on model stealing attack method based on FGSM sample expansion［J］. Journal of Information Security Research， 2021， 7（11）： 1023-1030.
7	KARIYAPPA S， PRAKASH A， QURESHI M K. MAZE： data-free model stealing attack using zeroth-order gradient estimation［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13809-13818.
8	YUAN X， DING L， ZHANG L， et al. ES attack： model stealing against deep neural networks without data hurdles［J］. IEEE Transactions on Emerging Topics in Computational Intelligence， 2022， 6（5）： 1258-1270.
9	J-B TRUONG， MAINI P， WALLS R J， et al. Data-free model extraction［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE，2021： 4767-4778.
10	MIURA T， HASEGAWA S， SHIBAHARA T. MEGEX： data-free model extraction attack against gradient-based explainable AI［EB/OL］.（2021-07-19）［2023-08-17］..
11	HONG C， HUANG J， CHEN L Y. MEGA： model stealing via collaborative generator-substitute networks［EB/OL］.（2022-01-31）［2023-08-17］..
12	SANYAL S， ADDEPALLI S， BABU R V.Towards data-free model stealing in a hard label setting［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 15263-15272.
13	TASUMI M， IWAHANA K， YANAI N， et al. First to possess his statistics： data-free model extraction attack on tabular data［EB/OL］.（2021-09-30）［2023-08-17］..
14	DeFAZIO D， RAMESH A. Adversarial model extraction on graph neural networks［EB/OL］.（2019-12-16）［2023-08-17］. .
15	WU B， YANG X， PAN S， et al. Model extraction attacks on graph neural networks： taxonomy and realisation［C］// Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. New York： ACM， 2022： 337-350.
16	SHEN Y， HE X， HAN Y， et al. Model stealing attacks against inductive graph neural networks［C］// Proceedings of the 2022 IEEE Symposium on Security and Privacy. Piscataway： IEEE，2022： 1175-1192.
17	FRANCESCHI L， NIEPERT M， PONTIL M， et al. Learning discrete structures for graph neural networks［J］. Proceedings of Machine Learning Research， 2019， 97： 1972-1982.
18	CHEN Y， WU L， ZAKI M J. Iterative deep graph learning for graph neural networks： better and robust node embeddings ［J］.Advances in Neural Information Processing Systems， 2020， 33： 19314-19326.
19	YUAN H， YU H， GUI S， et al. Explainability in graph neural networks： a taxonomic survey［EB/OL］.（2022-07-01）［2023-08-17］. .
20	YING R， BOURGEOIS D， YOU J， et al. GNNExplainer： generating explanations for graph neural networks［J］.Advances in Neural Information Processing Systems， 2019， 32： 9244-9255.
21	RONG Y， HUANG W， XU T， et al.DropEdge： towards deep graph convolutional networks on node classification［EB/OL］.（2020-03-12）［2023-08-17］..
22	CHEN D， LIN Y， LI W， et al. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view［EB/OL］.（2019-11-18）［2023-08-17］. .
23	ZHANG Y， PAL S， COATES M， et al.Bayesian graph convolutional neural networks for semi-supervised classification［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2019， 33（1）： 5829-5836.
24	ZHAO T， LIU Y， NEVES L， et al.Data augmentation for graph neural networks［EB/OL］.（2020-12-02）［2023-08-17］. .
25	KIPF TN， WELLING M.Variational graph auto-encoders［EB/OL］.（2016-11-21）［2023-08-17］. .
26	XU J， CHEN J， YOU S， et al. Robustness of deep learning models on graphs： a survey［J］. AI Open， 2021， 2： 69-78.

[1]	Rui GAO, Xuebin CHEN, Zucuan ZHANG. Dynamic social network privacy publishing method for partial graph updating [J]. Journal of Computer Applications, 2024, 44(12): 3831-3838.
[2]	Yong XIANG, Yanjun LI, Dingyun HUANG, Yu CHEN, Huiqin XIE. Differential and linear characteristic analysis of full-round Shadow algorithm [J]. Journal of Computer Applications, 2024, 44(12): 3839-3843.
[3]	Zhenhao ZHAO, Shibin ZHANG, Wunan WAN, Jinquan ZHANG, zhi QIN. Delegated proof of stake consensus algorithm based on reputation value and strong blind signature algorithm [J]. Journal of Computer Applications, 2024, 44(12): 3717-3722.
[4]	Yiting WANG, Wunan WAN, Shibin ZHANG, Jinquan ZHANG, Zhi QIN. Linkable ring signature scheme based on SM9 algorithm [J]. Journal of Computer Applications, 2024, 44(12): 3709-3716.
[5]	Jing LIANG, Wunan WAN, Shibin ZHANG, Jinquan ZHANG, Zhi QIN. Traceability storage model of charity system oriented to master-slave chain [J]. Journal of Computer Applications, 2024, 44(12): 3751-3758.
[6]	Deyuan LIU, Jingquan ZHANG, Xing ZHANG, Wunan WAN, Shibin ZHANG, Zhi QIN. Cross-chain identity authentication scheme based on certificate-less signcryption [J]. Journal of Computer Applications, 2024, 44(12): 3731-3740.
[7]	Xin ZHANG, Jinquan ZHANG, Deyuan LIU, Wunan WAN, Shibin ZHANG, Zhi QIN. Cross-chain identity management scheme based on identity-based proxy re-encryption [J]. Journal of Computer Applications, 2024, 44(12): 3723-3730.
[8]	DENG Yilin, YU Fajiang. Pseudo-random number generator based on LSTM and separable self-attention mechanism#br# #br# [J]. Journal of Computer Applications, 0, (): 0-0.
[9]	. Correlation power analysis of advanced encryption standard algorithm based on uniform manifold approximation and projection [J]. Journal of Computer Applications, 0, (): 0-0.
[10]	. Dynamic searchable encryption scheme based on puncture pseudorandom function [J]. Journal of Computer Applications, 0, (): 0-0.
[11]	. fedPF: Federated learning for personalization and fairness [J]. Journal of Computer Applications, 0, (): 0-0.
[12]	. Review of research on conflict-based cache side-channel attacks and eviction sets [J]. Journal of Computer Applications, 0, (): 0-0.
[13]	. Secure and efficient frequency estimation method based on shuffled differential privacy [J]. Journal of Computer Applications, 0, (): 0-0.
[14]	. Blockchain-based model for notarization of simulation testing data in autonomous driving [J]. Journal of Computer Applications, 0, (): 0-0.
[15]	. Fine-grained result verifiable searchable encryption scheme with designated tester in cloud [J]. Journal of Computer Applications, 0, (): 0-0.

Graph data generation approach for graph neural network model extraction attacks

面向图神经网络模型提取攻击的图数据生成方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 26

Related Articles 15

Recommended Articles

Metrics