《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (3): 788-796.DOI: 10.11772/j.issn.1001-9081.2023030290

• 网络空间安全 • 上一篇    下一篇

Android应用敏感行为与隐私政策一致性分析

杨保山1, 杨智1(), 陈性元1,2, 韩冰1, 杜学绘1   

  1. 1.信息工程大学,郑州 450001
    2.密码科学技术国家重点实验室(国家密码管理局),北京 100094
  • 收稿日期:2023-03-20 修回日期:2023-06-05 接受日期:2023-06-08 发布日期:2023-09-14 出版日期:2024-03-10
  • 通讯作者: 杨智
  • 作者简介:杨保山(1998—),男,河南驻马店人,硕士研究生,主要研究方向:软件安全分析
    陈性元(1963—),男,安徽无为人,教授,博士,主要研究方向:网络与信息安全
    韩冰(1978—),女,河南民权人,讲师,博士,主要研究方向:网络空间信息管理与评估
    杜学绘(1968—),女,河南新乡人,教授,博士,主要研究方向:空间信息网络、云计算安全。
  • 基金资助:
    国家自然科学基金资助项目(62176265)

Analysis of consistency between sensitive behavior and privacy policy of Android applications

Baoshan YANG1, Zhi YANG1(), Xingyuan CHEN1,2, Bing HAN1, Xuehui DU1   

  1. 1.Information Engineering University,Zhengzhou Henan 450001,China
    2.State Key Laboratory of Cryptography Science and Technology (State Cryptography Administration),Beijing 100094,China
  • Received:2023-03-20 Revised:2023-06-05 Accepted:2023-06-08 Online:2023-09-14 Published:2024-03-10
  • Contact: Zhi YANG
  • About author:YANG Baoshan, born in 1998, M. S. candidate. His research interests include software security analysis.
    CHEN Xingyuan, born in 1963, Ph. D., professor. His research interests include network and information security.
    HAN Bing, born in 1978, Ph. D., lecturer. Her research interests include management and evaluation of cyberspace information.
    DU Xuehui, born in 1968, Ph. D., professor. Her research interests include spatial information network, cloud computing security.
  • Supported by:
    National Natural Science Foundation of China(62176265)

摘要:

隐私政策文档声明了应用程序需要获取的隐私信息,但不能保证清晰且完全披露应用获取的隐私信息类型,目前对应用实际敏感行为与隐私政策一致性分析的研究仍存在不足。针对上述问题,提出一种Android应用敏感行为与隐私政策一致性分析方法。在隐私政策分析阶段,基于Bi-GRU-CRF(Bi-directional Gated Recurrent Unit Conditional Random Field)神经网络,通过添加自定义标注库对模型进行增量训练,实现对隐私政策声明中的关键信息的提取;在敏感行为分析阶段,通过对敏感应用程序接口(API)调用进行分类、对输入敏感源列表中已分析过的敏感API调用进行删除,以及对已提取过的敏感路径进行标记的方法来优化IFDS(Interprocedural, Finite, Distributive,Subset)算法,使敏感行为分析结果与隐私政策描述的语言粒度相匹配,并且降低分析结果的冗余,提高分析效率;在一致性分析阶段,将本体之间的语义关系分为等价关系、从属关系和近似关系,并据此定义敏感行为与隐私政策一致性形式化模型,将敏感行为与隐私政策一致的情况分为清晰的表述和模糊的表述,将不一致的情况分为省略的表述、不正确的表述和有歧义的表述,最后根据所提基于语义相似度的一致性分析算法对敏感行为与隐私政策进行一致性分析。实验结果表明,对928个应用程序进行分析,在隐私政策分析正确率为97.34%的情况下,51.4%的Android应用程序存在应用实际敏感行为与隐私政策声明不一致的情况。

关键词: Android, IFDS, 敏感行为, 隐私政策, 自然语言处理

Abstract:

The privacy policy document declares the privacy information that an application needs to obtain, but it cannot guarantee that it clearly and fully discloses the types of privacy information that the application obtains. Currently, there are still deficiencies in the analysis of the consistency between actual sensitive behaviors of applications and privacy policies. To address the above issues, a method for analyzing the consistency between sensitive behaviors and privacy policies of Android applications was proposed. In the privacy policy analysis stage, a Bi-GRU-CRF (Bi-directional Gated Recurrent Unit Conditional Random Field) neural network was used and the model was incrementally trained by adding a custom annotation library to extract key information from the privacy policy declaration. In the sensitive behavior analysis stage, IFDS (Interprocedural, Finite, Distributive, Subset) algorithm was optimized by classifying sensitive API (Application Programming Interface) calls, deleting already analyzed sensitive API calls from the input sensitive source list, and marking already extracted sensitive paths. It ensured that the analysis results of sensitive behaviors matched the language granularity of the privacy policy description, reduced the redundancy of the analysis results and improved the efficiency of analysis. In the consistency analysis stage, the semantic relationships between ontologies were classified into equivalence, subordination, and approximation relationships, and a formal model for consistency between sensitive behaviors and privacy policies was defined based on these relationships. The consistency situations between sensitive behaviors and privacy policies were classified into clear expression and ambiguous expression, and inconsistency situations were classified into omitted expression, incorrect expression, and ambiguous expression. Finally, based on the proposed semantic similarity-based consistency analysis algorithm, the consistency between sensitive behaviors and privacy policies was analyzed. Experimental results show that, by analyzing 928 applications, with the privacy policy analysis accuracy of 97.34%, 51.4% of Android applications are found to have inconsistencies between the actual sensitive behaviors and the privacy policy declaration.

Key words: Android, IFDS (Interprocedural, Finite, Distributive, Subset), sensitive behavior, privacy policy, Natural Language Processing (NLP)

中图分类号: