Label-Only Model Inversion Attacks via Knowledge Transfer

1Singapore University of Technology and Design (SUTD), 2Stanford University
NeurIPS 2023
*Equal Contribution, Work done while at SUTD

Abstract

In a model inversion (MI) attack, an adversary abuses access to a machine learning (ML) model to infer and reconstruct private training data. Remarkable progress has been made in the white-box and black-box setups, where the adversary has access to the complete model or the model's soft output respectively. However, there is very limited study in the most challenging but practically important setup: Label-only MI attacks, where the adversary only has access to the model's predicted label (hard label) without confidence scores nor any other model information.

In this work, we propose a new approach for label-only MI attacks. Our idea is based on transfer of knowledge from the opaque target model to surrogate models. Then, with the surrogate models, our approach can harness advanced white-box attacks. We propose knowledge transfer based on generative modelling, and propose a new Target model-assisted ACGAN (T-ACGAN) for effective knowledge transfer. Our method casts the challenging label-only MI into the more tractable white-box setup. We provide analysis to support that surrogate models based on our approach are good proxy for the target model for MI. Our experiments show that our method significantly outperforms existing SOTA Label-only MI attack by more than 15% across all MI benchmarks. Furthermore, our method compares favorably in terms of query budget. Our study highlights rising privacy threats for ML models even when minimal information (i.e.,hard labels) is exposed.

penultimate layer visualization
Figure 1: Overview and our contributions. (a) Under Label-only model inversion (MI) attack, the Target model T is opaque. (b) Stage 1: As our first contribution, we propose a knowledge transfer scheme to render surrogate model(s). (b) Stage 2: Then, we cast the Label-only MI attack as a white-box MI attack on surrogate model(s) S. (c) This casting can ease the challenging problem setup of label-only MI attack into a white-box MI attack. To our knowledge, our proposed approach is the first to address label-only MI via white-box MI attacks. (d) We propose T-ACGAN to leverage generative modeling and the target model for effective knowledge transfer to render surrogate model(s). Knowledge transfer renders D (Discriminator) as a surrogate model, and further generated samples of T-ACGAN can be used to train other surrogate variant S. (e) Our analysis demonstrates that S is a good proxy for T for MI attack. In particular, white-box MI attack on S mimics the white-box attack on opaque T. (f) Our proposed approach significantly improves the Label-only MI attack (e.g. 20% improvement in standard CelebA benchmark compared to existing SOTA resulting in significant improvement in private data reconstruction.

BibTeX

      @inproceedings{
        nguyen2023labelonly,
        title={Label-Only Model Inversion Attacks via Knowledge Transfer},
        author={Ngoc-Bao Nguyen and Keshigeyan Chandrasegaran and Milad Abdollahzadeh and Ngai-man Cheung},
        booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
        year={2023},
        url={https://openreview.net/forum?id=NuoIThPPag}
        }
  

Acknowledgements

This research is supported by the National Research Foundation, Singapore under its AI Singapore Programmes (AISG Award No.: AISG2-TC-2022-007) and SUTD project PIE-SGP-AI-2018-01. This research work is also supported by the Agency for Science, Technology and Research (A*STAR) under its MTC Programmatic Funds (Grant No. M23L7b0021). This material is based on the research/work support in part by the Changi General Hospital and Singapore University of Technology and Design, under the HealthTech Innovation Fund (HTIF Award No. CGH-SUTD-2021-004). We thank anonymous reviewers for their insightful feedback and discussion.