Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?


Viet-Hung Tran(*1,2) | Ngoc-Bao Nguyen(*1)
Son T. Mai(2) | Hans Vandierendonck(2) | Ira Assent(3) | Alex Kot(4) | Ngai-Man Cheung(1†)
(1)Singapore University of Technology and Design (SUTD) (2)Queen's University Belfast
(3)Aarhus University (3)Nanyang Technological University (NTU)
(*) The first two authors contributed equally Corresponding author
TMLR 08/2025

Paper | Github | Models

Abstract


Model Inversion (MI) attacks pose a significant privacy threat by reconstructing private training data from machine learning models. While existing defenses primarily concentrate on model-centric approaches, the impact of data on MI robustness remains largely unexplored.

In this work, we explore Random Erasing (RE)—a technique traditionally used for improving model generalization under occlusion—and uncover its surprising effectiveness as a defense against MI attacks. Specifically, our novel feature space analysis shows that models trained with RE-images introduce a significant discrepancy between the features of MI-reconstructed images and those of the private data. At the same time, features of private images remain distinct from other classes and well-separated from different classification regions. These effects collectively degrade MI reconstruction quality and attack accuracy while maintaining reasonable natural accuracy. Furthermore, we explore two critical properties of RE including Partial Erasure and Random Location. Partial Erasure prevents the model from observing entire objects during training. We find this has a significant impact on MI, which aims to reconstruct the entire objects. Random Location of erasure plays a crucial role in achieving a strong privacy-utility trade-off. Our findings highlight RE as a simple yet effective defense mechanism that can be easily integrated with existing privacy-preserving techniques. Extensive experiments across 37 setups demonstrate that our method achieves state-of-the-art (SOTA) performance in the privacy-utility trade-off. The results consistently demonstrate the superiority of our defense over existing methods across different MI attacks, network architectures, and attack configurations. For the first time, we achieve a significant degradation in attack accuracy without a decrease in utility for some configurations.


Inverted examples

Figure 1: Our Proposed Model Inversion (MI) Defense via Random Erasing (MIDRE). (a) “No Defense”: Training a model without MI defense. 𝓛(θ) is the standard training loss, e.g., cross-entropy. Training a model with state-of-the-art MI defenses (SOTA): (b) BiDO [Peng et al., 2022], (c) NLS [Struppek et al., 2024], (d) TL-DMI [Ho et al., 2024], (e) MI-RAD [Koh et al., 2024], and (f) Our method. Studies in [Peng et al., 2022; Struppek et al., 2024] focus on adding new loss to the training objective in order to find a balance between model utility and privacy. Both TL-DMI [Ho et al., 2024] and MI-RAD [Koh et al., 2024] focus on the model's parameters to defend against MI. For our proposed method (f), the training procedure and objective are the same as in (a) “No Defense.” However, the training samples presented to the model are partially masked, thus reducing the private training sample's information encoded in the model and preventing the model from observing the entire images. Therefore, MIDRE is different from other approaches and focuses on input data only to defend. We find that this can significantly degrade MI attacks, which require a substantial amount of private training data information encoded inside the model in order to reconstruct high-dimensional private images.

Feature space analysis of Random Erasing's defense effectiveness

Figure 2: Feature space analysis to show that, under MIDRE, freconMIDRE and fprivMIDRE have a discrepancy, degrading MI attack. We visualize penultimate layer activations of private images ( fpriv), RE-private images ( fRE), and MI-reconstructed images (× frecon) generated by both (a) NoDef and (b) our MIDRE model. We also visualize the convex hull for private images, RE-private images, and MI-reconstructed images. In (a), freconNoDef closely resembles fprivNoDef, consistent with high attack accuracy. In (b), private images and RE-private images share some similarity but they are not identical, with partial overlap between fprivMIDRE and fREMIDRE. Importantly, freconMIDRE closely resembles fREMIDRE as RE-private is the training data for MIDRE. This results in a reduced overlap between freconMIDRE and fprivMIDRE, explaining that MI does not accurately capture the private image features under MIDRE. More visualization can be found in Supp.

Citation


            @article{
Tran2025random,
title={Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?},
author={Viet-Hung Tran, Ngoc-Bao Nguyen, Son T. Mai, Hans Vandierendonck, Ira Assent, Alex Kot, Ngai-Man Cheung},
journal={Accepted in Transactions on Machine Learning Research (TMLR)},
year={2025},
url={https://openreview.net/forum?id=S9CwKnPHaO}
}
        

Acknowledgements


This research is supported by the National Research Foundation, Singapore under its AI Singapore Programmes (AISG Award No.: AISG2-TC-2022-007); The Agency for Science, Technology and Research (A*STAR) under its MTC Programmatic Funds (Grant No. M23L7b0021). This research is supported by the National Research Foundation, Singapore and Infocomm Media Development Authority under its Trust Tech Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore and Infocomm Media Development Authority. This research is also part-funded by the European Union (Horizon Europe 2021-2027 Framework Programme Grant Agreement number 10107245. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. The European Union cannot be held responsible for them) and by the Engineering and Physical Sciences Research Council under grant number EP/X029174/1.

-->