Federated Learning (FL) has emerged as a powerful paradigm for collaborative machine learning, enabling model training across decentralized devices while preserving data privacy. However, the distributed nature of FL makes it vulnerable to backdoor attacks, where malicious participants can inject subtle triggers into the global model, causing it to malfunction in specific scenarios while performing normally otherwise. Addressing this critical security concern, our work introduces BackdoorIndicator, a novel and proactive backdoor detection method that leverages Out-of-Distribution (OOD) data to identify and mitigate backdoor threats in federated learning systems.
Installation
To get started with BackdoorIndicator, you need to install the necessary Python packages. This can be easily done using pip with the provided requirements file:
pip install -r requirement.txt
For optimal compatibility, it is recommended to use the following package versions:
Python==3.7.15
torch==1.13.0
torchvision==0.14.0
For users interested in experimenting with edge-case datasets, these can be acquired by following the instructions available in the official repository of the “Yes-you-can-really-backdoor-FL” paper: https://github.com/ksreenivasan/OOD_Federated_Learning. This repository provides resources and guidance for obtaining and utilizing these specialized datasets.
First Run: Training and Checkpoint Setup
When you run the BackdoorIndicator code for the first time, it initiates the training of a federated learning global model from scratch. For initial setup and experimentation, we recommend running the code without any defense mechanisms activated. This initial run is primarily focused on establishing a baseline and saving checkpoints of the global model at different training rounds.
To configure this initial run, modify the params_vanilla_indicator.yaml
file located in the utils/yamls/
directory. Specifically, set the poisoned_start_round
and global_watermarking_start_round
parameters to a value significantly larger than the intended global round index for saving checkpoints. This effectively disables poisoning and defense mechanisms during the initial training phase.
poisoned_start_round: 10000 # Larger than the biggest global round index you want to save
global_watermarking_start_round: 10000
Upon execution, the code will automatically create a timestamped recording folder within the saved_models
directory. This folder will serve as the repository for saved model checkpoints. After the initial run, you can select any saved checkpoint to resume training or initiate backdoor detection. To specify a checkpoint for resuming, update the resumed_model
parameter in your YAML configuration file with the path to the desired checkpoint file. Additionally, the save_on_round
parameter allows you to define specific global rounds at which model checkpoints should be saved.
resumed_model: "Jun.05_06.09.03/saved_model_global_model_1200.pt.tar"
save_on_round: [xxx, yyy, zzz] # Specify rounds for saving checkpoints
To activate and explore the BackdoorIndicator defense mechanism or other defenses, you can specify the corresponding YAML configuration file when launching the main.py
script. For instance, to implement BackdoorIndicator, ensure that the global_watermarking_start_round
and poisoned_start_round
parameters in the params_vanilla_Indicator.yaml
file are configured to initiate the BackdoorIndicator and poisoning processes at the desired rounds. Then, execute the code using the following command, replacing "x"
with your GPU ID:
python main.py --GPU_id "x" --params utils/yamls/indicator/params_vanilla_Indicator.yaml
The experimental results and logs will be systematically recorded in the respective timestamped folder within the saved_models
directory, facilitating analysis and reproducibility.
Hyperparameter Configuration
For in-depth experimentation and performance tuning, BackdoorIndicator offers a range of hyperparameters that can be adjusted to suit specific federated learning scenarios and datasets. These hyperparameters, which are thoroughly discussed in our research paper, can be modified within the YAML configuration files to investigate their impact on the method’s effectiveness. Key hyperparameters include:
ood_data_source: # Specifies the source of out-of-distribution data
ood_data_sample_lens: # Defines the number of OOD samples to be used
global_retrain_no_times: # Controls the number of global retraining iterations
watermarking_mu: # Parameter influencing the watermarking strength
We encourage users to explore the effects of these hyperparameters to gain a deeper understanding of BackdoorIndicator’s behavior and optimize its performance in diverse federated learning environments.
Citation
If you find BackdoorIndicator and this repository valuable for your research or applications, we kindly request that you cite our paper. Proper citation acknowledges our work and contributes to the advancement of research in federated learning security and backdoor detection.
@inproceedings {299824,
author = {Songze Li and Yanbo Dai},
title = {{BackdoorIndicator}: Leveraging {OOD} Data for Proactive Backdoor Detection in Federated Learning},
booktitle = {33rd USENIX Security Symposium (USENIX Security 24)},
year = {2024},
isbn = {978-1-939133-44-1},
address = {Philadelphia, PA},
pages = {4193--4210},
url = {https://www.usenix.org/conference/usenixsecurity24/presentation/li-songze},
publisher = {USENIX Association},
month = aug
}
By leveraging OOD data, BackdoorIndicator offers a proactive and effective approach to bolstering the security of federated learning systems against backdoor attacks, paving the way for more robust and trustworthy collaborative machine learning in privacy-sensitive applications.