VeryFL: A Blockchain Federated Learning Model Benchmark Framework

Introduction

Federated Learning (FL) has emerged as a revolutionary approach to machine learning, enabling model training across decentralized devices or servers holding local data samples, without exchanging the data itself. This paradigm shift addresses critical data privacy and security concerns, particularly in sensitive domains like healthcare and finance. However, ensuring transparency, security, and trust in federated learning processes, especially in decentralized settings, remains a significant challenge. Blockchain technology, with its inherent characteristics of immutability, transparency, and decentralization, offers a compelling solution to enhance the reliability and trustworthiness of federated learning systems.

VeryFL is presented as a streamlined and accessible framework designed to explore and benchmark blockchain-integrated federated learning models. It provides a practical environment to understand the synergy between these two cutting-edge technologies. Built upon PyTorch for the federated learning components and Solidity for blockchain functionalities deployed on Ethereum, VeryFL serves multiple crucial purposes. It’s an excellent tool for grasping the fundamental workflow of federated learning, validating centralized federated learning algorithms, and crucially, experimenting with blockchain-based federated learning algorithms within a real Ethereum environment. This makes VeryFL a valuable Blockchain Federated Learning Model Benchmark for researchers, developers, and educators alike.

Dependencies

To effectively utilize VeryFL, ensure your environment meets the following dependency requirements:

Ethereum Environment:

Node.js (>= 16.0.0) & npm (>= 7.10.0): These are essential for setting up and managing the blockchain development environment. Node.js provides the runtime environment, and npm (Node Package Manager) is used to install necessary packages and tools.
Ganache: Ganache is a personal blockchain for Ethereum development. It allows you to quickly set up a private Ethereum blockchain on your local machine, which is crucial for testing and deploying smart contracts without the need for real cryptocurrencies or interaction with public networks. Install Ganache globally using npm:
```
npm install ganache --global
```

Python Environment:

Anaconda: Anaconda is recommended for managing Python environments and dependencies. It simplifies the process of installing and managing different Python versions and packages required for VeryFL.
Python (3.6 ~ 3.9): VeryFL is designed to be compatible with Python versions 3.6 to 3.9. These versions are chosen for their stability and compatibility with the required libraries.
PyTorch (1.13): PyTorch is a fundamental deep learning framework used for implementing the federated learning algorithms within VeryFL. Version 1.13 is specified for compatibility.
Brownie: Brownie is a Python-based development and testing framework for smart contracts targeting the Ethereum Virtual Machine. VeryFL uses Brownie to interact with the deployed smart contracts on the Ethereum blockchain, enabling seamless integration between the federated learning process and blockchain functionalities. Install Brownie using pip:
```
pip install eth-brownie
```

Basic Functions

VeryFL is equipped with several core functionalities that make it a robust blockchain federated learning model benchmark and experimentation platform:

Execute Federated Learning Experiments

VeryFL offers the capability to simulate a wide range of federated learning experiments, supporting both centralized and decentralized paradigms. This flexibility allows users to explore different federated learning scenarios and compare their performance. The framework includes a diverse collection of image classification datasets, commonly used in federated learning research, such as FashionMNIST, and integrates classic federated learning algorithms like FedAvg and FedProx. This allows researchers to readily implement and test variations of these algorithms or develop new ones within the VeryFL environment. By providing pre-built datasets and algorithm implementations, VeryFL significantly lowers the barrier to entry for experimenting with federated learning.

On-chain Mechanism Implemented with Solidity

A primary objective of VeryFL is to furnish a practical experimental environment for blockchain-based federated learning. To achieve this, VeryFL incorporates an embedded Ethereum network, enabling the implementation of on-chain mechanisms using Solidity, the primary smart contract language for Ethereum. These smart contracts, deployed within VeryFL, can govern various aspects of the federated learning process, such as participant registration, incentive mechanisms, model aggregation rules, and result verification. This on-chain functionality is crucial for building transparent and auditable federated learning systems, where key operations are recorded and verifiable on the blockchain.

Model Copyright Protection and Transaction

VeryFL demonstrates a pioneering approach to model copyright protection and secure model transactions within a federated learning context. By integrating model watermarking techniques, VeryFL provides a framework to embed unique identifiers (watermarks) into trained models. These watermarks can then be managed and verified on the blockchain, establishing a decentralized record of model ownership and provenance. This functionality enables the creation of a demo framework that can protect model copyright and facilitate model transactions. For a detailed understanding of this feature, refer to the research article cited below [2], which elaborates on the “Tokenized Model” concept, showcasing a blockchain-empowered decentralized model ownership verification platform. This feature positions VeryFL as a valuable blockchain federated learning model benchmark for exploring intellectual property rights in distributed machine learning.

Code Structure and Usage

VeryFL is designed with a modular and user-friendly code structure to facilitate ease of use and customization.

Quick Start

To quickly initiate a federated learning experiment using VeryFL, you can utilize the test.py script. For instance, to run an experiment using the FashionMNIST dataset, execute the following command in your terminal:

python test.py --benchmark FashionMNIST

This command leverages the argparse library to parse command-line arguments, allowing you to specify the benchmark you wish to run. The test.py script then retrieves the corresponding benchmark configuration from ./config/benchmark.py, sets up the global and training arguments, initializes the chosen federated learning algorithm, and instantiates a Task object to execute the federated learning process.

#test.py
import argparse
from task import Task
import config.benchmark

if __name__=="__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--benchmark', type=str, default="FashionMNIST", help="Running Benchmark(See ./config/benchmark.py)")
    args = parser.parse_args()
    benchmark = config.benchmark.get_benchmark(args.benchmark)
    global_args, train_args, algorithm = benchmark.get_args()
    classification_task = Task(global_args=global_args, train_args=train_args, algorithm=algorithm)
    classification_task.run()

Customize Task Parameters

VeryFL allows for extensive customization of experiment parameters through the ./config/benchmark.py file. Each benchmark defined in this file is structured into three key components:

global_args: This section defines global parameters relevant to the federated learning setup, such as the number of clients participating in the learning process, the dataset to be used (e.g., FashionMNIST, CIFAR-10), and the machine learning model architecture (e.g., CNN, ResNet).
train_args: This section encompasses training-specific hyperparameters, including the learning rate, weight decay, number of training rounds, local epochs, batch size, and optimization algorithm.
Algorithm: This component specifies the federated learning algorithm to be employed. It comprises definitions for the Aggregator (server-side aggregation logic), Client (client-side training process), and Trainer (local model training mechanism).

By modifying these parameters within the benchmark configuration files, users can tailor experiments to their specific research questions or application requirements, making VeryFL a highly adaptable blockchain federated learning model benchmark.

Add New FL Algorithms

VeryFL’s modular design facilitates the integration of novel federated learning algorithms. To incorporate a new FL algorithm, you need to implement two key components:

Client-side Algorithm (Trainer): Implement a new Trainer class within the ./client/trainer directory. This class should encapsulate the logic for local model training on each client device. It will define how each client updates its local model based on its data and the global model received from the server.
Server-side Algorithm (Aggregator): Develop a new Aggregator class within the ./server/aggregation_alg directory. This class will define the server-side aggregation strategy, specifying how the server combines model updates received from clients to generate a new global model. Common aggregation strategies include FedAvg, FedProx, and variations thereof.

By implementing these two components and registering them within the benchmark configuration, you can seamlessly extend VeryFL with custom federated learning algorithms, further enhancing its utility as a research and development platform.

Add New On-chain Mechanisms

VeryFL provides a clear pathway to integrate and experiment with new on-chain mechanisms using blockchain technology. To add new blockchain-based functionalities, follow these steps:

Implement Smart Contracts (Solidity): Develop the desired on-chain logic using Solidity within the ./chainEnv/contracts directory. This involves writing smart contracts that define the rules and functionalities of your blockchain-based mechanism, such as managing participant identities, handling incentive distribution, or recording model updates.
Deploy Smart Contracts: Modify the network startup scripts (within ./chainfl/interact or related setup files) to deploy your newly created smart contracts to the embedded Ethereum network when VeryFL is initialized. This ensures that your smart contracts are active and accessible during federated learning experiments.
Wrap Function Calls (Brownie SDK): Create a wrapper class or functions within the ./chainfl/interact directory using the Brownie SDK. This wrapper will provide a Python interface to interact with your deployed smart contracts. It will abstract the complexities of blockchain interactions, allowing you to call smart contract functions from your Python-based federated learning code.
Interact with Blockchain during Training: Integrate calls to your Brownie SDK wrapper functions within the federated learning training loop (e.g., in the Trainer or Aggregator classes). This allows you to trigger on-chain operations, such as recording model updates on the blockchain or verifying participant contributions, at relevant points during the federated learning process. This seamless integration of on-chain mechanisms makes VeryFL a powerful blockchain federated learning model benchmark for exploring decentralized and secure federated learning paradigms.

Relative Articles

For deeper insights into the design, applications, and theoretical underpinnings of VeryFL and related concepts, refer to the following research articles:

[1] [VeryFL Design] VeryFL: A Verify Federated Learning Framework Embedded with Blockchain(Arxiv) – This paper details the architecture and design principles behind the VeryFL framework.

[2] [Model Copyright] Tokenized Model: A Blockchain-Empowered Decentralized Model Ownership Verification Platform(Arxiv) – This article explores the concept of tokenized models and blockchain-based mechanisms for model copyright protection, a feature demonstrated in VeryFL.

[3] [Overall Background] Towards Reliable Utilization of AIGC: Blockchain-Empowered Ownership Verification Mechanism(OJCS 2023) – This paper provides a broader context on the use of blockchain for enhancing the reliability and trustworthiness of AI-generated content, including federated learning models.

[4] [Using VeryFL] A decentralized federated learning framework via committee mechanism with convergence guarantee(TPDS 2022) – This research utilizes VeryFL to implement and evaluate a decentralized federated learning framework based on a committee mechanism, showcasing the practical applications of VeryFL.