Quick Start

In this section, we will guide you through running a method in FedPruning using multi-GPU and multi-process execution. For this example, we will use the one simple Federated Dynamic Pruning method, FedTiny-clean for Example.

Step 1: Check Available GPUs

First, determine how many GPUs are available on your machine and their memory status. You can use either gpustat or nvidia-smi to check. In this example, we have four NVIDIA RTX 5880 GPUs:

> gpustat
[0] NVIDIA RTX 5880 Ada Generation | 38°C,   0 % |     1 / 49140 MB |
[1] NVIDIA RTX 5880 Ada Generation | 38°C,   0 % |     1 / 49140 MB |
[2] NVIDIA RTX 5880 Ada Generation | 40°C,   0 % |     1 / 49140 MB |
[3] NVIDIA RTX 5880 Ada Generation | 41°C,   0 % |     1 / 49140 MB |

Step 2: Download the Dataset

To download the dataset for training, such as CIFAR-10, follow these steps:

cd data/cifar10
sh download_cifar10.sh
cd ../../

This will download and prepare the CIFAR-10 dataset in the data/cifar10 directory for training.

Step 3: Set the Processes

Navigate to the directory containing the method you want to run. For FedAVG, this would be:

cd experiments/distributed/fedavg

You should decide how many processes you plan to use, the number of processes is client_num_per_round + 1 (where client_num_per_round is the number of clients selected per round and the additional process is for the server), you need to set the correct number of processes in gpu_mapping.yaml.

For example, if client_num_per_round is set to 10, you should configure 11 processes in gpu_mapping.yaml like this:

In this list:

  • The elements represent the number of processes assigned to each GPU.

  • The length of the list indicates the number of GPUs available.

In the above example, three processes will run on each of the first three GPUs, and two will run on the fourth GPU.

Step 4. Set the Arguments and Run the Code

To run the code, we will use a bash script. You can find details about the arguments and their meanings in the files run_fedtinyclean_distributed_pytorch.sh and main_fedtinyclean.py. By default, you can control these arguments directly through the terminal when using the bash script:

[] is mandatory arguments, {} is the optional arguments.

where

For example, if you want to evaluate the ResNet18 model on the CIFAR-10 dataset, you can run the following command:

If you are logged into WandB and its status is set to online, you can view the evaluation results on your WandB workspace website.

Last updated